arrow-rs.git
7 months agoPrepare for 6.3.0 release: Version updated + CHANGELOG (#981) 6.3.0
Andrew Lamb [Fri, 26 Nov 2021 12:20:17 +0000 (07:20 -0500)] 
Prepare for 6.3.0 release: Version updated + CHANGELOG (#981)

* Update release to 6.3.0

* Add 6.3.0 CHANGELOG

7 months agoadd more error test case and change the code style (#952) (#976)
Andrew Lamb [Fri, 26 Nov 2021 12:05:57 +0000 (07:05 -0500)] 
add more error test case and change the code style (#952) (#976)

Co-authored-by: Kun Liu <liukun@apache.org>
7 months agoSupport read decimal data from csv reader if user provide the schema with decimal...
Andrew Lamb [Wed, 24 Nov 2021 12:10:55 +0000 (07:10 -0500)] 
Support read decimal data from csv reader if user provide the schema with decimal data type (#941) (#974)

* support decimal data type for csv reader

* format code and fix lint check

* fix the clippy error

* enchance the parse csv to decimal and add more test

Co-authored-by: Kun Liu <liukun@apache.org>
7 months agoAdding Pretty Print Support For Fixed Size List (#958) (#968)
Andrew Lamb [Tue, 23 Nov 2021 18:25:11 +0000 (13:25 -0500)] 
Adding Pretty Print Support For Fixed Size List (#958) (#968)

* Inferring 2. as Float64 for issue #929

* Adding pretty print support for fixed size list array

* fixing linting errors

* adding null row to test

Co-authored-by: Brian Rackle <brianrackle@hotmail.com>
7 months agoFix bug in temporal utilities due to DST being ignored. (#955) (#967)
Andrew Lamb [Tue, 23 Nov 2021 18:25:00 +0000 (13:25 -0500)] 
Fix bug in temporal utilities due to DST being ignored. (#955) (#967)

* Check behaviour of temporal utilities for DST.

* Fix temporal util bug ignoring dst.

* Refactor macro for efficiency.

Co-authored-by: Navin <navin@novemberkilo.com>
7 months agoInferring 2. as Float64 for issue #929 (#950) (#966)
Andrew Lamb [Tue, 23 Nov 2021 18:24:32 +0000 (13:24 -0500)] 
Inferring 2. as Float64 for issue #929 (#950) (#966)

Co-authored-by: Brian Rackle <brianrackle@hotmail.com>
7 months agoFix CI for latest nightly (#970) (#973)
Andrew Lamb [Tue, 23 Nov 2021 15:05:00 +0000 (10:05 -0500)] 
Fix CI for latest nightly (#970) (#973)

* Fix arrow doc examples

* more cleanup

7 months agoFix primitive sort when input contains more nulls than the given sort limit (#954...
Andrew Lamb [Mon, 22 Nov 2021 20:58:55 +0000 (15:58 -0500)] 
Fix primitive sort when input contains more nulls than the given sort limit (#954) (#965)

Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
7 months agoUpdate comfy-table to 5.0 (#957) (#964)
Andrew Lamb [Mon, 22 Nov 2021 20:55:35 +0000 (15:55 -0500)] 
Update comfy-table to 5.0 (#957) (#964)

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
7 months agoFix csv writing of timestamps to show timezone. (#849) (#963)
Andrew Lamb [Mon, 22 Nov 2021 20:55:29 +0000 (15:55 -0500)] 
Fix csv writing of timestamps to show timezone. (#849) (#963)

* Write timestamps (in csvs) with timezone.

* More tests and more verbose naming.

* Please linter.

* Please clippy.

* Cleanup based on review feedback.

Co-authored-by: Navin <navin@novemberkilo.com>
7 months agoAdding ability to parse float from number with leading decimal (#831) (#962)
Andrew Lamb [Mon, 22 Nov 2021 20:55:19 +0000 (15:55 -0500)] 
Adding ability to parse float from number with leading decimal (#831) (#962)

* Adding ability to parse float from number with leading decimal

* Fixing deprecated std::usize::MAX constant per https://doc.rust-lang.org/core/usize/constant.MAX.html and making consistent with other usages

* Add test case for 2. and issue link

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Brian Rackle <brianrackle@hotmail.com>
7 months agoadd ilike comparitor (#874) (#961)
Andrew Lamb [Mon, 22 Nov 2021 20:55:07 +0000 (15:55 -0500)] 
add ilike comparitor (#874) (#961)

* add ilike comparitor

* add ilike comparitor

Co-authored-by: Jordan Deitch <jdeitch@digitalocean.com>
Co-authored-by: Jordan Deitch <jwdeitch@users.noreply.github.com>
Co-authored-by: Jordan Deitch <jdeitch@digitalocean.com>
7 months agoRemove unpassable cargo publish check from verify-release-candidate.sh (#882) (#949)
Andrew Lamb [Mon, 15 Nov 2021 18:58:16 +0000 (13:58 -0500)] 
Remove unpassable cargo publish check from verify-release-candidate.sh (#882) (#949)

7 months agoPrepare for 6.2.0 release (#947) 6.2.0
Andrew Lamb [Fri, 12 Nov 2021 11:56:10 +0000 (06:56 -0500)] 
Prepare for 6.2.0 release (#947)

* Update version to 6.2.0

* Add CHANGELOG for 6.2.0

7 months agoFix validation for offsets of StructArrays (#942) (#946)
Andrew Lamb [Fri, 12 Nov 2021 11:49:08 +0000 (06:49 -0500)] 
Fix validation for offsets of StructArrays (#942) (#946)

* reproduce validation error

* Fix validation bug

Co-authored-by: Ben Chambers <bjchambers@gmail.com>
Co-authored-by: Ben Chambers <bjchambers@gmail.com>
7 months agoimplement take kernel for null arrays (#939) (#944)
Andrew Lamb [Fri, 12 Nov 2021 11:18:19 +0000 (06:18 -0500)] 
implement take kernel for null arrays (#939) (#944)

Co-authored-by: Ben Chambers <35960+bjchambers@users.noreply.github.com>
7 months agoadd checker for appending i128 to decimal builder (#928) (#943)
Andrew Lamb [Fri, 12 Nov 2021 11:18:09 +0000 (06:18 -0500)] 
add checker for appending i128 to decimal builder (#928) (#943)

* add check for appending i128 to decimal builder

* remove the ArrowError(DecimalError)

Co-authored-by: Kun Liu <liukun@apache.org>
7 months agoValidate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)
Andrew Lamb [Tue, 9 Nov 2021 13:58:02 +0000 (08:58 -0500)] 
Validate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)

* Validate arguments to ArrayData::new: null bit buffer and buffers

* REname is_int_type to is_dictionary_key_type()

* Correctly handle self.offset in offsets buffer

* Consolidate checks

* Fix test output

7 months agofix some warning about unused variables in panic tests (#894) (#933)
Andrew Lamb [Tue, 9 Nov 2021 12:25:05 +0000 (07:25 -0500)] 
fix some warning about unused variables in panic tests (#894) (#933)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agofix some clippy warnings (#896) (#930)
Andrew Lamb [Tue, 9 Nov 2021 12:24:34 +0000 (07:24 -0500)] 
fix some clippy warnings (#896) (#930)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agofeat(ipc): add support for deserializing messages with nested dictionary fields ...
Andrew Lamb [Tue, 9 Nov 2021 12:24:20 +0000 (07:24 -0500)] 
feat(ipc): add support for deserializing messages with nested dictionary fields (#923) (#931)

* feat(ipc): read a message containing nested dictionary fields

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* address lints

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgi@lacework.net>
7 months agotest moving out (#895) (#932)
Andrew Lamb [Tue, 9 Nov 2021 12:24:09 +0000 (07:24 -0500)] 
test moving out (#895) (#932)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agoCherry pick Automatically retry failed MIRI runs to work around intermittent failures...
Andrew Lamb [Tue, 9 Nov 2021 12:23:42 +0000 (07:23 -0500)] 
Cherry pick Automatically retry failed MIRI runs to work around intermittent failures  (#934)

* Automatically retry failed MIRI runs to work around intermittent failures (#922)

* Move MIRI checks into a shell script

* add retry loop

* Do not use cache for miri

7 months agoUpdate mod.rs (#909) (#919)
Andrew Lamb [Fri, 5 Nov 2021 17:46:37 +0000 (13:46 -0400)] 
Update mod.rs (#909) (#919)

Co-authored-by: kingeasternsun <kingeasternsun@gmail.com>
7 months agoMark boolean kernels public (#913) (#920)
Andrew Lamb [Fri, 5 Nov 2021 17:46:29 +0000 (13:46 -0400)] 
Mark boolean kernels public (#913) (#920)

7 months agodoc example mistype (#904) (#918)
Andrew Lamb [Fri, 5 Nov 2021 10:52:06 +0000 (06:52 -0400)] 
doc example  mistype (#904) (#918)

Co-authored-by: kingeasternsun <kingeasternsun@gmail.com>
7 months agoallow null array to be cased to all other types (#884) (#917)
Andrew Lamb [Fri, 5 Nov 2021 10:51:42 +0000 (06:51 -0400)] 
allow null array to be cased to all other types (#884) (#917)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agoFix instances of UB that cause tests to not pass under miri (#878) (#916)
Andrew Lamb [Fri, 5 Nov 2021 10:51:30 +0000 (06:51 -0400)] 
Fix instances of UB that cause tests to not pass under miri (#878) (#916)

* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Ben Kimock <kimockb@gmail.com>
7 months agofeat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)
Andrew Lamb [Fri, 5 Nov 2021 10:51:22 +0000 (06:51 -0400)] 
feat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)

* feat(ipc): Support for writing dictionaries nested in structs and unions

Dictionaries are lost when serializing a RecordBatch for IPC, producing
invalid arrow data. This PR changes encoded_batch to recursively find
all dictionary fields within the schema (currently only in structs and
unions) so nested dictionaries are properly serialized.

* address lint and clippy

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
7 months agoFix references to changelog (#905)
Andrew Lamb [Tue, 2 Nov 2021 12:59:16 +0000 (08:59 -0400)] 
Fix references to changelog (#905)

8 months agoRelease 6.1.0 (#880) 6.1.0
Andrew Lamb [Fri, 29 Oct 2021 13:27:02 +0000 (09:27 -0400)] 
Release 6.1.0 (#880)

* Update changelog for 6.1 release

* Update version to 6.1.0

8 months agoimplement eq_dyn and neq_dyn (#858) (#867)
Andrew Lamb [Wed, 27 Oct 2021 12:44:25 +0000 (08:44 -0400)] 
implement eq_dyn and neq_dyn (#858) (#867)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months agofix: fix a bug in offset calculation for unions (#863) (#871)
Andrew Lamb [Wed, 27 Oct 2021 11:29:42 +0000 (07:29 -0400)] 
fix: fix a bug in offset calculation for unions (#863) (#871)

The `value_offset` function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. Fix the issue by reading the entire i32 offset and add a
unit test.

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
8 months agoadd lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)
Andrew Lamb [Wed, 27 Oct 2021 11:29:33 +0000 (07:29 -0400)] 
add lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months agoTest out new tarpaulin version (#852) (#866)
Andrew Lamb [Wed, 27 Oct 2021 10:46:17 +0000 (06:46 -0400)] 
Test out new tarpaulin version (#852) (#866)

8 months agofix(ipc): Support serializing structs containing dictionaries (#848) (#865)
Andrew Lamb [Wed, 27 Oct 2021 10:46:11 +0000 (06:46 -0400)] 
fix(ipc): Support serializing structs containing dictionaries (#848) (#865)

* fix(ipc): Support serializing structs containing dictionaries

Dictionary fields nested in structs were not properly marked as
dictionary fields when serializing to fb.

* style: cargo fmt

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
8 months agoImplement boolean equality kernels (#844) (#857)
Andrew Lamb [Mon, 25 Oct 2021 10:51:41 +0000 (06:51 -0400)] 
Implement boolean equality kernels (#844) (#857)

* Implement boolean equality kernels

* Respect offset

* Simplify

Co-authored-by: Daniël Heres <danielheres@gmail.com>
8 months agoCherry pick fix parquet_derive with default features (and fix cargo publish) (#856)
Andrew Lamb [Mon, 25 Oct 2021 10:51:22 +0000 (06:51 -0400)] 
Cherry pick fix parquet_derive with default features (and fix cargo publish) (#856)

* fix parquet_derive with default features (and fix `cargo publish`) (#837)

* Run all tests and do dry runs of cargo publish

* Add test for building parquet derive with default features'

* fix feature flags in parquet crate

* fixup rat

* fix default feature test

* Update parquet_derive/test/dependency/default-features/Cargo.toml

* Remove merge issue

8 months agoUse kernel utility for parsing timestamps in csv reader. (#832) (#853)
Andrew Lamb [Sun, 24 Oct 2021 11:09:43 +0000 (07:09 -0400)] 
Use kernel utility for parsing timestamps in csv reader. (#832) (#853)

* Use kernel utility for parsing timestamps in csvs.

* Remove cruft.

* Cleanup.

* Lint.

* Remove erroneous stringify.

Co-authored-by: Navin <navin@novemberkilo.com>
8 months agoUpdate README.md (#834) (#854)
Andrew Lamb [Sun, 24 Oct 2021 11:09:26 +0000 (07:09 -0400)] 
Update README.md (#834) (#854)

fix readme with invalid markdown syntax

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months ago[MINOR] Delete temp file from docs (#836) (#855)
Andrew Lamb [Sun, 24 Oct 2021 11:08:51 +0000 (07:08 -0400)] 
[MINOR] Delete temp file from docs (#836) (#855)

* Delete temp file from docs

* fix

* Use gitignore instead

8 months agoForce fresh cargo cache key in CI (#839) (#851)
Andrew Lamb [Sat, 23 Oct 2021 21:18:57 +0000 (17:18 -0400)] 
Force fresh cargo cache key in CI (#839) (#851)

8 months ago[Minor] Fix clippy errors with new rust version (1.56) and float formatting with...
Andrew Lamb [Sat, 23 Oct 2021 12:34:02 +0000 (08:34 -0400)] 
[Minor] Fix clippy errors with new rust version (1.56) and float formatting with nightly (#845) (#850)

* Clippy fixes

* Test formatting fixes

* Test formatting fixes

* Fixup

Co-authored-by: Daniël Heres <danielheres@gmail.com>
8 months agoUpdate version to 6.0.0 (#828) 6.0.0
Andrew Lamb [Wed, 13 Oct 2021 19:14:49 +0000 (15:14 -0400)] 
Update version to 6.0.0 (#828)

8 months agoAdd Changelog for 6.0.0 (#827)
Andrew Lamb [Wed, 13 Oct 2021 19:05:26 +0000 (15:05 -0400)] 
Add Changelog for 6.0.0 (#827)

* Add Changelog

* Cleanup Changelog

8 months agoReplace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unc...
Andrew Lamb [Wed, 13 Oct 2021 17:17:32 +0000 (13:17 -0400)] 
Replace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unchecked` (#822)

* Replace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unchecked`

* Fix compile for simd

* remove unsafe in benches

8 months agoJSON reader - empty nested list should not create child value (#826)
Wakahisa [Wed, 13 Oct 2021 13:46:07 +0000 (15:46 +0200)] 
JSON reader - empty nested list should not create child value (#826)

* JSON reader - empty nested list should not create child value

* PR review

8 months agoAdd support for parsing timezone using chrono-tz (#824)
Sumit [Wed, 13 Oct 2021 12:59:33 +0000 (14:59 +0200)] 
Add support for parsing timezone using chrono-tz (#824)

- add chrono-tz as an optional depedancy
- try parse using chrono for the numeric format
- if not then try using chrono-tz if present
- return error if neither result in FixedOffset

8 months agohandle tz while extractiing second/minute/hour from temporal arrays (#771)
Sumit [Mon, 11 Oct 2021 20:16:28 +0000 (22:16 +0200)] 
handle tz while extractiing second/minute/hour from temporal arrays (#771)

The patch rewrites the behaviour using macros to indicate the
repetitive nutate of operations

8 months agoFewer ByteArray allocations when writing binary columns (#820)
Wakahisa [Mon, 11 Oct 2021 19:59:11 +0000 (21:59 +0200)] 
Fewer ByteArray allocations when writing binary columns (#820)

* split benchmarks of primitive arrays

* add list benches

* Allocate one ByteArray per row group write

* enumerate

8 months ago[nit] update readme.md and reformat (#821)
Jiayu Liu [Fri, 8 Oct 2021 17:52:49 +0000 (01:52 +0800)] 
[nit] update readme.md and reformat (#821)

* update readme.md and reformat

* update arrow crate

8 months agoSeparate parquet writer benchmarks (#818)
Wakahisa [Thu, 7 Oct 2021 10:47:38 +0000 (12:47 +0200)] 
Separate parquet writer benchmarks (#818)

* split benchmarks of primitive arrays

* add list benches

8 months agoFix null count when casting ListArray (#816)
Andrew Lamb [Thu, 7 Oct 2021 00:29:04 +0000 (20:29 -0400)] 
Fix null count when casting ListArray (#816)

9 months agoAdd Parquet writer example to docs (#797)
Matthew Turner [Thu, 30 Sep 2021 19:10:13 +0000 (15:10 -0400)] 
Add Parquet writer example to docs (#797)

* First example parquet writer

* Add WriterProp examples

* Add missing imports

* Remove options and run doctest

* One more section to run

* no_run on read example

* Make reader run test

* Fix get_schema_by_cols

9 months agoexpose buffer ops (#809)
Ben Chambers [Thu, 30 Sep 2021 19:09:46 +0000 (12:09 -0700)] 
expose buffer ops (#809)

9 months agoparquet: Avoid NaN check for non-floats (#798)
Kornelijus Survila [Thu, 30 Sep 2021 10:43:57 +0000 (04:43 -0600)] 
parquet: Avoid NaN check for non-floats (#798)

It was especially expensive for `ByteArray` columns, potentially taking as
long as the rest of encoding.

9 months agoRemove extra quote in release instructions (#804)
Andrew Lamb [Sun, 26 Sep 2021 11:10:58 +0000 (07:10 -0400)] 
Remove extra quote in release instructions (#804)

9 months agoDoctests for DictionaryArrays. (#805)
Navin [Sun, 26 Sep 2021 11:04:57 +0000 (21:04 +1000)] 
Doctests for DictionaryArrays. (#805)

9 months agoMake parquet's optional arrow dependency skip the default features (#801)
msalib [Fri, 24 Sep 2021 15:38:13 +0000 (11:38 -0400)] 
Make parquet's optional arrow dependency skip the default features (#801)

* Make parquet only depend on minimal arrow features

parquet depends on arrow but arrow by default has a large number of features. That means that users who depend on parquet get the full arrow feature set, even if they don't need it. But parquet itself only needs the ipc feature.

* ipc is not even needed

9 months agoadd wasm32 to hash, fix wasm32 build (#787)
Mike Seddon [Tue, 21 Sep 2021 16:16:46 +0000 (02:16 +1000)] 
add wasm32 to hash, fix wasm32 build (#787)

* add wasm32 to hash

* cargo fmt

9 months agoDoctests for arrays - via collect method. (#785)
Navin [Tue, 21 Sep 2021 16:15:40 +0000 (02:15 +1000)] 
Doctests for arrays - via collect method. (#785)

9 months agoMake BooleanBufferBuilder get_bit not require mutable reference (#784)
Boaz [Sun, 19 Sep 2021 15:05:39 +0000 (18:05 +0300)] 
Make BooleanBufferBuilder get_bit not require mutable reference (#784)

9 months agofix: nanosecond timestamp scaling during string conversion (#780) (#781)
Ilya Biryukov [Fri, 17 Sep 2021 16:06:35 +0000 (19:06 +0300)] 
fix: nanosecond timestamp scaling during string conversion (#780) (#781)

Some datetime formats passed to `string_to_timestamp_nanos` were parsing
milliseconds as nanoseconds.

E.g. `1970-01-01 00:00:00.123` would parse as `123` nanoseconds instead
of milliseconds.

9 months agoAdd support for riscv64 (#769)
Felix Yan [Thu, 16 Sep 2021 21:34:38 +0000 (05:34 +0800)] 
Add support for riscv64 (#769)

* Fix riscv64 target_arch

This should be defined for riscv64 instead, as `riscv` doesn't match it.
I have no idea for riscv32 though.

* parquet: Use murmur_hash2_64a for riscv64

9 months agochore: Reduce the amount of code generated by monomorphization (#715)
Markus Westerlind [Mon, 13 Sep 2021 16:55:56 +0000 (18:55 +0200)] 
chore: Reduce the amount of code generated by monomorphization (#715)

* chore: Reduce the number of instantiations of take* (-3%)

Many types have the same native type, so simplifying these functions to
work directly with native types reduces the number of instantiations.

Reduces the number of llvm lines generated by ~3%

* chore: Shrink try_from_trusted_len_iter (-0.5%)

* chore: Only compile sort_primitive per native type (-8.5%)

* chore: Make the inner take_ functions less generic (-3.5%)

* chore: Don't duplicate sort_list (-13%)

* chore: Extract the "valid" sorting (-7%)

* chore: Extract the array sorter (-1%)

9 months agofix: Support length on slices with null (#745)
Ben Chambers [Sun, 12 Sep 2021 11:02:34 +0000 (04:02 -0700)] 
fix: Support length on slices with null (#745)

* fix: Support length on slices with null

* actually test length

9 months agoAdded PartialEq to RecordBatch (#750)
Matthew Turner [Sat, 11 Sep 2021 16:52:23 +0000 (12:52 -0400)] 
Added PartialEq to RecordBatch (#750)

* Added PartialEq to RecordBatch

* derive PartialEq and add tests

9 months agoExport `RowColumnIter` to fix doc (#763)
Richard [Sat, 11 Sep 2021 07:53:16 +0000 (15:53 +0800)] 
Export `RowColumnIter` to fix doc (#763)

* Export RowColumnIter to fix doc

* Add documentation for RowColumnIter

* Improve documentation for RowColumnIter

9 months agoUse latest nightly in CI to Fix CI for SIMD (#767)
Jorge Leitao [Fri, 10 Sep 2021 17:16:55 +0000 (18:16 +0100)] 
Use latest nightly in CI to Fix CI for SIMD  (#767)

* Fixed CI for SIMD

* Updated nightly for wasm

9 months agoUpdate Bitmap::len to return bits (#749)
Matthew Turner [Thu, 9 Sep 2021 21:34:01 +0000 (17:34 -0400)] 
Update Bitmap::len to return bits (#749)

9 months agoOptimize array::transform::utils::set_bits (#716)
mathiaspeters-sig [Thu, 9 Sep 2021 21:31:59 +0000 (23:31 +0200)] 
Optimize array::transform::utils::set_bits (#716)

* Added tests

* Updated tests and improved implementation

* Cleanup

* Stopped collecting bytes before writing to write_data

* Added tests

* Cleanup and comments

* Fixed clippy warning

* Fixed an endianess issue

* Fixed comments and naming

* Made tests less prone to off-by-n errors

9 months agofix: Scalar math operations on slices (#743)
Ben Chambers [Thu, 9 Sep 2021 20:25:46 +0000 (13:25 -0700)] 
fix: Scalar math operations on slices (#743)

* fix: Scalar math operations on slices

* remove conditional

9 months agofix: new_null_array for structs (#736)
Ben Chambers [Thu, 9 Sep 2021 20:02:10 +0000 (13:02 -0700)] 
fix: new_null_array for structs (#736)

9 months agofix: Allow parquet to be compiled without arrow (fix --no-default-features) (#731)
Markus Westerlind [Thu, 9 Sep 2021 20:00:05 +0000 (22:00 +0200)] 
fix: Allow parquet to be compiled without arrow (fix --no-default-features) (#731)

* fix: Allow parquet to be compiled without arrow

`--no-default-features` is currently broken in the parquet crate due to
arrow being required. With some small tweaks it can be made entirely
optional.

Added some extra steps to catch when `--no-default-features` does not
work on CI as well.

* Fix CI

* Fix path on CI

* --features test_common is needed for clippy

9 months agoAdd `append_nulls` and `append_trusted_len_iter` to `PrimitiveBuilder` (#728)
Ben Chambers [Thu, 9 Sep 2021 19:58:39 +0000 (12:58 -0700)] 
Add `append_nulls` and `append_trusted_len_iter` to `PrimitiveBuilder` (#728)

* stub out impl

* mark unsafe

* add tests

9 months agoUpgrade lexical-core to 0.8 (#748)
Daniël Heres [Sun, 5 Sep 2021 10:21:16 +0000 (12:21 +0200)] 
Upgrade lexical-core to 0.8 (#748)

* Upgrade lexical-core

* Use num instead

9 months agofix: Comparisons against scalar slices (#741)
Ben Chambers [Fri, 3 Sep 2021 00:15:09 +0000 (17:15 -0700)] 
fix: Comparisons against scalar slices (#741)

9 months agofix: Handle slices in unary kernel (#739)
Ben Chambers [Fri, 3 Sep 2021 00:12:47 +0000 (17:12 -0700)] 
fix: Handle slices in unary kernel (#739)

9 months agoRemove optional prettytable-rs dependency (#737)
Krisztián Szűcs [Thu, 2 Sep 2021 19:54:50 +0000 (21:54 +0200)] 
Remove optional prettytable-rs dependency (#737)

10 months agoPyO3 bridge for pyarrow interoperability (#691)
Krisztián Szűcs [Wed, 1 Sep 2021 10:37:35 +0000 (12:37 +0200)] 
PyO3 bridge for pyarrow interoperability  (#691)

* PyO3 bridge for pyarrow interoperability

* Fix clippy warnings

* Simplify error handling

* Fix clippy warnings

* Fix integration test workflow

* Address review comments

* Virtualenv

* Fix integration test

10 months agoFix decimal repr in schema (#721)
Sergii Mikhtoniuk [Tue, 31 Aug 2021 11:36:58 +0000 (04:36 -0700)] 
Fix decimal repr in schema (#721)

Fixes #713

10 months agoFix decimal value_as_string (#722)
Sergii Mikhtoniuk [Sun, 29 Aug 2021 10:22:16 +0000 (03:22 -0700)] 
Fix decimal value_as_string (#722)

Fixes #710

10 months agoAdd a note on rust compiler testing and compatibility (#726)
Andrew Lamb [Sat, 28 Aug 2021 16:22:24 +0000 (12:22 -0400)] 
Add a note on rust compiler testing and compatibility (#726)

* Add a note on rust compiler testing and compatibility

* prettier

10 months agoParquet Derive: remove obscure feature flags, make chrono time emit converted type...
Xavier Lange [Sat, 28 Aug 2021 11:17:39 +0000 (07:17 -0400)] 
Parquet Derive: remove obscure feature flags, make chrono time emit converted type (#712)

* remove feature flags, make timestamp emit converted types

* remove tracking numbers

* NaiveDateTime emits converted type

* formatting

* formatting

10 months agoSupport arrow readers for strings with DELTA_BYTE_ARRAY encoding (#709)
Ilya Biryukov [Thu, 26 Aug 2021 11:54:44 +0000 (14:54 +0300)] 
Support arrow readers for strings with DELTA_BYTE_ARRAY encoding (#709)

* Support arrow readers for strings with DELTA_BYTE_ARRAY encoding

* Review fixes

1. move slice init out of the loop,
2. add tests for nulls,
3. use `debug_assert` for programming error assertion.

10 months agofix edition 2021 (#714)
Jiayu Liu [Thu, 26 Aug 2021 11:50:59 +0000 (19:50 +0800)] 
fix edition 2021 (#714)

10 months agoImplement `regexp_matches_utf8` (#706)
baishen [Thu, 26 Aug 2021 11:50:04 +0000 (06:50 -0500)] 
Implement `regexp_matches_utf8` (#706)

* impl regexp_matches_utf8

* fix clippy

* add bench

* optimize

10 months agoSupport binary data type in `build_struct_array`. (#702)
Yuan Zhou [Sat, 21 Aug 2021 10:33:11 +0000 (18:33 +0800)] 
Support binary data type in `build_struct_array`. (#702)

* Support binary data type in `build_struct_array`.

* Modify test case.

* cargo fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
10 months agoDoctest for PrimitiveArray using from_iter_values. (#694)
Navin [Thu, 19 Aug 2021 20:45:54 +0000 (06:45 +1000)] 
Doctest for PrimitiveArray using from_iter_values. (#694)

* Doctest for PrimitiveArray using from_iter_values.

* Better example for building a PrimitiveArray.

10 months agoUpdate dev README with fancier regular expression for maintenance release notes ...
Andrew Lamb [Wed, 18 Aug 2021 16:33:47 +0000 (12:33 -0400)] 
Update dev README with fancier regular expression for maintenance release notes (#687)

* Update dev README with fancier regular expression

I am trying to incrementally improve the release notes

* Add bullet

* prettier

* tweak

10 months agoChange to comfy-table from prettytable-rs (#656)
Chojan Shang [Mon, 16 Aug 2021 21:11:05 +0000 (05:11 +0800)] 
Change to comfy-table from prettytable-rs (#656)

* Change to comfy-table

Signed-off-by: Chojan Shang <psiace@outlook.com>
* Apply review

Signed-off-by: Chojan Shang <psiace@outlook.com>
10 months agoallow casting from Timestamp based arrays to utf8 (#664)
Sumit [Mon, 16 Aug 2021 21:09:46 +0000 (23:09 +0200)] 
allow casting from Timestamp based arrays to utf8 (#664)

the change adds uses the existing `PrimitiveArray::value_as_datetime` to
support casting from `Timestamp(_,_)` to ``[Large]Utf8`.

10 months agoAdd get_bit to BooleanBufferBuilder (#693)
Boaz [Sun, 15 Aug 2021 18:58:14 +0000 (21:58 +0300)] 
Add get_bit to BooleanBufferBuilder (#693)

* Add get_bit to BooleanBufferBuilder

* fix clippy

10 months agoAllow creation of String arrays from &Option<&str> iterators (#680)
Pete Koomen [Thu, 12 Aug 2021 15:42:28 +0000 (08:42 -0700)] 
Allow creation of String arrays from &Option<&str> iterators (#680)

* Allow creation of String arrays from &Option<&str> iterators

* Add links in doc comments

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
10 months agoWrite FixedLenByteArray stats for FixedLenByteArray columns (not ByteArray stats...
Andrew Lamb [Tue, 10 Aug 2021 00:58:03 +0000 (20:58 -0400)] 
Write FixedLenByteArray stats for FixedLenByteArray columns (not ByteArray stats) (#662)

10 months agoMake rand an optional dependency (#674)
Roee Shlomo [Mon, 9 Aug 2021 11:31:07 +0000 (14:31 +0300)] 
Make rand an optional dependency (#674)

Closes #671

Signed-off-by: roee88 <roee88@gmail.com>
10 months agoWrite boolean stats for boolean columns (not i32 stats) (#661)
Andrew Lamb [Sun, 8 Aug 2021 12:32:47 +0000 (08:32 -0400)] 
Write boolean stats for boolean columns (not i32 stats) (#661)

10 months agoDoctests for DictionaryArray::from_iter, PrimitiveDictionaryBuilder and DecimalBuilde...
Navin [Sun, 8 Aug 2021 10:40:42 +0000 (20:40 +1000)] 
Doctests for DictionaryArray::from_iter, PrimitiveDictionaryBuilder and DecimalBuilder. (#673)

* Doctest for PrimitiveDictionaryBuilder.

* Doctests for DictionaryArray::from_iter.

* Documentation for DecimalBuilder.

10 months agoAdd some do comments to parquet bit_util (#663)
Andrew Lamb [Sun, 8 Aug 2021 10:36:24 +0000 (06:36 -0400)] 
Add some do comments to parquet bit_util (#663)

10 months agoallocate enough bytes when writing booleans (#658)
Ben Chambers [Sun, 8 Aug 2021 07:57:17 +0000 (00:57 -0700)] 
allocate enough bytes when writing booleans (#658)

* allocate enough bytes when writing booleans

* round up to nearest multiple of 256