arrow-rs.git
7 months agoPrepare for 6.2.0 release (#947) 6.2.0
Andrew Lamb [Fri, 12 Nov 2021 11:56:10 +0000 (06:56 -0500)] 
Prepare for 6.2.0 release (#947)

* Update version to 6.2.0

* Add CHANGELOG for 6.2.0

7 months agoFix validation for offsets of StructArrays (#942) (#946)
Andrew Lamb [Fri, 12 Nov 2021 11:49:08 +0000 (06:49 -0500)] 
Fix validation for offsets of StructArrays (#942) (#946)

* reproduce validation error

* Fix validation bug

Co-authored-by: Ben Chambers <bjchambers@gmail.com>
Co-authored-by: Ben Chambers <bjchambers@gmail.com>
7 months agoimplement take kernel for null arrays (#939) (#944)
Andrew Lamb [Fri, 12 Nov 2021 11:18:19 +0000 (06:18 -0500)] 
implement take kernel for null arrays (#939) (#944)

Co-authored-by: Ben Chambers <35960+bjchambers@users.noreply.github.com>
7 months agoadd checker for appending i128 to decimal builder (#928) (#943)
Andrew Lamb [Fri, 12 Nov 2021 11:18:09 +0000 (06:18 -0500)] 
add checker for appending i128 to decimal builder (#928) (#943)

* add check for appending i128 to decimal builder

* remove the ArrowError(DecimalError)

Co-authored-by: Kun Liu <liukun@apache.org>
7 months agoValidate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)
Andrew Lamb [Tue, 9 Nov 2021 13:58:02 +0000 (08:58 -0500)] 
Validate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)

* Validate arguments to ArrayData::new: null bit buffer and buffers

* REname is_int_type to is_dictionary_key_type()

* Correctly handle self.offset in offsets buffer

* Consolidate checks

* Fix test output

7 months agofix some warning about unused variables in panic tests (#894) (#933)
Andrew Lamb [Tue, 9 Nov 2021 12:25:05 +0000 (07:25 -0500)] 
fix some warning about unused variables in panic tests (#894) (#933)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agofix some clippy warnings (#896) (#930)
Andrew Lamb [Tue, 9 Nov 2021 12:24:34 +0000 (07:24 -0500)] 
fix some clippy warnings (#896) (#930)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agofeat(ipc): add support for deserializing messages with nested dictionary fields ...
Andrew Lamb [Tue, 9 Nov 2021 12:24:20 +0000 (07:24 -0500)] 
feat(ipc): add support for deserializing messages with nested dictionary fields (#923) (#931)

* feat(ipc): read a message containing nested dictionary fields

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* address lints

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgi@lacework.net>
7 months agotest moving out (#895) (#932)
Andrew Lamb [Tue, 9 Nov 2021 12:24:09 +0000 (07:24 -0500)] 
test moving out (#895) (#932)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agoCherry pick Automatically retry failed MIRI runs to work around intermittent failures...
Andrew Lamb [Tue, 9 Nov 2021 12:23:42 +0000 (07:23 -0500)] 
Cherry pick Automatically retry failed MIRI runs to work around intermittent failures  (#934)

* Automatically retry failed MIRI runs to work around intermittent failures (#922)

* Move MIRI checks into a shell script

* add retry loop

* Do not use cache for miri

7 months agoUpdate mod.rs (#909) (#919)
Andrew Lamb [Fri, 5 Nov 2021 17:46:37 +0000 (13:46 -0400)] 
Update mod.rs (#909) (#919)

Co-authored-by: kingeasternsun <kingeasternsun@gmail.com>
7 months agoMark boolean kernels public (#913) (#920)
Andrew Lamb [Fri, 5 Nov 2021 17:46:29 +0000 (13:46 -0400)] 
Mark boolean kernels public (#913) (#920)

7 months agodoc example mistype (#904) (#918)
Andrew Lamb [Fri, 5 Nov 2021 10:52:06 +0000 (06:52 -0400)] 
doc example  mistype (#904) (#918)

Co-authored-by: kingeasternsun <kingeasternsun@gmail.com>
7 months agoallow null array to be cased to all other types (#884) (#917)
Andrew Lamb [Fri, 5 Nov 2021 10:51:42 +0000 (06:51 -0400)] 
allow null array to be cased to all other types (#884) (#917)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
7 months agoFix instances of UB that cause tests to not pass under miri (#878) (#916)
Andrew Lamb [Fri, 5 Nov 2021 10:51:30 +0000 (06:51 -0400)] 
Fix instances of UB that cause tests to not pass under miri (#878) (#916)

* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Ben Kimock <kimockb@gmail.com>
7 months agofeat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)
Andrew Lamb [Fri, 5 Nov 2021 10:51:22 +0000 (06:51 -0400)] 
feat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)

* feat(ipc): Support for writing dictionaries nested in structs and unions

Dictionaries are lost when serializing a RecordBatch for IPC, producing
invalid arrow data. This PR changes encoded_batch to recursively find
all dictionary fields within the schema (currently only in structs and
unions) so nested dictionaries are properly serialized.

* address lint and clippy

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
7 months agoFix references to changelog (#905)
Andrew Lamb [Tue, 2 Nov 2021 12:59:16 +0000 (08:59 -0400)] 
Fix references to changelog (#905)

8 months agoRelease 6.1.0 (#880) 6.1.0
Andrew Lamb [Fri, 29 Oct 2021 13:27:02 +0000 (09:27 -0400)] 
Release 6.1.0 (#880)

* Update changelog for 6.1 release

* Update version to 6.1.0

8 months agoimplement eq_dyn and neq_dyn (#858) (#867)
Andrew Lamb [Wed, 27 Oct 2021 12:44:25 +0000 (08:44 -0400)] 
implement eq_dyn and neq_dyn (#858) (#867)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months agofix: fix a bug in offset calculation for unions (#863) (#871)
Andrew Lamb [Wed, 27 Oct 2021 11:29:42 +0000 (07:29 -0400)] 
fix: fix a bug in offset calculation for unions (#863) (#871)

The `value_offset` function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. Fix the issue by reading the entire i32 offset and add a
unit test.

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
8 months agoadd lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)
Andrew Lamb [Wed, 27 Oct 2021 11:29:33 +0000 (07:29 -0400)] 
add lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months agoTest out new tarpaulin version (#852) (#866)
Andrew Lamb [Wed, 27 Oct 2021 10:46:17 +0000 (06:46 -0400)] 
Test out new tarpaulin version (#852) (#866)

8 months agofix(ipc): Support serializing structs containing dictionaries (#848) (#865)
Andrew Lamb [Wed, 27 Oct 2021 10:46:11 +0000 (06:46 -0400)] 
fix(ipc): Support serializing structs containing dictionaries (#848) (#865)

* fix(ipc): Support serializing structs containing dictionaries

Dictionary fields nested in structs were not properly marked as
dictionary fields when serializing to fb.

* style: cargo fmt

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
8 months agoImplement boolean equality kernels (#844) (#857)
Andrew Lamb [Mon, 25 Oct 2021 10:51:41 +0000 (06:51 -0400)] 
Implement boolean equality kernels (#844) (#857)

* Implement boolean equality kernels

* Respect offset

* Simplify

Co-authored-by: Daniël Heres <danielheres@gmail.com>
8 months agoCherry pick fix parquet_derive with default features (and fix cargo publish) (#856)
Andrew Lamb [Mon, 25 Oct 2021 10:51:22 +0000 (06:51 -0400)] 
Cherry pick fix parquet_derive with default features (and fix cargo publish) (#856)

* fix parquet_derive with default features (and fix `cargo publish`) (#837)

* Run all tests and do dry runs of cargo publish

* Add test for building parquet derive with default features'

* fix feature flags in parquet crate

* fixup rat

* fix default feature test

* Update parquet_derive/test/dependency/default-features/Cargo.toml

* Remove merge issue

8 months agoUse kernel utility for parsing timestamps in csv reader. (#832) (#853)
Andrew Lamb [Sun, 24 Oct 2021 11:09:43 +0000 (07:09 -0400)] 
Use kernel utility for parsing timestamps in csv reader. (#832) (#853)

* Use kernel utility for parsing timestamps in csvs.

* Remove cruft.

* Cleanup.

* Lint.

* Remove erroneous stringify.

Co-authored-by: Navin <navin@novemberkilo.com>
8 months agoUpdate README.md (#834) (#854)
Andrew Lamb [Sun, 24 Oct 2021 11:09:26 +0000 (07:09 -0400)] 
Update README.md (#834) (#854)

fix readme with invalid markdown syntax

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
8 months ago[MINOR] Delete temp file from docs (#836) (#855)
Andrew Lamb [Sun, 24 Oct 2021 11:08:51 +0000 (07:08 -0400)] 
[MINOR] Delete temp file from docs (#836) (#855)

* Delete temp file from docs

* fix

* Use gitignore instead

8 months agoForce fresh cargo cache key in CI (#839) (#851)
Andrew Lamb [Sat, 23 Oct 2021 21:18:57 +0000 (17:18 -0400)] 
Force fresh cargo cache key in CI (#839) (#851)

8 months ago[Minor] Fix clippy errors with new rust version (1.56) and float formatting with...
Andrew Lamb [Sat, 23 Oct 2021 12:34:02 +0000 (08:34 -0400)] 
[Minor] Fix clippy errors with new rust version (1.56) and float formatting with nightly (#845) (#850)

* Clippy fixes

* Test formatting fixes

* Test formatting fixes

* Fixup

Co-authored-by: Daniël Heres <danielheres@gmail.com>
8 months agoUpdate version to 6.0.0 (#828) 6.0.0
Andrew Lamb [Wed, 13 Oct 2021 19:14:49 +0000 (15:14 -0400)] 
Update version to 6.0.0 (#828)

8 months agoAdd Changelog for 6.0.0 (#827)
Andrew Lamb [Wed, 13 Oct 2021 19:05:26 +0000 (15:05 -0400)] 
Add Changelog for 6.0.0 (#827)

* Add Changelog

* Cleanup Changelog

8 months agoReplace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unc...
Andrew Lamb [Wed, 13 Oct 2021 17:17:32 +0000 (13:17 -0400)] 
Replace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unchecked` (#822)

* Replace `ArrayData::new()` with `ArrayData::try_new()` and `unsafe ArrayData::new_unchecked`

* Fix compile for simd

* remove unsafe in benches

8 months agoJSON reader - empty nested list should not create child value (#826)
Wakahisa [Wed, 13 Oct 2021 13:46:07 +0000 (15:46 +0200)] 
JSON reader - empty nested list should not create child value (#826)

* JSON reader - empty nested list should not create child value

* PR review

8 months agoAdd support for parsing timezone using chrono-tz (#824)
Sumit [Wed, 13 Oct 2021 12:59:33 +0000 (14:59 +0200)] 
Add support for parsing timezone using chrono-tz (#824)

- add chrono-tz as an optional depedancy
- try parse using chrono for the numeric format
- if not then try using chrono-tz if present
- return error if neither result in FixedOffset

8 months agohandle tz while extractiing second/minute/hour from temporal arrays (#771)
Sumit [Mon, 11 Oct 2021 20:16:28 +0000 (22:16 +0200)] 
handle tz while extractiing second/minute/hour from temporal arrays (#771)

The patch rewrites the behaviour using macros to indicate the
repetitive nutate of operations

8 months agoFewer ByteArray allocations when writing binary columns (#820)
Wakahisa [Mon, 11 Oct 2021 19:59:11 +0000 (21:59 +0200)] 
Fewer ByteArray allocations when writing binary columns (#820)

* split benchmarks of primitive arrays

* add list benches

* Allocate one ByteArray per row group write

* enumerate

8 months ago[nit] update readme.md and reformat (#821)
Jiayu Liu [Fri, 8 Oct 2021 17:52:49 +0000 (01:52 +0800)] 
[nit] update readme.md and reformat (#821)

* update readme.md and reformat

* update arrow crate

8 months agoSeparate parquet writer benchmarks (#818)
Wakahisa [Thu, 7 Oct 2021 10:47:38 +0000 (12:47 +0200)] 
Separate parquet writer benchmarks (#818)

* split benchmarks of primitive arrays

* add list benches

8 months agoFix null count when casting ListArray (#816)
Andrew Lamb [Thu, 7 Oct 2021 00:29:04 +0000 (20:29 -0400)] 
Fix null count when casting ListArray (#816)

9 months agoAdd Parquet writer example to docs (#797)
Matthew Turner [Thu, 30 Sep 2021 19:10:13 +0000 (15:10 -0400)] 
Add Parquet writer example to docs (#797)

* First example parquet writer

* Add WriterProp examples

* Add missing imports

* Remove options and run doctest

* One more section to run

* no_run on read example

* Make reader run test

* Fix get_schema_by_cols

9 months agoexpose buffer ops (#809)
Ben Chambers [Thu, 30 Sep 2021 19:09:46 +0000 (12:09 -0700)] 
expose buffer ops (#809)

9 months agoparquet: Avoid NaN check for non-floats (#798)
Kornelijus Survila [Thu, 30 Sep 2021 10:43:57 +0000 (04:43 -0600)] 
parquet: Avoid NaN check for non-floats (#798)

It was especially expensive for `ByteArray` columns, potentially taking as
long as the rest of encoding.

9 months agoRemove extra quote in release instructions (#804)
Andrew Lamb [Sun, 26 Sep 2021 11:10:58 +0000 (07:10 -0400)] 
Remove extra quote in release instructions (#804)

9 months agoDoctests for DictionaryArrays. (#805)
Navin [Sun, 26 Sep 2021 11:04:57 +0000 (21:04 +1000)] 
Doctests for DictionaryArrays. (#805)

9 months agoMake parquet's optional arrow dependency skip the default features (#801)
msalib [Fri, 24 Sep 2021 15:38:13 +0000 (11:38 -0400)] 
Make parquet's optional arrow dependency skip the default features (#801)

* Make parquet only depend on minimal arrow features

parquet depends on arrow but arrow by default has a large number of features. That means that users who depend on parquet get the full arrow feature set, even if they don't need it. But parquet itself only needs the ipc feature.

* ipc is not even needed

9 months agoadd wasm32 to hash, fix wasm32 build (#787)
Mike Seddon [Tue, 21 Sep 2021 16:16:46 +0000 (02:16 +1000)] 
add wasm32 to hash, fix wasm32 build (#787)

* add wasm32 to hash

* cargo fmt

9 months agoDoctests for arrays - via collect method. (#785)
Navin [Tue, 21 Sep 2021 16:15:40 +0000 (02:15 +1000)] 
Doctests for arrays - via collect method. (#785)

9 months agoMake BooleanBufferBuilder get_bit not require mutable reference (#784)
Boaz [Sun, 19 Sep 2021 15:05:39 +0000 (18:05 +0300)] 
Make BooleanBufferBuilder get_bit not require mutable reference (#784)

9 months agofix: nanosecond timestamp scaling during string conversion (#780) (#781)
Ilya Biryukov [Fri, 17 Sep 2021 16:06:35 +0000 (19:06 +0300)] 
fix: nanosecond timestamp scaling during string conversion (#780) (#781)

Some datetime formats passed to `string_to_timestamp_nanos` were parsing
milliseconds as nanoseconds.

E.g. `1970-01-01 00:00:00.123` would parse as `123` nanoseconds instead
of milliseconds.

9 months agoAdd support for riscv64 (#769)
Felix Yan [Thu, 16 Sep 2021 21:34:38 +0000 (05:34 +0800)] 
Add support for riscv64 (#769)

* Fix riscv64 target_arch

This should be defined for riscv64 instead, as `riscv` doesn't match it.
I have no idea for riscv32 though.

* parquet: Use murmur_hash2_64a for riscv64

9 months agochore: Reduce the amount of code generated by monomorphization (#715)
Markus Westerlind [Mon, 13 Sep 2021 16:55:56 +0000 (18:55 +0200)] 
chore: Reduce the amount of code generated by monomorphization (#715)

* chore: Reduce the number of instantiations of take* (-3%)

Many types have the same native type, so simplifying these functions to
work directly with native types reduces the number of instantiations.

Reduces the number of llvm lines generated by ~3%

* chore: Shrink try_from_trusted_len_iter (-0.5%)

* chore: Only compile sort_primitive per native type (-8.5%)

* chore: Make the inner take_ functions less generic (-3.5%)

* chore: Don't duplicate sort_list (-13%)

* chore: Extract the "valid" sorting (-7%)

* chore: Extract the array sorter (-1%)

9 months agofix: Support length on slices with null (#745)
Ben Chambers [Sun, 12 Sep 2021 11:02:34 +0000 (04:02 -0700)] 
fix: Support length on slices with null (#745)

* fix: Support length on slices with null

* actually test length

9 months agoAdded PartialEq to RecordBatch (#750)
Matthew Turner [Sat, 11 Sep 2021 16:52:23 +0000 (12:52 -0400)] 
Added PartialEq to RecordBatch (#750)

* Added PartialEq to RecordBatch

* derive PartialEq and add tests

9 months agoExport `RowColumnIter` to fix doc (#763)
Richard [Sat, 11 Sep 2021 07:53:16 +0000 (15:53 +0800)] 
Export `RowColumnIter` to fix doc (#763)

* Export RowColumnIter to fix doc

* Add documentation for RowColumnIter

* Improve documentation for RowColumnIter

9 months agoUse latest nightly in CI to Fix CI for SIMD (#767)
Jorge Leitao [Fri, 10 Sep 2021 17:16:55 +0000 (18:16 +0100)] 
Use latest nightly in CI to Fix CI for SIMD  (#767)

* Fixed CI for SIMD

* Updated nightly for wasm

9 months agoUpdate Bitmap::len to return bits (#749)
Matthew Turner [Thu, 9 Sep 2021 21:34:01 +0000 (17:34 -0400)] 
Update Bitmap::len to return bits (#749)

9 months agoOptimize array::transform::utils::set_bits (#716)
mathiaspeters-sig [Thu, 9 Sep 2021 21:31:59 +0000 (23:31 +0200)] 
Optimize array::transform::utils::set_bits (#716)

* Added tests

* Updated tests and improved implementation

* Cleanup

* Stopped collecting bytes before writing to write_data

* Added tests

* Cleanup and comments

* Fixed clippy warning

* Fixed an endianess issue

* Fixed comments and naming

* Made tests less prone to off-by-n errors

9 months agofix: Scalar math operations on slices (#743)
Ben Chambers [Thu, 9 Sep 2021 20:25:46 +0000 (13:25 -0700)] 
fix: Scalar math operations on slices (#743)

* fix: Scalar math operations on slices

* remove conditional

9 months agofix: new_null_array for structs (#736)
Ben Chambers [Thu, 9 Sep 2021 20:02:10 +0000 (13:02 -0700)] 
fix: new_null_array for structs (#736)

9 months agofix: Allow parquet to be compiled without arrow (fix --no-default-features) (#731)
Markus Westerlind [Thu, 9 Sep 2021 20:00:05 +0000 (22:00 +0200)] 
fix: Allow parquet to be compiled without arrow (fix --no-default-features) (#731)

* fix: Allow parquet to be compiled without arrow

`--no-default-features` is currently broken in the parquet crate due to
arrow being required. With some small tweaks it can be made entirely
optional.

Added some extra steps to catch when `--no-default-features` does not
work on CI as well.

* Fix CI

* Fix path on CI

* --features test_common is needed for clippy

9 months agoAdd `append_nulls` and `append_trusted_len_iter` to `PrimitiveBuilder` (#728)
Ben Chambers [Thu, 9 Sep 2021 19:58:39 +0000 (12:58 -0700)] 
Add `append_nulls` and `append_trusted_len_iter` to `PrimitiveBuilder` (#728)

* stub out impl

* mark unsafe

* add tests

9 months agoUpgrade lexical-core to 0.8 (#748)
Daniël Heres [Sun, 5 Sep 2021 10:21:16 +0000 (12:21 +0200)] 
Upgrade lexical-core to 0.8 (#748)

* Upgrade lexical-core

* Use num instead

9 months agofix: Comparisons against scalar slices (#741)
Ben Chambers [Fri, 3 Sep 2021 00:15:09 +0000 (17:15 -0700)] 
fix: Comparisons against scalar slices (#741)

9 months agofix: Handle slices in unary kernel (#739)
Ben Chambers [Fri, 3 Sep 2021 00:12:47 +0000 (17:12 -0700)] 
fix: Handle slices in unary kernel (#739)

9 months agoRemove optional prettytable-rs dependency (#737)
Krisztián Szűcs [Thu, 2 Sep 2021 19:54:50 +0000 (21:54 +0200)] 
Remove optional prettytable-rs dependency (#737)

10 months agoPyO3 bridge for pyarrow interoperability (#691)
Krisztián Szűcs [Wed, 1 Sep 2021 10:37:35 +0000 (12:37 +0200)] 
PyO3 bridge for pyarrow interoperability  (#691)

* PyO3 bridge for pyarrow interoperability

* Fix clippy warnings

* Simplify error handling

* Fix clippy warnings

* Fix integration test workflow

* Address review comments

* Virtualenv

* Fix integration test

10 months agoFix decimal repr in schema (#721)
Sergii Mikhtoniuk [Tue, 31 Aug 2021 11:36:58 +0000 (04:36 -0700)] 
Fix decimal repr in schema (#721)

Fixes #713

10 months agoFix decimal value_as_string (#722)
Sergii Mikhtoniuk [Sun, 29 Aug 2021 10:22:16 +0000 (03:22 -0700)] 
Fix decimal value_as_string (#722)

Fixes #710

10 months agoAdd a note on rust compiler testing and compatibility (#726)
Andrew Lamb [Sat, 28 Aug 2021 16:22:24 +0000 (12:22 -0400)] 
Add a note on rust compiler testing and compatibility (#726)

* Add a note on rust compiler testing and compatibility

* prettier

10 months agoParquet Derive: remove obscure feature flags, make chrono time emit converted type...
Xavier Lange [Sat, 28 Aug 2021 11:17:39 +0000 (07:17 -0400)] 
Parquet Derive: remove obscure feature flags, make chrono time emit converted type (#712)

* remove feature flags, make timestamp emit converted types

* remove tracking numbers

* NaiveDateTime emits converted type

* formatting

* formatting

10 months agoSupport arrow readers for strings with DELTA_BYTE_ARRAY encoding (#709)
Ilya Biryukov [Thu, 26 Aug 2021 11:54:44 +0000 (14:54 +0300)] 
Support arrow readers for strings with DELTA_BYTE_ARRAY encoding (#709)

* Support arrow readers for strings with DELTA_BYTE_ARRAY encoding

* Review fixes

1. move slice init out of the loop,
2. add tests for nulls,
3. use `debug_assert` for programming error assertion.

10 months agofix edition 2021 (#714)
Jiayu Liu [Thu, 26 Aug 2021 11:50:59 +0000 (19:50 +0800)] 
fix edition 2021 (#714)

10 months agoImplement `regexp_matches_utf8` (#706)
baishen [Thu, 26 Aug 2021 11:50:04 +0000 (06:50 -0500)] 
Implement `regexp_matches_utf8` (#706)

* impl regexp_matches_utf8

* fix clippy

* add bench

* optimize

10 months agoSupport binary data type in `build_struct_array`. (#702)
Yuan Zhou [Sat, 21 Aug 2021 10:33:11 +0000 (18:33 +0800)] 
Support binary data type in `build_struct_array`. (#702)

* Support binary data type in `build_struct_array`.

* Modify test case.

* cargo fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
10 months agoDoctest for PrimitiveArray using from_iter_values. (#694)
Navin [Thu, 19 Aug 2021 20:45:54 +0000 (06:45 +1000)] 
Doctest for PrimitiveArray using from_iter_values. (#694)

* Doctest for PrimitiveArray using from_iter_values.

* Better example for building a PrimitiveArray.

10 months agoUpdate dev README with fancier regular expression for maintenance release notes ...
Andrew Lamb [Wed, 18 Aug 2021 16:33:47 +0000 (12:33 -0400)] 
Update dev README with fancier regular expression for maintenance release notes (#687)

* Update dev README with fancier regular expression

I am trying to incrementally improve the release notes

* Add bullet

* prettier

* tweak

10 months agoChange to comfy-table from prettytable-rs (#656)
Chojan Shang [Mon, 16 Aug 2021 21:11:05 +0000 (05:11 +0800)] 
Change to comfy-table from prettytable-rs (#656)

* Change to comfy-table

Signed-off-by: Chojan Shang <psiace@outlook.com>
* Apply review

Signed-off-by: Chojan Shang <psiace@outlook.com>
10 months agoallow casting from Timestamp based arrays to utf8 (#664)
Sumit [Mon, 16 Aug 2021 21:09:46 +0000 (23:09 +0200)] 
allow casting from Timestamp based arrays to utf8 (#664)

the change adds uses the existing `PrimitiveArray::value_as_datetime` to
support casting from `Timestamp(_,_)` to ``[Large]Utf8`.

10 months agoAdd get_bit to BooleanBufferBuilder (#693)
Boaz [Sun, 15 Aug 2021 18:58:14 +0000 (21:58 +0300)] 
Add get_bit to BooleanBufferBuilder (#693)

* Add get_bit to BooleanBufferBuilder

* fix clippy

10 months agoAllow creation of String arrays from &Option<&str> iterators (#680)
Pete Koomen [Thu, 12 Aug 2021 15:42:28 +0000 (08:42 -0700)] 
Allow creation of String arrays from &Option<&str> iterators (#680)

* Allow creation of String arrays from &Option<&str> iterators

* Add links in doc comments

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
10 months agoWrite FixedLenByteArray stats for FixedLenByteArray columns (not ByteArray stats...
Andrew Lamb [Tue, 10 Aug 2021 00:58:03 +0000 (20:58 -0400)] 
Write FixedLenByteArray stats for FixedLenByteArray columns (not ByteArray stats) (#662)

10 months agoMake rand an optional dependency (#674)
Roee Shlomo [Mon, 9 Aug 2021 11:31:07 +0000 (14:31 +0300)] 
Make rand an optional dependency (#674)

Closes #671

Signed-off-by: roee88 <roee88@gmail.com>
10 months agoWrite boolean stats for boolean columns (not i32 stats) (#661)
Andrew Lamb [Sun, 8 Aug 2021 12:32:47 +0000 (08:32 -0400)] 
Write boolean stats for boolean columns (not i32 stats) (#661)

10 months agoDoctests for DictionaryArray::from_iter, PrimitiveDictionaryBuilder and DecimalBuilde...
Navin [Sun, 8 Aug 2021 10:40:42 +0000 (20:40 +1000)] 
Doctests for DictionaryArray::from_iter, PrimitiveDictionaryBuilder and DecimalBuilder. (#673)

* Doctest for PrimitiveDictionaryBuilder.

* Doctests for DictionaryArray::from_iter.

* Documentation for DecimalBuilder.

10 months agoAdd some do comments to parquet bit_util (#663)
Andrew Lamb [Sun, 8 Aug 2021 10:36:24 +0000 (06:36 -0400)] 
Add some do comments to parquet bit_util (#663)

10 months agoallocate enough bytes when writing booleans (#658)
Ben Chambers [Sun, 8 Aug 2021 07:57:17 +0000 (00:57 -0700)] 
allocate enough bytes when writing booleans (#658)

* allocate enough bytes when writing booleans

* round up to nearest multiple of 256

10 months agoFix parquet string statistics generation (#643)
Andrew Lamb [Sun, 8 Aug 2021 07:46:14 +0000 (03:46 -0400)] 
Fix parquet string statistics generation (#643)

* Fix string statistics generation, add tests

* fix Int96 stats test

* Add notes for additional tickets

10 months agoAdd a note about arrow crate security / safety (#628)
Andrew Lamb [Sat, 7 Aug 2021 00:38:01 +0000 (20:38 -0400)] 
Add a note about arrow crate security / safety (#628)

* Add note about safety to arrow README.md

* Prettier

* Remove note about making modules private

10 months agoTiny tweaks to release readme (#670)
Andrew Lamb [Fri, 6 Aug 2021 11:43:39 +0000 (07:43 -0400)] 
Tiny tweaks to release readme (#670)

10 months agoRemove undefined behavior in `value` method of boolean and primitive arrays (#644)
Daniël Heres [Tue, 3 Aug 2021 07:11:24 +0000 (09:11 +0200)] 
Remove undefined behavior in `value` method of boolean and primitive arrays (#644)

* Remove UB in `value`

* Add safety note

10 months agoDoctests for from_iter for BooleanArray & for BooleanBuilder. (#647)
Navin [Mon, 2 Aug 2021 20:47:16 +0000 (06:47 +1000)] 
Doctests for from_iter for BooleanArray & for BooleanBuilder. (#647)

10 months agodraft question template (#649)
Ruihang Xia [Mon, 2 Aug 2021 19:43:46 +0000 (03:43 +0800)] 
draft question template (#649)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
10 months agoupdate documentation (#648)
Ruihang Xia [Mon, 2 Aug 2021 17:32:41 +0000 (01:32 +0800)] 
update documentation (#648)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
10 months agoFix data corruption in json decoder f64-to-i64 cast (#652)
Christian Williams [Mon, 2 Aug 2021 17:29:09 +0000 (13:29 -0400)] 
Fix data corruption in json decoder f64-to-i64 cast (#652)

* Add failing test for JSON writer i64 bug

* Add special handling for i64/u64 to json decoder array builder

* Fix linter error - linter wants .flatten on a new line

11 months agoAdd human readable Format for parquet ByteArray (#642)
Andrew Lamb [Sat, 31 Jul 2021 15:02:20 +0000 (11:02 -0400)] 
Add human readable Format for parquet ByteArray (#642)

11 months agoMinimal MapArray support (#491)
Wakahisa [Sat, 31 Jul 2021 05:20:56 +0000 (07:20 +0200)] 
Minimal MapArray support (#491)

* add DataType::Map to datatypes

* barebones MapArray and MapBuilder

This commit adds the MapArray and MapBuilder.
The interfaces are however incomplete at this stage.

* minimal IPC read and write

* barebones MapArray (missed)

* add equality for map, relying on list

A map is a list with some specific rules, so for equality it is the same as a list

* json reader for MapArray

* add schema roundtrip

* read and write maps from/to arrow map

* clippy

* Calculate map levels separately

Avoids the generic case of list > struct > [ley, value], which adds overhead

* Fix map reader context and path

* Map array tests

* add doc comments and clean up code

* wip: review feedback

* add test for map

* fix clippy 1.54 lints

11 months agoSpeed up filter_record_batch with one array (#637)
Daniël Heres [Fri, 30 Jul 2021 19:30:33 +0000 (21:30 +0200)] 
Speed up filter_record_batch with one array (#637)

* Speed up filter_record_batch with one array

* Don't into()

11 months agoAdd note about changelog generation to README (#639)
Andrew Lamb [Fri, 30 Jul 2021 11:57:37 +0000 (07:57 -0400)] 
Add note about changelog generation to README (#639)

* Add note about changelog generation to README

* make it prettier

11 months agoFix clippy lints for Rust 1.54 (#631)
Andrew Lamb [Thu, 29 Jul 2021 20:03:10 +0000 (16:03 -0400)] 
Fix clippy lints for Rust 1.54 (#631)