arrow-rs.git
4 months agoUpdate changelog 1287/head 9.0.1
Andrew Lamb [Mon, 7 Feb 2022 17:08:08 +0000 (12:08 -0500)] 
Update changelog

4 months agoUpdate version to 9.0.1
Andrew Lamb [Mon, 7 Feb 2022 17:06:36 +0000 (12:06 -0500)] 
Update version to 9.0.1

4 months agoFix bitmask creation in chunked part of simd comparison (#1286)
Jörn Horstmann [Mon, 7 Feb 2022 17:04:42 +0000 (18:04 +0100)] 
Fix bitmask creation in chunked part of simd comparison (#1286)

4 months agoPrepare for 9.0.0 release: Update version + CHANGELOG (#1265) 9.0.0
Andrew Lamb [Fri, 4 Feb 2022 12:22:22 +0000 (07:22 -0500)] 
Prepare for 9.0.0 release: Update version + CHANGELOG (#1265)

* Update version to 9.0.0

* Update changelog generator script

* Initial changelog

* Updates

* Update again

* rat

4 months agoupgrade clap (#1261)
Jiayu Liu [Thu, 3 Feb 2022 07:39:24 +0000 (15:39 +0800)] 
upgrade clap (#1261)

4 months agoRemove unsupported flag in rustfmt.toml (#1262)
Andrew Lamb [Wed, 2 Feb 2022 20:37:59 +0000 (15:37 -0500)] 
Remove unsupported flag in rustfmt.toml (#1262)

4 months agoImprove module documentation for parquet crate (#1253)
Andrew Lamb [Wed, 2 Feb 2022 20:37:42 +0000 (15:37 -0500)] 
Improve module documentation for parquet crate (#1253)

4 months agoFaster bitmask iteration (#1228)
Raphael Taylor-Davies [Wed, 2 Feb 2022 20:37:10 +0000 (20:37 +0000)] 
Faster bitmask iteration (#1228)

* Add UnalignedBitChunks (#1227)

* Clippy

* Fix flaky test

* Improve test legibility

* Fix SlicesIterator offset direction

* Format

* Fix byte-aligned termination

* Test edge-cases

* More tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Review feedback

* Make UnalignedBitChunkIterator crate local

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoAdd `async` arrow parquet reader (#1154)
Raphael Taylor-Davies [Wed, 2 Feb 2022 11:31:21 +0000 (11:31 +0000)] 
Add `async` arrow parquet reader (#1154)

* Async parquet reader (#111)

Add Sync + Send bounds to parquet crate

* Remove Sync from DataType

* Review feedback

* Add basic test

* Fix lints

* Review feedback

* Tweak CI

4 months agoRefresh readme / contributing guide (#1252)
Andrew Lamb [Wed, 2 Feb 2022 11:16:05 +0000 (06:16 -0500)] 
Refresh readme / contributing guide (#1252)

4 months agoUpdate chrono-tz requirement from 0.4 to 0.6 (#1259)
dependabot[bot] [Tue, 1 Feb 2022 21:58:26 +0000 (16:58 -0500)] 
Update chrono-tz requirement from 0.4 to 0.6 (#1259)

Updates the requirements on [chrono-tz](https://github.com/chronotope/chrono-tz) to permit the latest version.
- [Release notes](https://github.com/chronotope/chrono-tz/releases)
- [Changelog](https://github.com/chronotope/chrono-tz/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono-tz/commits/v0.6.1)

---
updated-dependencies:
- dependency-name: chrono-tz
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoUpdate zstd requirement from 0.9 to 0.10 (#1257)
dependabot[bot] [Tue, 1 Feb 2022 21:58:13 +0000 (16:58 -0500)] 
Update zstd requirement from 0.9 to 0.10 (#1257)

Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](https://github.com/gyscos/zstd-rs/compare/0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: zstd
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoAdd dependabot (#1256)
Andrew Lamb [Tue, 1 Feb 2022 17:48:16 +0000 (12:48 -0500)] 
Add dependabot (#1256)

4 months agoBatch multiple records in ArrowWriter (#1214)
Raphael Taylor-Davies [Tue, 1 Feb 2022 14:28:15 +0000 (14:28 +0000)] 
Batch multiple records in ArrowWriter (#1214)

* Batch multiple records in ArrowWriter

* Document max_group_size and reduce default (#1213)

* Review feedback

* Write multiple arrays without concat

* Clippy

* Test aggregating complex types

* Test complex slice

* Clippy

4 months agoRevert "Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)" (#1251)
Andrew Lamb [Tue, 1 Feb 2022 14:02:49 +0000 (09:02 -0500)] 
Revert "Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)" (#1251)

This reverts commit d68c4ae14077d60326eb57fe28133645f800d7e5.

4 months agoAdd docs examples for dynamically compare functions (#1250)
Remzi Yang [Mon, 31 Jan 2022 21:58:58 +0000 (05:58 +0800)] 
Add docs examples for dynamically compare functions  (#1250)

* add examples

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct rust format

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoFix NullArrayReader (#1245) (#1246)
Raphael Taylor-Davies [Sat, 29 Jan 2022 15:53:10 +0000 (15:53 +0000)] 
Fix NullArrayReader (#1245) (#1246)

4 months agoRevert making parquet::data_type and parquet::arrow::schema experimental (#1244)
Raphael Taylor-Davies [Fri, 28 Jan 2022 19:20:04 +0000 (19:20 +0000)] 
Revert making parquet::data_type and parquet::arrow::schema experimental (#1244)

4 months agorename to Bitmap::bit_len (#1242)
Remzi Yang [Fri, 28 Jan 2022 11:43:07 +0000 (19:43 +0800)] 
rename to Bitmap::bit_len (#1242)

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoAdd Rust Docs examples for UnionArray (#1241)
Remzi Yang [Thu, 27 Jan 2022 20:54:21 +0000 (04:54 +0800)] 
Add Rust Docs examples for UnionArray (#1241)

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoRemove Copy trait from dyn scalar kernels (#1243)
Matthew Turner [Thu, 27 Jan 2022 17:05:33 +0000 (12:05 -0500)] 
Remove Copy trait from dyn scalar kernels (#1243)

4 months agodyn compare for binary array (#1238)
Remzi Yang [Wed, 26 Jan 2022 12:18:25 +0000 (20:18 +0800)] 
dyn compare for binary array (#1238)

* dyn compare two binary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* add dyn comparison for binary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for dyn compare binary array and scalar

Signed-off-by: remzi <13716567376yh@gmail.com>
* remove DictionaryArray from dyn compare, because not find an easy way to build binary dictionary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix mistakes in test code

Signed-off-by: remzi <13716567376yh@gmail.com>
* add Nones into the test cases

Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 scalar

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code format

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoImprove documentation for Bitmap (#1237)
Andrew Lamb [Tue, 25 Jan 2022 11:35:26 +0000 (06:35 -0500)] 
Improve documentation for Bitmap (#1237)

4 months agoFix null bitmap length validation (#1231) (#1232)
Raphael Taylor-Davies [Mon, 24 Jan 2022 21:31:23 +0000 (21:31 +0000)] 
Fix null bitmap length validation (#1231) (#1232)

* Fix null bitmap length validation (#1231)

* Reuse existing test

4 months agoRemove explicit simd arithmetic kernels except for division/modulo (#1221)
Jörn Horstmann [Mon, 24 Jan 2022 19:50:30 +0000 (20:50 +0100)] 
Remove explicit simd arithmetic kernels except for division/modulo (#1221)

* Extend arithmetic benchmarks

* Remove explicit simd arithmetic except for div/mod because autovectorization generates better code

* Remove unneeded return keywords

4 months agoRemove memory-check feature (#1222)
Jörn Horstmann [Mon, 24 Jan 2022 19:28:15 +0000 (20:28 +0100)] 
Remove memory-check feature (#1222)

4 months ago[Minor]Re-export `array::builder::make_builder` to make it available for downstream...
Yijie Shen [Mon, 24 Jan 2022 19:27:18 +0000 (03:27 +0800)] 
[Minor]Re-export `array::builder::make_builder` to make it available for downstream (#1235)

* Re-export `array::builder::make_builder`

* Update mod.rs

4 months ago[Minor]`into_inner` for IPC `FileWriter` (#1236)
Yijie Shen [Mon, 24 Jan 2022 19:23:40 +0000 (03:23 +0800)] 
[Minor]`into_inner` for IPC `FileWriter` (#1236)

* `into_inner` for IPC `FileWriter`

* lint

4 months agoDo not concatenate identical dictionaries (#1219)
Raphael Taylor-Davies [Mon, 24 Jan 2022 19:22:43 +0000 (19:22 +0000)] 
Do not concatenate identical dictionaries (#1219)

* Do not concatenate identical dictionaries (#504)

* Review feedback

4 months agoRemove arrow array reader (#1197) (#1234)
Raphael Taylor-Davies [Mon, 24 Jan 2022 15:07:26 +0000 (15:07 +0000)] 
Remove arrow array reader (#1197) (#1234)

4 months agoPreserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improv...
Raphael Taylor-Davies [Mon, 24 Jan 2022 12:00:44 +0000 (12:00 +0000)] 
Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) (#1180)

* Preserve dictionary encoding from parquet (#171)

* Use OffsetBuffer::into_array for dictionary

* Fix and test handling of empty dictionaries

Don't panic if missing dictionary page

* Use ArrayRef instead of Arc<ArrayData>

* Update doc comments

* Add integration test

Tweak RecordReader buffering logic

* Add benchmark

* Set write batch size in parquet fuzz tests

Fix bug in column writer with small page sizes

* Fix test_dictionary_preservation

* Add batch_size comment

5 months agoAdd non utf8 values into the test cases of BinaryArray comparison (#1220)
Remzi Yang [Sun, 23 Jan 2022 10:11:39 +0000 (18:11 +0800)] 
Add non utf8 values into the test cases of BinaryArray comparison (#1220)

* add eq_dyn for BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error

Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 bytes to the test cases of BinaryArray comparision

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agoUpdate DECIMAL_RE to allow scientific notation in auto inferred schemas (#1216)
Patrick More [Sat, 22 Jan 2022 20:18:10 +0000 (12:18 -0800)] 
Update DECIMAL_RE to allow scientific notation in auto inferred schemas (#1216)

* Update DECIMAL_RE to allow scientific notation in auto inferred schemas

* Fixed format lint

5 months agoPrepare for 8.0.0 release: Update CHANGELOG and versions (#1212) 8.0.0
Andrew Lamb [Fri, 21 Jan 2022 12:06:57 +0000 (07:06 -0500)] 
Prepare for 8.0.0 release: Update CHANGELOG and versions (#1212)

* Update version to 8.0.0

* Update Changelog for 8.0.0

* restore RAT

* Remove items that were released in 7.0.0

5 months agoImprove changelog generator script settings (#1210)
Andrew Lamb [Fri, 21 Jan 2022 11:58:45 +0000 (06:58 -0500)] 
Improve changelog generator script settings (#1210)

* Update changelog script

* fix typo

5 months agoReturn error from JSON writer rather than panic (#1205)
Yang [Wed, 19 Jan 2022 21:28:52 +0000 (05:28 +0800)] 
Return error from JSON writer rather than panic (#1205)

* Return error from JSON writer rather than panic

* fix comment

5 months agofix a bug in variable sized equality (#1209)
Helgi Kristvin Sigurbjarnarson [Wed, 19 Jan 2022 21:28:31 +0000 (13:28 -0800)] 
fix a bug in variable sized equality (#1209)

A missing validity buffer was being treated as all values being null,
rather than all values being valid, causing equality to fail on some
equivalent string and binary arrays.

5 months agoUpdate parquet crate readme (#1192)
Andrew Lamb [Wed, 19 Jan 2022 18:17:48 +0000 (13:17 -0500)] 
Update parquet crate readme (#1192)

* Update parquet crate readme

* prettier

5 months agoAdd comparison support for fully qualified BinaryArray (#1195)
Remzi Yang [Wed, 19 Jan 2022 18:06:47 +0000 (02:06 +0800)] 
Add comparison support for fully qualified BinaryArray (#1195)

* add eq_dyn for BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agofeat: add support for casting Duration/Interval to Int64Array (#1196)
Edd Robinson [Wed, 19 Jan 2022 12:19:29 +0000 (12:19 +0000)] 
feat: add support for casting Duration/Interval to Int64Array (#1196)

* feat: add support for casting Duration to Int64Array

* feat: cast from Interval to Int64

5 months agoPin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)
Andrew Lamb [Wed, 19 Jan 2022 02:35:43 +0000 (21:35 -0500)] 
Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)

5 months agobugfix in display of float16 array (#1194)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 21:59:27 +0000 (13:59 -0800)] 
bugfix in display of float16 array (#1194)

Due to a typo the float16 array was being cast to a float32 array,
causing a crash when pretty printing a record batch containing float16.

5 months ago parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) (#1082)
Raphael Taylor-Davies [Tue, 18 Jan 2022 12:13:21 +0000 (12:13 +0000)] 
 parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)  (#1082)

* Optimized ByteArrayReader (#1040)

UTF-8 Validation (#786)

* Fix arrow_array_reader benchmark

* Allow running subset of arrow_array_reader benchmarks

* Faster UTF-8 validation

* Tweak null handling

* Add license

* Refine `ValuesBuffer::pad_nulls`

* Tweak error handling

* Use page null count if available

* Doc comments

* Test DELTA_BYTE_ARRAY encoding

* Support legacy Encoding::PLAIN_DICTIONARY

* Add OffsetBuffer unit tests

Review feedback

* More tests

* Fix lint

* Review feedback

5 months agofeat(parquet): support for reading structs nested within lists (#1187)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 12:09:12 +0000 (04:09 -0800)] 
feat(parquet): support for reading structs nested within lists (#1187)

* feat(parquet): support for reading structs nested within lists

* fix: logical conflict

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoupdate nightly version for miri (#1189)
Jiayu Liu [Tue, 18 Jan 2022 01:52:01 +0000 (09:52 +0800)] 
update nightly version for miri (#1189)

5 months agoTruncate bitmask on split (#1183)
Raphael Taylor-Davies [Mon, 17 Jan 2022 15:51:21 +0000 (15:51 +0000)] 
Truncate bitmask on split (#1183)

* Truncate bitmask on split

* Fix BooleanBufferBuilder::resize

* Format

5 months agofix: Fix a bug in how filter indices are calculated (#1185)
Helgi Kristvin Sigurbjarnarson [Mon, 17 Jan 2022 15:49:14 +0000 (07:49 -0800)] 
fix: Fix a bug in how filter indices are calculated (#1185)

* fix: Fix a bug in how filter indices are calculated

Using the definition level and the nullability of the column only
produces the correct indices if max_definition - 1 is the list level.
For deeper nesting (struct in a list) this produces incorrect indices,
silently causing incorrect data to be written.

This fix uses the array offsets to compute the indices instead.

* add assertions

5 months agoSupport DecimalType in sort and take kernels (#1172)
Kun Liu [Mon, 17 Jan 2022 15:42:45 +0000 (23:42 +0800)] 
Support DecimalType in sort and take kernels (#1172)

5 months agoadd from_iter_values for binary array (#1188)
Jiayu Liu [Mon, 17 Jan 2022 15:32:56 +0000 (23:32 +0800)] 
add from_iter_values for binary array (#1188)

5 months agoUse tempfile for parquet tests (#1165)
Raphael Taylor-Davies [Sun, 16 Jan 2022 12:08:35 +0000 (12:08 +0000)] 
Use tempfile for parquet tests (#1165)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoSerialize i128 as JSON string (#1175)
Raphael Taylor-Davies [Sat, 15 Jan 2022 18:49:09 +0000 (18:49 +0000)] 
Serialize i128 as JSON string (#1175)

5 months agoAdd ticket reference for false positive (#1181)
Andrew Lamb [Sat, 15 Jan 2022 18:48:15 +0000 (13:48 -0500)] 
Add ticket reference for false positive (#1181)

5 months agoFix record formatting in 1.58 (#1178)
Raphael Taylor-Davies [Sat, 15 Jan 2022 11:34:44 +0000 (11:34 +0000)] 
Fix record formatting in 1.58 (#1178)

5 months agoBugfix in parquet writing empty lists of structs (#1166)
Helgi Kristvin Sigurbjarnarson [Fri, 14 Jan 2022 18:09:51 +0000 (10:09 -0800)] 
Bugfix in parquet writing empty lists of structs (#1166)

Fix a bug in the definition level calculation for fields nested within a
struct and a list. When a list is empty or null in parquet the nested
field gets a null value. However, in arrow, the value is simply missing.
When serializing an immediate child of the list, the list offsets are
used to calculate the correct definition level for its children, but it
is not carried further to fields nested deeper (e.g., fields on a struct
within a list).  This (somewhat hacky) fix treats a struct within a list
as if it were a list.

5 months agoFix compilation error with simd feature (#1169)
Jörn Horstmann [Fri, 14 Jan 2022 16:17:44 +0000 (17:17 +0100)] 
Fix compilation error with simd feature (#1169)

5 months agoFix new clippy lints introduced in Rust 1.58 (#1170)
Andrew Lamb [Fri, 14 Jan 2022 16:17:36 +0000 (11:17 -0500)] 
Fix new clippy lints introduced in Rust 1.58 (#1170)

5 months agoSimplify and reduce code duplication in arithmetic kernels (#1161)
Jörn Horstmann [Thu, 13 Jan 2022 18:27:52 +0000 (19:27 +0100)] 
Simplify and reduce code duplication in arithmetic kernels (#1161)

* Simplify and reduce code duplication in arithmetic kernels

* Update comments

5 months agoUpdate dev/release/README for master releases, remove supporting scripts (#1143)
Andrew Lamb [Thu, 13 Jan 2022 18:14:39 +0000 (13:14 -0500)] 
Update dev/release/README for master releases, remove supporting scripts (#1143)

* Update dev/release/README for master releases

* remove cherry pick script

5 months agoImprove parquet reading performance for columns with nulls by preserving bitmask...
Raphael Taylor-Davies [Thu, 13 Jan 2022 15:18:55 +0000 (15:18 +0000)] 
Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) (#1054)

* Preserve bitmask (#1037)

* Remove now unnecessary box (#1061)

* Fix handling of empty bitmasks

* More docs

* Add nested nullability test case

* Add packed decoder test

5 months agoRemove left over readme file from arrow/arrow-rs split (#1162)
Andrew Lamb [Thu, 13 Jan 2022 06:33:24 +0000 (01:33 -0500)] 
Remove left over readme file from arrow/arrow-rs split (#1162)

5 months agoFuzz test different parquet encodings (#1156)
Raphael Taylor-Davies [Wed, 12 Jan 2022 14:44:07 +0000 (14:44 +0000)] 
Fuzz test different parquet encodings (#1156)

5 months agoAdd subtract_scalar kernel (#1152)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:22:42 +0000 (04:22 -0800)] 
Add subtract_scalar kernel (#1152)

* Add subtract_scalar

* Rebase

5 months agoAdd multiply_scalar (#1159)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:21:37 +0000 (04:21 -0800)] 
Add multiply_scalar (#1159)

5 months agofeat(json): support for map arrays in json writer (#1149)
Helgi Kristvin Sigurbjarnarson [Tue, 11 Jan 2022 19:19:38 +0000 (11:19 -0800)] 
feat(json): support for map arrays in json writer (#1149)

5 months agoAdd add_scalar kernel (#1151)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:18:52 +0000 (11:18 -0800)] 
Add add_scalar kernel (#1151)

* Add add_scalar

* move simd_float_unary_math_op to simd_unary_math_op

5 months agoDocument safety justification of some uses of `from_trusted_len_iter` (#1148)
Andrew Lamb [Tue, 11 Jan 2022 19:10:01 +0000 (14:10 -0500)] 
Document safety justification of some uses of `from_trusted_len_iter` (#1148)

5 months agoMove simd right out of for_each loop (#1150)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:08:52 +0000 (11:08 -0800)] 
Move simd right out of for_each loop (#1150)

5 months ago Generify ColumnReaderImpl and RecordReader (#1040) (#1041)
Raphael Taylor-Davies [Tue, 11 Jan 2022 18:01:15 +0000 (18:01 +0000)] 
 Generify ColumnReaderImpl and RecordReader (#1040)  (#1041)

* Simplify record reader

* Generify ColumnReaderImpl and RecordReader (#1040)

* Tweak count_records predicate

* Pre-allocate bitmask

* fix: TypedBuffer::split update len

* Simplify GenericRecordReader

* Move column decoders into module

* Remove `RecordBuffer::create` method

* Remove `TypedBuffer<i16>::count_records`

* Pass null count to `ColumnValueDecoder::read`

* Pull null padding out of column reader

* Review feedback

* Format

* License headers

* Further doc tweaks

* Further docs

* Restrict ScalarBuffer types

5 months agoRemove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)
Andrew Lamb [Tue, 11 Jan 2022 17:58:02 +0000 (12:58 -0500)] 
Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)

5 months agoBooleanBufferBuilder::append_packed (#1038) (#1039)
Raphael Taylor-Davies [Tue, 11 Jan 2022 15:12:30 +0000 (15:12 +0000)] 
BooleanBufferBuilder::append_packed (#1038) (#1039)

* BooleanBufferBuilder::append_packed (#1038)

* Update docstring

* Add packed_append_range

* Fix capacity

* Use set_bits from transform::util

* Add license

* Format

5 months agoImprove parquet performance: Skip levels computation for required struct arrays in...
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:05:03 +0000 (14:05 +0000)] 
Improve parquet performance: Skip levels computation for required struct arrays in parquet (#1035)

* Skip levels computation for required struct arrays (#1034)

* Review feedback

5 months agoRestrict RecordReader and friends to scalar types (#1132) (#1155)
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:02:30 +0000 (14:02 +0000)] 
Restrict RecordReader and friends to scalar types (#1132) (#1155)

5 months agoExtends parquet fuzz tests to also tests nulls, dictionaries and row groups with...
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:00:50 +0000 (14:00 +0000)] 
Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages  (#1053) (#1110)

* Parquet fuzz tests (#1053)

* Test multiple WriterVersions

* Revert array_reader change

5 months agoMove more parquet functionality behind experimental feature flag (#1032) (#1134)
Raphael Taylor-Davies [Mon, 10 Jan 2022 21:50:31 +0000 (21:50 +0000)] 
Move more parquet functionality behind experimental feature flag (#1032)  (#1134)

* Move more parquet functionality behind experimental feature flag (#1032)

* Fix logical conflicts

5 months agoImplement SIMD comparison operations for types with less than 4 lanes (i128) (#1146)
Jörn Horstmann [Mon, 10 Jan 2022 21:49:46 +0000 (22:49 +0100)] 
Implement SIMD comparison operations for types with less than 4 lanes (i128) (#1146)

* Implement simd mask creation for 128 bit types

* Adjust comparison kernels to always append 64 bit chunks

* Only append minimal number of bytes

* Add benchmark for MonthDayNano comparison

* Fix typo in comment

Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
* Fix typo in comment

Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
5 months agoFix undefined behavor in GenericStringArray::from_iter_values (#1145)
Andrew Lamb [Mon, 10 Jan 2022 21:47:06 +0000 (16:47 -0500)] 
Fix undefined behavor in GenericStringArray::from_iter_values (#1145)

* Fix undefined behavor in GenericStringArray::from_iter_values

* Cleanup code and tests

* clippy

* Fix test

5 months agoUpdate readme to clarify versioning (#1142) active_release 7.0.0
Andrew Lamb [Sat, 8 Jan 2022 10:28:32 +0000 (05:28 -0500)] 
Update readme to clarify versioning (#1142)

5 months agoUpdate version to 7.0.0 and update CHANGELOG (#1141)
Andrew Lamb [Sat, 8 Jan 2022 10:19:08 +0000 (05:19 -0500)] 
Update version to 7.0.0 and update CHANGELOG (#1141)

* Update changelog generator

* Bring changelog from 6.5.0

* Update changelog

* Update version to 7.0.0

5 months agofeat(ipc): support for reading union arrays through IPC (#1140)
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:46:16 +0000 (14:46 -0800)] 
feat(ipc): support for reading union arrays through IPC (#1140)

5 months agoDyn comparison of interval arrays (#1106) (#1107)
Raphael Taylor-Davies [Thu, 6 Jan 2022 22:22:35 +0000 (22:22 +0000)] 
Dyn comparison of interval arrays (#1106) (#1107)

* Dyn comparison of interval arrays (#1106)

* fix fmt

* Skip test when simd is enabled

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agofeat: union schema serialization/deserialization for ipc (#1135)
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:12:40 +0000 (14:12 -0800)] 
feat: union schema serialization/deserialization for ipc (#1135)

5 months ago*_dyn_scalar kernels: Support Float32Array and Float64Array, (#1127)
Andrew Lamb [Thu, 6 Jan 2022 22:12:23 +0000 (17:12 -0500)] 
*_dyn_scalar kernels: Support Float32Array and Float64Array,  (#1127)

* *_dyn_scalar kernels: Support Float32Array and Float64Array, use ToPrimitive rather than `Into<i128>`m take take &dyn Array rather than `ArrayRef`

* Update APIs for *_dyn_bool_scalar kernels

5 months agoAdd more information on SIMD (#1138)
Benson Muite [Thu, 6 Jan 2022 22:11:01 +0000 (01:11 +0300)] 
Add more information on SIMD (#1138)

5 months agoAdd dyn boolean kernels (#1131)
Matthew Turner [Wed, 5 Jan 2022 21:53:32 +0000 (16:53 -0500)] 
Add dyn boolean kernels (#1131)

* Add dyn bool kernels

* Add tests

* Update error messages

* Update test

* Fix test

* Update doc strings

5 months agoFix reading of dictionary encoded pages with null values (#1111) (#1130)
Yordan Pavlov [Wed, 5 Jan 2022 19:57:19 +0000 (19:57 +0000)] 
Fix reading of dictionary encoded pages with null values (#1111) (#1130)

* fix reading of dictionary encoded pages with null values

* fix linting issues

5 months agoMake arrow::array_reader private (#1032) (#1133)
Raphael Taylor-Davies [Wed, 5 Jan 2022 16:27:27 +0000 (16:27 +0000)] 
Make arrow::array_reader private (#1032) (#1133)

5 months agoImplement Array for ArrayRef, Improve as_* kernels to take `&dyn Array` (#1129)
Andrew Lamb [Wed, 5 Jan 2022 13:45:27 +0000 (08:45 -0500)] 
Implement Array for ArrayRef, Improve as_* kernels to take `&dyn Array` (#1129)

* Implement Array for ArrayRef

* Improve as_* kernels to take &dyn Array

* remove uneeded pyarrow binding

5 months agoAdd Schema::with_metadata and Field::with_metadata (#1092)
Andrew Lamb [Wed, 5 Jan 2022 12:29:25 +0000 (07:29 -0500)] 
Add Schema::with_metadata and Field::with_metadata (#1092)

5 months agoallow using custom datetime format for inference and parsing csv file (#1112)
Sumit [Sun, 2 Jan 2022 16:42:43 +0000 (17:42 +0100)] 
allow using custom datetime format for inference and parsing csv file (#1112)

* allow using custom datetime format for inference and parsing csv file

The patch extends the current implementation to allow passing a custom
datetime_re and datetime_format to the ReaderBuilder.

datetime_re is used infer schema of the csv and then datetime_format is
used to parse the actual string to a Date64.
ofcourse  passing non-compatible datetime_re and datetime_format values
is going to fail the parsing or inference, however it is an expected but
hard-to-detect failure.

* Incorporate some clippy recommendations for limit count of call args

The patch adds a new struct to collect all these options together and
then passes the struct around. Ideally the struct could be embedded into
the reader but that can be done as separate exercise.

* Detect presence of timezone in format while parsing csv for date64

The patch decides on using NaiveDateTime or DateTime from chrono lib
based on presence of timezone components

chrono expects timezone to be presetn if DateTime is used, errors
otherwise. Whereas NaiveDateTime ignores timezone even if explicitly
provided.

5 months agoUpdate Union Array to add `UnionMode`, match latest Arrow Spec, and rename `new...
Andrew Lamb [Sun, 2 Jan 2022 14:37:45 +0000 (09:37 -0500)] 
Update Union Array to add `UnionMode`,  match latest Arrow Spec, and rename `new` -> `unsafe new_unchecked()` (#885)

* Update union array to new null handling

* Update arrow/src/array/array_union.rs

* correct comment

5 months agoAdd kernel and tests (#1125)
Matthew Turner [Sun, 2 Jan 2022 14:24:50 +0000 (09:24 -0500)] 
Add kernel and tests (#1125)

5 months agoAdd kernel and tests (#1123)
Matthew Turner [Sun, 2 Jan 2022 14:24:18 +0000 (09:24 -0500)] 
Add kernel and tests (#1123)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd kernel and tests (#1122)
Matthew Turner [Sun, 2 Jan 2022 13:45:08 +0000 (08:45 -0500)] 
Add kernel and tests (#1122)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd neq dyn scalar kernel (#1118)
Matthew Turner [Sun, 2 Jan 2022 13:02:22 +0000 (08:02 -0500)] 
Add neq dyn scalar kernel (#1118)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add gt_dyn_scalar kernel

* Add gt_eq_dyn_scalar kernel

* Add neq_dyn_scalar kernel

* Add kernel to err message

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd gt eq dyn scalar kernel (#1117)
Matthew Turner [Sun, 2 Jan 2022 12:08:24 +0000 (07:08 -0500)] 
Add gt eq dyn scalar kernel (#1117)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add gt_dyn_scalar kernel

* Add gt_eq_dyn_scalar kernel

* Add kernel to err message

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd gt dyn scalar kernel (#1116)
Matthew Turner [Sun, 2 Jan 2022 11:47:28 +0000 (06:47 -0500)] 
Add gt dyn scalar kernel (#1116)

* Add gt_dyn_scalar kernel

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd lt eq dyn scalar kernel (#1115)
Matthew Turner [Sun, 2 Jan 2022 11:33:51 +0000 (06:33 -0500)] 
Add lt eq dyn scalar kernel (#1115)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add kernel to error message

* fix merge problem

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd kernel and tests (#1121)
Matthew Turner [Sun, 2 Jan 2022 11:25:00 +0000 (06:25 -0500)] 
Add kernel and tests (#1121)

5 months agoAdd kernel and tests (#1124)
Matthew Turner [Sun, 2 Jan 2022 11:11:00 +0000 (06:11 -0500)] 
Add kernel and tests (#1124)

5 months agoAdd lt dyn scalar kernel (#1114)
Matthew Turner [Sun, 2 Jan 2022 11:08:36 +0000 (06:08 -0500)] 
Add lt dyn scalar kernel (#1114)

* Add lt_dyn_scalar and tests

* Add kernel to error message