arrow-rs.git
4 months agoUpdate versions and CHANGELOG for 9.1.0 release (#1325) 9.1.0
Andrew Lamb [Sat, 19 Feb 2022 16:36:34 +0000 (11:36 -0500)] 
Update versions and CHANGELOG for 9.1.0 release (#1325)

* Update version to 10.0.0

* Update changelog generator script

* Initial Changelog

* iter

* one more

* Set version to 9.1.0

* Make it more clear 1282 was not fixed

* touchups

* Update changelog

Co-authored-by: Wakahisa <nevilledips@gmail.com>
4 months agoUpdate the document of function `MutableArrayData::extend` (#1336)
Remzi Yang [Sat, 19 Feb 2022 16:05:11 +0000 (00:05 +0800)] 
Update the document of function `MutableArrayData::extend` (#1336)

* update document

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the fmt

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoClean up DictionaryArray construction in test (#1314)
Andrew Lamb [Sat, 19 Feb 2022 16:02:28 +0000 (11:02 -0500)] 
Clean up DictionaryArray construction in test (#1314)

4 months agoCleanup: remove some dead / test only code (#1331)
Andrew Lamb [Sat, 19 Feb 2022 15:57:52 +0000 (10:57 -0500)] 
Cleanup: remove some dead / test only code (#1331)

4 months agoExpose page encoding `ColumnChunkMetadata` (#1322)
Shani Solomon [Thu, 17 Feb 2022 18:32:17 +0000 (20:32 +0200)] 
Expose page encoding `ColumnChunkMetadata` (#1322)

* init

* replaced test file

* init

* thrift conversion

* refactor

* tests

* clippy

4 months agoEnable dead_code lint (#1324)
Sergey Glushchenko [Thu, 17 Feb 2022 12:28:26 +0000 (13:28 +0100)] 
Enable dead_code lint (#1324)

4 months agoExpose column index and offset index (#1318)
Shani Solomon [Wed, 16 Feb 2022 16:58:08 +0000 (18:58 +0200)] 
Expose column index and offset index (#1318)

# Which issue does this PR close?
Closes #1317.

Exposing the column index and offset index offsets and lengths so parquet engines could optimize their reads.

4 months agoEnable more lints (#1315)
Sergey Glushchenko [Wed, 16 Feb 2022 13:41:38 +0000 (14:41 +0100)] 
Enable more lints (#1315)

4 months agofix test bug and ensure that bloom filter metadata is serialized in `to_thrift` ...
Shani Solomon [Wed, 16 Feb 2022 12:02:38 +0000 (14:02 +0200)] 
fix test bug and ensure that bloom filter metadata is serialized in `to_thrift` (#1320)

* fix test bug and cc metadata to_thrift

* fmt

4 months agoImplement an iterator for DictionaryArray (#1296)
Liang-Chi Hsieh [Wed, 16 Feb 2022 11:57:21 +0000 (03:57 -0800)] 
Implement an iterator for DictionaryArray (#1296)

* Add DictionaryIter

* Try suggested approach.

* do check

* Add to boolean, string and binary arrays

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoUse new DecimalArray creation API in arrow crate (#1249)
Andrew Lamb [Tue, 15 Feb 2022 20:24:12 +0000 (15:24 -0500)] 
Use new DecimalArray creation API in arrow crate (#1249)

* Use new API in ffi.rs

* Use new API in sort.rs

* Use new API in pretty.rs

* Use new API in array_binary.rs

* Use new API in equal_json.rs

* Use new API in take.rs

* Use new API in cast.rs

* Use new API in equal.rs

* clippy

4 months agoAdd `DictionaryArray::try_new()` to create dictionaries from pre existing arrays...
Andrew Lamb [Tue, 15 Feb 2022 20:12:21 +0000 (15:12 -0500)] 
Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays (#1300)

* Add DictionaryArray::try_new()

* Update arrow/src/array/array_dictionary.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
4 months agoExpose has bloom offset (#1309)
Shani Solomon [Tue, 15 Feb 2022 20:04:11 +0000 (22:04 +0200)] 
Expose has bloom offset (#1309)

* expose hasBloomFilters

* added test

* var name

* rename

* updated test to support new test file

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoEnable clippy::type_complexity (#1310)
Sergey Glushchenko [Tue, 15 Feb 2022 18:38:54 +0000 (19:38 +0100)] 
Enable clippy::type_complexity (#1310)

4 months agoUpdate parquet-testing pin (#1311)
Andrew Lamb [Tue, 15 Feb 2022 18:15:25 +0000 (13:15 -0500)] 
Update parquet-testing pin (#1311)

4 months agoVectorized DeltaBitPackDecoder (#1281) (#1284)
Raphael Taylor-Davies [Tue, 15 Feb 2022 15:11:42 +0000 (15:11 +0000)] 
Vectorized DeltaBitPackDecoder (#1281) (#1284)

* Vectorized `DeltaBitPackDecoder` (#1281)

* Review feedback

4 months agoEnable clippy::float_equality_without_abs lint (#1305)
Sergey Glushchenko [Mon, 14 Feb 2022 11:44:46 +0000 (12:44 +0100)] 
Enable clippy::float_equality_without_abs lint (#1305)

4 months agoImplement DictionaryArray support in eq_dyn (#1263)
Liang-Chi Hsieh [Sun, 13 Feb 2022 13:35:40 +0000 (05:35 -0800)] 
Implement DictionaryArray support in eq_dyn (#1263)

* Implement DictionaryArray support in eq_dyn

* For review comment: make eq_dict as generic and rename to cmp_dict. Remove unsafeness.

* Other integer types

* Fix clippy error

* Fix format

* Fix clippy and format

* Add cmp_dict_utf8 and cmp_dict_binary to cover the utf8/binary value array cases

* Add binary test

* Add remaining types

* Add Float32 and Float64 and update a few comments.

4 months agoClean up DecimalArray creation in parquet crate (#1247)
Andrew Lamb [Sun, 13 Feb 2022 13:26:56 +0000 (08:26 -0500)] 
Clean up DecimalArray creation in parquet crate (#1247)

4 months agoChanges for 9.0.2 (#1291)
Andrew Lamb [Sun, 13 Feb 2022 12:25:54 +0000 (07:25 -0500)] 
Changes for 9.0.2  (#1291)

* Fix bitmask creation in chunked part of simd comparison (#1286)

* Update version to 9.0.1

* Update changelog

* Fix bitmask creation also for simd comparisons with scalar (#1290)

* Update versions and changelog for 9.0.2

Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
4 months agoarrow: enable clippy::vec_init_then_push lint (#1303)
Sergey Glushchenko [Sun, 13 Feb 2022 08:59:49 +0000 (09:59 +0100)] 
arrow: enable clippy::vec_init_then_push lint (#1303)

4 months agoFix test_unaligned_bit_chunk_iterator (#1297)
Raphael Taylor-Davies [Thu, 10 Feb 2022 11:36:50 +0000 (11:36 +0000)] 
Fix test_unaligned_bit_chunk_iterator (#1297)

* Fix test_unaligned_bit_chunk_iterator

* Add more verification

* Format

4 months agofix failing csv_writer bench (#1293)
Andy Grove [Thu, 10 Feb 2022 07:58:08 +0000 (00:58 -0700)] 
fix failing csv_writer bench (#1293)

4 months agoSpecialized filter kernels (#1248)
Raphael Taylor-Davies [Wed, 9 Feb 2022 21:03:15 +0000 (21:03 +0000)] 
Specialized filter kernels (#1248)

* Add specialized primitive filter kernels

* Filter context

* Optimize null buffer construction

* Clippy

* Benchmark filter construction

* Review feedback

* Specialized string filter

* Specialized dictionary filter kernel

* Use trusted_len_iter

* Review feedback

* Add fuzz filter test

* Clarify selective vs selectivity confusion

* Revert change to MutableBuffer::from_trusted_len_iter_bool

* Fix filter_bits offset handling

* Review feedback

* Use i64 for chunk offset

* Only optimize filter when filtering multiple columns

* Test truncated filter

* Review feedback

* Add IterationStrategy::None

* Remove selective / selectivity docs confusion

4 months agoFix bitmask creation also for simd comparisons with scalar (#1290)
Jörn Horstmann [Wed, 9 Feb 2022 16:33:36 +0000 (17:33 +0100)] 
Fix bitmask creation also for simd comparisons with scalar (#1290)

4 months ago`DecimalArray` API ergonomics: add iter(), create from iter(), change precision ...
Andrew Lamb [Tue, 8 Feb 2022 20:10:14 +0000 (15:10 -0500)] 
`DecimalArray` API ergonomics: add iter(), create from iter(), change precision / scale (#1223)

* DecimalArray: create from iter, iter(), docs

* Add with_precision and scale

* Implement iter() and into_iter() for DecimalArray

* Clean up and tests

* Return Result rather than panic

* Refactor error handling into separate function

* Validate data in `with_precision_and_scale`
!

* Use named constant values

* clippy

4 months agoSkip zero-ing primitive nulls (#1280)
Raphael Taylor-Davies [Tue, 8 Feb 2022 19:50:48 +0000 (19:50 +0000)] 
Skip zero-ing primitive nulls (#1280)

4 months agoRun rustdoc in CI and error if warnings (#1266)
Andrew Lamb [Tue, 8 Feb 2022 12:04:36 +0000 (07:04 -0500)] 
Run rustdoc in CI and error if warnings (#1266)

* Run rustdoc in CI and error if warnings

* Update .github/workflows/rust.yml

* use nightly for doc check

* install rustfmt

* attempt to install pythondev

* try2

* try3

4 months agoFix some clippy lints in parquet crate, rename `LevelEncoder` variants to conform...
Remzi Yang [Tue, 8 Feb 2022 12:04:17 +0000 (20:04 +0800)] 
Fix some clippy lints in parquet crate, rename `LevelEncoder` variants to conform to Rust standards (#1273)

* disallow vec_init_then_push

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow upper case acronyms

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow transmute ptr to ptr

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow same item push

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow approx constant

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow cast ptr alignment

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow float cmp

Signed-off-by: remzi <13716567376yh@gmail.com>
* check float equality with abs

Signed-off-by: remzi <13716567376yh@gmail.com>
* check incomplete features

Signed-off-by: remzi <13716567376yh@gmail.com>
* check single char names

Signed-off-by: remzi <13716567376yh@gmail.com>
* check needless range loop

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more vec_init_then_push lint, especially in macro

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more float dquality without abs

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more vec push same items

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more needless range loop

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoFix bitmask creation in chunked part of simd comparison (#1286)
Jörn Horstmann [Mon, 7 Feb 2022 17:04:42 +0000 (18:04 +0100)] 
Fix bitmask creation in chunked part of simd comparison (#1286)

4 months agoRestrict Decoder to compatible types (#1276) (#1277)
Raphael Taylor-Davies [Mon, 7 Feb 2022 15:03:30 +0000 (15:03 +0000)] 
Restrict Decoder to compatible types (#1276) (#1277)

4 months agoFix doc warnings (#1268)
Andrew Lamb [Mon, 7 Feb 2022 14:58:19 +0000 (09:58 -0500)] 
Fix doc warnings (#1268)

4 months agoMake rle decoder public (#1271)
Ze'ev Maor [Sun, 6 Feb 2022 01:44:01 +0000 (03:44 +0200)] 
Make rle decoder public (#1271)

4 months agoPrepare for 9.0.0 release: Update version + CHANGELOG (#1265) 9.0.0
Andrew Lamb [Fri, 4 Feb 2022 12:22:22 +0000 (07:22 -0500)] 
Prepare for 9.0.0 release: Update version + CHANGELOG (#1265)

* Update version to 9.0.0

* Update changelog generator script

* Initial changelog

* Updates

* Update again

* rat

4 months agoupgrade clap (#1261)
Jiayu Liu [Thu, 3 Feb 2022 07:39:24 +0000 (15:39 +0800)] 
upgrade clap (#1261)

4 months agoRemove unsupported flag in rustfmt.toml (#1262)
Andrew Lamb [Wed, 2 Feb 2022 20:37:59 +0000 (15:37 -0500)] 
Remove unsupported flag in rustfmt.toml (#1262)

4 months agoImprove module documentation for parquet crate (#1253)
Andrew Lamb [Wed, 2 Feb 2022 20:37:42 +0000 (15:37 -0500)] 
Improve module documentation for parquet crate (#1253)

4 months agoFaster bitmask iteration (#1228)
Raphael Taylor-Davies [Wed, 2 Feb 2022 20:37:10 +0000 (20:37 +0000)] 
Faster bitmask iteration (#1228)

* Add UnalignedBitChunks (#1227)

* Clippy

* Fix flaky test

* Improve test legibility

* Fix SlicesIterator offset direction

* Format

* Fix byte-aligned termination

* Test edge-cases

* More tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Review feedback

* Make UnalignedBitChunkIterator crate local

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoAdd `async` arrow parquet reader (#1154)
Raphael Taylor-Davies [Wed, 2 Feb 2022 11:31:21 +0000 (11:31 +0000)] 
Add `async` arrow parquet reader (#1154)

* Async parquet reader (#111)

Add Sync + Send bounds to parquet crate

* Remove Sync from DataType

* Review feedback

* Add basic test

* Fix lints

* Review feedback

* Tweak CI

4 months agoRefresh readme / contributing guide (#1252)
Andrew Lamb [Wed, 2 Feb 2022 11:16:05 +0000 (06:16 -0500)] 
Refresh readme / contributing guide (#1252)

4 months agoUpdate chrono-tz requirement from 0.4 to 0.6 (#1259)
dependabot[bot] [Tue, 1 Feb 2022 21:58:26 +0000 (16:58 -0500)] 
Update chrono-tz requirement from 0.4 to 0.6 (#1259)

Updates the requirements on [chrono-tz](https://github.com/chronotope/chrono-tz) to permit the latest version.
- [Release notes](https://github.com/chronotope/chrono-tz/releases)
- [Changelog](https://github.com/chronotope/chrono-tz/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono-tz/commits/v0.6.1)

---
updated-dependencies:
- dependency-name: chrono-tz
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoUpdate zstd requirement from 0.9 to 0.10 (#1257)
dependabot[bot] [Tue, 1 Feb 2022 21:58:13 +0000 (16:58 -0500)] 
Update zstd requirement from 0.9 to 0.10 (#1257)

Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](https://github.com/gyscos/zstd-rs/compare/0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: zstd
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoAdd dependabot (#1256)
Andrew Lamb [Tue, 1 Feb 2022 17:48:16 +0000 (12:48 -0500)] 
Add dependabot (#1256)

4 months agoBatch multiple records in ArrowWriter (#1214)
Raphael Taylor-Davies [Tue, 1 Feb 2022 14:28:15 +0000 (14:28 +0000)] 
Batch multiple records in ArrowWriter (#1214)

* Batch multiple records in ArrowWriter

* Document max_group_size and reduce default (#1213)

* Review feedback

* Write multiple arrays without concat

* Clippy

* Test aggregating complex types

* Test complex slice

* Clippy

4 months agoRevert "Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)" (#1251)
Andrew Lamb [Tue, 1 Feb 2022 14:02:49 +0000 (09:02 -0500)] 
Revert "Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)" (#1251)

This reverts commit d68c4ae14077d60326eb57fe28133645f800d7e5.

4 months agoAdd docs examples for dynamically compare functions (#1250)
Remzi Yang [Mon, 31 Jan 2022 21:58:58 +0000 (05:58 +0800)] 
Add docs examples for dynamically compare functions  (#1250)

* add examples

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct rust format

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoFix NullArrayReader (#1245) (#1246)
Raphael Taylor-Davies [Sat, 29 Jan 2022 15:53:10 +0000 (15:53 +0000)] 
Fix NullArrayReader (#1245) (#1246)

5 months agoRevert making parquet::data_type and parquet::arrow::schema experimental (#1244)
Raphael Taylor-Davies [Fri, 28 Jan 2022 19:20:04 +0000 (19:20 +0000)] 
Revert making parquet::data_type and parquet::arrow::schema experimental (#1244)

5 months agorename to Bitmap::bit_len (#1242)
Remzi Yang [Fri, 28 Jan 2022 11:43:07 +0000 (19:43 +0800)] 
rename to Bitmap::bit_len (#1242)

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agoAdd Rust Docs examples for UnionArray (#1241)
Remzi Yang [Thu, 27 Jan 2022 20:54:21 +0000 (04:54 +0800)] 
Add Rust Docs examples for UnionArray (#1241)

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agoRemove Copy trait from dyn scalar kernels (#1243)
Matthew Turner [Thu, 27 Jan 2022 17:05:33 +0000 (12:05 -0500)] 
Remove Copy trait from dyn scalar kernels (#1243)

5 months agodyn compare for binary array (#1238)
Remzi Yang [Wed, 26 Jan 2022 12:18:25 +0000 (20:18 +0800)] 
dyn compare for binary array (#1238)

* dyn compare two binary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* add dyn comparison for binary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for dyn compare binary array and scalar

Signed-off-by: remzi <13716567376yh@gmail.com>
* remove DictionaryArray from dyn compare, because not find an easy way to build binary dictionary array

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix mistakes in test code

Signed-off-by: remzi <13716567376yh@gmail.com>
* add Nones into the test cases

Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 scalar

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code format

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agoImprove documentation for Bitmap (#1237)
Andrew Lamb [Tue, 25 Jan 2022 11:35:26 +0000 (06:35 -0500)] 
Improve documentation for Bitmap (#1237)

5 months agoFix null bitmap length validation (#1231) (#1232)
Raphael Taylor-Davies [Mon, 24 Jan 2022 21:31:23 +0000 (21:31 +0000)] 
Fix null bitmap length validation (#1231) (#1232)

* Fix null bitmap length validation (#1231)

* Reuse existing test

5 months agoRemove explicit simd arithmetic kernels except for division/modulo (#1221)
Jörn Horstmann [Mon, 24 Jan 2022 19:50:30 +0000 (20:50 +0100)] 
Remove explicit simd arithmetic kernels except for division/modulo (#1221)

* Extend arithmetic benchmarks

* Remove explicit simd arithmetic except for div/mod because autovectorization generates better code

* Remove unneeded return keywords

5 months agoRemove memory-check feature (#1222)
Jörn Horstmann [Mon, 24 Jan 2022 19:28:15 +0000 (20:28 +0100)] 
Remove memory-check feature (#1222)

5 months ago[Minor]Re-export `array::builder::make_builder` to make it available for downstream...
Yijie Shen [Mon, 24 Jan 2022 19:27:18 +0000 (03:27 +0800)] 
[Minor]Re-export `array::builder::make_builder` to make it available for downstream (#1235)

* Re-export `array::builder::make_builder`

* Update mod.rs

5 months ago[Minor]`into_inner` for IPC `FileWriter` (#1236)
Yijie Shen [Mon, 24 Jan 2022 19:23:40 +0000 (03:23 +0800)] 
[Minor]`into_inner` for IPC `FileWriter` (#1236)

* `into_inner` for IPC `FileWriter`

* lint

5 months agoDo not concatenate identical dictionaries (#1219)
Raphael Taylor-Davies [Mon, 24 Jan 2022 19:22:43 +0000 (19:22 +0000)] 
Do not concatenate identical dictionaries (#1219)

* Do not concatenate identical dictionaries (#504)

* Review feedback

5 months agoRemove arrow array reader (#1197) (#1234)
Raphael Taylor-Davies [Mon, 24 Jan 2022 15:07:26 +0000 (15:07 +0000)] 
Remove arrow array reader (#1197) (#1234)

5 months agoPreserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improv...
Raphael Taylor-Davies [Mon, 24 Jan 2022 12:00:44 +0000 (12:00 +0000)] 
Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) (#1180)

* Preserve dictionary encoding from parquet (#171)

* Use OffsetBuffer::into_array for dictionary

* Fix and test handling of empty dictionaries

Don't panic if missing dictionary page

* Use ArrayRef instead of Arc<ArrayData>

* Update doc comments

* Add integration test

Tweak RecordReader buffering logic

* Add benchmark

* Set write batch size in parquet fuzz tests

Fix bug in column writer with small page sizes

* Fix test_dictionary_preservation

* Add batch_size comment

5 months agoAdd non utf8 values into the test cases of BinaryArray comparison (#1220)
Remzi Yang [Sun, 23 Jan 2022 10:11:39 +0000 (18:11 +0800)] 
Add non utf8 values into the test cases of BinaryArray comparison (#1220)

* add eq_dyn for BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error

Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 bytes to the test cases of BinaryArray comparision

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agoUpdate DECIMAL_RE to allow scientific notation in auto inferred schemas (#1216)
Patrick More [Sat, 22 Jan 2022 20:18:10 +0000 (12:18 -0800)] 
Update DECIMAL_RE to allow scientific notation in auto inferred schemas (#1216)

* Update DECIMAL_RE to allow scientific notation in auto inferred schemas

* Fixed format lint

5 months agoPrepare for 8.0.0 release: Update CHANGELOG and versions (#1212) 8.0.0
Andrew Lamb [Fri, 21 Jan 2022 12:06:57 +0000 (07:06 -0500)] 
Prepare for 8.0.0 release: Update CHANGELOG and versions (#1212)

* Update version to 8.0.0

* Update Changelog for 8.0.0

* restore RAT

* Remove items that were released in 7.0.0

5 months agoImprove changelog generator script settings (#1210)
Andrew Lamb [Fri, 21 Jan 2022 11:58:45 +0000 (06:58 -0500)] 
Improve changelog generator script settings (#1210)

* Update changelog script

* fix typo

5 months agoReturn error from JSON writer rather than panic (#1205)
Yang [Wed, 19 Jan 2022 21:28:52 +0000 (05:28 +0800)] 
Return error from JSON writer rather than panic (#1205)

* Return error from JSON writer rather than panic

* fix comment

5 months agofix a bug in variable sized equality (#1209)
Helgi Kristvin Sigurbjarnarson [Wed, 19 Jan 2022 21:28:31 +0000 (13:28 -0800)] 
fix a bug in variable sized equality (#1209)

A missing validity buffer was being treated as all values being null,
rather than all values being valid, causing equality to fail on some
equivalent string and binary arrays.

5 months agoUpdate parquet crate readme (#1192)
Andrew Lamb [Wed, 19 Jan 2022 18:17:48 +0000 (13:17 -0500)] 
Update parquet crate readme (#1192)

* Update parquet crate readme

* prettier

5 months agoAdd comparison support for fully qualified BinaryArray (#1195)
Remzi Yang [Wed, 19 Jan 2022 18:06:47 +0000 (02:06 +0800)] 
Add comparison support for fully qualified BinaryArray (#1195)

* add eq_dyn for BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agofeat: add support for casting Duration/Interval to Int64Array (#1196)
Edd Robinson [Wed, 19 Jan 2022 12:19:29 +0000 (12:19 +0000)] 
feat: add support for casting Duration/Interval to Int64Array (#1196)

* feat: add support for casting Duration to Int64Array

* feat: cast from Interval to Int64

5 months agoPin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)
Andrew Lamb [Wed, 19 Jan 2022 02:35:43 +0000 (21:35 -0500)] 
Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)

5 months agobugfix in display of float16 array (#1194)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 21:59:27 +0000 (13:59 -0800)] 
bugfix in display of float16 array (#1194)

Due to a typo the float16 array was being cast to a float32 array,
causing a crash when pretty printing a record batch containing float16.

5 months ago parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) (#1082)
Raphael Taylor-Davies [Tue, 18 Jan 2022 12:13:21 +0000 (12:13 +0000)] 
 parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)  (#1082)

* Optimized ByteArrayReader (#1040)

UTF-8 Validation (#786)

* Fix arrow_array_reader benchmark

* Allow running subset of arrow_array_reader benchmarks

* Faster UTF-8 validation

* Tweak null handling

* Add license

* Refine `ValuesBuffer::pad_nulls`

* Tweak error handling

* Use page null count if available

* Doc comments

* Test DELTA_BYTE_ARRAY encoding

* Support legacy Encoding::PLAIN_DICTIONARY

* Add OffsetBuffer unit tests

Review feedback

* More tests

* Fix lint

* Review feedback

5 months agofeat(parquet): support for reading structs nested within lists (#1187)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 12:09:12 +0000 (04:09 -0800)] 
feat(parquet): support for reading structs nested within lists (#1187)

* feat(parquet): support for reading structs nested within lists

* fix: logical conflict

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoupdate nightly version for miri (#1189)
Jiayu Liu [Tue, 18 Jan 2022 01:52:01 +0000 (09:52 +0800)] 
update nightly version for miri (#1189)

5 months agoTruncate bitmask on split (#1183)
Raphael Taylor-Davies [Mon, 17 Jan 2022 15:51:21 +0000 (15:51 +0000)] 
Truncate bitmask on split (#1183)

* Truncate bitmask on split

* Fix BooleanBufferBuilder::resize

* Format

5 months agofix: Fix a bug in how filter indices are calculated (#1185)
Helgi Kristvin Sigurbjarnarson [Mon, 17 Jan 2022 15:49:14 +0000 (07:49 -0800)] 
fix: Fix a bug in how filter indices are calculated (#1185)

* fix: Fix a bug in how filter indices are calculated

Using the definition level and the nullability of the column only
produces the correct indices if max_definition - 1 is the list level.
For deeper nesting (struct in a list) this produces incorrect indices,
silently causing incorrect data to be written.

This fix uses the array offsets to compute the indices instead.

* add assertions

5 months agoSupport DecimalType in sort and take kernels (#1172)
Kun Liu [Mon, 17 Jan 2022 15:42:45 +0000 (23:42 +0800)] 
Support DecimalType in sort and take kernels (#1172)

5 months agoadd from_iter_values for binary array (#1188)
Jiayu Liu [Mon, 17 Jan 2022 15:32:56 +0000 (23:32 +0800)] 
add from_iter_values for binary array (#1188)

5 months agoUse tempfile for parquet tests (#1165)
Raphael Taylor-Davies [Sun, 16 Jan 2022 12:08:35 +0000 (12:08 +0000)] 
Use tempfile for parquet tests (#1165)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoSerialize i128 as JSON string (#1175)
Raphael Taylor-Davies [Sat, 15 Jan 2022 18:49:09 +0000 (18:49 +0000)] 
Serialize i128 as JSON string (#1175)

5 months agoAdd ticket reference for false positive (#1181)
Andrew Lamb [Sat, 15 Jan 2022 18:48:15 +0000 (13:48 -0500)] 
Add ticket reference for false positive (#1181)

5 months agoFix record formatting in 1.58 (#1178)
Raphael Taylor-Davies [Sat, 15 Jan 2022 11:34:44 +0000 (11:34 +0000)] 
Fix record formatting in 1.58 (#1178)

5 months agoBugfix in parquet writing empty lists of structs (#1166)
Helgi Kristvin Sigurbjarnarson [Fri, 14 Jan 2022 18:09:51 +0000 (10:09 -0800)] 
Bugfix in parquet writing empty lists of structs (#1166)

Fix a bug in the definition level calculation for fields nested within a
struct and a list. When a list is empty or null in parquet the nested
field gets a null value. However, in arrow, the value is simply missing.
When serializing an immediate child of the list, the list offsets are
used to calculate the correct definition level for its children, but it
is not carried further to fields nested deeper (e.g., fields on a struct
within a list).  This (somewhat hacky) fix treats a struct within a list
as if it were a list.

5 months agoFix compilation error with simd feature (#1169)
Jörn Horstmann [Fri, 14 Jan 2022 16:17:44 +0000 (17:17 +0100)] 
Fix compilation error with simd feature (#1169)

5 months agoFix new clippy lints introduced in Rust 1.58 (#1170)
Andrew Lamb [Fri, 14 Jan 2022 16:17:36 +0000 (11:17 -0500)] 
Fix new clippy lints introduced in Rust 1.58 (#1170)

5 months agoSimplify and reduce code duplication in arithmetic kernels (#1161)
Jörn Horstmann [Thu, 13 Jan 2022 18:27:52 +0000 (19:27 +0100)] 
Simplify and reduce code duplication in arithmetic kernels (#1161)

* Simplify and reduce code duplication in arithmetic kernels

* Update comments

5 months agoUpdate dev/release/README for master releases, remove supporting scripts (#1143)
Andrew Lamb [Thu, 13 Jan 2022 18:14:39 +0000 (13:14 -0500)] 
Update dev/release/README for master releases, remove supporting scripts (#1143)

* Update dev/release/README for master releases

* remove cherry pick script

5 months agoImprove parquet reading performance for columns with nulls by preserving bitmask...
Raphael Taylor-Davies [Thu, 13 Jan 2022 15:18:55 +0000 (15:18 +0000)] 
Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) (#1054)

* Preserve bitmask (#1037)

* Remove now unnecessary box (#1061)

* Fix handling of empty bitmasks

* More docs

* Add nested nullability test case

* Add packed decoder test

5 months agoRemove left over readme file from arrow/arrow-rs split (#1162)
Andrew Lamb [Thu, 13 Jan 2022 06:33:24 +0000 (01:33 -0500)] 
Remove left over readme file from arrow/arrow-rs split (#1162)

5 months agoFuzz test different parquet encodings (#1156)
Raphael Taylor-Davies [Wed, 12 Jan 2022 14:44:07 +0000 (14:44 +0000)] 
Fuzz test different parquet encodings (#1156)

5 months agoAdd subtract_scalar kernel (#1152)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:22:42 +0000 (04:22 -0800)] 
Add subtract_scalar kernel (#1152)

* Add subtract_scalar

* Rebase

5 months agoAdd multiply_scalar (#1159)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:21:37 +0000 (04:21 -0800)] 
Add multiply_scalar (#1159)

5 months agofeat(json): support for map arrays in json writer (#1149)
Helgi Kristvin Sigurbjarnarson [Tue, 11 Jan 2022 19:19:38 +0000 (11:19 -0800)] 
feat(json): support for map arrays in json writer (#1149)

5 months agoAdd add_scalar kernel (#1151)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:18:52 +0000 (11:18 -0800)] 
Add add_scalar kernel (#1151)

* Add add_scalar

* move simd_float_unary_math_op to simd_unary_math_op

5 months agoDocument safety justification of some uses of `from_trusted_len_iter` (#1148)
Andrew Lamb [Tue, 11 Jan 2022 19:10:01 +0000 (14:10 -0500)] 
Document safety justification of some uses of `from_trusted_len_iter` (#1148)

5 months agoMove simd right out of for_each loop (#1150)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:08:52 +0000 (11:08 -0800)] 
Move simd right out of for_each loop (#1150)

5 months ago Generify ColumnReaderImpl and RecordReader (#1040) (#1041)
Raphael Taylor-Davies [Tue, 11 Jan 2022 18:01:15 +0000 (18:01 +0000)] 
 Generify ColumnReaderImpl and RecordReader (#1040)  (#1041)

* Simplify record reader

* Generify ColumnReaderImpl and RecordReader (#1040)

* Tweak count_records predicate

* Pre-allocate bitmask

* fix: TypedBuffer::split update len

* Simplify GenericRecordReader

* Move column decoders into module

* Remove `RecordBuffer::create` method

* Remove `TypedBuffer<i16>::count_records`

* Pass null count to `ColumnValueDecoder::read`

* Pull null padding out of column reader

* Review feedback

* Format

* License headers

* Further doc tweaks

* Further docs

* Restrict ScalarBuffer types

5 months agoRemove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)
Andrew Lamb [Tue, 11 Jan 2022 17:58:02 +0000 (12:58 -0500)] 
Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)

5 months agoBooleanBufferBuilder::append_packed (#1038) (#1039)
Raphael Taylor-Davies [Tue, 11 Jan 2022 15:12:30 +0000 (15:12 +0000)] 
BooleanBufferBuilder::append_packed (#1038) (#1039)

* BooleanBufferBuilder::append_packed (#1038)

* Update docstring

* Add packed_append_range

* Fix capacity

* Use set_bits from transform::util

* Add license

* Format