arrow-rs.git
3 months agoPrepare for 11.0.0 release (#1461) 11.0.0
Andrew Lamb [Fri, 18 Mar 2022 07:46:56 +0000 (03:46 -0400)] 
Prepare for 11.0.0 release (#1461)

* Update version to 11.0.0

* Update changelog

* update changelog

* fixup

* tweak

3 months agoFix generate_interval_case in integration test (#1446)
Liang-Chi Hsieh [Thu, 17 Mar 2022 12:45:56 +0000 (05:45 -0700)] 
Fix generate_interval_case in integration test (#1446)

* Fix generate_interval_case

* Fix

3 months agorewrite doc (#1450)
Remzi Yang [Thu, 17 Mar 2022 12:43:10 +0000 (20:43 +0800)] 
rewrite doc (#1450)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoenhancement: remove redundant if/clamp_min/abs (#1428)
jakevin [Thu, 17 Mar 2022 07:35:28 +0000 (15:35 +0800)] 
enhancement: remove redundant if/clamp_min/abs (#1428)

3 months agoSet `default-features = false` for `zstd` in the parquet crate to support `wasm32...
Kyle Barron [Wed, 16 Mar 2022 12:23:01 +0000 (06:23 -0600)] 
Set `default-features = false` for `zstd` in the parquet crate to support `wasm32-unknown-unknown` (#1414)

* Update zstd version for wasm support

* Bump to 0.11.1

3 months ago`filter` kernel should work with FixedSizeListArrays (#1434)
Liang-Chi Hsieh [Wed, 16 Mar 2022 12:22:49 +0000 (05:22 -0700)] 
`filter` kernel should work with FixedSizeListArrays (#1434)

* filter kernel should work with FixedSizeListArrays

* Fix clippy

* Fix clippy

3 months agofilter kernel should work with UnionArray (#1412)
Liang-Chi Hsieh [Wed, 16 Mar 2022 07:40:44 +0000 (00:40 -0700)] 
filter kernel should work with UnionArray (#1412)

3 months agoRewrite doc example of ListArray and LargeListArray (#1447)
Remzi Yang [Wed, 16 Mar 2022 06:45:33 +0000 (14:45 +0800)] 
Rewrite doc example of ListArray and LargeListArray (#1447)

3 months agoFix generate_decimal128_case (#1440)
Liang-Chi Hsieh [Mon, 14 Mar 2022 12:12:14 +0000 (05:12 -0700)] 
Fix generate_decimal128_case (#1440)

3 months agoFix integration doc (#1438)
Liang-Chi Hsieh [Mon, 14 Mar 2022 07:11:02 +0000 (00:11 -0700)] 
Fix integration doc (#1438)

3 months agoFix DeltaBitPack MiniBlock Bit Width Padding (#1418)
Raphael Taylor-Davies [Mon, 14 Mar 2022 07:05:54 +0000 (07:05 +0000)] 
Fix DeltaBitPack MiniBlock Bit Width Padding (#1418)

* Consistent DeltaBitPackEncoder bit width padding (#1416)

Ignore non-zero padded bit widths in DeltaBitPackDecoder (#1417)

* chore: review feedback

* Add test of DeltaBitPackDecoder padding

* Revert formatting

3 months agoAdd doc example for creating `FixedSizeListArray` (#1426)
Remzi Yang [Sat, 12 Mar 2022 17:51:34 +0000 (01:51 +0800)] 
Add doc example for creating `FixedSizeListArray` (#1426)

3 months agoSupport nullable keys in DictionaryArray::try_new (#1430)
Jörn Horstmann [Fri, 11 Mar 2022 20:02:05 +0000 (21:02 +0100)] 
Support nullable keys in DictionaryArray::try_new (#1430)

* Support nullable keys in DictionaryArray::try_new

* Set null count so it does not have to be recalculated

3 months agoFix possibly unaligned writes in MutableBuffer (#1421)
Jörn Horstmann [Fri, 11 Mar 2022 19:22:18 +0000 (20:22 +0100)] 
Fix possibly unaligned writes in MutableBuffer (#1421)

* Fix possibly unaligned writes in MutableBuffer

* Remove debug output and make from_trusted_len_iter follow the same pattern

* Add comment in extend_from_slice

3 months agoAdd value_unchecked() for FixedSizeBinaryArray (#1420)
jakevin [Fri, 11 Mar 2022 18:31:08 +0000 (02:31 +0800)] 
Add value_unchecked() for FixedSizeBinaryArray (#1420)

3 months agoUpdate zstd requirement from 0.10 to 0.11 (#1415)
dependabot[bot] [Fri, 11 Mar 2022 12:19:41 +0000 (07:19 -0500)] 
Update zstd requirement from 0.10 to 0.11 (#1415)

Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](https://github.com/gyscos/zstd-rs/compare/v0.10.0...v0.11.0)

---
updated-dependencies:
- dependency-name: zstd
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 months agoImplement basic FlightSQL Server (#1386)
Wang Fenjin [Fri, 11 Mar 2022 11:19:18 +0000 (19:19 +0800)] 
Implement basic FlightSQL Server (#1386)

* init impl flight sql server mod

Change-Id: I108b2468b078470bb8b6f95c031035cc09227986

* update according to comments

Change-Id: Ibb381e105041b38e6402850a2338403f802568ec

* fix ci error

Change-Id: I9485e510f1a960b6e094e559c3679434f8474ec1

* format code

Change-Id: I7ef4ade3acc81ccf5df088c866d41b538cf6f4f2

* fix clippy issue

Change-Id: I35d108ef43f2c2245444cfd5ea82da00b4f694f9

* add more test

Change-Id: Ic159cea2c76b017e183d2946e2d24e6fd1f9b4c1

* improve error handling

Change-Id: I709c16613092fd42ccff827eed3e3ad3f28368e2

* delete unnecessary Sync

Change-Id: I03ed0f69ddb1203ecd75982815fa72eca4d81160

* add flight_sql_server example

Change-Id: Ia35d697aaac3c72feba9c3aaf380ee3930484c48

* get rid of type annotation in unpack

Change-Id: I6006702d424ac6595f58c66057df267c4fd24476

* fix comments

Change-Id: I740d3d4e5aabbb56219291381e6a6db6506eca28

* add feature flight-sql

Change-Id: I223cf76be10ff379fcc9000c730d99c9773c7c3d

* delete all-features flag as packed_simd_2 no supported

Change-Id: I50915b85b2f806bac5cd3207623e3f4e0e1974a1

* add feature flag for example

Change-Id: I562efcfa89a606b8061d2715ca1b6775e2a952a9

* fix do_put and do_action API

Change-Id: I80bef8c2b0a713a87c43487708ae721f5f8f9da9

* format code

Change-Id: Ie664a5fca965759dbba59ad9e34fc6e33150ddbf

* rename feature

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* rename to flight-sql-experimental

Change-Id: I4de4fe3768b0316e69ba6798406310632933d25d

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoRemove duplicate bound check in the function shift (#1409)
Remzi Yang [Fri, 11 Mar 2022 02:40:15 +0000 (10:40 +0800)] 
Remove duplicate bound check in the function shift (#1409)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoDirectly write to MutableBuffer in substring (#1423)
Liang-Chi Hsieh [Fri, 11 Mar 2022 02:37:06 +0000 (18:37 -0800)] 
Directly write to MutableBuffer in substring (#1423)

3 months agoImplement projection for arrow file / streams (#1339)
Daniël Heres [Wed, 9 Mar 2022 17:40:53 +0000 (18:40 +0100)] 
Implement projection for arrow file / streams (#1339)

* Implement projection for arrow file / streams

* Tests

* Fix

* Fix

* Add test

* Add test

* Add link

* Undo change to existing test

* Update arrow/src/ipc/reader.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Use project

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoAdd dictionary support for C data interface (#1407)
Chao Sun [Wed, 9 Mar 2022 07:27:43 +0000 (23:27 -0800)] 
Add dictionary support for C data interface (#1407)

* initial commit

* add integration tests for python

* address comments

3 months agofix (#1406)
Remzi Yang [Tue, 8 Mar 2022 15:28:20 +0000 (23:28 +0800)] 
fix (#1406)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoadd unit test to check all none (#1405)
jakevin [Mon, 7 Mar 2022 21:29:59 +0000 (05:29 +0800)] 
add unit test to check all none (#1405)

3 months agoImprove integration testing docs (#1403)
Andrew Lamb [Sun, 6 Mar 2022 18:41:11 +0000 (13:41 -0500)] 
Improve integration testing docs (#1403)

3 months agoMove csv Parser trait and its implementations to utils module (#1385)
Sumit [Sun, 6 Mar 2022 13:36:36 +0000 (14:36 +0100)] 
Move csv Parser trait and its implementations to utils module (#1385)

* move Parser trait to utils

this allow the parser capabilities to be re-used for json module

* implement parse_formatted for date32

* remove redundant checks

* Update arrow/src/util/mod.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* make Parser trait pub(crate) only and not pub

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoIntroduce `ReadOptions` with builder API, for parquet filter row groups that satisfy...
Yijie Shen [Sun, 6 Mar 2022 11:44:54 +0000 (19:44 +0800)] 
Introduce `ReadOptions` with builder API, for parquet filter row groups that satisfy all filters, and enable filter row groups by range. (#1389)

* Filter row groups by comparing midpoint with offset range

* lint

* ReadOptions with builder API

* fix comments

* precise range doc

* tab to space

3 months agoAdd note in contributing guideline about types of contributions (#1396)
Andrew Lamb [Sun, 6 Mar 2022 11:41:58 +0000 (06:41 -0500)] 
Add note in contributing guideline about types of contributions (#1396)

* Add note in contributing guideline about types of contributions

* prettier

3 months agofix: Fix grpc schema hack in flight integration test (#1402)
Andrew Lamb [Sat, 5 Mar 2022 13:54:34 +0000 (08:54 -0500)] 
fix: Fix grpc schema hack in flight integration test (#1402)

3 months agoPrepare for the 10.0.0 release (#1395) 10.0.0
Andrew Lamb [Sat, 5 Mar 2022 11:38:10 +0000 (06:38 -0500)] 
Prepare for the 10.0.0 release (#1395)

* Update version to 10.0.0

* Initial 10.0.0 CHANGELOG

* Cleanup CHANGELOG

* Update for last change

3 months agoAdd extract month and day in temporal.rs (#1388)
Yang Jiang [Fri, 4 Mar 2022 11:42:22 +0000 (19:42 +0800)] 
Add extract month and day in temporal.rs (#1388)

* Add extract month in temporal.rs

* fix clippy

* implement day

* add ut

* fix clippy

3 months agoClarify release instructions about when to merge CHANGELOG update (#1370)
Andrew Lamb [Thu, 3 Mar 2022 18:26:06 +0000 (13:26 -0500)] 
Clarify release instructions about when to merge CHANGELOG update (#1370)

3 months agofeat: support maps in MutableArrayData (#1379)
Helgi Kristvin Sigurbjarnarson [Thu, 3 Mar 2022 18:15:46 +0000 (10:15 -0800)] 
feat: support maps in MutableArrayData (#1379)

Additionally, this allows the use fo `filter` on record batches and
arrays containing maps.

3 months agoAdd write method to Json Writer (#1383)
Matthew Turner [Thu, 3 Mar 2022 17:49:18 +0000 (12:49 -0500)] 
Add write method to Json Writer (#1383)

* Add write method

* Add docs

3 months agoSpeed up the function `min_max_string` (#1374)
Remzi Yang [Thu, 3 Mar 2022 17:47:35 +0000 (01:47 +0800)] 
Speed up the function `min_max_string` (#1374)

* clean up the code

Signed-off-by: remzi <13716567376yh@gmail.com>
* bring back the optimization when null count is zero

Signed-off-by: remzi <13716567376yh@gmail.com>
* pretty the trait bound and update comment

Signed-off-by: remzi <13716567376yh@gmail.com>
* use value_unchecked to replace array.value
10% extra speed up

Signed-off-by: remzi <13716567376yh@gmail.com>
* update the performance data

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoAllow primitive array creation from iterators of PrimitiveTypes (as well as `Option...
Liang-Chi Hsieh [Thu, 3 Mar 2022 14:04:43 +0000 (06:04 -0800)] 
Allow primitive array creation from iterators of PrimitiveTypes (as well as `Option`) (#1367)

* More idiomatic primitive array creation

* Use From instead for clippy

* Rename to NativeAdapter and add document

3 months agoImprove performance if dictionary kernels, add benchmark and add `take_iter_unchecked...
Liang-Chi Hsieh [Thu, 3 Mar 2022 11:34:00 +0000 (03:34 -0800)] 
Improve performance if dictionary kernels, add benchmark and add `take_iter_unchecked` (#1372)

* Add benchmark and take_iter_unchecked.

* Add Safety section for clippy

* Update arrow/src/compute/kernels/comparison.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoSupport extract `week` in temporal.rs (#1376)
Yang Jiang [Wed, 2 Mar 2022 16:30:16 +0000 (00:30 +0800)] 
Support extract `week` in temporal.rs (#1376)

* Add extract week in temporal.rs

* add more test conditions

* add comments

3 months agoAdd Clone to IpcWriteOptions (#1382)
Matthew Turner [Wed, 2 Mar 2022 16:15:57 +0000 (11:15 -0500)] 
Add Clone to IpcWriteOptions (#1382)

3 months agoRefactor `RecordBatch::validate_new_batch` (#1361)
Remzi Yang [Wed, 2 Mar 2022 15:26:11 +0000 (23:26 +0800)] 
Refactor `RecordBatch::validate_new_batch` (#1361)

* refactor checking same row count

Signed-off-by: remzi <13716567376yh@gmail.com>
* refactor matching schema

Signed-off-by: remzi <13716567376yh@gmail.com>
* add more comments
simplify the iterator

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agorefactor (#1346)
Shani Solomon [Wed, 2 Mar 2022 15:25:25 +0000 (17:25 +0200)] 
refactor (#1346)

3 months agoImplement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn...
Liang-Chi Hsieh [Tue, 1 Mar 2022 11:37:48 +0000 (03:37 -0800)] 
Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn (#1326)

* Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn

* Fix clippy

* Fix format

* Add test

* For review comment and suggestion

* Allow reasonable boolean comparisons

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoUpdate pyo3 requirement from 0.15 to 0.16 (#1369)
dependabot[bot] [Tue, 1 Mar 2022 04:10:49 +0000 (12:10 +0800)] 
Update pyo3 requirement from 0.15 to 0.16 (#1369)

3 months agosupport as_decimal_array api (#1356)
Kun Liu [Mon, 28 Feb 2022 21:02:06 +0000 (05:02 +0800)] 
support as_decimal_array api (#1356)

3 months agoUse DictionaryArray's iterator (#1330)
Liang-Chi Hsieh [Mon, 28 Feb 2022 20:55:21 +0000 (12:55 -0800)] 
Use DictionaryArray's iterator (#1330)

3 months agoUpdate contributing guide (#1368)
Remzi Yang [Mon, 28 Feb 2022 19:17:53 +0000 (03:17 +0800)] 
Update contributing guide (#1368)

* add build environment

Signed-off-by: remzi <13716567376yh@gmail.com>
* update the format

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoUpdate flatbuffers requirement from =2.1.0 to =2.1.1 (#1364)
dependabot[bot] [Mon, 28 Feb 2022 19:17:32 +0000 (14:17 -0500)] 
Update flatbuffers requirement from =2.1.0 to =2.1.1 (#1364)

Updates the requirements on [flatbuffers](https://github.com/google/flatbuffers) to permit the latest version.
- [Release notes](https://github.com/google/flatbuffers/releases)
- [Commits](https://github.com/google/flatbuffers/commits)

---
updated-dependencies:
- dependency-name: flatbuffers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 months agoUpdate flatbuffers requirement from =2.0.0 to =2.1.0 (#1359)
dependabot[bot] [Fri, 25 Feb 2022 12:20:52 +0000 (20:20 +0800)] 
Update flatbuffers requirement from =2.0.0 to =2.1.0 (#1359)

Updates the requirements on [flatbuffers](https://github.com/google/flatbuffers) to permit the latest version.
- [Release notes](https://github.com/google/flatbuffers/releases)
- [Commits](https://github.com/google/flatbuffers/commits)

---
updated-dependencies:
- dependency-name: flatbuffers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 months agoFix clippy lints (#1363)
Remzi Yang [Fri, 25 Feb 2022 12:18:13 +0000 (20:18 +0800)] 
Fix clippy lints (#1363)

* fix some

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix some warnings

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix some warning

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix all clippy lints

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoRemove delimiter from csv Writer (#1342)
Sergey Glushchenko [Thu, 24 Feb 2022 07:50:29 +0000 (08:50 +0100)] 
Remove delimiter from csv Writer (#1342)

4 months agoPublicly export arrow::array::MapBuilder (#1355)
tjwilson90 [Thu, 24 Feb 2022 07:48:33 +0000 (23:48 -0800)] 
Publicly export arrow::array::MapBuilder (#1355)

4 months agoMake bounds configurable in csv ReaderBuilder (#1341)
Sergey Glushchenko [Thu, 24 Feb 2022 07:42:54 +0000 (08:42 +0100)] 
Make bounds configurable in csv ReaderBuilder (#1341)

4 months agoRefactor `StructArray::from` (#1360)
Remzi Yang [Thu, 24 Feb 2022 07:38:49 +0000 (15:38 +0800)] 
Refactor `StructArray::from` (#1360)

* add async to default features

Signed-off-by: remzi <13716567376yh@gmail.com>
* rewrite

Signed-off-by: remzi <13716567376yh@gmail.com>
* update

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoAdd with_datetime_format to csv WriterBuilder (#1347)
Sergey Glushchenko [Wed, 23 Feb 2022 19:15:28 +0000 (20:15 +0100)] 
Add with_datetime_format to csv WriterBuilder (#1347)

4 months agofix: add LargeUtf8 support in json writer (#1358)
Tiphaine Ruy [Wed, 23 Feb 2022 18:59:22 +0000 (19:59 +0100)] 
fix: add LargeUtf8 support in json writer (#1358)

4 months agoRemove redundant has_ methods for optional column metadata fields (#1345)
Shani Solomon [Tue, 22 Feb 2022 17:03:53 +0000 (19:03 +0200)] 
Remove redundant has_ methods for optional column metadata fields (#1345)

4 months agoArrow Rust + Conbench Integration (#1289)
diana [Tue, 22 Feb 2022 09:20:54 +0000 (02:20 -0700)] 
Arrow Rust + Conbench Integration (#1289)

* Arrow Rust + Conbench Integration

* remove --src-dir

4 months agoRefactor Bitmap::new (#1343)
Remzi Yang [Mon, 21 Feb 2022 05:59:05 +0000 (13:59 +0800)] 
Refactor Bitmap::new (#1343)

### Which issue does this PR close?

Closes #1337.

### Rationale for this change

`bit_util` has provided some functions to calculate the ceiling and multiple, so we can use them in `Bitmap::new` to achieve a faster and cleaner code.

### What changes are included in this PR?

### Are there any user-facing changes?

None.

4 months agoDon't use Arc::from_raw when importing ArrowArray and ArrowSchema (#1334)
Liang-Chi Hsieh [Sun, 20 Feb 2022 18:26:03 +0000 (10:26 -0800)] 
Don't use Arc::from_raw when importing ArrowArray and ArrowSchema (#1334)

4 months agoUpdate versions and CHANGELOG for 9.1.0 release (#1325) 9.1.0
Andrew Lamb [Sat, 19 Feb 2022 16:36:34 +0000 (11:36 -0500)] 
Update versions and CHANGELOG for 9.1.0 release (#1325)

* Update version to 10.0.0

* Update changelog generator script

* Initial Changelog

* iter

* one more

* Set version to 9.1.0

* Make it more clear 1282 was not fixed

* touchups

* Update changelog

Co-authored-by: Wakahisa <nevilledips@gmail.com>
4 months agoUpdate the document of function `MutableArrayData::extend` (#1336)
Remzi Yang [Sat, 19 Feb 2022 16:05:11 +0000 (00:05 +0800)] 
Update the document of function `MutableArrayData::extend` (#1336)

* update document

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the fmt

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoClean up DictionaryArray construction in test (#1314)
Andrew Lamb [Sat, 19 Feb 2022 16:02:28 +0000 (11:02 -0500)] 
Clean up DictionaryArray construction in test (#1314)

4 months agoCleanup: remove some dead / test only code (#1331)
Andrew Lamb [Sat, 19 Feb 2022 15:57:52 +0000 (10:57 -0500)] 
Cleanup: remove some dead / test only code (#1331)

4 months agoExpose page encoding `ColumnChunkMetadata` (#1322)
Shani Solomon [Thu, 17 Feb 2022 18:32:17 +0000 (20:32 +0200)] 
Expose page encoding `ColumnChunkMetadata` (#1322)

* init

* replaced test file

* init

* thrift conversion

* refactor

* tests

* clippy

4 months agoEnable dead_code lint (#1324)
Sergey Glushchenko [Thu, 17 Feb 2022 12:28:26 +0000 (13:28 +0100)] 
Enable dead_code lint (#1324)

4 months agoExpose column index and offset index (#1318)
Shani Solomon [Wed, 16 Feb 2022 16:58:08 +0000 (18:58 +0200)] 
Expose column index and offset index (#1318)

# Which issue does this PR close?
Closes #1317.

Exposing the column index and offset index offsets and lengths so parquet engines could optimize their reads.

4 months agoEnable more lints (#1315)
Sergey Glushchenko [Wed, 16 Feb 2022 13:41:38 +0000 (14:41 +0100)] 
Enable more lints (#1315)

4 months agofix test bug and ensure that bloom filter metadata is serialized in `to_thrift` ...
Shani Solomon [Wed, 16 Feb 2022 12:02:38 +0000 (14:02 +0200)] 
fix test bug and ensure that bloom filter metadata is serialized in `to_thrift` (#1320)

* fix test bug and cc metadata to_thrift

* fmt

4 months agoImplement an iterator for DictionaryArray (#1296)
Liang-Chi Hsieh [Wed, 16 Feb 2022 11:57:21 +0000 (03:57 -0800)] 
Implement an iterator for DictionaryArray (#1296)

* Add DictionaryIter

* Try suggested approach.

* do check

* Add to boolean, string and binary arrays

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoUse new DecimalArray creation API in arrow crate (#1249)
Andrew Lamb [Tue, 15 Feb 2022 20:24:12 +0000 (15:24 -0500)] 
Use new DecimalArray creation API in arrow crate (#1249)

* Use new API in ffi.rs

* Use new API in sort.rs

* Use new API in pretty.rs

* Use new API in array_binary.rs

* Use new API in equal_json.rs

* Use new API in take.rs

* Use new API in cast.rs

* Use new API in equal.rs

* clippy

4 months agoAdd `DictionaryArray::try_new()` to create dictionaries from pre existing arrays...
Andrew Lamb [Tue, 15 Feb 2022 20:12:21 +0000 (15:12 -0500)] 
Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays (#1300)

* Add DictionaryArray::try_new()

* Update arrow/src/array/array_dictionary.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
4 months agoExpose has bloom offset (#1309)
Shani Solomon [Tue, 15 Feb 2022 20:04:11 +0000 (22:04 +0200)] 
Expose has bloom offset (#1309)

* expose hasBloomFilters

* added test

* var name

* rename

* updated test to support new test file

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoEnable clippy::type_complexity (#1310)
Sergey Glushchenko [Tue, 15 Feb 2022 18:38:54 +0000 (19:38 +0100)] 
Enable clippy::type_complexity (#1310)

4 months agoUpdate parquet-testing pin (#1311)
Andrew Lamb [Tue, 15 Feb 2022 18:15:25 +0000 (13:15 -0500)] 
Update parquet-testing pin (#1311)

4 months agoVectorized DeltaBitPackDecoder (#1281) (#1284)
Raphael Taylor-Davies [Tue, 15 Feb 2022 15:11:42 +0000 (15:11 +0000)] 
Vectorized DeltaBitPackDecoder (#1281) (#1284)

* Vectorized `DeltaBitPackDecoder` (#1281)

* Review feedback

4 months agoEnable clippy::float_equality_without_abs lint (#1305)
Sergey Glushchenko [Mon, 14 Feb 2022 11:44:46 +0000 (12:44 +0100)] 
Enable clippy::float_equality_without_abs lint (#1305)

4 months agoImplement DictionaryArray support in eq_dyn (#1263)
Liang-Chi Hsieh [Sun, 13 Feb 2022 13:35:40 +0000 (05:35 -0800)] 
Implement DictionaryArray support in eq_dyn (#1263)

* Implement DictionaryArray support in eq_dyn

* For review comment: make eq_dict as generic and rename to cmp_dict. Remove unsafeness.

* Other integer types

* Fix clippy error

* Fix format

* Fix clippy and format

* Add cmp_dict_utf8 and cmp_dict_binary to cover the utf8/binary value array cases

* Add binary test

* Add remaining types

* Add Float32 and Float64 and update a few comments.

4 months agoClean up DecimalArray creation in parquet crate (#1247)
Andrew Lamb [Sun, 13 Feb 2022 13:26:56 +0000 (08:26 -0500)] 
Clean up DecimalArray creation in parquet crate (#1247)

4 months agoChanges for 9.0.2 (#1291)
Andrew Lamb [Sun, 13 Feb 2022 12:25:54 +0000 (07:25 -0500)] 
Changes for 9.0.2  (#1291)

* Fix bitmask creation in chunked part of simd comparison (#1286)

* Update version to 9.0.1

* Update changelog

* Fix bitmask creation also for simd comparisons with scalar (#1290)

* Update versions and changelog for 9.0.2

Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
4 months agoarrow: enable clippy::vec_init_then_push lint (#1303)
Sergey Glushchenko [Sun, 13 Feb 2022 08:59:49 +0000 (09:59 +0100)] 
arrow: enable clippy::vec_init_then_push lint (#1303)

4 months agoFix test_unaligned_bit_chunk_iterator (#1297)
Raphael Taylor-Davies [Thu, 10 Feb 2022 11:36:50 +0000 (11:36 +0000)] 
Fix test_unaligned_bit_chunk_iterator (#1297)

* Fix test_unaligned_bit_chunk_iterator

* Add more verification

* Format

4 months agofix failing csv_writer bench (#1293)
Andy Grove [Thu, 10 Feb 2022 07:58:08 +0000 (00:58 -0700)] 
fix failing csv_writer bench (#1293)

4 months agoSpecialized filter kernels (#1248)
Raphael Taylor-Davies [Wed, 9 Feb 2022 21:03:15 +0000 (21:03 +0000)] 
Specialized filter kernels (#1248)

* Add specialized primitive filter kernels

* Filter context

* Optimize null buffer construction

* Clippy

* Benchmark filter construction

* Review feedback

* Specialized string filter

* Specialized dictionary filter kernel

* Use trusted_len_iter

* Review feedback

* Add fuzz filter test

* Clarify selective vs selectivity confusion

* Revert change to MutableBuffer::from_trusted_len_iter_bool

* Fix filter_bits offset handling

* Review feedback

* Use i64 for chunk offset

* Only optimize filter when filtering multiple columns

* Test truncated filter

* Review feedback

* Add IterationStrategy::None

* Remove selective / selectivity docs confusion

4 months agoFix bitmask creation also for simd comparisons with scalar (#1290)
Jörn Horstmann [Wed, 9 Feb 2022 16:33:36 +0000 (17:33 +0100)] 
Fix bitmask creation also for simd comparisons with scalar (#1290)

4 months ago`DecimalArray` API ergonomics: add iter(), create from iter(), change precision ...
Andrew Lamb [Tue, 8 Feb 2022 20:10:14 +0000 (15:10 -0500)] 
`DecimalArray` API ergonomics: add iter(), create from iter(), change precision / scale (#1223)

* DecimalArray: create from iter, iter(), docs

* Add with_precision and scale

* Implement iter() and into_iter() for DecimalArray

* Clean up and tests

* Return Result rather than panic

* Refactor error handling into separate function

* Validate data in `with_precision_and_scale`
!

* Use named constant values

* clippy

4 months agoSkip zero-ing primitive nulls (#1280)
Raphael Taylor-Davies [Tue, 8 Feb 2022 19:50:48 +0000 (19:50 +0000)] 
Skip zero-ing primitive nulls (#1280)

4 months agoRun rustdoc in CI and error if warnings (#1266)
Andrew Lamb [Tue, 8 Feb 2022 12:04:36 +0000 (07:04 -0500)] 
Run rustdoc in CI and error if warnings (#1266)

* Run rustdoc in CI and error if warnings

* Update .github/workflows/rust.yml

* use nightly for doc check

* install rustfmt

* attempt to install pythondev

* try2

* try3

4 months agoFix some clippy lints in parquet crate, rename `LevelEncoder` variants to conform...
Remzi Yang [Tue, 8 Feb 2022 12:04:17 +0000 (20:04 +0800)] 
Fix some clippy lints in parquet crate, rename `LevelEncoder` variants to conform to Rust standards (#1273)

* disallow vec_init_then_push

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow upper case acronyms

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow transmute ptr to ptr

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow same item push

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow approx constant

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow cast ptr alignment

Signed-off-by: remzi <13716567376yh@gmail.com>
* disallow float cmp

Signed-off-by: remzi <13716567376yh@gmail.com>
* check float equality with abs

Signed-off-by: remzi <13716567376yh@gmail.com>
* check incomplete features

Signed-off-by: remzi <13716567376yh@gmail.com>
* check single char names

Signed-off-by: remzi <13716567376yh@gmail.com>
* check needless range loop

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more vec_init_then_push lint, especially in macro

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more float dquality without abs

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more vec push same items

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix more needless range loop

Signed-off-by: remzi <13716567376yh@gmail.com>
4 months agoFix bitmask creation in chunked part of simd comparison (#1286)
Jörn Horstmann [Mon, 7 Feb 2022 17:04:42 +0000 (18:04 +0100)] 
Fix bitmask creation in chunked part of simd comparison (#1286)

4 months agoRestrict Decoder to compatible types (#1276) (#1277)
Raphael Taylor-Davies [Mon, 7 Feb 2022 15:03:30 +0000 (15:03 +0000)] 
Restrict Decoder to compatible types (#1276) (#1277)

4 months agoFix doc warnings (#1268)
Andrew Lamb [Mon, 7 Feb 2022 14:58:19 +0000 (09:58 -0500)] 
Fix doc warnings (#1268)

4 months agoMake rle decoder public (#1271)
Ze'ev Maor [Sun, 6 Feb 2022 01:44:01 +0000 (03:44 +0200)] 
Make rle decoder public (#1271)

4 months agoPrepare for 9.0.0 release: Update version + CHANGELOG (#1265) 9.0.0
Andrew Lamb [Fri, 4 Feb 2022 12:22:22 +0000 (07:22 -0500)] 
Prepare for 9.0.0 release: Update version + CHANGELOG (#1265)

* Update version to 9.0.0

* Update changelog generator script

* Initial changelog

* Updates

* Update again

* rat

4 months agoupgrade clap (#1261)
Jiayu Liu [Thu, 3 Feb 2022 07:39:24 +0000 (15:39 +0800)] 
upgrade clap (#1261)

4 months agoRemove unsupported flag in rustfmt.toml (#1262)
Andrew Lamb [Wed, 2 Feb 2022 20:37:59 +0000 (15:37 -0500)] 
Remove unsupported flag in rustfmt.toml (#1262)

4 months agoImprove module documentation for parquet crate (#1253)
Andrew Lamb [Wed, 2 Feb 2022 20:37:42 +0000 (15:37 -0500)] 
Improve module documentation for parquet crate (#1253)

4 months agoFaster bitmask iteration (#1228)
Raphael Taylor-Davies [Wed, 2 Feb 2022 20:37:10 +0000 (20:37 +0000)] 
Faster bitmask iteration (#1228)

* Add UnalignedBitChunks (#1227)

* Clippy

* Fix flaky test

* Improve test legibility

* Fix SlicesIterator offset direction

* Format

* Fix byte-aligned termination

* Test edge-cases

* More tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Review feedback

* Make UnalignedBitChunkIterator crate local

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
4 months agoAdd `async` arrow parquet reader (#1154)
Raphael Taylor-Davies [Wed, 2 Feb 2022 11:31:21 +0000 (11:31 +0000)] 
Add `async` arrow parquet reader (#1154)

* Async parquet reader (#111)

Add Sync + Send bounds to parquet crate

* Remove Sync from DataType

* Review feedback

* Add basic test

* Fix lints

* Review feedback

* Tweak CI

4 months agoRefresh readme / contributing guide (#1252)
Andrew Lamb [Wed, 2 Feb 2022 11:16:05 +0000 (06:16 -0500)] 
Refresh readme / contributing guide (#1252)

4 months agoUpdate chrono-tz requirement from 0.4 to 0.6 (#1259)
dependabot[bot] [Tue, 1 Feb 2022 21:58:26 +0000 (16:58 -0500)] 
Update chrono-tz requirement from 0.4 to 0.6 (#1259)

Updates the requirements on [chrono-tz](https://github.com/chronotope/chrono-tz) to permit the latest version.
- [Release notes](https://github.com/chronotope/chrono-tz/releases)
- [Changelog](https://github.com/chronotope/chrono-tz/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono-tz/commits/v0.6.1)

---
updated-dependencies:
- dependency-name: chrono-tz
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoUpdate zstd requirement from 0.9 to 0.10 (#1257)
dependabot[bot] [Tue, 1 Feb 2022 21:58:13 +0000 (16:58 -0500)] 
Update zstd requirement from 0.9 to 0.10 (#1257)

Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](https://github.com/gyscos/zstd-rs/compare/0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: zstd
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>