Andrew Lamb [Mon, 7 Feb 2022 17:08:08 +0000 (12:08 -0500)]
Update changelog
Andrew Lamb [Mon, 7 Feb 2022 17:06:36 +0000 (12:06 -0500)]
Update version to 9.0.1
Jörn Horstmann [Mon, 7 Feb 2022 17:04:42 +0000 (18:04 +0100)]
Fix bitmask creation in chunked part of simd comparison (#1286)
Andrew Lamb [Fri, 4 Feb 2022 12:22:22 +0000 (07:22 -0500)]
Prepare for 9.0.0 release: Update version + CHANGELOG (#1265)
* Update version to 9.0.0
* Update changelog generator script
* Initial changelog
* Updates
* Update again
* rat
Jiayu Liu [Thu, 3 Feb 2022 07:39:24 +0000 (15:39 +0800)]
upgrade clap (#1261)
Andrew Lamb [Wed, 2 Feb 2022 20:37:59 +0000 (15:37 -0500)]
Remove unsupported flag in rustfmt.toml (#1262)
Andrew Lamb [Wed, 2 Feb 2022 20:37:42 +0000 (15:37 -0500)]
Improve module documentation for parquet crate (#1253)
Raphael Taylor-Davies [Wed, 2 Feb 2022 20:37:10 +0000 (20:37 +0000)]
Faster bitmask iteration (#1228)
* Add UnalignedBitChunks (#1227)
* Clippy
* Fix flaky test
* Improve test legibility
* Fix SlicesIterator offset direction
* Format
* Fix byte-aligned termination
* Test edge-cases
* More tests
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Review feedback
* Make UnalignedBitChunkIterator crate local
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Raphael Taylor-Davies [Wed, 2 Feb 2022 11:31:21 +0000 (11:31 +0000)]
Add `async` arrow parquet reader (#1154)
* Async parquet reader (#111)
Add Sync + Send bounds to parquet crate
* Remove Sync from DataType
* Review feedback
* Add basic test
* Fix lints
* Review feedback
* Tweak CI
Andrew Lamb [Wed, 2 Feb 2022 11:16:05 +0000 (06:16 -0500)]
Refresh readme / contributing guide (#1252)
dependabot[bot] [Tue, 1 Feb 2022 21:58:26 +0000 (16:58 -0500)]
Update chrono-tz requirement from 0.4 to 0.6 (#1259)
Updates the requirements on [chrono-tz](https://github.com/chronotope/chrono-tz) to permit the latest version.
- [Release notes](https://github.com/chronotope/chrono-tz/releases)
- [Changelog](https://github.com/chronotope/chrono-tz/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono-tz/commits/v0.6.1)
---
updated-dependencies:
- dependency-name: chrono-tz
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
dependabot[bot] [Tue, 1 Feb 2022 21:58:13 +0000 (16:58 -0500)]
Update zstd requirement from 0.9 to 0.10 (#1257)
Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases)
- [Commits](https://github.com/gyscos/zstd-rs/compare/0.9.0...v0.10.0)
---
updated-dependencies:
- dependency-name: zstd
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Andrew Lamb [Tue, 1 Feb 2022 17:48:16 +0000 (12:48 -0500)]
Add dependabot (#1256)
Raphael Taylor-Davies [Tue, 1 Feb 2022 14:28:15 +0000 (14:28 +0000)]
Batch multiple records in ArrowWriter (#1214)
* Batch multiple records in ArrowWriter
* Document max_group_size and reduce default (#1213)
* Review feedback
* Write multiple arrays without concat
* Clippy
* Test aggregating complex types
* Test complex slice
* Clippy
Andrew Lamb [Tue, 1 Feb 2022 14:02:49 +0000 (09:02 -0500)]
Revert "Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)" (#1251)
This reverts commit
d68c4ae14077d60326eb57fe28133645f800d7e5.
Remzi Yang [Mon, 31 Jan 2022 21:58:58 +0000 (05:58 +0800)]
Add docs examples for dynamically compare functions (#1250)
* add examples
Signed-off-by: remzi <13716567376yh@gmail.com>
* correct rust format
Signed-off-by: remzi <13716567376yh@gmail.com>
Raphael Taylor-Davies [Sat, 29 Jan 2022 15:53:10 +0000 (15:53 +0000)]
Fix NullArrayReader (#1245) (#1246)
Raphael Taylor-Davies [Fri, 28 Jan 2022 19:20:04 +0000 (19:20 +0000)]
Revert making parquet::data_type and parquet::arrow::schema experimental (#1244)
Remzi Yang [Fri, 28 Jan 2022 11:43:07 +0000 (19:43 +0800)]
rename to Bitmap::bit_len (#1242)
Signed-off-by: remzi <13716567376yh@gmail.com>
Remzi Yang [Thu, 27 Jan 2022 20:54:21 +0000 (04:54 +0800)]
Add Rust Docs examples for UnionArray (#1241)
Signed-off-by: remzi <13716567376yh@gmail.com>
Matthew Turner [Thu, 27 Jan 2022 17:05:33 +0000 (12:05 -0500)]
Remove Copy trait from dyn scalar kernels (#1243)
Remzi Yang [Wed, 26 Jan 2022 12:18:25 +0000 (20:18 +0800)]
dyn compare for binary array (#1238)
* dyn compare two binary array
Signed-off-by: remzi <13716567376yh@gmail.com>
* add dyn comparison for binary array
Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for dyn compare binary array and scalar
Signed-off-by: remzi <13716567376yh@gmail.com>
* remove DictionaryArray from dyn compare, because not find an easy way to build binary dictionary array
Signed-off-by: remzi <13716567376yh@gmail.com>
* fix mistakes in test code
Signed-off-by: remzi <13716567376yh@gmail.com>
* add Nones into the test cases
Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 scalar
Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code format
Signed-off-by: remzi <13716567376yh@gmail.com>
Andrew Lamb [Tue, 25 Jan 2022 11:35:26 +0000 (06:35 -0500)]
Improve documentation for Bitmap (#1237)
Raphael Taylor-Davies [Mon, 24 Jan 2022 21:31:23 +0000 (21:31 +0000)]
Fix null bitmap length validation (#1231) (#1232)
* Fix null bitmap length validation (#1231)
* Reuse existing test
Jörn Horstmann [Mon, 24 Jan 2022 19:50:30 +0000 (20:50 +0100)]
Remove explicit simd arithmetic kernels except for division/modulo (#1221)
* Extend arithmetic benchmarks
* Remove explicit simd arithmetic except for div/mod because autovectorization generates better code
* Remove unneeded return keywords
Jörn Horstmann [Mon, 24 Jan 2022 19:28:15 +0000 (20:28 +0100)]
Remove memory-check feature (#1222)
Yijie Shen [Mon, 24 Jan 2022 19:27:18 +0000 (03:27 +0800)]
[Minor]Re-export `array::builder::make_builder` to make it available for downstream (#1235)
* Re-export `array::builder::make_builder`
* Update mod.rs
Yijie Shen [Mon, 24 Jan 2022 19:23:40 +0000 (03:23 +0800)]
[Minor]`into_inner` for IPC `FileWriter` (#1236)
* `into_inner` for IPC `FileWriter`
* lint
Raphael Taylor-Davies [Mon, 24 Jan 2022 19:22:43 +0000 (19:22 +0000)]
Do not concatenate identical dictionaries (#1219)
* Do not concatenate identical dictionaries (#504)
* Review feedback
Raphael Taylor-Davies [Mon, 24 Jan 2022 15:07:26 +0000 (15:07 +0000)]
Remove arrow array reader (#1197) (#1234)
Raphael Taylor-Davies [Mon, 24 Jan 2022 12:00:44 +0000 (12:00 +0000)]
Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) (#1180)
* Preserve dictionary encoding from parquet (#171)
* Use OffsetBuffer::into_array for dictionary
* Fix and test handling of empty dictionaries
Don't panic if missing dictionary page
* Use ArrayRef instead of Arc<ArrayData>
* Update doc comments
* Add integration test
Tweak RecordReader buffering logic
* Add benchmark
* Set write batch size in parquet fuzz tests
Fix bug in column writer with small page sizes
* Fix test_dictionary_preservation
* Add batch_size comment
Remzi Yang [Sun, 23 Jan 2022 10:11:39 +0000 (18:11 +0800)]
Add non utf8 values into the test cases of BinaryArray comparison (#1220)
* add eq_dyn for BinaryArray
Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting
Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs
Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray
Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests
Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions
Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error
Signed-off-by: remzi <13716567376yh@gmail.com>
* add non utf8 bytes to the test cases of BinaryArray comparision
Signed-off-by: remzi <13716567376yh@gmail.com>
Patrick More [Sat, 22 Jan 2022 20:18:10 +0000 (12:18 -0800)]
Update DECIMAL_RE to allow scientific notation in auto inferred schemas (#1216)
* Update DECIMAL_RE to allow scientific notation in auto inferred schemas
* Fixed format lint
Andrew Lamb [Fri, 21 Jan 2022 12:06:57 +0000 (07:06 -0500)]
Prepare for 8.0.0 release: Update CHANGELOG and versions (#1212)
* Update version to 8.0.0
* Update Changelog for 8.0.0
* restore RAT
* Remove items that were released in 7.0.0
Andrew Lamb [Fri, 21 Jan 2022 11:58:45 +0000 (06:58 -0500)]
Improve changelog generator script settings (#1210)
* Update changelog script
* fix typo
Yang [Wed, 19 Jan 2022 21:28:52 +0000 (05:28 +0800)]
Return error from JSON writer rather than panic (#1205)
* Return error from JSON writer rather than panic
* fix comment
Helgi Kristvin Sigurbjarnarson [Wed, 19 Jan 2022 21:28:31 +0000 (13:28 -0800)]
fix a bug in variable sized equality (#1209)
A missing validity buffer was being treated as all values being null,
rather than all values being valid, causing equality to fail on some
equivalent string and binary arrays.
Andrew Lamb [Wed, 19 Jan 2022 18:17:48 +0000 (13:17 -0500)]
Update parquet crate readme (#1192)
* Update parquet crate readme
* prettier
Remzi Yang [Wed, 19 Jan 2022 18:06:47 +0000 (02:06 +0800)]
Add comparison support for fully qualified BinaryArray (#1195)
* add eq_dyn for BinaryArray
Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting
Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs
Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray
Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests
Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions
Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error
Signed-off-by: remzi <13716567376yh@gmail.com>
Edd Robinson [Wed, 19 Jan 2022 12:19:29 +0000 (12:19 +0000)]
feat: add support for casting Duration/Interval to Int64Array (#1196)
* feat: add support for casting Duration to Int64Array
* feat: cast from Interval to Int64
Andrew Lamb [Wed, 19 Jan 2022 02:35:43 +0000 (21:35 -0500)]
Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 21:59:27 +0000 (13:59 -0800)]
bugfix in display of float16 array (#1194)
Due to a typo the float16 array was being cast to a float32 array,
causing a crash when pretty printing a record batch containing float16.
Raphael Taylor-Davies [Tue, 18 Jan 2022 12:13:21 +0000 (12:13 +0000)]
parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) (#1082)
* Optimized ByteArrayReader (#1040)
UTF-8 Validation (#786)
* Fix arrow_array_reader benchmark
* Allow running subset of arrow_array_reader benchmarks
* Faster UTF-8 validation
* Tweak null handling
* Add license
* Refine `ValuesBuffer::pad_nulls`
* Tweak error handling
* Use page null count if available
* Doc comments
* Test DELTA_BYTE_ARRAY encoding
* Support legacy Encoding::PLAIN_DICTIONARY
* Add OffsetBuffer unit tests
Review feedback
* More tests
* Fix lint
* Review feedback
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 12:09:12 +0000 (04:09 -0800)]
feat(parquet): support for reading structs nested within lists (#1187)
* feat(parquet): support for reading structs nested within lists
* fix: logical conflict
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Jiayu Liu [Tue, 18 Jan 2022 01:52:01 +0000 (09:52 +0800)]
update nightly version for miri (#1189)
Raphael Taylor-Davies [Mon, 17 Jan 2022 15:51:21 +0000 (15:51 +0000)]
Truncate bitmask on split (#1183)
* Truncate bitmask on split
* Fix BooleanBufferBuilder::resize
* Format
Helgi Kristvin Sigurbjarnarson [Mon, 17 Jan 2022 15:49:14 +0000 (07:49 -0800)]
fix: Fix a bug in how filter indices are calculated (#1185)
* fix: Fix a bug in how filter indices are calculated
Using the definition level and the nullability of the column only
produces the correct indices if max_definition - 1 is the list level.
For deeper nesting (struct in a list) this produces incorrect indices,
silently causing incorrect data to be written.
This fix uses the array offsets to compute the indices instead.
* add assertions
Kun Liu [Mon, 17 Jan 2022 15:42:45 +0000 (23:42 +0800)]
Support DecimalType in sort and take kernels (#1172)
Jiayu Liu [Mon, 17 Jan 2022 15:32:56 +0000 (23:32 +0800)]
add from_iter_values for binary array (#1188)
Raphael Taylor-Davies [Sun, 16 Jan 2022 12:08:35 +0000 (12:08 +0000)]
Use tempfile for parquet tests (#1165)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Raphael Taylor-Davies [Sat, 15 Jan 2022 18:49:09 +0000 (18:49 +0000)]
Serialize i128 as JSON string (#1175)
Andrew Lamb [Sat, 15 Jan 2022 18:48:15 +0000 (13:48 -0500)]
Add ticket reference for false positive (#1181)
Raphael Taylor-Davies [Sat, 15 Jan 2022 11:34:44 +0000 (11:34 +0000)]
Fix record formatting in 1.58 (#1178)
Helgi Kristvin Sigurbjarnarson [Fri, 14 Jan 2022 18:09:51 +0000 (10:09 -0800)]
Bugfix in parquet writing empty lists of structs (#1166)
Fix a bug in the definition level calculation for fields nested within a
struct and a list. When a list is empty or null in parquet the nested
field gets a null value. However, in arrow, the value is simply missing.
When serializing an immediate child of the list, the list offsets are
used to calculate the correct definition level for its children, but it
is not carried further to fields nested deeper (e.g., fields on a struct
within a list). This (somewhat hacky) fix treats a struct within a list
as if it were a list.
Jörn Horstmann [Fri, 14 Jan 2022 16:17:44 +0000 (17:17 +0100)]
Fix compilation error with simd feature (#1169)
Andrew Lamb [Fri, 14 Jan 2022 16:17:36 +0000 (11:17 -0500)]
Fix new clippy lints introduced in Rust 1.58 (#1170)
Jörn Horstmann [Thu, 13 Jan 2022 18:27:52 +0000 (19:27 +0100)]
Simplify and reduce code duplication in arithmetic kernels (#1161)
* Simplify and reduce code duplication in arithmetic kernels
* Update comments
Andrew Lamb [Thu, 13 Jan 2022 18:14:39 +0000 (13:14 -0500)]
Update dev/release/README for master releases, remove supporting scripts (#1143)
* Update dev/release/README for master releases
* remove cherry pick script
Raphael Taylor-Davies [Thu, 13 Jan 2022 15:18:55 +0000 (15:18 +0000)]
Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) (#1054)
* Preserve bitmask (#1037)
* Remove now unnecessary box (#1061)
* Fix handling of empty bitmasks
* More docs
* Add nested nullability test case
* Add packed decoder test
Andrew Lamb [Thu, 13 Jan 2022 06:33:24 +0000 (01:33 -0500)]
Remove left over readme file from arrow/arrow-rs split (#1162)
Raphael Taylor-Davies [Wed, 12 Jan 2022 14:44:07 +0000 (14:44 +0000)]
Fuzz test different parquet encodings (#1156)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:22:42 +0000 (04:22 -0800)]
Add subtract_scalar kernel (#1152)
* Add subtract_scalar
* Rebase
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:21:37 +0000 (04:21 -0800)]
Add multiply_scalar (#1159)
Helgi Kristvin Sigurbjarnarson [Tue, 11 Jan 2022 19:19:38 +0000 (11:19 -0800)]
feat(json): support for map arrays in json writer (#1149)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:18:52 +0000 (11:18 -0800)]
Add add_scalar kernel (#1151)
* Add add_scalar
* move simd_float_unary_math_op to simd_unary_math_op
Andrew Lamb [Tue, 11 Jan 2022 19:10:01 +0000 (14:10 -0500)]
Document safety justification of some uses of `from_trusted_len_iter` (#1148)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:08:52 +0000 (11:08 -0800)]
Move simd right out of for_each loop (#1150)
Raphael Taylor-Davies [Tue, 11 Jan 2022 18:01:15 +0000 (18:01 +0000)]
Generify ColumnReaderImpl and RecordReader (#1040) (#1041)
* Simplify record reader
* Generify ColumnReaderImpl and RecordReader (#1040)
* Tweak count_records predicate
* Pre-allocate bitmask
* fix: TypedBuffer::split update len
* Simplify GenericRecordReader
* Move column decoders into module
* Remove `RecordBuffer::create` method
* Remove `TypedBuffer<i16>::count_records`
* Pass null count to `ColumnValueDecoder::read`
* Pull null padding out of column reader
* Review feedback
* Format
* License headers
* Further doc tweaks
* Further docs
* Restrict ScalarBuffer types
Andrew Lamb [Tue, 11 Jan 2022 17:58:02 +0000 (12:58 -0500)]
Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)
Raphael Taylor-Davies [Tue, 11 Jan 2022 15:12:30 +0000 (15:12 +0000)]
BooleanBufferBuilder::append_packed (#1038) (#1039)
* BooleanBufferBuilder::append_packed (#1038)
* Update docstring
* Add packed_append_range
* Fix capacity
* Use set_bits from transform::util
* Add license
* Format
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:05:03 +0000 (14:05 +0000)]
Improve parquet performance: Skip levels computation for required struct arrays in parquet (#1035)
* Skip levels computation for required struct arrays (#1034)
* Review feedback
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:02:30 +0000 (14:02 +0000)]
Restrict RecordReader and friends to scalar types (#1132) (#1155)
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:00:50 +0000 (14:00 +0000)]
Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) (#1110)
* Parquet fuzz tests (#1053)
* Test multiple WriterVersions
* Revert array_reader change
Raphael Taylor-Davies [Mon, 10 Jan 2022 21:50:31 +0000 (21:50 +0000)]
Move more parquet functionality behind experimental feature flag (#1032) (#1134)
* Move more parquet functionality behind experimental feature flag (#1032)
* Fix logical conflicts
Jörn Horstmann [Mon, 10 Jan 2022 21:49:46 +0000 (22:49 +0100)]
Implement SIMD comparison operations for types with less than 4 lanes (i128) (#1146)
* Implement simd mask creation for 128 bit types
* Adjust comparison kernels to always append 64 bit chunks
* Only append minimal number of bytes
* Add benchmark for MonthDayNano comparison
* Fix typo in comment
Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
* Fix typo in comment
Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
Andrew Lamb [Mon, 10 Jan 2022 21:47:06 +0000 (16:47 -0500)]
Fix undefined behavor in GenericStringArray::from_iter_values (#1145)
* Fix undefined behavor in GenericStringArray::from_iter_values
* Cleanup code and tests
* clippy
* Fix test
Andrew Lamb [Sat, 8 Jan 2022 10:28:32 +0000 (05:28 -0500)]
Update readme to clarify versioning (#1142)
Andrew Lamb [Sat, 8 Jan 2022 10:19:08 +0000 (05:19 -0500)]
Update version to 7.0.0 and update CHANGELOG (#1141)
* Update changelog generator
* Bring changelog from 6.5.0
* Update changelog
* Update version to 7.0.0
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:46:16 +0000 (14:46 -0800)]
feat(ipc): support for reading union arrays through IPC (#1140)
Raphael Taylor-Davies [Thu, 6 Jan 2022 22:22:35 +0000 (22:22 +0000)]
Dyn comparison of interval arrays (#1106) (#1107)
* Dyn comparison of interval arrays (#1106)
* fix fmt
* Skip test when simd is enabled
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:12:40 +0000 (14:12 -0800)]
feat: union schema serialization/deserialization for ipc (#1135)
Andrew Lamb [Thu, 6 Jan 2022 22:12:23 +0000 (17:12 -0500)]
*_dyn_scalar kernels: Support Float32Array and Float64Array, (#1127)
* *_dyn_scalar kernels: Support Float32Array and Float64Array, use ToPrimitive rather than `Into<i128>`m take take &dyn Array rather than `ArrayRef`
* Update APIs for *_dyn_bool_scalar kernels
Benson Muite [Thu, 6 Jan 2022 22:11:01 +0000 (01:11 +0300)]
Add more information on SIMD (#1138)
Matthew Turner [Wed, 5 Jan 2022 21:53:32 +0000 (16:53 -0500)]
Add dyn boolean kernels (#1131)
* Add dyn bool kernels
* Add tests
* Update error messages
* Update test
* Fix test
* Update doc strings
Yordan Pavlov [Wed, 5 Jan 2022 19:57:19 +0000 (19:57 +0000)]
Fix reading of dictionary encoded pages with null values (#1111) (#1130)
* fix reading of dictionary encoded pages with null values
* fix linting issues
Raphael Taylor-Davies [Wed, 5 Jan 2022 16:27:27 +0000 (16:27 +0000)]
Make arrow::array_reader private (#1032) (#1133)
Andrew Lamb [Wed, 5 Jan 2022 13:45:27 +0000 (08:45 -0500)]
Implement Array for ArrayRef, Improve as_* kernels to take `&dyn Array` (#1129)
* Implement Array for ArrayRef
* Improve as_* kernels to take &dyn Array
* remove uneeded pyarrow binding
Andrew Lamb [Wed, 5 Jan 2022 12:29:25 +0000 (07:29 -0500)]
Add Schema::with_metadata and Field::with_metadata (#1092)
Sumit [Sun, 2 Jan 2022 16:42:43 +0000 (17:42 +0100)]
allow using custom datetime format for inference and parsing csv file (#1112)
* allow using custom datetime format for inference and parsing csv file
The patch extends the current implementation to allow passing a custom
datetime_re and datetime_format to the ReaderBuilder.
datetime_re is used infer schema of the csv and then datetime_format is
used to parse the actual string to a Date64.
ofcourse passing non-compatible datetime_re and datetime_format values
is going to fail the parsing or inference, however it is an expected but
hard-to-detect failure.
* Incorporate some clippy recommendations for limit count of call args
The patch adds a new struct to collect all these options together and
then passes the struct around. Ideally the struct could be embedded into
the reader but that can be done as separate exercise.
* Detect presence of timezone in format while parsing csv for date64
The patch decides on using NaiveDateTime or DateTime from chrono lib
based on presence of timezone components
chrono expects timezone to be presetn if DateTime is used, errors
otherwise. Whereas NaiveDateTime ignores timezone even if explicitly
provided.
Andrew Lamb [Sun, 2 Jan 2022 14:37:45 +0000 (09:37 -0500)]
Update Union Array to add `UnionMode`, match latest Arrow Spec, and rename `new` -> `unsafe new_unchecked()` (#885)
* Update union array to new null handling
* Update arrow/src/array/array_union.rs
* correct comment
Matthew Turner [Sun, 2 Jan 2022 14:24:50 +0000 (09:24 -0500)]
Add kernel and tests (#1125)
Matthew Turner [Sun, 2 Jan 2022 14:24:18 +0000 (09:24 -0500)]
Add kernel and tests (#1123)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 13:45:08 +0000 (08:45 -0500)]
Add kernel and tests (#1122)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 13:02:22 +0000 (08:02 -0500)]
Add neq dyn scalar kernel (#1118)
* Add lt_dyn_scalar and tests
* Add lt_eq_dyn_scalar kernel
* Add gt_dyn_scalar kernel
* Add gt_eq_dyn_scalar kernel
* Add neq_dyn_scalar kernel
* Add kernel to err message
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 12:08:24 +0000 (07:08 -0500)]
Add gt eq dyn scalar kernel (#1117)
* Add lt_dyn_scalar and tests
* Add lt_eq_dyn_scalar kernel
* Add gt_dyn_scalar kernel
* Add gt_eq_dyn_scalar kernel
* Add kernel to err message
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 11:47:28 +0000 (06:47 -0500)]
Add gt dyn scalar kernel (#1116)
* Add gt_dyn_scalar kernel
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 11:33:51 +0000 (06:33 -0500)]
Add lt eq dyn scalar kernel (#1115)
* Add lt_dyn_scalar and tests
* Add lt_eq_dyn_scalar kernel
* Add kernel to error message
* fix merge problem
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Matthew Turner [Sun, 2 Jan 2022 11:25:00 +0000 (06:25 -0500)]
Add kernel and tests (#1121)
Matthew Turner [Sun, 2 Jan 2022 11:11:00 +0000 (06:11 -0500)]
Add kernel and tests (#1124)
Matthew Turner [Sun, 2 Jan 2022 11:08:36 +0000 (06:08 -0500)]
Add lt dyn scalar kernel (#1114)
* Add lt_dyn_scalar and tests
* Add kernel to error message