arrow-rs.git
5 months agoPrepare for 8.0.0 release: Update CHANGELOG and versions (#1212) 8.0.0
Andrew Lamb [Fri, 21 Jan 2022 12:06:57 +0000 (07:06 -0500)] 
Prepare for 8.0.0 release: Update CHANGELOG and versions (#1212)

* Update version to 8.0.0

* Update Changelog for 8.0.0

* restore RAT

* Remove items that were released in 7.0.0

5 months agoImprove changelog generator script settings (#1210)
Andrew Lamb [Fri, 21 Jan 2022 11:58:45 +0000 (06:58 -0500)] 
Improve changelog generator script settings (#1210)

* Update changelog script

* fix typo

5 months agoReturn error from JSON writer rather than panic (#1205)
Yang [Wed, 19 Jan 2022 21:28:52 +0000 (05:28 +0800)] 
Return error from JSON writer rather than panic (#1205)

* Return error from JSON writer rather than panic

* fix comment

5 months agofix a bug in variable sized equality (#1209)
Helgi Kristvin Sigurbjarnarson [Wed, 19 Jan 2022 21:28:31 +0000 (13:28 -0800)] 
fix a bug in variable sized equality (#1209)

A missing validity buffer was being treated as all values being null,
rather than all values being valid, causing equality to fail on some
equivalent string and binary arrays.

5 months agoUpdate parquet crate readme (#1192)
Andrew Lamb [Wed, 19 Jan 2022 18:17:48 +0000 (13:17 -0500)] 
Update parquet crate readme (#1192)

* Update parquet crate readme

* prettier

5 months agoAdd comparison support for fully qualified BinaryArray (#1195)
Remzi Yang [Wed, 19 Jan 2022 18:06:47 +0000 (02:06 +0800)] 
Add comparison support for fully qualified BinaryArray (#1195)

* add eq_dyn for BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the code formatting

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comparison support for fully qualified binary array
delete dyn comparison which will be added in successive PRs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests for comparison of fully qualified BinaryArray

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 missed tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* move 2 functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix reference error

Signed-off-by: remzi <13716567376yh@gmail.com>
5 months agofeat: add support for casting Duration/Interval to Int64Array (#1196)
Edd Robinson [Wed, 19 Jan 2022 12:19:29 +0000 (12:19 +0000)] 
feat: add support for casting Duration/Interval to Int64Array (#1196)

* feat: add support for casting Duration to Int64Array

* feat: cast from Interval to Int64

5 months agoPin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)
Andrew Lamb [Wed, 19 Jan 2022 02:35:43 +0000 (21:35 -0500)] 
Pin WASM / packed SIMD tests to nightly-2022-01-17 (#1204)

5 months agobugfix in display of float16 array (#1194)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 21:59:27 +0000 (13:59 -0800)] 
bugfix in display of float16 array (#1194)

Due to a typo the float16 array was being cast to a float32 array,
causing a crash when pretty printing a record batch containing float16.

5 months ago parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) (#1082)
Raphael Taylor-Davies [Tue, 18 Jan 2022 12:13:21 +0000 (12:13 +0000)] 
 parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)  (#1082)

* Optimized ByteArrayReader (#1040)

UTF-8 Validation (#786)

* Fix arrow_array_reader benchmark

* Allow running subset of arrow_array_reader benchmarks

* Faster UTF-8 validation

* Tweak null handling

* Add license

* Refine `ValuesBuffer::pad_nulls`

* Tweak error handling

* Use page null count if available

* Doc comments

* Test DELTA_BYTE_ARRAY encoding

* Support legacy Encoding::PLAIN_DICTIONARY

* Add OffsetBuffer unit tests

Review feedback

* More tests

* Fix lint

* Review feedback

5 months agofeat(parquet): support for reading structs nested within lists (#1187)
Helgi Kristvin Sigurbjarnarson [Tue, 18 Jan 2022 12:09:12 +0000 (04:09 -0800)] 
feat(parquet): support for reading structs nested within lists (#1187)

* feat(parquet): support for reading structs nested within lists

* fix: logical conflict

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoupdate nightly version for miri (#1189)
Jiayu Liu [Tue, 18 Jan 2022 01:52:01 +0000 (09:52 +0800)] 
update nightly version for miri (#1189)

5 months agoTruncate bitmask on split (#1183)
Raphael Taylor-Davies [Mon, 17 Jan 2022 15:51:21 +0000 (15:51 +0000)] 
Truncate bitmask on split (#1183)

* Truncate bitmask on split

* Fix BooleanBufferBuilder::resize

* Format

5 months agofix: Fix a bug in how filter indices are calculated (#1185)
Helgi Kristvin Sigurbjarnarson [Mon, 17 Jan 2022 15:49:14 +0000 (07:49 -0800)] 
fix: Fix a bug in how filter indices are calculated (#1185)

* fix: Fix a bug in how filter indices are calculated

Using the definition level and the nullability of the column only
produces the correct indices if max_definition - 1 is the list level.
For deeper nesting (struct in a list) this produces incorrect indices,
silently causing incorrect data to be written.

This fix uses the array offsets to compute the indices instead.

* add assertions

5 months agoSupport DecimalType in sort and take kernels (#1172)
Kun Liu [Mon, 17 Jan 2022 15:42:45 +0000 (23:42 +0800)] 
Support DecimalType in sort and take kernels (#1172)

5 months agoadd from_iter_values for binary array (#1188)
Jiayu Liu [Mon, 17 Jan 2022 15:32:56 +0000 (23:32 +0800)] 
add from_iter_values for binary array (#1188)

5 months agoUse tempfile for parquet tests (#1165)
Raphael Taylor-Davies [Sun, 16 Jan 2022 12:08:35 +0000 (12:08 +0000)] 
Use tempfile for parquet tests (#1165)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoSerialize i128 as JSON string (#1175)
Raphael Taylor-Davies [Sat, 15 Jan 2022 18:49:09 +0000 (18:49 +0000)] 
Serialize i128 as JSON string (#1175)

5 months agoAdd ticket reference for false positive (#1181)
Andrew Lamb [Sat, 15 Jan 2022 18:48:15 +0000 (13:48 -0500)] 
Add ticket reference for false positive (#1181)

5 months agoFix record formatting in 1.58 (#1178)
Raphael Taylor-Davies [Sat, 15 Jan 2022 11:34:44 +0000 (11:34 +0000)] 
Fix record formatting in 1.58 (#1178)

5 months agoBugfix in parquet writing empty lists of structs (#1166)
Helgi Kristvin Sigurbjarnarson [Fri, 14 Jan 2022 18:09:51 +0000 (10:09 -0800)] 
Bugfix in parquet writing empty lists of structs (#1166)

Fix a bug in the definition level calculation for fields nested within a
struct and a list. When a list is empty or null in parquet the nested
field gets a null value. However, in arrow, the value is simply missing.
When serializing an immediate child of the list, the list offsets are
used to calculate the correct definition level for its children, but it
is not carried further to fields nested deeper (e.g., fields on a struct
within a list).  This (somewhat hacky) fix treats a struct within a list
as if it were a list.

5 months agoFix compilation error with simd feature (#1169)
Jörn Horstmann [Fri, 14 Jan 2022 16:17:44 +0000 (17:17 +0100)] 
Fix compilation error with simd feature (#1169)

5 months agoFix new clippy lints introduced in Rust 1.58 (#1170)
Andrew Lamb [Fri, 14 Jan 2022 16:17:36 +0000 (11:17 -0500)] 
Fix new clippy lints introduced in Rust 1.58 (#1170)

5 months agoSimplify and reduce code duplication in arithmetic kernels (#1161)
Jörn Horstmann [Thu, 13 Jan 2022 18:27:52 +0000 (19:27 +0100)] 
Simplify and reduce code duplication in arithmetic kernels (#1161)

* Simplify and reduce code duplication in arithmetic kernels

* Update comments

5 months agoUpdate dev/release/README for master releases, remove supporting scripts (#1143)
Andrew Lamb [Thu, 13 Jan 2022 18:14:39 +0000 (13:14 -0500)] 
Update dev/release/README for master releases, remove supporting scripts (#1143)

* Update dev/release/README for master releases

* remove cherry pick script

5 months agoImprove parquet reading performance for columns with nulls by preserving bitmask...
Raphael Taylor-Davies [Thu, 13 Jan 2022 15:18:55 +0000 (15:18 +0000)] 
Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) (#1054)

* Preserve bitmask (#1037)

* Remove now unnecessary box (#1061)

* Fix handling of empty bitmasks

* More docs

* Add nested nullability test case

* Add packed decoder test

5 months agoRemove left over readme file from arrow/arrow-rs split (#1162)
Andrew Lamb [Thu, 13 Jan 2022 06:33:24 +0000 (01:33 -0500)] 
Remove left over readme file from arrow/arrow-rs split (#1162)

5 months agoFuzz test different parquet encodings (#1156)
Raphael Taylor-Davies [Wed, 12 Jan 2022 14:44:07 +0000 (14:44 +0000)] 
Fuzz test different parquet encodings (#1156)

5 months agoAdd subtract_scalar kernel (#1152)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:22:42 +0000 (04:22 -0800)] 
Add subtract_scalar kernel (#1152)

* Add subtract_scalar

* Rebase

5 months agoAdd multiply_scalar (#1159)
Liang-Chi Hsieh [Wed, 12 Jan 2022 12:21:37 +0000 (04:21 -0800)] 
Add multiply_scalar (#1159)

5 months agofeat(json): support for map arrays in json writer (#1149)
Helgi Kristvin Sigurbjarnarson [Tue, 11 Jan 2022 19:19:38 +0000 (11:19 -0800)] 
feat(json): support for map arrays in json writer (#1149)

5 months agoAdd add_scalar kernel (#1151)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:18:52 +0000 (11:18 -0800)] 
Add add_scalar kernel (#1151)

* Add add_scalar

* move simd_float_unary_math_op to simd_unary_math_op

5 months agoDocument safety justification of some uses of `from_trusted_len_iter` (#1148)
Andrew Lamb [Tue, 11 Jan 2022 19:10:01 +0000 (14:10 -0500)] 
Document safety justification of some uses of `from_trusted_len_iter` (#1148)

5 months agoMove simd right out of for_each loop (#1150)
Liang-Chi Hsieh [Tue, 11 Jan 2022 19:08:52 +0000 (11:08 -0800)] 
Move simd right out of for_each loop (#1150)

5 months ago Generify ColumnReaderImpl and RecordReader (#1040) (#1041)
Raphael Taylor-Davies [Tue, 11 Jan 2022 18:01:15 +0000 (18:01 +0000)] 
 Generify ColumnReaderImpl and RecordReader (#1040)  (#1041)

* Simplify record reader

* Generify ColumnReaderImpl and RecordReader (#1040)

* Tweak count_records predicate

* Pre-allocate bitmask

* fix: TypedBuffer::split update len

* Simplify GenericRecordReader

* Move column decoders into module

* Remove `RecordBuffer::create` method

* Remove `TypedBuffer<i16>::count_records`

* Pass null count to `ColumnValueDecoder::read`

* Pull null padding out of column reader

* Review feedback

* Format

* License headers

* Further doc tweaks

* Further docs

* Restrict ScalarBuffer types

5 months agoRemove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)
Andrew Lamb [Tue, 11 Jan 2022 17:58:02 +0000 (12:58 -0500)] 
Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec` (#1147)

5 months agoBooleanBufferBuilder::append_packed (#1038) (#1039)
Raphael Taylor-Davies [Tue, 11 Jan 2022 15:12:30 +0000 (15:12 +0000)] 
BooleanBufferBuilder::append_packed (#1038) (#1039)

* BooleanBufferBuilder::append_packed (#1038)

* Update docstring

* Add packed_append_range

* Fix capacity

* Use set_bits from transform::util

* Add license

* Format

5 months agoImprove parquet performance: Skip levels computation for required struct arrays in...
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:05:03 +0000 (14:05 +0000)] 
Improve parquet performance: Skip levels computation for required struct arrays in parquet (#1035)

* Skip levels computation for required struct arrays (#1034)

* Review feedback

5 months agoRestrict RecordReader and friends to scalar types (#1132) (#1155)
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:02:30 +0000 (14:02 +0000)] 
Restrict RecordReader and friends to scalar types (#1132) (#1155)

5 months agoExtends parquet fuzz tests to also tests nulls, dictionaries and row groups with...
Raphael Taylor-Davies [Tue, 11 Jan 2022 14:00:50 +0000 (14:00 +0000)] 
Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages  (#1053) (#1110)

* Parquet fuzz tests (#1053)

* Test multiple WriterVersions

* Revert array_reader change

5 months agoMove more parquet functionality behind experimental feature flag (#1032) (#1134)
Raphael Taylor-Davies [Mon, 10 Jan 2022 21:50:31 +0000 (21:50 +0000)] 
Move more parquet functionality behind experimental feature flag (#1032)  (#1134)

* Move more parquet functionality behind experimental feature flag (#1032)

* Fix logical conflicts

5 months agoImplement SIMD comparison operations for types with less than 4 lanes (i128) (#1146)
Jörn Horstmann [Mon, 10 Jan 2022 21:49:46 +0000 (22:49 +0100)] 
Implement SIMD comparison operations for types with less than 4 lanes (i128) (#1146)

* Implement simd mask creation for 128 bit types

* Adjust comparison kernels to always append 64 bit chunks

* Only append minimal number of bytes

* Add benchmark for MonthDayNano comparison

* Fix typo in comment

Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
* Fix typo in comment

Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com>
5 months agoFix undefined behavor in GenericStringArray::from_iter_values (#1145)
Andrew Lamb [Mon, 10 Jan 2022 21:47:06 +0000 (16:47 -0500)] 
Fix undefined behavor in GenericStringArray::from_iter_values (#1145)

* Fix undefined behavor in GenericStringArray::from_iter_values

* Cleanup code and tests

* clippy

* Fix test

5 months agoUpdate readme to clarify versioning (#1142) active_release 7.0.0
Andrew Lamb [Sat, 8 Jan 2022 10:28:32 +0000 (05:28 -0500)] 
Update readme to clarify versioning (#1142)

5 months agoUpdate version to 7.0.0 and update CHANGELOG (#1141)
Andrew Lamb [Sat, 8 Jan 2022 10:19:08 +0000 (05:19 -0500)] 
Update version to 7.0.0 and update CHANGELOG (#1141)

* Update changelog generator

* Bring changelog from 6.5.0

* Update changelog

* Update version to 7.0.0

5 months agofeat(ipc): support for reading union arrays through IPC (#1140)
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:46:16 +0000 (14:46 -0800)] 
feat(ipc): support for reading union arrays through IPC (#1140)

5 months agoDyn comparison of interval arrays (#1106) (#1107)
Raphael Taylor-Davies [Thu, 6 Jan 2022 22:22:35 +0000 (22:22 +0000)] 
Dyn comparison of interval arrays (#1106) (#1107)

* Dyn comparison of interval arrays (#1106)

* fix fmt

* Skip test when simd is enabled

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agofeat: union schema serialization/deserialization for ipc (#1135)
Helgi Kristvin Sigurbjarnarson [Thu, 6 Jan 2022 22:12:40 +0000 (14:12 -0800)] 
feat: union schema serialization/deserialization for ipc (#1135)

5 months ago*_dyn_scalar kernels: Support Float32Array and Float64Array, (#1127)
Andrew Lamb [Thu, 6 Jan 2022 22:12:23 +0000 (17:12 -0500)] 
*_dyn_scalar kernels: Support Float32Array and Float64Array,  (#1127)

* *_dyn_scalar kernels: Support Float32Array and Float64Array, use ToPrimitive rather than `Into<i128>`m take take &dyn Array rather than `ArrayRef`

* Update APIs for *_dyn_bool_scalar kernels

5 months agoAdd more information on SIMD (#1138)
Benson Muite [Thu, 6 Jan 2022 22:11:01 +0000 (01:11 +0300)] 
Add more information on SIMD (#1138)

5 months agoAdd dyn boolean kernels (#1131)
Matthew Turner [Wed, 5 Jan 2022 21:53:32 +0000 (16:53 -0500)] 
Add dyn boolean kernels (#1131)

* Add dyn bool kernels

* Add tests

* Update error messages

* Update test

* Fix test

* Update doc strings

5 months agoFix reading of dictionary encoded pages with null values (#1111) (#1130)
Yordan Pavlov [Wed, 5 Jan 2022 19:57:19 +0000 (19:57 +0000)] 
Fix reading of dictionary encoded pages with null values (#1111) (#1130)

* fix reading of dictionary encoded pages with null values

* fix linting issues

5 months agoMake arrow::array_reader private (#1032) (#1133)
Raphael Taylor-Davies [Wed, 5 Jan 2022 16:27:27 +0000 (16:27 +0000)] 
Make arrow::array_reader private (#1032) (#1133)

5 months agoImplement Array for ArrayRef, Improve as_* kernels to take `&dyn Array` (#1129)
Andrew Lamb [Wed, 5 Jan 2022 13:45:27 +0000 (08:45 -0500)] 
Implement Array for ArrayRef, Improve as_* kernels to take `&dyn Array` (#1129)

* Implement Array for ArrayRef

* Improve as_* kernels to take &dyn Array

* remove uneeded pyarrow binding

5 months agoAdd Schema::with_metadata and Field::with_metadata (#1092)
Andrew Lamb [Wed, 5 Jan 2022 12:29:25 +0000 (07:29 -0500)] 
Add Schema::with_metadata and Field::with_metadata (#1092)

5 months agoallow using custom datetime format for inference and parsing csv file (#1112)
Sumit [Sun, 2 Jan 2022 16:42:43 +0000 (17:42 +0100)] 
allow using custom datetime format for inference and parsing csv file (#1112)

* allow using custom datetime format for inference and parsing csv file

The patch extends the current implementation to allow passing a custom
datetime_re and datetime_format to the ReaderBuilder.

datetime_re is used infer schema of the csv and then datetime_format is
used to parse the actual string to a Date64.
ofcourse  passing non-compatible datetime_re and datetime_format values
is going to fail the parsing or inference, however it is an expected but
hard-to-detect failure.

* Incorporate some clippy recommendations for limit count of call args

The patch adds a new struct to collect all these options together and
then passes the struct around. Ideally the struct could be embedded into
the reader but that can be done as separate exercise.

* Detect presence of timezone in format while parsing csv for date64

The patch decides on using NaiveDateTime or DateTime from chrono lib
based on presence of timezone components

chrono expects timezone to be presetn if DateTime is used, errors
otherwise. Whereas NaiveDateTime ignores timezone even if explicitly
provided.

5 months agoUpdate Union Array to add `UnionMode`, match latest Arrow Spec, and rename `new...
Andrew Lamb [Sun, 2 Jan 2022 14:37:45 +0000 (09:37 -0500)] 
Update Union Array to add `UnionMode`,  match latest Arrow Spec, and rename `new` -> `unsafe new_unchecked()` (#885)

* Update union array to new null handling

* Update arrow/src/array/array_union.rs

* correct comment

5 months agoAdd kernel and tests (#1125)
Matthew Turner [Sun, 2 Jan 2022 14:24:50 +0000 (09:24 -0500)] 
Add kernel and tests (#1125)

5 months agoAdd kernel and tests (#1123)
Matthew Turner [Sun, 2 Jan 2022 14:24:18 +0000 (09:24 -0500)] 
Add kernel and tests (#1123)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd kernel and tests (#1122)
Matthew Turner [Sun, 2 Jan 2022 13:45:08 +0000 (08:45 -0500)] 
Add kernel and tests (#1122)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd neq dyn scalar kernel (#1118)
Matthew Turner [Sun, 2 Jan 2022 13:02:22 +0000 (08:02 -0500)] 
Add neq dyn scalar kernel (#1118)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add gt_dyn_scalar kernel

* Add gt_eq_dyn_scalar kernel

* Add neq_dyn_scalar kernel

* Add kernel to err message

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd gt eq dyn scalar kernel (#1117)
Matthew Turner [Sun, 2 Jan 2022 12:08:24 +0000 (07:08 -0500)] 
Add gt eq dyn scalar kernel (#1117)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add gt_dyn_scalar kernel

* Add gt_eq_dyn_scalar kernel

* Add kernel to err message

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd gt dyn scalar kernel (#1116)
Matthew Turner [Sun, 2 Jan 2022 11:47:28 +0000 (06:47 -0500)] 
Add gt dyn scalar kernel (#1116)

* Add gt_dyn_scalar kernel

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd lt eq dyn scalar kernel (#1115)
Matthew Turner [Sun, 2 Jan 2022 11:33:51 +0000 (06:33 -0500)] 
Add lt eq dyn scalar kernel (#1115)

* Add lt_dyn_scalar and tests

* Add lt_eq_dyn_scalar kernel

* Add kernel to error message

* fix merge problem

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoAdd kernel and tests (#1121)
Matthew Turner [Sun, 2 Jan 2022 11:25:00 +0000 (06:25 -0500)] 
Add kernel and tests (#1121)

5 months agoAdd kernel and tests (#1124)
Matthew Turner [Sun, 2 Jan 2022 11:11:00 +0000 (06:11 -0500)] 
Add kernel and tests (#1124)

5 months agoAdd lt dyn scalar kernel (#1114)
Matthew Turner [Sun, 2 Jan 2022 11:08:36 +0000 (06:08 -0500)] 
Add lt dyn scalar kernel (#1114)

* Add lt_dyn_scalar and tests

* Add kernel to error message

5 months agofix bug: error type for BufferBuilder (#1104)
Kun Liu [Sun, 2 Jan 2022 11:06:40 +0000 (19:06 +0800)] 
fix bug: error type for BufferBuilder (#1104)

* fix bug: error type for BufferBuilder

* fix clippy

5 months agoDefine eq_dyn_scalar API (#1074)
Matthew Turner [Sat, 1 Jan 2022 12:06:08 +0000 (07:06 -0500)] 
Define eq_dyn_scalar API (#1074)

* Squash

* Cleanup error messages

5 months agoMutableArrayData support extend decimal data type (#1100)
Kun Liu [Wed, 29 Dec 2021 19:51:05 +0000 (03:51 +0800)] 
MutableArrayData support extend decimal data type (#1100)

* support extend decimal data type

* add more test

5 months agoPrint the 'FixedSizeBinaryArray' like a normal 'BinaryArray' (#1097)
Francis Le Roy [Wed, 29 Dec 2021 19:50:40 +0000 (20:50 +0100)] 
Print the 'FixedSizeBinaryArray' like a normal 'BinaryArray' (#1097)

* Print the 'FixedBinaryArray' like a normal 'BinaryArray'

* apply cargo fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
5 months agoimplement eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn for timestamp types...
Liang-Chi Hsieh [Wed, 29 Dec 2021 13:20:27 +0000 (05:20 -0800)] 
implement eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn for timestamp types (#1095)

* implement eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn for timestamp types

* Simplify test code

5 months agoAllow proc-macro2 dependency to be flexible (#1102)
Andrew Lamb [Wed, 29 Dec 2021 12:01:22 +0000 (07:01 -0500)] 
Allow proc-macro2 dependency to be flexible (#1102)

6 months agosupport cast decimal to decimal (#1084)
Kun Liu [Thu, 23 Dec 2021 13:53:00 +0000 (21:53 +0800)] 
support cast decimal to decimal (#1084)

* support cast decimal to decimal

* add test case

* remove meaningless code

6 months agoFix like regex escaping (#1085)
Daniël Heres [Wed, 22 Dec 2021 17:54:30 +0000 (18:54 +0100)] 
Fix like regex escaping (#1085)

* Fix like regex escaping

* Fix like regex escaping

* Fix doctest

* Simplify

6 months agosupport cast decimal to signed numeric (#1073)
Kun Liu [Wed, 22 Dec 2021 16:43:44 +0000 (00:43 +0800)] 
support cast decimal to signed numeric (#1073)

* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type
support decimal to unsigned numeric

* address the comments and fix the clippy

6 months agoUpdate pyo3 to 0.15 (#1076)
dbr/Ben [Wed, 22 Dec 2021 16:36:06 +0000 (03:36 +1100)] 
Update pyo3 to 0.15 (#1076)

* Update pyo3 to 0.15

* Update pyo3 in integration tests also

6 months agoparquet: Use constant for RLE decoder buffer size (#1070)
Andrew Lamb [Tue, 21 Dec 2021 11:52:56 +0000 (06:52 -0500)] 
parquet: Use constant for RLE decoder buffer size (#1070)

6 months agoAdd Schema::project and RecordBatch::project functions (#1033)
Stephen Carman [Mon, 20 Dec 2021 16:48:43 +0000 (11:48 -0500)] 
Add Schema::project and RecordBatch::project functions  (#1033)

* Allow Schema and RecordBatch to project schemas on specific columns returning a new schema with those columns only

* Addressing PR updates and adding a test for out of range projection

* switch to &[usize]

* fix: clippy and fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
6 months agoBox RleDecoder index buffer (#1061) (#1062)
Raphael Taylor-Davies [Mon, 20 Dec 2021 16:46:23 +0000 (16:46 +0000)] 
Box RleDecoder index buffer (#1061) (#1062)

* Box RleDecoder index buffer (#1061)

* Format

6 months agosupport cast signed numeric to decimal (#1044)
Kun Liu [Mon, 20 Dec 2021 16:36:53 +0000 (00:36 +0800)] 
support cast signed numeric to decimal (#1044)

* support cast signed numeric to decimal

* add test for i8,i16,i32,i64,f32,f64 casted to decimal

* change format of float64

* add none test; merge integer test together

6 months agofix(compute): LIKE escape parenthesis (#1042)
Dmitry Patsura [Mon, 20 Dec 2021 16:31:52 +0000 (19:31 +0300)] 
fix(compute): LIKE escape parenthesis (#1042)

Signed-off-by: Dmitry Patsura <talk@dmtry.me>
6 months agoAdd MONTH_DAY_NANO interval type, impl `ArrowNativeType` for `i128` (#779)
baishen [Mon, 20 Dec 2021 14:41:00 +0000 (22:41 +0800)] 
Add MONTH_DAY_NANO interval type, impl `ArrowNativeType` for `i128` (#779)

* support interval MonthDayNano

* fix

* fix

* fix

* fix test

* add IPC integration test

* fix rat

* update patch

* fix

* fmt

* fix

* fix

* fix

* fix

* fix

* fix

* remove integration-testing/unskip.patch

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
6 months agoBooleanBufferBuilder correct buffer length (#1051) (#1052)
Raphael Taylor-Davies [Mon, 20 Dec 2021 13:28:17 +0000 (13:28 +0000)] 
BooleanBufferBuilder correct buffer length (#1051) (#1052)

6 months agoAddress benchmarks that aren't compiling (#1001)
Carol (Nichols || Goulding) [Fri, 17 Dec 2021 19:23:33 +0000 (14:23 -0500)] 
Address benchmarks that aren't compiling (#1001)

* Add a CI job that checks benchmarks (but doesn't run them)

* The feature test_common must be turned on to build parquet benchmarks

* Align cache keys

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
6 months agoRemove outdated safety example from doc (#1050)
Andrew Lamb [Fri, 17 Dec 2021 15:38:07 +0000 (10:38 -0500)] 
Remove outdated safety example from doc (#1050)

6 months agoUse existing array type in `take` kernel (#1046)
Max Burke [Fri, 17 Dec 2021 11:20:49 +0000 (03:20 -0800)] 
Use existing array type in `take` kernel (#1046)

* Need to use type from data so that we do not lose, for example, timezone information

* add test for take preseving timezone

6 months agoAvoid allocating vector of indices in lexicographical_partition_ranges (#998)
Jörn Horstmann [Wed, 15 Dec 2021 19:56:45 +0000 (20:56 +0100)] 
Avoid allocating vector of indices in lexicographical_partition_ranges (#998)

* Avoid allocating vector of indices in lexicographical_partition_ranges

* Adjust comments

* Improve comments and remove one unneeded parameter

6 months agoMark `MutableBuffer::typed_data_mut` unsafe (#1029)
Andrew Lamb [Wed, 15 Dec 2021 19:56:16 +0000 (14:56 -0500)] 
Mark `MutableBuffer::typed_data_mut` unsafe (#1029)

* Mark `MutableBuffer::typed_data_mut` unsafe

* fmt

* Mark use of `typed_data_but` as unsafe in simd kernels

6 months agoExtract method to drive PageIterator -> RecordReader (#1031)
Raphael Taylor-Davies [Tue, 14 Dec 2021 19:41:18 +0000 (19:41 +0000)] 
Extract method to drive PageIterator -> RecordReader (#1031)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
6 months agoSimplify parquet arror `RecordReader` (#1021)
Raphael Taylor-Davies [Mon, 13 Dec 2021 21:44:47 +0000 (21:44 +0000)] 
Simplify parquet arror `RecordReader` (#1021)

6 months agoClarify governance of arrow crate (#1030)
Andrew Lamb [Sun, 12 Dec 2021 21:04:33 +0000 (16:04 -0500)] 
Clarify governance of arrow crate (#1030)

6 months agoForce new cargo and target caching to fix CI (#1023)
Andrew Lamb [Fri, 10 Dec 2021 15:11:45 +0000 (10:11 -0500)] 
Force new cargo and target caching to fix CI (#1023)

6 months agoFix: fixes a broken link and some missing styling in the main arrow crate docs (...
Adam Gutglick [Thu, 9 Dec 2021 18:18:49 +0000 (20:18 +0200)] 
Fix: fixes a broken link and some missing styling in the main arrow crate docs (#1013)

6 months agoRemove out of date comment (#1008)
Andrew Lamb [Mon, 6 Dec 2021 20:39:52 +0000 (15:39 -0500)] 
Remove out of date comment (#1008)

6 months agoMinimize features of indexmap and chrono (#1000)
Carol (Nichols || Goulding) [Sat, 4 Dec 2021 15:28:30 +0000 (10:28 -0500)] 
Minimize features of indexmap and chrono (#1000)

* Disable default features of chrono; only enable features needed

Chrono's default features contain "oldtime", which is deprecated.
According to [the docs](https://docs.rs/chrono/0.4.19/chrono/#duration),

> new code should disable the oldtime feature and use the
> chrono::Duration type instead. The oldtime feature is enabled by
> default for backwards compatibility, but future versions of Chrono
> are likely to remove the feature entirely.

so follow that recommendation by setting default-features to false. And
actually, only Arrow needs the "clock" feature, so all the other
features can stay off too to minimize the feature set that projects
depending on arrow or parquet are forced to enable.

* Explicitly enable indexmap's "std" feature

The indexmap crate uses the autocfg crate to do target detection to
determine whether `std` is available. Arrow isn't targeting `no_std`
environments, so the target detection isn't necessary. This might save
some build time.

https://github.com/bluss/indexmap/pull/145

6 months agoDocstrings for Timestamp*Array. (#988)
Navin [Sat, 4 Dec 2021 15:21:47 +0000 (02:21 +1100)] 
Docstrings for Timestamp*Array. (#988)

* Docstrings for TimestampSecondArray.

* fixup! Docstrings for TimestampSecondArray.

6 months agoUpdate rust version to 1.57 (#1003)
Carlos [Sat, 4 Dec 2021 15:13:19 +0000 (23:13 +0800)] 
Update rust version to 1.57 (#1003)

6 months agoAdd full data validation for ArrayData::try_new() (#921)
Andrew Lamb [Sat, 4 Dec 2021 11:43:45 +0000 (06:43 -0500)] 
Add full data validation for ArrayData::try_new() (#921)

* Add full data validation for ArrayData::try_new()

* Only look at offset+len indexes

Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
* fix test

* fmt

* test for array indexes

Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
6 months agoRemove unneeded `rc` feature of serde (#990)
Carol (Nichols || Goulding) [Fri, 3 Dec 2021 21:58:04 +0000 (16:58 -0500)] 
Remove unneeded `rc` feature of serde (#990)

Fixes #989.

This feature opts into impls for `Rc` and `Arc`, but none of the data
structures that use Serialize/Deserialize actually contain `Rc` or
`Arc`s.

See:

- [Serde docs](https://serde.rs/feature-flags.html#-features-rc)
- [PR adding this](https://github.com/apache/arrow/pull/3016)