arrow-rs.git
7 weeks agoPrepare for release of 14.0.0 (#1693) 14.0.0
Andrew Lamb [Fri, 13 May 2022 16:54:14 +0000 (12:54 -0400)] 
Prepare for release of 14.0.0 (#1693)

* Update version to 14.0.0

* Initial draft of changelog

* updates

* update

* RAT, ETC

* Final fixups

7 weeks agoFix docs.rs build (#1696)
Andrew Lamb [Fri, 13 May 2022 10:00:45 +0000 (06:00 -0400)] 
Fix docs.rs build (#1696)

8 weeks agoAdd `async` into doc features (#1349)
Remzi Yang [Thu, 12 May 2022 14:49:14 +0000 (22:49 +0800)] 
Add `async` into doc features (#1349)

* add async to default features

Signed-off-by: remzi <13716567376yh@gmail.com>
* remove async from default features

Signed-off-by: remzi <13716567376yh@gmail.com>
* add doc metadata

Signed-off-by: remzi <13716567376yh@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
8 weeks agofix bench command line options (#1685)
kazuhiko kikuchi [Thu, 12 May 2022 11:22:00 +0000 (20:22 +0900)] 
fix bench command line options (#1685)

fix to criterion bench not accepts options

```
error: Unrecognized option: 'save-baseline'
error: bench failed
```
https://github.com/bheisler/criterion.rs/issues/193#issuecomment-415740713

8 weeks agosupport duration in ffi (#1689)
ryan-jacobs1 [Thu, 12 May 2022 02:19:00 +0000 (22:19 -0400)] 
support duration in ffi (#1689)

8 weeks agoUse bytes in parquet (#1474) (#1683)
Raphael Taylor-Davies [Wed, 11 May 2022 16:54:07 +0000 (17:54 +0100)] 
Use bytes in parquet (#1474) (#1683)

8 weeks agoFix generate_unions_case for Rust case (#1677)
Liang-Chi Hsieh [Tue, 10 May 2022 23:45:24 +0000 (16:45 -0700)] 
Fix generate_unions_case for Rust case (#1677)

* Fix generate_unions_case for rust case

* Add test

8 weeks agoEnable branch protection (#1679)
Raphael Taylor-Davies [Tue, 10 May 2022 19:30:48 +0000 (20:30 +0100)] 
Enable branch protection (#1679)

8 weeks agoSupport dictionary arrays in length and bit_length (#1674)
Liang-Chi Hsieh [Mon, 9 May 2022 19:31:37 +0000 (12:31 -0700)] 
Support dictionary arrays in length and bit_length (#1674)

* Support dictionary arrays in length and bit_length

* Fix typo

8 weeks agoAdd dictionary array support for substring function (#1665)
Chao Sun [Mon, 9 May 2022 17:57:21 +0000 (10:57 -0700)] 
Add dictionary array support for substring function (#1665)

* initial commit

* add test

* comments

* more comments

8 weeks agoFix logical merge conflict in #1588 (#1678)
Raphael Taylor-Davies [Mon, 9 May 2022 10:47:28 +0000 (11:47 +0100)] 
Fix logical merge conflict in #1588 (#1678)

8 weeks agoAdd support for nested list arrays from parquet to arrow arrays (#993) (#1588)
Raphael Taylor-Davies [Mon, 9 May 2022 10:23:17 +0000 (11:23 +0100)] 
Add support for nested list arrays from parquet to arrow arrays (#993) (#1588)

* Add support for nested list arrays (#993)

* More tests

* Minor cleanup

* Filter nulls

* Update comments

* Fix doc

* Fix clippy

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* More tests

* Add sanity check to ListArrayReader

* Fix test_struct_array_reader

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
8 weeks agoreplace by (#1664)
Remzi Yang [Sun, 8 May 2022 23:04:24 +0000 (07:04 +0800)] 
replace  by (#1664)

Signed-off-by: remzi <13716567376yh@gmail.com>
8 weeks agoReceive schema from flight data. (#1670)
Liang-Chi Hsieh [Sun, 8 May 2022 20:25:31 +0000 (13:25 -0700)] 
Receive schema from flight data. (#1670)

8 weeks agodo not require exact version (#1668)
Kamil Konior [Sun, 8 May 2022 13:51:39 +0000 (15:51 +0200)] 
do not require exact version (#1668)

8 weeks agoExclude `dict_id` and `dict_is_ordered` from equality comparison of `Field` (#1647)
Liang-Chi Hsieh [Sun, 8 May 2022 07:00:35 +0000 (00:00 -0700)] 
Exclude `dict_id` and `dict_is_ordered` from equality comparison of `Field` (#1647)

* Impl Eq for Field

* Impl Hash

* Impl PartialOrd for Field

* Impl Ord instead

* Add comment and test.

8 weeks agoFix generate_nested_dictionary_case integration test failure for Rust cases (#1636)
Liang-Chi Hsieh [Sun, 8 May 2022 06:58:41 +0000 (23:58 -0700)] 
Fix generate_nested_dictionary_case integration test failure for Rust cases (#1636)

* Fix ipc nested dict

* Rename dictionaries_by_field.

* Fix a few more inconsistent names

* Rename a few more

2 months agoAdd `DecimalType` support in `new_null_array ` (#1659)
Yijie Shen [Fri, 6 May 2022 18:48:35 +0000 (02:48 +0800)] 
Add `DecimalType` support in `new_null_array ` (#1659)

* minor: creation of all null decimal array

* fmt

2 months agoRemove parquet dictionary converters (#1661) (#1662)
Raphael Taylor-Davies [Fri, 6 May 2022 18:45:00 +0000 (19:45 +0100)] 
Remove parquet dictionary converters (#1661) (#1662)

2 months agoRemove `StringOffsetTrait` and `BinaryOffsetTrait` (#1645)
Remzi Yang [Fri, 6 May 2022 06:43:54 +0000 (14:43 +0800)] 
Remove `StringOffsetTrait` and `BinaryOffsetTrait` (#1645)

* remove StringOffsetTrait

Signed-off-by: remzi <13716567376yh@gmail.com>
* remove BinaryOffsetTrait

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agoPretty Print `UnionArray`s (#1648)
Trent Feda [Fri, 6 May 2022 06:12:31 +0000 (02:12 -0400)] 
Pretty Print `UnionArray`s (#1648)

* Add Union support to pretty/display

* Add inner null to nested Union test, Add type id to error print

2 months agoMinor: simplify the function `GenericListArray::get_type` (#1650)
Remzi Yang [Thu, 5 May 2022 19:04:43 +0000 (03:04 +0800)] 
Minor: simplify the function `GenericListArray::get_type` (#1650)

* simplify the code

Signed-off-by: remzi <13716567376yh@gmail.com>
* correct the format

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agoAdd `substring` support for `FixedSizeBinaryArray` (#1633)
Remzi Yang [Thu, 5 May 2022 01:30:37 +0000 (09:30 +0800)] 
Add `substring` support for `FixedSizeBinaryArray` (#1633)

* add function, no tests yet

Signed-off-by: remzi <13716567376yh@gmail.com>
* adjust the fn structure

Signed-off-by: remzi <13716567376yh@gmail.com>
* cargo fmt

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix clippy

Signed-off-by: remzi <13716567376yh@gmail.com>
* add identical test cases for utf8 and binary

Signed-off-by: remzi <13716567376yh@gmail.com>
* add benchmark

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix offset bug

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix a nit

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agocorrect arrow-flight readme version (#1641)
Andrew Lamb [Tue, 3 May 2022 17:58:52 +0000 (13:58 -0400)] 
correct arrow-flight readme version (#1641)

2 months agoDo not assume dictionaries exists in footer (#1631)
Peter C. Jentsch [Tue, 3 May 2022 16:14:13 +0000 (09:14 -0700)] 
Do not assume dictionaries exists in footer (#1631)

* do not assume footer exists, fixes issue #1335

* fix cargo fmt and clippy errors

2 months agoFix UnionArray is_null (#1632)
Liang-Chi Hsieh [Mon, 2 May 2022 21:03:42 +0000 (14:03 -0700)] 
Fix UnionArray is_null (#1632)

2 months agoexpose row-group flush in public api (#1634)
Kamil Konior [Mon, 2 May 2022 17:07:31 +0000 (19:07 +0200)] 
expose row-group flush in public api (#1634)

* expose row-group flush in public api

* a try to improve method names

* tiny refactors, beautifying code

2 months agoPrepare for arrow-rs 13.0.0 release (#1624) 13.0.0
Andrew Lamb [Fri, 29 Apr 2022 17:54:38 +0000 (13:54 -0400)] 
Prepare for arrow-rs 13.0.0 release (#1624)

* Update version to 13.0.0

* Draft changelog

* updates

* moar

* cleanup

* remove duplicated entry, update for #1589

* Final updates

2 months agoFix decimals min max statistics (#1621)
Atef Sawaed [Fri, 29 Apr 2022 17:49:43 +0000 (20:49 +0300)] 
Fix decimals min max statistics (#1621)

* Fix incorrect writing of min/max statistics

* Refactor

* Decimals Byte array comparison

* Add Decimals test

* Use slice instead of vector

* Fix build error

* Fix build error

* Coding Style

* More tests

* Refactor

* Improve code readability

Co-authored-by: Atef Sawaed <atefsawaed@microsoft.com>
2 months agouse standard library (#1629)
Remzi Yang [Fri, 29 Apr 2022 17:22:35 +0000 (01:22 +0800)] 
use standard library (#1629)

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agoClarify docs on UnionBuilder::append_null (#1628)
Andrew Lamb [Thu, 28 Apr 2022 18:51:01 +0000 (14:51 -0400)] 
Clarify docs on UnionBuilder::append_null (#1628)

2 months agoFix Null Mask Handling in ArrayData And UnionArray (#1589)
Raphael Taylor-Davies [Thu, 28 Apr 2022 17:39:37 +0000 (18:39 +0100)] 
Fix Null Mask Handling in ArrayData And UnionArray (#1589)

* Fix ListArray and StructArray equality (#626)

* Simplify null masking in equality comparisons

Various UnionArray fixes (#1598) (#1596) (#1591) (#1590)

Fix handling of null masks in ArrayData equality (#1599)

* Miscellaneous fixes

* Fix structure null equality

* Review feedback

2 months agoUpdate flatbuffers requirement from =2.1.1 to =2.1.2 (#1622)
dependabot[bot] [Wed, 27 Apr 2022 18:59:07 +0000 (14:59 -0400)] 
Update flatbuffers requirement from =2.1.1 to =2.1.2 (#1622)

Updates the requirements on [flatbuffers](https://github.com/google/flatbuffers) to permit the latest version.
- [Release notes](https://github.com/google/flatbuffers/releases)
- [Commits](https://github.com/google/flatbuffers/commits)

---
updated-dependencies:
- dependency-name: flatbuffers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 months agoAdd `substring` support for binary (#1608)
Remzi Yang [Mon, 25 Apr 2022 20:29:20 +0000 (04:29 +0800)] 
Add `substring` support for binary (#1608)

* add substring for binary
fix some test for string array

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix another bug in test

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* add with_nulls tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* add without_nulls tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix clippy

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agoDon't access and validate offset buffer in ListArray::from(ArrayData) (#1602)
Jörn Horstmann [Mon, 25 Apr 2022 20:27:30 +0000 (22:27 +0200)] 
Don't access and validate offset buffer in ListArray::from(ArrayData) (#1602)

* Don't access and validate offset buffer in ListArray::from(ArrayData)

* Simplify empty buffer creation

* fix clippy

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agoAdd example readme (#1615)
Andrew Lamb [Mon, 25 Apr 2022 11:18:21 +0000 (07:18 -0400)] 
Add example readme (#1615)

* Add example readme

* RAT

* prettier

* Review comments

2 months agoImprove docs and examples links on main readme (#1614)
Andrew Lamb [Mon, 25 Apr 2022 10:24:27 +0000 (06:24 -0400)] 
Improve docs and examples links on main readme (#1614)

* Improve docs and examples links on main readme

* markdown

* improve wording

2 months agoRead/Write nested dictionaries under FixedSizeList in IPC (#1610)
Liang-Chi Hsieh [Mon, 25 Apr 2022 06:15:57 +0000 (23:15 -0700)] 
Read/Write nested dictionaries under FixedSizeList in IPC (#1610)

* Read/Write nested dictionaries under FixedSizeList in IPC

* Fix clippy

2 months agoUpdate datatypes in parquet::basic::LogicalType (#1612)
tfeda [Sun, 24 Apr 2022 05:42:45 +0000 (01:42 -0400)] 
Update datatypes in parquet::basic::LogicalType (#1612)

* Convert LogicalType::STRING(StringType) to LogicalType::String

* Covert LogicalType::MAP(MapType) to LogicalType::Map

* Convert LogicalType::LIST(ListType) to LogicalType::List

* Converted LogicalType::ENUM(EnumType) to LogicalType::Enum

* Convert LogicalType::DECIMAL(DecimalType) to LogicalType::Decimal { scale, precision }, Fix String proc macro error

* Convert LogicalType::DATE(DateType) to LogicalType::Date

* Convert LogicalType::TIME(TimeType) to LogicalType::Time { is_adjusted_to_u_t_c: bool, unit: TimeUnit }

* Convert LogicalType::TIMESTAMP(TimestampType) to LogicalType::Timestamp { is_adjusted_to_u_t_c: bool, unit: TimeUnit }

* Convert LogicalType::INTEGER(IntType) to LogicalType::Integer { bit_width: i8, is_signed: bool }

* Convert LogicalType::UNKNOWN,JSON,BSON to LogicalType::Unknown,Json,Bson

* Convert LogicalType::UUID to LogicalType::Uuid

* Add ref t to simplify printing in src/arrow/schema from_int32()

2 months agoParquet: schema validation should allow scale == precision for decimal type (#1607)
Chao Sun [Sat, 23 Apr 2022 06:42:01 +0000 (23:42 -0700)] 
Parquet: schema validation should allow scale == precision for decimal type (#1607)

2 months agoFix map nullable flag in `ParquetTypeConverter` (#1592)
Liang-Chi Hsieh [Thu, 21 Apr 2022 08:22:37 +0000 (01:22 -0700)] 
Fix map nullable flag in `ParquetTypeConverter` (#1592)

* Fix map nullable flag

* Fix tests

* Add another map case

2 months agoRead/write nested dictionary under map in ipc stream reader/writer (#1583)
Liang-Chi Hsieh [Wed, 20 Apr 2022 18:58:55 +0000 (11:58 -0700)] 
Read/write nested dictionary under map in ipc stream reader/writer (#1583)

* IPC read/write nested dict in map

* For review comments

* update comment

2 months agoRead/write nested dictionary under large list in ipc stream reader/writer (#1585)
Liang-Chi Hsieh [Wed, 20 Apr 2022 07:25:18 +0000 (00:25 -0700)] 
Read/write nested dictionary under large list in ipc stream reader/writer (#1585)

* Read/Write nested dictionaries under LargeList in IPC

* For review comments

2 months agoAdd utf-8 validation checking for `substring` (#1577)
Remzi Yang [Tue, 19 Apr 2022 16:26:34 +0000 (00:26 +0800)] 
Add utf-8 validation checking for `substring` (#1577)

* add utf-8 validation checking
update doc
add a test for invalid array type

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests
clean up

Signed-off-by: remzi <13716567376yh@gmail.com>
* test the worst case

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc and tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* use std method is_char_boundary
update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* add 2 substring benches

Signed-off-by: remzi <13716567376yh@gmail.com>
* replace dyn Fn with loop unswitching

Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2 months agoDerive `Clone` and `PartialEq` for json `DecoderOptions` (#1581)
Andrew Lamb [Mon, 18 Apr 2022 05:51:18 +0000 (01:51 -0400)] 
Derive `Clone` and `PartialEq` for json `DecoderOptions` (#1581)

* Implement Clone and PartialEq for DecoderOptions

* fmt

2 months agoSupport casting to/from `DataType::Null` in `cast` kernel (#1572)
DuRipeng [Mon, 18 Apr 2022 05:50:24 +0000 (13:50 +0800)] 
Support casting to/from `DataType::Null` in `cast` kernel (#1572)

* cast null from and to others

* fmt fix

* add more ut

Co-authored-by: duripeng <duripeng@baidu.com>
2 months agoRemove reference indirection for Copy types in substring (#1576)
Raphael Taylor-Davies [Sun, 17 Apr 2022 10:52:55 +0000 (11:52 +0100)] 
Remove reference indirection for Copy types in substring (#1576)

2 months agoReplace &Option<T> with Option<&T> (#1571)
tfeda [Sun, 17 Apr 2022 08:37:56 +0000 (04:37 -0400)] 
Replace &Option<T>  with Option<&T> (#1571)

* Convert &Option<T> arguments to Option<&T>

* fix cargo fmt issues

* Update FileMetaData created_by() to return Option<&str>, Fix integration test

2 months agoUse littleendian arrow files for projection_should_work (#1573)
Liang-Chi Hsieh [Sat, 16 Apr 2022 17:23:53 +0000 (10:23 -0700)] 
Use littleendian arrow files for projection_should_work (#1573)

2 months agoPrepare for 12.0.0 release: Update version and CHANGELOG (#1569) 12.0.0
Andrew Lamb [Fri, 15 Apr 2022 22:46:09 +0000 (18:46 -0400)] 
Prepare for 12.0.0 release: Update version and CHANGELOG  (#1569)

* Update version to 12.0.0

* Update changelog script

* Update changelog for 12.0.0

2 months agoSplit out ListArrayReader into separate module (#1483) (#1563)
Raphael Taylor-Davies [Fri, 15 Apr 2022 14:09:25 +0000 (15:09 +0100)] 
Split out ListArrayReader into separate module (#1483) (#1563)

* Split out ListArrayReader into separate module (#1483)

* Fix merge conflict

2 months agofix infinite loop in not fully packed bit-packed runs (#1555)
Raphael Taylor-Davies [Fri, 15 Apr 2022 14:09:12 +0000 (15:09 +0100)] 
fix infinite loop in not fully packed bit-packed runs (#1555)

* fix infinite loop in not fully packed bit-packed runs

* Add test and also fix get_batch_with_dict

Co-authored-by: Andrei Liakhovich <anliakho@microsoft.com>
2 months agoAdd test for creating FixedSizeBinaryArray::try_from_sparse_iter failed when given...
Andrew Lamb [Fri, 15 Apr 2022 13:46:50 +0000 (09:46 -0400)] 
Add test for creating FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones (#1551)

* Add test for creating FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones

* fix test

2 months agoRead/write nested dictionary in ipc stream reader/writer (#1566)
Liang-Chi Hsieh [Fri, 15 Apr 2022 13:05:23 +0000 (06:05 -0700)] 
Read/write nested dictionary in ipc stream reader/writer (#1566)

* Read dictionary inside dictionary

* Fix clippy

2 months agoinitial commit (#1564)
Chao Sun [Fri, 15 Apr 2022 12:52:03 +0000 (05:52 -0700)] 
initial commit (#1564)

2 months agoSplit out MapArray into separate module (#1483) (#1562)
Raphael Taylor-Davies [Fri, 15 Apr 2022 12:50:28 +0000 (13:50 +0100)] 
Split out MapArray into separate module (#1483) (#1562)

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agoSupport empty projection in ParquetRecordBatchReader (#1560)
Raphael Taylor-Davies [Fri, 15 Apr 2022 12:43:42 +0000 (13:43 +0100)] 
Support empty projection in ParquetRecordBatchReader (#1560)

* Support empty projection in ParquetRecordBatchReader

* Fix async reader

* Fix RAT

2 months agoFix incorrect `into_buffers` for UnionArray (#1567)
Liang-Chi Hsieh [Thu, 14 Apr 2022 18:42:29 +0000 (11:42 -0700)] 
Fix incorrect `into_buffers` for UnionArray (#1567)

* Fix incorrect buffers for UnionArray

* Add test

* Re-enable test_filter_union_array_sparse

2 months agoAdd CI check for full validation mode (#1546)
Andrew Lamb [Thu, 14 Apr 2022 17:30:16 +0000 (13:30 -0400)] 
Add CI check for full validation mode (#1546)

* Add force_validate feature

* Disable some redundant checks

* Add issue link

* Add test with force_validate feature flag

* fix up message

* disable due to https://github.com/apache/arrow-rs/issues/1547

* disable ipc test failure

* fix clippy

* Fix doctest to pass with force_validate enabled

2 months agoAdd option to skip decoding arrow metadata from parquet (#1459) (#1558)
Raphael Taylor-Davies [Thu, 14 Apr 2022 14:45:15 +0000 (15:45 +0100)] 
Add option to skip decoding arrow metadata from parquet (#1459) (#1558)

* Add option to skip decoding arrow metadata from parquet (#1459)

Fix inference from null logical type (#1557)

Replace some `&Option<T>` with `Option<&T>` (#1556)

* Update parquet/src/arrow/arrow_reader.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agoImprove JSON reader documentation (#1559)
Andrew Lamb [Wed, 13 Apr 2022 16:46:53 +0000 (12:46 -0400)] 
Improve JSON reader documentation (#1559)

* Improve JSON reader documentation

* Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2 months agoConslidate JSON reader options (#1539)
Andrew Lamb [Wed, 13 Apr 2022 12:51:15 +0000 (08:51 -0400)] 
Conslidate JSON reader options (#1539)

2 months agoFix reading dictionaries from nested structs in ipc `StreamReader` (#1550)
Thomas Peiselt [Wed, 13 Apr 2022 11:28:32 +0000 (13:28 +0200)] 
Fix reading dictionaries from nested structs in ipc `StreamReader` (#1550)

* Fix reading dictionaries from nested structs in ipc `StreamReader`

* Fix clippy error

* Apply review comment about field naming in test

2 months agoCreate RecordBatch With Non-Zero Row Count But No Columns (#1536) (#1552)
Raphael Taylor-Davies [Tue, 12 Apr 2022 23:44:22 +0000 (00:44 +0100)] 
Create RecordBatch With Non-Zero Row Count But No Columns (#1536) (#1552)

* Support empty RecordBatch (#1536)

* Placate clippy

* Review feedback

* Fix doc

* Fix create_record_batch_slice_empty_batch test

2 months agoAllow json reader/decoder to work with format_strings for each field (#1451)
Sumit [Tue, 12 Apr 2022 13:39:05 +0000 (15:39 +0200)] 
Allow json reader/decoder to work with format_strings for each field  (#1451)

* implement parser for remaining types used by json decoder

* added format strings (hashmap) to json reader

the format_string map's key is column name.
The value will be used to parse the date64/date32 types from json
if the read value is of string type

add tests for formatted parser for date{32,64}type for json readers

all-parsers start

fixup! added format strings (hashmap) to json reader

* add DecoderOptions struct for holding options for decoder

that way later extensions to the decoder can be added to this struct
without breaking API.

* Fixup some comments

* added test for string parsing json reader for time{32,64} types

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agominor: enable Date32/64 to String/LargeString cast (#1534)
Yijie Shen [Tue, 12 Apr 2022 13:34:18 +0000 (21:34 +0800)] 
minor: enable Date32/64 to String/LargeString cast (#1534)

2 months agoupdate the doc of `substring` (#1529)
Remzi Yang [Sun, 10 Apr 2022 11:15:27 +0000 (19:15 +0800)] 
update the doc of `substring` (#1529)

* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agofix clippy errors in 1.60 (#1527)
Andrew Lamb [Fri, 8 Apr 2022 10:19:56 +0000 (06:19 -0400)] 
fix clippy errors in 1.60 (#1527)

2 months agoAdd `new_from_strings` to create `MapArrays` (#1507)
Liang-Chi Hsieh [Thu, 7 Apr 2022 21:07:45 +0000 (14:07 -0700)] 
Add `new_from_strings` to create `MapArrays` (#1507)

* Add new_from_strings

* Fix clippy

* Update arrow/src/array/array_map.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Fix typo too

* For review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agoFix reading nested lists from parquet files (#1517)
Liang-Chi Hsieh [Thu, 7 Apr 2022 21:06:39 +0000 (14:06 -0700)] 
Fix reading nested lists from parquet files  (#1517)

* Fix

* Add test

2 months agoDecouple buffer deallocation from ffi and allow creating buffers from rust vec (...
Jörn Horstmann [Thu, 7 Apr 2022 21:01:21 +0000 (23:01 +0200)] 
Decouple buffer deallocation from ffi and allow creating buffers from rust vec (#1494)

* Decouple buffer deallocation from ffi and allow zero-copy buffer creation from rust vectors or strings

* Move allocation owner to alloc module

* Rename and comment Deallocation variants

* Fix doc link

* Explicitly assert that Buffer is UnwindSafe

* fix: doc comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 months agoSpeed up the `substring` kernel by about 2x (#1512)
Remzi Yang [Thu, 7 Apr 2022 20:31:30 +0000 (04:31 +0800)] 
Speed up the `substring` kernel by about 2x (#1512)

* speed up substring

Signed-off-by: remzi <13716567376yh@gmail.com>
* add comments

Signed-off-by: remzi <13716567376yh@gmail.com>
* reformat code

Signed-off-by: remzi <13716567376yh@gmail.com>
* use trait opject to simplify the code

Signed-off-by: remzi <13716567376yh@gmail.com>
* reformat code

Signed-off-by: remzi <13716567376yh@gmail.com>
* fmt code

Signed-off-by: remzi <13716567376yh@gmail.com>
2 months agochore: Update `prost`, `prost-derive` and `prost-types` to 0.10, `tonic`, and `tonic...
Andrew Lamb [Thu, 7 Apr 2022 19:30:46 +0000 (15:30 -0400)] 
chore: Update `prost`, `prost-derive` and `prost-types` to 0.10, `tonic`, and `tonic-build` to `0.7` (#1510)

* chore: Update prost, prost-derive and prost-types to 0.10

* Update tonic requirement from 0.6 to 0.7

Updates the requirements on [tonic](https://github.com/hyperium/tonic) to permit the latest version.
- [Release notes](https://github.com/hyperium/tonic/releases)
- [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/tonic/compare/v0.6.0...v0.7.0)

---
updated-dependencies:
- dependency-name: tonic
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
* Update tonic-build requirement from 0.6 to 0.7

Updates the requirements on [tonic-build](https://github.com/hyperium/tonic) to permit the latest version.
- [Release notes](https://github.com/hyperium/tonic/releases)
- [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/tonic/compare/v0.6.0...v0.7.0)

---
updated-dependencies:
- dependency-name: tonic-build
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
* Update generated code

* Try installing cmake dependencies for flight

* install cmake and protobuf

* Use --experimental_allow_proto3_optional flag

* fix apt-install

* try to install just protobuf compiler

* Add action to configure workspace

* Use prost enabled toolchain

* fixes

* fixups

* fix clippy

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 months agoFix for missing documentation of `GenericListBuilder` (#1525)
Sven Cattell [Wed, 6 Apr 2022 17:39:50 +0000 (13:39 -0400)] 
Fix for missing documentation of `GenericListBuilder` (#1525)

* Should fix  issue #1518

* Lint change.

3 months agoAdd a diagram to `take` kernel documentation (#1524)
Andrew Lamb [Tue, 5 Apr 2022 19:55:56 +0000 (15:55 -0400)] 
Add a diagram to `take` kernel documentation (#1524)

3 months agoMark remove-old-releases.sh executable (#1522)
Andrew Lamb [Tue, 5 Apr 2022 13:39:56 +0000 (09:39 -0400)] 
Mark remove-old-releases.sh executable (#1522)

3 months agoDelete duplicate code in the `sort` kernel (#1519)
Remzi Yang [Tue, 5 Apr 2022 13:32:52 +0000 (21:32 +0800)] 
Delete duplicate code in the `sort` kernel (#1519)

* remove repeated code

Signed-off-by: remzi <13716567376yh@gmail.com>
* delete weird file

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoPrepare for release version `11.1.0` (#1514) 11.1.0
Andrew Lamb [Fri, 1 Apr 2022 15:23:38 +0000 (11:23 -0400)] 
Prepare for release version `11.1.0` (#1514)

* Update release version to 11.1.0

* draft: changelog

* more

* update

* Fixup

3 months ago Implement ArrayEqual for UnionArray (#1469)
Liang-Chi Hsieh [Thu, 31 Mar 2022 18:20:28 +0000 (11:20 -0700)] 
 Implement ArrayEqual for UnionArray (#1469)

* init

* more

* Remove dense/sparse case

* Fix clippy

* For review

* For review

3 months agoAdd FFI for Arrow C Stream Interface (#1384)
Liang-Chi Hsieh [Thu, 31 Mar 2022 01:07:06 +0000 (18:07 -0700)] 
Add FFI for Arrow C Stream Interface (#1384)

* Add FFI for Arrow C Stream Interface

* Add ArrowArrayStreamReader

* Add test

* Fix clippy

* fix format

* define error code

* Regenerate ffi binding using bindgen

* Rewrite test

* Remove CStreamInterface

* Fix clippy error

* Fix more clippy errors

* For review comment.

* Fix clippy error

* Fix clippy error

* not run example code in comment

* ignore doctest

* For review

* Fix clippy

* For review comment

* For review

* Add export_reader_into_raw

* For review

3 months agoClarify docs that SlicesIterator ignores null values (#1504)
Andrew Lamb [Wed, 30 Mar 2022 17:49:12 +0000 (13:49 -0400)] 
Clarify docs that SlicesIterator ignores null values (#1504)

* Clarify docs that SlicesIterator ignores null values

* Update arrow/src/compute/kernels/filter.rs

Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
3 months agoUpdate release scripts to automatically clean up old release versions (#1467)
Andrew Lamb [Wed, 30 Mar 2022 17:48:33 +0000 (13:48 -0400)] 
Update release scripts to automatically clean up old release versions (#1467)

* Automatically clean up old release versions

* Update dev/release/release-tarball.sh

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Add message to delete command

* fix submodules

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 months agoSupport calculating number of chars for `StringArray` (#1503)
Remzi Yang [Wed, 30 Mar 2022 17:47:29 +0000 (01:47 +0800)] 
Support calculating number of chars for `StringArray` (#1503)

* add functions, no tests yet

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests
delete unchecked fn
update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* use lib method
update doc and test

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoImplement `size_hint` and `ExactSizedIterator` for `DecimalArray` (#1506)
Andrew Lamb [Wed, 30 Mar 2022 17:46:20 +0000 (13:46 -0400)] 
Implement `size_hint` and `ExactSizedIterator` for `DecimalArray` (#1506)

* Implement size_hint and ExactSizedIterator for DecimalArray

* clippy

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* Move DecimalIter to iterator.rs

* Bring back doc fixes

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
3 months agoupdate doc (#1491)
Remzi Yang [Tue, 29 Mar 2022 18:59:40 +0000 (02:59 +0800)] 
update doc (#1491)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoUse Arrow take kernel within ListArrayReader (#1490)
Liang-Chi Hsieh [Tue, 29 Mar 2022 18:59:00 +0000 (11:59 -0700)] 
Use Arrow take kernel within ListArrayReader (#1490)

* Remove remove_indices

* For review

3 months agoFix miri error in try_from_trusted_len_iter (#1497)
Jörn Horstmann [Tue, 29 Mar 2022 16:48:37 +0000 (18:48 +0200)] 
Fix miri error in try_from_trusted_len_iter (#1497)

3 months agoAdd `length` kernel support for List Array (#1488)
Remzi Yang [Mon, 28 Mar 2022 20:47:45 +0000 (04:47 +0800)] 
Add `length` kernel support for List Array (#1488)

* add fn for list length
code format

Signed-off-by: remzi <13716567376yh@gmail.com>
* add list support into length function

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoSupport sort for decimal data type (#1487)
Yijie Shen [Mon, 28 Mar 2022 20:38:58 +0000 (04:38 +0800)] 
Support sort for decimal data type (#1487)

3 months agoFix generate_non_canonical_map_case, fix `MapArray` equality (#1476)
Liang-Chi Hsieh [Sun, 27 Mar 2022 10:46:56 +0000 (03:46 -0700)] 
Fix generate_non_canonical_map_case, fix `MapArray` equality  (#1476)

* Revamp list_equal for map type

* Canonicalize schema

* Add nullability and metadata

3 months agoFix reading/writing nested null arrays (#1480) (#1036) (#1399) (#1481)
Raphael Taylor-Davies [Fri, 25 Mar 2022 16:43:52 +0000 (16:43 +0000)] 
Fix reading/writing nested null arrays (#1480) (#1036) (#1399) (#1481)

3 months agoSplit ArrayReaderBuilder into its own module (#1483) (#1485)
Raphael Taylor-Davies [Fri, 25 Mar 2022 12:55:36 +0000 (12:55 +0000)] 
Split ArrayReaderBuilder into its own module (#1483) (#1485)

* Split ArrayReaderBuilder into its own module (#1483)

* Add license header

3 months agoSupport the `length` kernel on Binary Array (#1465)
Remzi Yang [Thu, 24 Mar 2022 18:32:34 +0000 (02:32 +0800)] 
Support the `length` kernel on Binary Array (#1465)

* support length on binary array (not test)
rewrite unary_offset using macro

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests

Signed-off-by: remzi <13716567376yh@gmail.com>
* add non-utf8 test cases

Signed-off-by: remzi <13716567376yh@gmail.com>
* fix some doc

Signed-off-by: remzi <13716567376yh@gmail.com>
* update doc
simplify the way to get offsets. No performance penalty

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agofix doc (#1471)
Remzi Yang [Wed, 23 Mar 2022 20:53:16 +0000 (04:53 +0800)] 
fix doc (#1471)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 months agoFix doc (#1463)
Liang-Chi Hsieh [Tue, 22 Mar 2022 11:11:10 +0000 (04:11 -0700)] 
Fix doc (#1463)

3 months agoFix generate_map_case (#1457)
Liang-Chi Hsieh [Tue, 22 Mar 2022 11:07:08 +0000 (04:07 -0700)] 
Fix generate_map_case (#1457)

3 months agoImprove performance of DictionaryArray::try_new()  (#1435)
jakevin [Tue, 22 Mar 2022 11:06:25 +0000 (19:06 +0800)] 
Improve performance of DictionaryArray::try_new()  (#1435)

* improve `DictionaryArray::try_new()` #1313

* *: fix typo

* *: add cheap validate and unit test

* *: polish the error

* Add safety note

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
3 months agoFix Parquet reader for null list (#1448)
Liang-Chi Hsieh [Tue, 22 Mar 2022 10:09:08 +0000 (03:09 -0700)] 
Fix Parquet reader for null list (#1448)

* Fix Parquet reader for null list

* Test on forked parquet-testing

* For review comments

* Fix clippy

3 months agoRemove Clone and copy source structs internally (#1449)
Liang-Chi Hsieh [Sat, 19 Mar 2022 06:26:50 +0000 (23:26 -0700)] 
Remove Clone and copy source structs internally (#1449)

* Remove Clone and copy source structs internally

* Remove drop_in_place and add more comment

* Add export_into_raw

* Fix format

* Fix clippy

* Move to export_array_into_raw

* Fix clippy

* Fix doc

* Use write_unaligned

3 months agoPrepare for 11.0.0 release (#1461) 11.0.0
Andrew Lamb [Fri, 18 Mar 2022 07:46:56 +0000 (03:46 -0400)] 
Prepare for 11.0.0 release (#1461)

* Update version to 11.0.0

* Update changelog

* update changelog

* fixup

* tweak