arrow-rs.git
30 hours agoMove DecimalArray to array_decimal.rs (#1986) master
Remzi Yang [Fri, 1 Jul 2022 18:40:43 +0000 (02:40 +0800)] 
Move DecimalArray to array_decimal.rs (#1986)

Signed-off-by: remzi <13716567376yh@gmail.com>
43 hours agoRemove PrimitiveBuilder::finish_dict (#1978) (#1980)
Raphael Taylor-Davies [Fri, 1 Jul 2022 05:25:36 +0000 (06:25 +0100)] 
Remove PrimitiveBuilder::finish_dict (#1978) (#1980)

45 hours agoFix clippy (#1984)
Liang-Chi Hsieh [Fri, 1 Jul 2022 03:40:13 +0000 (20:40 -0700)] 
Fix clippy (#1984)

2 days agoDeclare the value_length of decimal array as a `const` (#1968)
Remzi Yang [Thu, 30 Jun 2022 23:26:59 +0000 (07:26 +0800)] 
Declare the value_length of decimal array as a `const` (#1968)

* declare the value_length of decimal array as a const

Signed-off-by: remzi <13716567376yh@gmail.com>
* simpl the value function

Signed-off-by: remzi <13716567376yh@gmail.com>
2 days agoSupport dictionary array for subtract and multiply kernel (#1971)
Liang-Chi Hsieh [Thu, 30 Jun 2022 16:05:40 +0000 (09:05 -0700)] 
Support dictionary array for subtract and multiply kernel (#1971)

* Support dictionary array for subtract kernel

* Support dictionary array in multiply kernel

2 days agoadd column index and offset index (#1935)
Kun Liu [Thu, 30 Jun 2022 09:51:30 +0000 (17:51 +0800)] 
add column index and offset index (#1935)

3 days agoCalculate n_buffers in FFI_ArrowArray by data layout (#1960)
Liang-Chi Hsieh [Wed, 29 Jun 2022 19:54:42 +0000 (12:54 -0700)] 
Calculate n_buffers in FFI_ArrowArray by data layout (#1960)

* Fix n_buffers

* Add test

* Add code comment

* Don't put null pointer if no null buffer by spec

* Trigger Build

3 days agoArbitrary size concat elements utf8 (#1787)
Ismail-Maj [Wed, 29 Jun 2022 19:14:37 +0000 (21:14 +0200)] 
Arbitrary size concat elements utf8 (#1787)

* arbitrary size combine_option_bitmap and tests

* more tests and error

* format

* more tests

* clone and reduce

* arbitrary size concat_elements_utf8

* nit

* tests

* Update arrow/src/compute/kernels/concat_elements.rs

* support one element input

* split implementations

* fmt

3 days agoFaster StringDictionaryBuilder (#1851) (#1861)
Raphael Taylor-Davies [Wed, 29 Jun 2022 18:12:40 +0000 (19:12 +0100)] 
Faster StringDictionaryBuilder (#1851) (#1861)

3 days agoFix the behavior of `from_fixed_size_list` when offset > 0 (#1964)
Remzi Yang [Wed, 29 Jun 2022 17:27:22 +0000 (01:27 +0800)] 
Fix the behavior of `from_fixed_size_list` when offset > 0 (#1964)

* fix the behaviour of from_fixed_size_list when offset > 0

Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/array/array_binary.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 days agoSet is_adjusted_to_utc if any timezone set (#1932) (#1953)
Raphael Taylor-Davies [Wed, 29 Jun 2022 09:29:22 +0000 (10:29 +0100)] 
Set is_adjusted_to_utc if any timezone set (#1932) (#1953)

* Set is_adjusted_to_utc if any timezone set (#1932)

* Fix roundtrip

3 days agoUse InMemoryColumnChunkReader (#1956)
Raphael Taylor-Davies [Wed, 29 Jun 2022 09:28:53 +0000 (10:28 +0100)] 
Use InMemoryColumnChunkReader (#1956)

3 days agofix the doc of value_length (#1957)
Remzi Yang [Wed, 29 Jun 2022 04:21:05 +0000 (12:21 +0800)] 
fix the doc of value_length (#1957)

Signed-off-by: remzi <13716567376yh@gmail.com>
4 days agoAdd add_dyn for DictionaryArray support (#1951)
Liang-Chi Hsieh [Tue, 28 Jun 2022 16:37:54 +0000 (09:37 -0700)] 
Add add_dyn for DictionaryArray support (#1951)

* Add add_dyn for DictionaryArray support

* Print lengths in error message

* Add null values to test cases

4 days agoUnpin clap (#1867) (#1954)
Raphael Taylor-Davies [Tue, 28 Jun 2022 10:03:37 +0000 (11:03 +0100)] 
Unpin clap (#1867) (#1954)

4 days agofix bug: write column metadata to the behind of the column chunk data (#1947)
Kun Liu [Tue, 28 Jun 2022 06:34:53 +0000 (14:34 +0800)] 
fix bug: write column metadata to the behind of the column chunk data (#1947)

4 days agoremove casting other type to null type (#1942)
Kun Liu [Tue, 28 Jun 2022 03:56:47 +0000 (11:56 +0800)] 
remove casting other type to null type (#1942)

6 days agoRequire Send+Sync bounds for Allocation trait (#1945)
Jörn Horstmann [Sun, 26 Jun 2022 21:18:05 +0000 (23:18 +0200)] 
Require Send+Sync bounds for Allocation trait (#1945)

8 days agoUpdate version and changelog for version 17.0.0 (#1926) 17.0.0
Andrew Lamb [Fri, 24 Jun 2022 20:27:54 +0000 (16:27 -0400)] 
Update version and changelog for version 17.0.0 (#1926)

* Update version to 17.0.0

* Initial changelog

* Updates

* update changelog

* touchups

* Fix old changelog

* Fix heading

8 days agoadd readme (#1940)
Remzi Yang [Fri, 24 Jun 2022 19:00:26 +0000 (03:00 +0800)] 
add readme (#1940)

Signed-off-by: remzi <13716567376yh@gmail.com>
8 days agoSet adjusted to UTC if UTC timezone (#1932) (#1937)
Raphael Taylor-Davies [Fri, 24 Jun 2022 12:22:39 +0000 (13:22 +0100)] 
Set adjusted to UTC if UTC timezone (#1932) (#1937)

8 days agoSupport casting `NULL` to/from `Decimal` (#1922)
Kun Liu [Fri, 24 Jun 2022 07:00:16 +0000 (15:00 +0800)] 
Support casting `NULL` to/from `Decimal` (#1922)

* support NULL type values to decimal

* support NULL type values to decimal

* Update arrow/src/compute/kernels/cast.rs

* Update arrow/src/compute/kernels/cast.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
9 days agoSplit up parquet::arrow::array_reader (#1483) (#1933)
Raphael Taylor-Davies [Thu, 23 Jun 2022 20:51:30 +0000 (21:51 +0100)] 
Split up parquet::arrow::array_reader (#1483) (#1933)

* Split up parquet::arrow::array_reader

* RAT

9 days agoUpdate indexmap dependency (#1929)
Raphael Taylor-Davies [Thu, 23 Jun 2022 20:03:13 +0000 (21:03 +0100)] 
Update indexmap dependency (#1929)

9 days agoAdd Decimal256 API (#1914)
Liang-Chi Hsieh [Thu, 23 Jun 2022 19:29:48 +0000 (12:29 -0700)] 
Add Decimal256 API (#1914)

* Add Decimal256

* Dedup

* Truncate string representation by precision

* Update arrow/src/util/decimal.rs

Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
* Update arrow/src/util/decimal.rs

Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
* Update arrow/src/util/decimal.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* For review

* Fix clippy

* For review

* Move another one

Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
9 days agoAdd ArrowWriter doctest (#1927) (#1930)
Raphael Taylor-Davies [Thu, 23 Jun 2022 19:03:24 +0000 (20:03 +0100)] 
Add ArrowWriter doctest (#1927) (#1930)

9 days agoComplete and fixup split of `arrow::array::builder` module (#1843) (#1928)
Raphael Taylor-Davies [Thu, 23 Jun 2022 18:38:56 +0000 (19:38 +0100)] 
Complete and fixup  split of `arrow::array::builder` module (#1843) (#1928)

* Fix merge conflicts from (#1879)

* Split out of decimal_builder (#1843)

* Fix RAT

* Format

* Restore (#1842)

9 days agoreplace checked_add(sub).unwrap() with +(-) (#1924)
Remzi Yang [Thu, 23 Jun 2022 12:47:08 +0000 (20:47 +0800)] 
replace checked_add(sub).unwrap() with +(-) (#1924)

Signed-off-by: remzi <13716567376yh@gmail.com>
10 days agoUpdate half requirement from 1.8 to 2.0 (#1919)
dependabot[bot] [Wed, 22 Jun 2022 13:15:18 +0000 (09:15 -0400)] 
Update half requirement from 1.8 to 2.0 (#1919)

Updates the requirements on [half](https://github.com/starkat99/half-rs) to permit the latest version.
- [Release notes](https://github.com/starkat99/half-rs/releases)
- [Changelog](https://github.com/starkat99/half-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/starkat99/half-rs/compare/v1.8.0...v2.0.0)

---
updated-dependencies:
- dependency-name: half
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
11 days agominor: add a diagram to docstring for DictionaryArray (#1909)
Andrew Lamb [Tue, 21 Jun 2022 15:04:39 +0000 (11:04 -0400)] 
minor: add a diagram to docstring for DictionaryArray (#1909)

* minor: add a diagram to docstring for DictionaryArray

* minor: clarify docstring on `DictionaryArray::lookup_key`

* Apply suggestions from code review

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
* make values smaller and keys larger

Co-authored-by: Wakahisa <nevilledips@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
11 days agominor: clarify docstring on `DictionaryArray::lookup_key` (#1910)
Andrew Lamb [Tue, 21 Jun 2022 11:23:17 +0000 (07:23 -0400)] 
minor: clarify docstring on `DictionaryArray::lookup_key` (#1910)

* minor: clarify docstring on `DictionaryArray::lookup_key`

* Update arrow/src/array/array_dictionary.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
11 days agoFix max and min decimal (#1917)
Liang-Chi Hsieh [Tue, 21 Jun 2022 04:57:34 +0000 (21:57 -0700)] 
Fix max and min decimal (#1917)

12 days agoAdd `DictionaryArray::key` function (#1912)
Andrew Lamb [Mon, 20 Jun 2022 06:10:16 +0000 (02:10 -0400)] 
Add `DictionaryArray::key` function (#1912)

12 days agoIssue #1876: Explicitly declare the used features for each dependency in parquet...
Martin Grigorov [Mon, 20 Jun 2022 06:02:44 +0000 (09:02 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in parquet (#1895)

Declare that parquet module uses rand's std and std_rng features

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
12 days agoIssue #1876: Explicitly declare the used features for each dependency in parquet_deri...
Martin Grigorov [Mon, 20 Jun 2022 06:02:11 +0000 (09:02 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test (#1897)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
13 days agoFix misaligned reference and logic error in crc32 (#1906)
Ben Kimock [Sun, 19 Jun 2022 08:08:56 +0000 (04:08 -0400)] 
Fix misaligned reference and logic error in crc32 (#1906)

Previously, this code tried to turn a &[u8] into a &[u32] without
checking alignment. This means it could and did create misaligned
references, which is UB. This can be detected by running the tests with
-Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host
is). This change adopts the approach from the murmurhash implementation.

The previous implementation also ignored the tail bytes. The loop at the
end treats num_bytes as if it is the full length of the slice, but it
isn't, num_bytes number of bytes after the last 4-byte group. This can
be observed for example by changing "hello" to just "hell" in the tests.
Under the old implementation, the test will still pass. Now, the value
that comes out changes, and "hello" and "hell" hash to different values.

13 days agoIssue #1876: Explicitly declare the used features for each dependency in parquet_deri...
Martin Grigorov [Sun, 19 Jun 2022 08:01:43 +0000 (11:01 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in parquet_derive (#1896)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
13 days agoIssue #1876: Explicitly declare the used features for each dependency in integration_...
Martin Grigorov [Sun, 19 Jun 2022 07:58:31 +0000 (10:58 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in integration_testing (#1898)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
13 days agoRefine the `bit_util` of Parquet. (#1905)
Remzi Yang [Sun, 19 Jun 2022 07:50:24 +0000 (15:50 +0800)] 
Refine the `bit_util` of Parquet. (#1905)

* refine log2

Signed-off-by: remzi <13716567376yh@gmail.com>
* remove log2

Signed-off-by: remzi <13716567376yh@gmail.com>
* return as u8

Signed-off-by: remzi <13716567376yh@gmail.com>
* refine more functions

Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs

Signed-off-by: remzi <13716567376yh@gmail.com>
* rm auto file

Signed-off-by: remzi <13716567376yh@gmail.com>
2 weeks agoAdd validation to `RecordBatch` for non-nullable fields containing null values (...
Andy Grove [Sat, 18 Jun 2022 14:43:12 +0000 (08:43 -0600)] 
Add validation to `RecordBatch` for non-nullable fields containing null values (#1890)

2 weeks agoUse bit_slice in combine_option_bitmap (#1900)
Jörn Horstmann [Sat, 18 Jun 2022 05:55:15 +0000 (07:55 +0200)] 
Use bit_slice in combine_option_bitmap (#1900)

2 weeks agoCorrect nullable in read_dictionary (#1893)
Liang-Chi Hsieh [Sat, 18 Jun 2022 05:49:48 +0000 (22:49 -0700)] 
Correct nullable in read_dictionary (#1893)

* Correct nullable

* For review

2 weeks agoSplit up arrow::array::builder module (#1843) (#1879)
Dalton Modlin [Fri, 17 Jun 2022 19:51:56 +0000 (14:51 -0500)] 
Split up arrow::array::builder module (#1843) (#1879)

* Split up arrow::array::builder module (#1843)

* Split up arrow::array::builder module (#1843)

- Removed old builder.rs
- Added missing licensing header to builder submodules
- Updated builder submodule imports and exports
- Updated array mod file builder imports and exports

2 weeks agoCloses #1902: Print the original and projected RecordBatch in dynamic_types example...
Martin Grigorov [Fri, 17 Jun 2022 18:25:18 +0000 (21:25 +0300)] 
Closes #1902: Print the original and projected RecordBatch in dynamic_types example (#1903)

* Closes #1902: Print the original and projected RecordBatch in dynamic_types example

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
* Issue #1902: Fix formatting issue

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
2 weeks agoMinor: Add examples to docstring for `weekday` (#1894)
Andrew Lamb [Fri, 17 Jun 2022 16:15:45 +0000 (12:15 -0400)] 
Minor: Add examples to docstring for `weekday` (#1894)

2 weeks agoFeature add weekday temporal kernel (#1891)
Remco Verhoef [Fri, 17 Jun 2022 10:52:16 +0000 (12:52 +0200)] 
Feature add weekday temporal kernel (#1891)

* add weekday temporal kernel

* add test

* fmt

* fix test

* use correct weekday numbers

2 weeks agoClean up the test code of `substring` kernel. (#1853)
Remzi Yang [Fri, 17 Jun 2022 10:46:42 +0000 (18:46 +0800)] 
Clean up the test code of `substring` kernel. (#1853)

* clean up

Signed-off-by: remzi <13716567376yh@gmail.com>
* clean up fixed binary

Signed-off-by: remzi <13716567376yh@gmail.com>
* trigger GitHub actions

* directly panic

Signed-off-by: remzi <13716567376yh@gmail.com>
* add docs for helper macros

Signed-off-by: remzi <13716567376yh@gmail.com>
2 weeks agoImplement UnionArray FieldData using Type Erasure (#1842)
Raphael Taylor-Davies [Thu, 16 Jun 2022 20:45:42 +0000 (21:45 +0100)] 
Implement UnionArray FieldData using Type Erasure (#1842)

* Strongly typed UnionBuilder

* Update arrow/src/array/builder.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 weeks agoIssue #1876: Explicitly declare the used features for each dependency in parquet...
Martin Grigorov [Thu, 16 Jun 2022 20:19:55 +0000 (23:19 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in parquet (#1881)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
2 weeks agoIssue #1876: Explicitly declare the used features for each dependency in arrow-flight...
Martin Grigorov [Thu, 16 Jun 2022 20:19:45 +0000 (23:19 +0300)] 
Issue #1876: Explicitly declare the used features for each dependency in arrow-flight (#1880)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
2 weeks agorename function (#1889)
Andrey Frolov [Thu, 16 Jun 2022 20:18:26 +0000 (23:18 +0300)] 
rename function (#1889)

2 weeks agoSupport specifying list capacities for MutableArrayData (#1885)
Jörn Horstmann [Thu, 16 Jun 2022 20:07:36 +0000 (22:07 +0200)] 
Support specifying list capacities for MutableArrayData (#1885)

2 weeks agoExpose `BitSliceIterator` and `BitIndexIterator` (#1864) (#1865)
Raphael Taylor-Davies [Thu, 16 Jun 2022 20:07:04 +0000 (21:07 +0100)] 
Expose `BitSliceIterator` and `BitIndexIterator` (#1864) (#1865)

* Expose BitSliceIterator and BitIndexIterator (#1864)

* Update arrow/src/compute/kernels/filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Format

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2 weeks agodocs: remove experimental marker on C Stream Interface (#1821)
Will Jones [Thu, 16 Jun 2022 19:31:27 +0000 (21:31 +0200)] 
docs: remove experimental marker on C Stream Interface (#1821)

2 weeks agoDo not print exit code from miri, instead it should be the return value of the script...
Jörn Horstmann [Thu, 16 Jun 2022 19:09:36 +0000 (21:09 +0200)] 
Do not print exit code from miri, instead it should be the return value of the script (#1873)

2 weeks ago Add Decimal128 API and use it in DecimalArray and DecimalBuilder (#1871)
Liang-Chi Hsieh [Thu, 16 Jun 2022 06:50:39 +0000 (23:50 -0700)] 
 Add Decimal128 API and use it in DecimalArray and DecimalBuilder (#1871)

* Add Decimal128

* Fix clippy

* Add code comment

2 weeks agoMark typed buffer APIs safe (#996) (#1027) (#1866)
Raphael Taylor-Davies [Wed, 15 Jun 2022 13:18:45 +0000 (14:18 +0100)] 
Mark typed buffer APIs safe (#996) (#1027) (#1866)

* Mark typed buffer APIs safe (#996) (#1027)

* Fix parquet

* Format

* Review feedback

2 weeks agoAdd vec-inspired APIs to BufferBuilder (#1850) (#1860)
Raphael Taylor-Davies [Wed, 15 Jun 2022 13:18:28 +0000 (14:18 +0100)] 
Add vec-inspired APIs to BufferBuilder (#1850) (#1860)

2 weeks agoAdd `nilike` support in `comparison` (#1846)
Alex Qyoun-ae [Wed, 15 Jun 2022 13:13:49 +0000 (17:13 +0400)] 
Add `nilike` support in `comparison` (#1846)

2 weeks agoIssue #1876 - Explicitly declare the used features for each dependency (#1877)
Martin Grigorov [Wed, 15 Jun 2022 09:06:16 +0000 (12:06 +0300)] 
Issue #1876 - Explicitly declare the used features for each dependency (#1877)

* Issue #1876 - Explicitly declare the used features for each dependency

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
* Issue #1876 - Enable "std" and "std_rng" features for rand in dev-dependencies

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
2 weeks agoFixes #1874 - Upgrade `regex` dependency to 1.5.6 (#1875)
Martin Grigorov [Tue, 14 Jun 2022 19:37:54 +0000 (22:37 +0300)] 
Fixes #1874 - Upgrade `regex` dependency to 1.5.6 (#1875)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
2 weeks agoFix memory leak in ffi test (#1878)
Liang-Chi Hsieh [Tue, 14 Jun 2022 18:08:28 +0000 (11:08 -0700)] 
Fix memory leak in ffi test (#1878)

* Fix leaks

* Fix leaks under array::ffi::tests

2 weeks agoAdd PyArrow integration test for C Stream Interface (#1848)
Liang-Chi Hsieh [Mon, 13 Jun 2022 22:29:46 +0000 (15:29 -0700)] 
Add PyArrow integration test for C Stream Interface (#1848)

* Add PyArrow integration test for ArrowArrayStream

* Trigger Build

2 weeks agoUpdate vendored protobuf (#1869)
Raphael Taylor-Davies [Mon, 13 Jun 2022 21:29:01 +0000 (22:29 +0100)] 
Update vendored protobuf (#1869)

2 weeks agoExclude some long-running tests when running under miri (#1863)
Jörn Horstmann [Mon, 13 Jun 2022 21:18:02 +0000 (23:18 +0200)] 
Exclude some long-running tests when running under miri (#1863)

* Exclude some long-running tests when running under miri

* Print exit code of miri

* Disable miri stacked borrows checking because it uses too much memory for github actions

* Exclude another slow test from running under miri

* Add link to miri issue

2 weeks agoPin clap to 3.1 (#1867) (#1868)
Raphael Taylor-Davies [Mon, 13 Jun 2022 19:02:20 +0000 (20:02 +0100)] 
Pin clap to 3.1 (#1867) (#1868)

2 weeks agoOmit validity buffer in PrimitiveArray::from_iter when all values are valid (#1859)
Jörn Horstmann [Mon, 13 Jun 2022 18:32:55 +0000 (20:32 +0200)] 
Omit validity buffer in PrimitiveArray::from_iter when all values are valid (#1859)

* Omit validity buffer in PrimitiveArray::from_iter when all values are valid

* Use bool::then instead of if

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2 weeks agoZero copy page decoding from bytes (#1810)
Raphael Taylor-Davies [Mon, 13 Jun 2022 12:36:42 +0000 (13:36 +0100)] 
Zero copy page decoding from bytes (#1810)

2 weeks agoAdd two `from` methods for `FixedSizeBinaryArray` (#1854)
Remzi Yang [Mon, 13 Jun 2022 11:09:51 +0000 (19:09 +0800)] 
Add two `from` methods for `FixedSizeBinaryArray` (#1854)

* add from

Signed-off-by: remzi <13716567376yh@gmail.com>
* fmt

Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests

Signed-off-by: remzi <13716567376yh@gmail.com>
2 weeks agoRemove simd and avx512 bitwise kernels in favor of autovectorization (#1830)
Jörn Horstmann [Sun, 12 Jun 2022 17:09:02 +0000 (19:09 +0200)] 
Remove simd and avx512 bitwise kernels in favor of autovectorization (#1830)

* Remove simd and avx512 bitwise kernels since they are actually slightly slower than the autovectorized version

* Add notes about target-cpu to README

3 weeks agoRefactor parquet::arrow module (#1827)
Raphael Taylor-Davies [Sat, 11 Jun 2022 17:52:05 +0000 (18:52 +0100)] 
Refactor parquet::arrow module (#1827)

* Refactor parquet::arrow module

* Fix doc

* Remove legacy benchmarks

3 weeks agospeed up `substring_by_char` by about 2.5x (#1832)
Remzi Yang [Sat, 11 Jun 2022 09:18:04 +0000 (17:18 +0800)] 
speed up `substring_by_char` by about 2.5x (#1832)

* speed up substring_by_char

Signed-off-by: remzi <13716567376yh@gmail.com>
* better estimate the length of value buffer

Signed-off-by: remzi <13716567376yh@gmail.com>
3 weeks agoAdd `quarter` support in `temporal` (#1836)
Alex Qyoun-ae [Sat, 11 Jun 2022 07:14:50 +0000 (11:14 +0400)] 
Add `quarter` support in `temporal` (#1836)

3 weeks agoMINOR: Remove version check from `test_command_help` (#1844)
Liang-Chi Hsieh [Sat, 11 Jun 2022 05:35:24 +0000 (22:35 -0700)] 
MINOR: Remove version check from `test_command_help` (#1844)

* Remove version

* Update parquet/src/bin/parquet-fromcsv.rs

3 weeks agoUpdate versions and CHANGELOG for `16.0.0` (#1826) 16.0.0
Andrew Lamb [Fri, 10 Jun 2022 17:27:03 +0000 (13:27 -0400)] 
Update versions and CHANGELOG for `16.0.0` (#1826)

* Update versions to 16.0.0

* Update changelog

* Updates

* updates

* Update for latest

* polish

3 weeks agoUpdate module docs (#1840)
Raphael Taylor-Davies [Fri, 10 Jun 2022 17:08:21 +0000 (18:08 +0100)] 
Update module docs (#1840)

3 weeks agoMake equals_datatype method public, enabling other modules (#1838)
Remco Verhoef [Fri, 10 Jun 2022 16:58:29 +0000 (18:58 +0200)] 
Make equals_datatype method public, enabling other modules (#1838)

* we'd like to use this function in datafusion

3 weeks agoUpdate safety disclaimer (#1837)
Raphael Taylor-Davies [Fri, 10 Jun 2022 10:54:40 +0000 (11:54 +0100)] 
Update safety disclaimer (#1837)

3 weeks agoadd parquet-fromcsv (#1) (#1798)
kazuhiko kikuchi [Fri, 10 Jun 2022 07:50:55 +0000 (16:50 +0900)] 
add parquet-fromcsv (#1) (#1798)

* add parquet-fromcsv (#1)

add command line tool for convert csv to parquet.

* add `text` for non-rust documentation text

* Update parquet/src/bin/parquet-fromcsv.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* automate update help text

* remove anyhow

* add rat_exclude_files

* update test_command_help

* fix clippy warnings

* add writer-version, max-row-group-size arg

* fix cargo fmt lint

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 weeks agofix annotation (#1831)
Yang Jiang [Fri, 10 Jun 2022 06:51:12 +0000 (14:51 +0800)] 
fix annotation (#1831)

3 weeks agoChange to use `resolver v2`, test more feature flag combinations in CI, fix errors...
Raphael Taylor-Davies [Thu, 9 Jun 2022 20:53:25 +0000 (21:53 +0100)] 
Change to use `resolver v2`, test more feature flag combinations in CI, fix errors (#1630) (#1822)

* Test more feature flag combinations in CI (#1630)

* Clippy lints

* Fix clippy fix

* Fix running examples from workspace root

* Format

* Fix arrow benchmark features

* Split up CI yaml

* Add docs

* Rework caching

* Use lockfile for cache key

Don't install unused components

3 weeks agoUpdate MIRI pin (#1828)
Raphael Taylor-Davies [Thu, 9 Jun 2022 19:58:27 +0000 (20:58 +0100)] 
Update MIRI pin (#1828)

* Update MIRI

* Unpin MIRI

3 weeks agoFix list equal for empty offset list array (#1818)
Liang-Chi Hsieh [Thu, 9 Jun 2022 16:17:24 +0000 (09:17 -0700)] 
Fix list equal for empty offset list array (#1818)

* Fix list equal for empty offset list array

* For review

3 weeks agoAdd ScalarBuffer abstraction (#1811) (#1820)
Raphael Taylor-Davies [Thu, 9 Jun 2022 07:47:41 +0000 (08:47 +0100)] 
Add ScalarBuffer abstraction (#1811) (#1820)

* Add ScalarBuffer abstraction (#1811)

* Lint fixes

3 weeks agoSeal ArrowNativeType and OffsetSizeTrait (#1028) (#1819)
Raphael Taylor-Davies [Wed, 8 Jun 2022 16:11:00 +0000 (17:11 +0100)] 
Seal ArrowNativeType and OffsetSizeTrait (#1028) (#1819)

* Seal ArrowNativeType (#1028)

* Add docs

3 weeks agoDon't overwrite existing data on snappy decompress (#1806) (#1807)
Raphael Taylor-Davies [Wed, 8 Jun 2022 07:27:34 +0000 (08:27 +0100)] 
Don't overwrite existing data on snappy decompress (#1806) (#1807)

* Don't trample existing data on snappy decompress (#1806)

* Review feedback

3 weeks agoFix Decimal and List ArrayData Validation (#1813) (#1814) (#1816)
Raphael Taylor-Davies [Tue, 7 Jun 2022 22:06:44 +0000 (23:06 +0100)] 
Fix Decimal and List ArrayData Validation (#1813) (#1814) (#1816)

* Fix DecimalArray validation (#1813)

Fix offset validation for sliced children of list arrays (#1814)

* Update arrow/src/array/data.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
3 weeks agoAdd public API for decoding parquet footer (#1804)
Raphael Taylor-Davies [Tue, 7 Jun 2022 21:25:02 +0000 (22:25 +0100)] 
Add public API for decoding parquet footer (#1804)

* Add public API for decoding parquet footer

* Review feedback

3 weeks agoAdd AsyncFileReader trait (#1803)
Raphael Taylor-Davies [Tue, 7 Jun 2022 10:19:22 +0000 (11:19 +0100)] 
Add AsyncFileReader trait (#1803)

* Add AsyncChunkReader trait

* Review feedback

* Rename to AsyncFileReader

3 weeks agorename to substring_kernels.rs (#1805)
Remzi Yang [Tue, 7 Jun 2022 08:07:02 +0000 (16:07 +0800)] 
rename to substring_kernels.rs (#1805)

Signed-off-by: remzi <13716567376yh@gmail.com>
3 weeks agoUse IPC row count info in IPC reader (#1796)
Liang-Chi Hsieh [Mon, 6 Jun 2022 16:59:53 +0000 (09:59 -0700)] 
Use IPC row count info in IPC reader (#1796)

* Use IPC row count info

* Add test

* Update arrow/src/ipc/reader.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 weeks agoWrite validity buffer for UnionArray in V4 IPC message (#1794)
Liang-Chi Hsieh [Mon, 6 Jun 2022 16:03:14 +0000 (09:03 -0700)] 
Write validity buffer for UnionArray in V4 IPC message (#1794)

* Write validity buffer for Union Array in V4 IPC message

* Add test

* Fix clippy

* Fix clippy

3 weeks agofeat:Add function for row alignment with page mask (#1791)
Yang Jiang [Mon, 6 Jun 2022 14:33:25 +0000 (22:33 +0800)] 
feat:Add function for row alignment with page mask (#1791)

* add range and rowRanges

* add some tests for range

* add filter logic

* fix fmt

* fix todo

* fix test

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* fix compute_row_ranges

* fix annotation

* change to use std:ops:RangeInclusive

* Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* fix

* Update parquet/src/file/page_index/range.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 weeks agoFix typos in the Memory and Buffers section of the docs home (#1795)
Marc Garcia [Mon, 6 Jun 2022 13:33:31 +0000 (14:33 +0100)] 
Fix typos in the Memory and Buffers section of the docs home (#1795)

3 weeks agoAdd `Substring_by_char` (#1784)
Remzi Yang [Mon, 6 Jun 2022 13:28:14 +0000 (21:28 +0800)] 
Add `Substring_by_char` (#1784)

* add substring by char

Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs

Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs

Signed-off-by: remzi <13716567376yh@gmail.com>
* add benchmark

Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
3 weeks agoAccess metadata of flushed row groups on write (#1691) (#1774)
Raphael Taylor-Davies [Mon, 6 Jun 2022 12:57:55 +0000 (13:57 +0100)] 
Access metadata of flushed row groups on write (#1691) (#1774)

* Access metadata of flushed row groups on write (#1691)

* Add tests

3 weeks agoArbitrary size combine option bitmap (#1781)
Ismail-Maj [Mon, 6 Jun 2022 12:52:02 +0000 (14:52 +0200)] 
Arbitrary size combine option bitmap (#1781)

* arbitrary size combine_option_bitmap and tests

* more tests and error

* format

* more tests

* clone and reduce

* nit

3 weeks agoRead and skip validity buffer of UnionType Array for V4 ipc message (#1789)
Liang-Chi Hsieh [Sun, 5 Jun 2022 09:00:44 +0000 (02:00 -0700)] 
Read and skip validity buffer of UnionType Array for V4 ipc message (#1789)

* Read valididy buffer for V4 ipc message

* Add unit test

* Fix clippy

4 weeks agoAdd `ParquetFileArrowReader::try_new` (#1782)
Raphael Taylor-Davies [Fri, 3 Jun 2022 23:32:54 +0000 (00:32 +0100)] 
Add `ParquetFileArrowReader::try_new` (#1782)

* Add ParquetFileArrowReader::try_new

* Review feedback

4 weeks agoImplement ChunkReader for Bytes (#1775)
Raphael Taylor-Davies [Fri, 3 Jun 2022 14:37:22 +0000 (15:37 +0100)] 
Implement ChunkReader for Bytes (#1775)

Deprecate SliceableCursor