Remzi Yang [Fri, 1 Jul 2022 18:40:43 +0000 (02:40 +0800)]
Move DecimalArray to array_decimal.rs (#1986)
Signed-off-by: remzi <13716567376yh@gmail.com>
Raphael Taylor-Davies [Fri, 1 Jul 2022 05:25:36 +0000 (06:25 +0100)]
Remove PrimitiveBuilder::finish_dict (#1978) (#1980)
Liang-Chi Hsieh [Fri, 1 Jul 2022 03:40:13 +0000 (20:40 -0700)]
Fix clippy (#1984)
Remzi Yang [Thu, 30 Jun 2022 23:26:59 +0000 (07:26 +0800)]
Declare the value_length of decimal array as a `const` (#1968)
* declare the value_length of decimal array as a const
Signed-off-by: remzi <13716567376yh@gmail.com>
* simpl the value function
Signed-off-by: remzi <13716567376yh@gmail.com>
Liang-Chi Hsieh [Thu, 30 Jun 2022 16:05:40 +0000 (09:05 -0700)]
Support dictionary array for subtract and multiply kernel (#1971)
* Support dictionary array for subtract kernel
* Support dictionary array in multiply kernel
Kun Liu [Thu, 30 Jun 2022 09:51:30 +0000 (17:51 +0800)]
add column index and offset index (#1935)
Liang-Chi Hsieh [Wed, 29 Jun 2022 19:54:42 +0000 (12:54 -0700)]
Calculate n_buffers in FFI_ArrowArray by data layout (#1960)
* Fix n_buffers
* Add test
* Add code comment
* Don't put null pointer if no null buffer by spec
* Trigger Build
Ismail-Maj [Wed, 29 Jun 2022 19:14:37 +0000 (21:14 +0200)]
Arbitrary size concat elements utf8 (#1787)
* arbitrary size combine_option_bitmap and tests
* more tests and error
* format
* more tests
* clone and reduce
* arbitrary size concat_elements_utf8
* nit
* tests
* Update arrow/src/compute/kernels/concat_elements.rs
* support one element input
* split implementations
* fmt
Raphael Taylor-Davies [Wed, 29 Jun 2022 18:12:40 +0000 (19:12 +0100)]
Faster StringDictionaryBuilder (#1851) (#1861)
Remzi Yang [Wed, 29 Jun 2022 17:27:22 +0000 (01:27 +0800)]
Fix the behavior of `from_fixed_size_list` when offset > 0 (#1964)
* fix the behaviour of from_fixed_size_list when offset > 0
Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/array/array_binary.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Raphael Taylor-Davies [Wed, 29 Jun 2022 09:29:22 +0000 (10:29 +0100)]
Set is_adjusted_to_utc if any timezone set (#1932) (#1953)
* Set is_adjusted_to_utc if any timezone set (#1932)
* Fix roundtrip
Raphael Taylor-Davies [Wed, 29 Jun 2022 09:28:53 +0000 (10:28 +0100)]
Use InMemoryColumnChunkReader (#1956)
Remzi Yang [Wed, 29 Jun 2022 04:21:05 +0000 (12:21 +0800)]
fix the doc of value_length (#1957)
Signed-off-by: remzi <13716567376yh@gmail.com>
Liang-Chi Hsieh [Tue, 28 Jun 2022 16:37:54 +0000 (09:37 -0700)]
Add add_dyn for DictionaryArray support (#1951)
* Add add_dyn for DictionaryArray support
* Print lengths in error message
* Add null values to test cases
Raphael Taylor-Davies [Tue, 28 Jun 2022 10:03:37 +0000 (11:03 +0100)]
Unpin clap (#1867) (#1954)
Kun Liu [Tue, 28 Jun 2022 06:34:53 +0000 (14:34 +0800)]
fix bug: write column metadata to the behind of the column chunk data (#1947)
Kun Liu [Tue, 28 Jun 2022 03:56:47 +0000 (11:56 +0800)]
remove casting other type to null type (#1942)
Jörn Horstmann [Sun, 26 Jun 2022 21:18:05 +0000 (23:18 +0200)]
Require Send+Sync bounds for Allocation trait (#1945)
Andrew Lamb [Fri, 24 Jun 2022 20:27:54 +0000 (16:27 -0400)]
Update version and changelog for version 17.0.0 (#1926)
* Update version to 17.0.0
* Initial changelog
* Updates
* update changelog
* touchups
* Fix old changelog
* Fix heading
Remzi Yang [Fri, 24 Jun 2022 19:00:26 +0000 (03:00 +0800)]
add readme (#1940)
Signed-off-by: remzi <13716567376yh@gmail.com>
Raphael Taylor-Davies [Fri, 24 Jun 2022 12:22:39 +0000 (13:22 +0100)]
Set adjusted to UTC if UTC timezone (#1932) (#1937)
Kun Liu [Fri, 24 Jun 2022 07:00:16 +0000 (15:00 +0800)]
Support casting `NULL` to/from `Decimal` (#1922)
* support NULL type values to decimal
* support NULL type values to decimal
* Update arrow/src/compute/kernels/cast.rs
* Update arrow/src/compute/kernels/cast.rs
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Raphael Taylor-Davies [Thu, 23 Jun 2022 20:51:30 +0000 (21:51 +0100)]
Split up parquet::arrow::array_reader (#1483) (#1933)
* Split up parquet::arrow::array_reader
* RAT
Raphael Taylor-Davies [Thu, 23 Jun 2022 20:03:13 +0000 (21:03 +0100)]
Update indexmap dependency (#1929)
Liang-Chi Hsieh [Thu, 23 Jun 2022 19:29:48 +0000 (12:29 -0700)]
Add Decimal256 API (#1914)
* Add Decimal256
* Dedup
* Truncate string representation by precision
* Update arrow/src/util/decimal.rs
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
* Update arrow/src/util/decimal.rs
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
* Update arrow/src/util/decimal.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* For review
* Fix clippy
* For review
* Move another one
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Raphael Taylor-Davies [Thu, 23 Jun 2022 19:03:24 +0000 (20:03 +0100)]
Add ArrowWriter doctest (#1927) (#1930)
Raphael Taylor-Davies [Thu, 23 Jun 2022 18:38:56 +0000 (19:38 +0100)]
Complete and fixup split of `arrow::array::builder` module (#1843) (#1928)
* Fix merge conflicts from (#1879)
* Split out of decimal_builder (#1843)
* Fix RAT
* Format
* Restore (#1842)
Remzi Yang [Thu, 23 Jun 2022 12:47:08 +0000 (20:47 +0800)]
replace checked_add(sub).unwrap() with +(-) (#1924)
Signed-off-by: remzi <13716567376yh@gmail.com>
dependabot[bot] [Wed, 22 Jun 2022 13:15:18 +0000 (09:15 -0400)]
Update half requirement from 1.8 to 2.0 (#1919)
Updates the requirements on [half](https://github.com/starkat99/half-rs) to permit the latest version.
- [Release notes](https://github.com/starkat99/half-rs/releases)
- [Changelog](https://github.com/starkat99/half-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/starkat99/half-rs/compare/v1.8.0...v2.0.0)
---
updated-dependencies:
- dependency-name: half
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Andrew Lamb [Tue, 21 Jun 2022 15:04:39 +0000 (11:04 -0400)]
minor: add a diagram to docstring for DictionaryArray (#1909)
* minor: add a diagram to docstring for DictionaryArray
* minor: clarify docstring on `DictionaryArray::lookup_key`
* Apply suggestions from code review
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
* make values smaller and keys larger
Co-authored-by: Wakahisa <nevilledips@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
Andrew Lamb [Tue, 21 Jun 2022 11:23:17 +0000 (07:23 -0400)]
minor: clarify docstring on `DictionaryArray::lookup_key` (#1910)
* minor: clarify docstring on `DictionaryArray::lookup_key`
* Update arrow/src/array/array_dictionary.rs
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Liang-Chi Hsieh [Tue, 21 Jun 2022 04:57:34 +0000 (21:57 -0700)]
Fix max and min decimal (#1917)
Andrew Lamb [Mon, 20 Jun 2022 06:10:16 +0000 (02:10 -0400)]
Add `DictionaryArray::key` function (#1912)
Martin Grigorov [Mon, 20 Jun 2022 06:02:44 +0000 (09:02 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in parquet (#1895)
Declare that parquet module uses rand's std and std_rng features
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Martin Grigorov [Mon, 20 Jun 2022 06:02:11 +0000 (09:02 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test (#1897)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Ben Kimock [Sun, 19 Jun 2022 08:08:56 +0000 (04:08 -0400)]
Fix misaligned reference and logic error in crc32 (#1906)
Previously, this code tried to turn a &[u8] into a &[u32] without
checking alignment. This means it could and did create misaligned
references, which is UB. This can be detected by running the tests with
-Zbuild-std --target=x86_64-unknown-linux-gnu (or whatever your host
is). This change adopts the approach from the murmurhash implementation.
The previous implementation also ignored the tail bytes. The loop at the
end treats num_bytes as if it is the full length of the slice, but it
isn't, num_bytes number of bytes after the last 4-byte group. This can
be observed for example by changing "hello" to just "hell" in the tests.
Under the old implementation, the test will still pass. Now, the value
that comes out changes, and "hello" and "hell" hash to different values.
Martin Grigorov [Sun, 19 Jun 2022 08:01:43 +0000 (11:01 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in parquet_derive (#1896)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Martin Grigorov [Sun, 19 Jun 2022 07:58:31 +0000 (10:58 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in integration_testing (#1898)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Remzi Yang [Sun, 19 Jun 2022 07:50:24 +0000 (15:50 +0800)]
Refine the `bit_util` of Parquet. (#1905)
* refine log2
Signed-off-by: remzi <13716567376yh@gmail.com>
* remove log2
Signed-off-by: remzi <13716567376yh@gmail.com>
* return as u8
Signed-off-by: remzi <13716567376yh@gmail.com>
* refine more functions
Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs
Signed-off-by: remzi <13716567376yh@gmail.com>
* rm auto file
Signed-off-by: remzi <13716567376yh@gmail.com>
Andy Grove [Sat, 18 Jun 2022 14:43:12 +0000 (08:43 -0600)]
Add validation to `RecordBatch` for non-nullable fields containing null values (#1890)
Jörn Horstmann [Sat, 18 Jun 2022 05:55:15 +0000 (07:55 +0200)]
Use bit_slice in combine_option_bitmap (#1900)
Liang-Chi Hsieh [Sat, 18 Jun 2022 05:49:48 +0000 (22:49 -0700)]
Correct nullable in read_dictionary (#1893)
* Correct nullable
* For review
Dalton Modlin [Fri, 17 Jun 2022 19:51:56 +0000 (14:51 -0500)]
Split up arrow::array::builder module (#1843) (#1879)
* Split up arrow::array::builder module (#1843)
* Split up arrow::array::builder module (#1843)
- Removed old builder.rs
- Added missing licensing header to builder submodules
- Updated builder submodule imports and exports
- Updated array mod file builder imports and exports
Martin Grigorov [Fri, 17 Jun 2022 18:25:18 +0000 (21:25 +0300)]
Closes #1902: Print the original and projected RecordBatch in dynamic_types example (#1903)
* Closes #1902: Print the original and projected RecordBatch in dynamic_types example
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
* Issue #1902: Fix formatting issue
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Andrew Lamb [Fri, 17 Jun 2022 16:15:45 +0000 (12:15 -0400)]
Minor: Add examples to docstring for `weekday` (#1894)
Remco Verhoef [Fri, 17 Jun 2022 10:52:16 +0000 (12:52 +0200)]
Feature add weekday temporal kernel (#1891)
* add weekday temporal kernel
* add test
* fmt
* fix test
* use correct weekday numbers
Remzi Yang [Fri, 17 Jun 2022 10:46:42 +0000 (18:46 +0800)]
Clean up the test code of `substring` kernel. (#1853)
* clean up
Signed-off-by: remzi <13716567376yh@gmail.com>
* clean up fixed binary
Signed-off-by: remzi <13716567376yh@gmail.com>
* trigger GitHub actions
* directly panic
Signed-off-by: remzi <13716567376yh@gmail.com>
* add docs for helper macros
Signed-off-by: remzi <13716567376yh@gmail.com>
Raphael Taylor-Davies [Thu, 16 Jun 2022 20:45:42 +0000 (21:45 +0100)]
Implement UnionArray FieldData using Type Erasure (#1842)
* Strongly typed UnionBuilder
* Update arrow/src/array/builder.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Martin Grigorov [Thu, 16 Jun 2022 20:19:55 +0000 (23:19 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in parquet (#1881)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Martin Grigorov [Thu, 16 Jun 2022 20:19:45 +0000 (23:19 +0300)]
Issue #1876: Explicitly declare the used features for each dependency in arrow-flight (#1880)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Andrey Frolov [Thu, 16 Jun 2022 20:18:26 +0000 (23:18 +0300)]
rename function (#1889)
Jörn Horstmann [Thu, 16 Jun 2022 20:07:36 +0000 (22:07 +0200)]
Support specifying list capacities for MutableArrayData (#1885)
Raphael Taylor-Davies [Thu, 16 Jun 2022 20:07:04 +0000 (21:07 +0100)]
Expose `BitSliceIterator` and `BitIndexIterator` (#1864) (#1865)
* Expose BitSliceIterator and BitIndexIterator (#1864)
* Update arrow/src/compute/kernels/filter.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Format
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Will Jones [Thu, 16 Jun 2022 19:31:27 +0000 (21:31 +0200)]
docs: remove experimental marker on C Stream Interface (#1821)
Jörn Horstmann [Thu, 16 Jun 2022 19:09:36 +0000 (21:09 +0200)]
Do not print exit code from miri, instead it should be the return value of the script (#1873)
Liang-Chi Hsieh [Thu, 16 Jun 2022 06:50:39 +0000 (23:50 -0700)]
Add Decimal128 API and use it in DecimalArray and DecimalBuilder (#1871)
* Add Decimal128
* Fix clippy
* Add code comment
Raphael Taylor-Davies [Wed, 15 Jun 2022 13:18:45 +0000 (14:18 +0100)]
Mark typed buffer APIs safe (#996) (#1027) (#1866)
* Mark typed buffer APIs safe (#996) (#1027)
* Fix parquet
* Format
* Review feedback
Raphael Taylor-Davies [Wed, 15 Jun 2022 13:18:28 +0000 (14:18 +0100)]
Add vec-inspired APIs to BufferBuilder (#1850) (#1860)
Alex Qyoun-ae [Wed, 15 Jun 2022 13:13:49 +0000 (17:13 +0400)]
Add `nilike` support in `comparison` (#1846)
Martin Grigorov [Wed, 15 Jun 2022 09:06:16 +0000 (12:06 +0300)]
Issue #1876 - Explicitly declare the used features for each dependency (#1877)
* Issue #1876 - Explicitly declare the used features for each dependency
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
* Issue #1876 - Enable "std" and "std_rng" features for rand in dev-dependencies
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Martin Grigorov [Tue, 14 Jun 2022 19:37:54 +0000 (22:37 +0300)]
Fixes #1874 - Upgrade `regex` dependency to 1.5.6 (#1875)
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Liang-Chi Hsieh [Tue, 14 Jun 2022 18:08:28 +0000 (11:08 -0700)]
Fix memory leak in ffi test (#1878)
* Fix leaks
* Fix leaks under array::ffi::tests
Liang-Chi Hsieh [Mon, 13 Jun 2022 22:29:46 +0000 (15:29 -0700)]
Add PyArrow integration test for C Stream Interface (#1848)
* Add PyArrow integration test for ArrowArrayStream
* Trigger Build
Raphael Taylor-Davies [Mon, 13 Jun 2022 21:29:01 +0000 (22:29 +0100)]
Update vendored protobuf (#1869)
Jörn Horstmann [Mon, 13 Jun 2022 21:18:02 +0000 (23:18 +0200)]
Exclude some long-running tests when running under miri (#1863)
* Exclude some long-running tests when running under miri
* Print exit code of miri
* Disable miri stacked borrows checking because it uses too much memory for github actions
* Exclude another slow test from running under miri
* Add link to miri issue
Raphael Taylor-Davies [Mon, 13 Jun 2022 19:02:20 +0000 (20:02 +0100)]
Pin clap to 3.1 (#1867) (#1868)
Jörn Horstmann [Mon, 13 Jun 2022 18:32:55 +0000 (20:32 +0200)]
Omit validity buffer in PrimitiveArray::from_iter when all values are valid (#1859)
* Omit validity buffer in PrimitiveArray::from_iter when all values are valid
* Use bool::then instead of if
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Raphael Taylor-Davies [Mon, 13 Jun 2022 12:36:42 +0000 (13:36 +0100)]
Zero copy page decoding from bytes (#1810)
Remzi Yang [Mon, 13 Jun 2022 11:09:51 +0000 (19:09 +0800)]
Add two `from` methods for `FixedSizeBinaryArray` (#1854)
* add from
Signed-off-by: remzi <13716567376yh@gmail.com>
* fmt
Signed-off-by: remzi <13716567376yh@gmail.com>
* add tests
Signed-off-by: remzi <13716567376yh@gmail.com>
Jörn Horstmann [Sun, 12 Jun 2022 17:09:02 +0000 (19:09 +0200)]
Remove simd and avx512 bitwise kernels in favor of autovectorization (#1830)
* Remove simd and avx512 bitwise kernels since they are actually slightly slower than the autovectorized version
* Add notes about target-cpu to README
Raphael Taylor-Davies [Sat, 11 Jun 2022 17:52:05 +0000 (18:52 +0100)]
Refactor parquet::arrow module (#1827)
* Refactor parquet::arrow module
* Fix doc
* Remove legacy benchmarks
Remzi Yang [Sat, 11 Jun 2022 09:18:04 +0000 (17:18 +0800)]
speed up `substring_by_char` by about 2.5x (#1832)
* speed up substring_by_char
Signed-off-by: remzi <13716567376yh@gmail.com>
* better estimate the length of value buffer
Signed-off-by: remzi <13716567376yh@gmail.com>
Alex Qyoun-ae [Sat, 11 Jun 2022 07:14:50 +0000 (11:14 +0400)]
Add `quarter` support in `temporal` (#1836)
Liang-Chi Hsieh [Sat, 11 Jun 2022 05:35:24 +0000 (22:35 -0700)]
MINOR: Remove version check from `test_command_help` (#1844)
* Remove version
* Update parquet/src/bin/parquet-fromcsv.rs
Andrew Lamb [Fri, 10 Jun 2022 17:27:03 +0000 (13:27 -0400)]
Update versions and CHANGELOG for `16.0.0` (#1826)
* Update versions to 16.0.0
* Update changelog
* Updates
* updates
* Update for latest
* polish
Raphael Taylor-Davies [Fri, 10 Jun 2022 17:08:21 +0000 (18:08 +0100)]
Update module docs (#1840)
Remco Verhoef [Fri, 10 Jun 2022 16:58:29 +0000 (18:58 +0200)]
Make equals_datatype method public, enabling other modules (#1838)
* we'd like to use this function in datafusion
Raphael Taylor-Davies [Fri, 10 Jun 2022 10:54:40 +0000 (11:54 +0100)]
Update safety disclaimer (#1837)
kazuhiko kikuchi [Fri, 10 Jun 2022 07:50:55 +0000 (16:50 +0900)]
add parquet-fromcsv (#1) (#1798)
* add parquet-fromcsv (#1)
add command line tool for convert csv to parquet.
* add `text` for non-rust documentation text
* Update parquet/src/bin/parquet-fromcsv.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update parquet/src/bin/parquet-fromcsv.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* automate update help text
* remove anyhow
* add rat_exclude_files
* update test_command_help
* fix clippy warnings
* add writer-version, max-row-group-size arg
* fix cargo fmt lint
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Yang Jiang [Fri, 10 Jun 2022 06:51:12 +0000 (14:51 +0800)]
fix annotation (#1831)
Raphael Taylor-Davies [Thu, 9 Jun 2022 20:53:25 +0000 (21:53 +0100)]
Change to use `resolver v2`, test more feature flag combinations in CI, fix errors (#1630) (#1822)
* Test more feature flag combinations in CI (#1630)
* Clippy lints
* Fix clippy fix
* Fix running examples from workspace root
* Format
* Fix arrow benchmark features
* Split up CI yaml
* Add docs
* Rework caching
* Use lockfile for cache key
Don't install unused components
Raphael Taylor-Davies [Thu, 9 Jun 2022 19:58:27 +0000 (20:58 +0100)]
Update MIRI pin (#1828)
* Update MIRI
* Unpin MIRI
Liang-Chi Hsieh [Thu, 9 Jun 2022 16:17:24 +0000 (09:17 -0700)]
Fix list equal for empty offset list array (#1818)
* Fix list equal for empty offset list array
* For review
Raphael Taylor-Davies [Thu, 9 Jun 2022 07:47:41 +0000 (08:47 +0100)]
Add ScalarBuffer abstraction (#1811) (#1820)
* Add ScalarBuffer abstraction (#1811)
* Lint fixes
Raphael Taylor-Davies [Wed, 8 Jun 2022 16:11:00 +0000 (17:11 +0100)]
Seal ArrowNativeType and OffsetSizeTrait (#1028) (#1819)
* Seal ArrowNativeType (#1028)
* Add docs
Raphael Taylor-Davies [Wed, 8 Jun 2022 07:27:34 +0000 (08:27 +0100)]
Don't overwrite existing data on snappy decompress (#1806) (#1807)
* Don't trample existing data on snappy decompress (#1806)
* Review feedback
Raphael Taylor-Davies [Tue, 7 Jun 2022 22:06:44 +0000 (23:06 +0100)]
Fix Decimal and List ArrayData Validation (#1813) (#1814) (#1816)
* Fix DecimalArray validation (#1813)
Fix offset validation for sliced children of list arrays (#1814)
* Update arrow/src/array/data.rs
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Raphael Taylor-Davies [Tue, 7 Jun 2022 21:25:02 +0000 (22:25 +0100)]
Add public API for decoding parquet footer (#1804)
* Add public API for decoding parquet footer
* Review feedback
Raphael Taylor-Davies [Tue, 7 Jun 2022 10:19:22 +0000 (11:19 +0100)]
Add AsyncFileReader trait (#1803)
* Add AsyncChunkReader trait
* Review feedback
* Rename to AsyncFileReader
Remzi Yang [Tue, 7 Jun 2022 08:07:02 +0000 (16:07 +0800)]
rename to substring_kernels.rs (#1805)
Signed-off-by: remzi <13716567376yh@gmail.com>
Liang-Chi Hsieh [Mon, 6 Jun 2022 16:59:53 +0000 (09:59 -0700)]
Use IPC row count info in IPC reader (#1796)
* Use IPC row count info
* Add test
* Update arrow/src/ipc/reader.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Liang-Chi Hsieh [Mon, 6 Jun 2022 16:03:14 +0000 (09:03 -0700)]
Write validity buffer for UnionArray in V4 IPC message (#1794)
* Write validity buffer for Union Array in V4 IPC message
* Add test
* Fix clippy
* Fix clippy
Yang Jiang [Mon, 6 Jun 2022 14:33:25 +0000 (22:33 +0800)]
feat:Add function for row alignment with page mask (#1791)
* add range and rowRanges
* add some tests for range
* add filter logic
* fix fmt
* fix todo
* fix test
* Apply suggestions from code review
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
* fix compute_row_ranges
* fix annotation
* change to use std:ops:RangeInclusive
* Apply suggestions from code review
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* fix
* Update parquet/src/file/page_index/range.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Marc Garcia [Mon, 6 Jun 2022 13:33:31 +0000 (14:33 +0100)]
Fix typos in the Memory and Buffers section of the docs home (#1795)
Remzi Yang [Mon, 6 Jun 2022 13:28:14 +0000 (21:28 +0800)]
Add `Substring_by_char` (#1784)
* add substring by char
Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs
Signed-off-by: remzi <13716567376yh@gmail.com>
* update docs
Signed-off-by: remzi <13716567376yh@gmail.com>
* add benchmark
Signed-off-by: remzi <13716567376yh@gmail.com>
* Update arrow/src/compute/kernels/substring.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* Update arrow/src/compute/kernels/substring.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Raphael Taylor-Davies [Mon, 6 Jun 2022 12:57:55 +0000 (13:57 +0100)]
Access metadata of flushed row groups on write (#1691) (#1774)
* Access metadata of flushed row groups on write (#1691)
* Add tests
Ismail-Maj [Mon, 6 Jun 2022 12:52:02 +0000 (14:52 +0200)]
Arbitrary size combine option bitmap (#1781)
* arbitrary size combine_option_bitmap and tests
* more tests and error
* format
* more tests
* clone and reduce
* nit
Liang-Chi Hsieh [Sun, 5 Jun 2022 09:00:44 +0000 (02:00 -0700)]
Read and skip validity buffer of UnionType Array for V4 ipc message (#1789)
* Read valididy buffer for V4 ipc message
* Add unit test
* Fix clippy
Raphael Taylor-Davies [Fri, 3 Jun 2022 23:32:54 +0000 (00:32 +0100)]
Add `ParquetFileArrowReader::try_new` (#1782)
* Add ParquetFileArrowReader::try_new
* Review feedback
Raphael Taylor-Davies [Fri, 3 Jun 2022 14:37:22 +0000 (15:37 +0100)]
Implement ChunkReader for Bytes (#1775)
Deprecate SliceableCursor