iceberg.git
7 months agoAdd version.txt for release 0.12.1 apache-iceberg-0.12.1
Ryan Blue [Mon, 1 Nov 2021 21:27:07 +0000 (14:27 -0700)] 
Add version.txt for release 0.12.1

7 months agoSpark 3.2: Remove extra parens to fix checkstyle (#3386) 0.12.x
Kyle Bendickson [Sun, 31 Oct 2021 18:56:56 +0000 (11:56 -0700)] 
Spark 3.2: Remove extra parens to fix checkstyle (#3386)

7 months agoORC: Fix importing ORC files with float and double columns and test (#3332)
Kyle Bendickson [Fri, 29 Oct 2021 19:02:24 +0000 (12:02 -0700)] 
ORC: Fix importing ORC files with float and double columns and test (#3332)

7 months agoSpark: Fix ClassCastException when using bucket UDF (#3368)
Chen Zhang [Tue, 26 Oct 2021 23:46:09 +0000 (07:46 +0800)] 
Spark: Fix ClassCastException when using bucket UDF (#3368)

7 months agoHive: Fix Catalogs.hiveCatalog method for default catalogs (#3338)
pvary [Tue, 26 Oct 2021 22:21:43 +0000 (00:21 +0200)] 
Hive: Fix Catalogs.hiveCatalog method for default catalogs (#3338)

7 months agoBuild: Fix ErrorProne NewHashMapInt warnings (#3260)
Kyle Bendickson [Sun, 10 Oct 2021 17:53:21 +0000 (10:53 -0700)] 
Build: Fix ErrorProne NewHashMapInt warnings (#3260)

This should improve performance by allocating the correct size directly rather than reallocating later.

7 months agoAvro: Fix file import with correct row count (#3273)
Szehon Ho [Wed, 20 Oct 2021 15:09:10 +0000 (08:09 -0700)] 
Avro: Fix file import with correct row count (#3273)

7 months agoCore: Validate concurrently added delete files in OvewriteFiles (#3199)
Anton Okolnychyi [Fri, 1 Oct 2021 18:48:43 +0000 (11:48 -0700)] 
Core: Validate concurrently added delete files in OvewriteFiles (#3199)

7 months agoCore: Validate concurrently added delete files in RowDelta (#3195)
Anton Okolnychyi [Tue, 28 Sep 2021 19:34:39 +0000 (12:34 -0700)] 
Core: Validate concurrently added delete files in RowDelta (#3195)

7 months agoHive: Ensure tableLevelMutex is unlocked when uncommitted metadata delete fails ...
jshmchenxi [Mon, 11 Oct 2021 10:27:50 +0000 (18:27 +0800)] 
Hive: Ensure tableLevelMutex is unlocked when uncommitted metadata delete fails (#3264)

7 months agoCore: Optimize check for referenced data files in BaseRowDelta (#3071)
Anton Okolnychyi [Mon, 13 Sep 2021 21:25:49 +0000 (11:25 -1000)] 
Core: Optimize check for referenced data files in BaseRowDelta (#3071)

This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.

7 months agoParquet: Fix map projection after map to key_value rename (#3309)
Ryan Blue [Tue, 19 Oct 2021 15:05:48 +0000 (08:05 -0700)] 
Parquet: Fix map projection after map to key_value rename (#3309)

7 months agoHotfix: Fix Flink test imports. (#3319)
Ryan Blue [Tue, 19 Oct 2021 18:34:11 +0000 (11:34 -0700)] 
Hotfix: Fix Flink test imports. (#3319)

7 months agoFlink: Fix CDC validation errors (#3258)
Ryan Blue [Tue, 19 Oct 2021 11:56:26 +0000 (04:56 -0700)] 
Flink: Fix CDC validation errors (#3258)

7 months agoHive: Fix NoSuchMethodError of OrcTail with Hive3.x and Vectorized ORC (#3155)
Omar Al-Safi [Tue, 12 Oct 2021 09:10:41 +0000 (11:10 +0200)] 
Hive: Fix NoSuchMethodError of OrcTail with Hive3.x and Vectorized ORC (#3155)

7 months agoAWS: Add check to create staging directory if not exists for S3OutputStream (#3175)
Rajarshi Sarkar [Tue, 28 Sep 2021 19:46:24 +0000 (01:16 +0530)] 
AWS: Add check to create staging directory if not exists for S3OutputStream (#3175)

7 months agoData: Fix equality deletes with date/time types (#3135)
xloya [Tue, 28 Sep 2021 18:05:01 +0000 (02:05 +0800)] 
Data: Fix equality deletes with date/time types (#3135)

Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>
7 months agoCore: Fix JDBC properties, only keep keys with jdbc. prefix (#3078)
Đặng Minh Dũng [Sun, 26 Sep 2021 22:55:37 +0000 (05:55 +0700)] 
Core: Fix JDBC properties, only keep keys with jdbc. prefix (#3078)

7 months agoAWS: Fix DynamoDbCatalog.dropNamespace attr check (#3035)
Bijan Houle [Mon, 13 Sep 2021 15:38:16 +0000 (09:38 -0600)] 
AWS: Fix DynamoDbCatalog.dropNamespace attr check (#3035)

7 months agoCore: Fix null value check for table properties (#3052)
Aman Rawat [Mon, 30 Aug 2021 17:51:47 +0000 (23:21 +0530)] 
Core: Fix null value check for table properties (#3052)

Co-authored-by: rawataaryan9 <rawataaryan9@github.com>
10 months agoAdd version.txt for release 0.12.0 apache-iceberg-0.12.0
Carl Steinbach [Mon, 9 Aug 2021 23:21:23 +0000 (16:21 -0700)] 
Add version.txt for release 0.12.0

10 months agoCore: Add predicate pushdown for files metadata table (#2926) release-base-0.12.0
Szehon Ho [Mon, 9 Aug 2021 22:58:51 +0000 (15:58 -0700)] 
Core: Add predicate pushdown for files metadata table (#2926)

10 months agoSpark: Fix random test failures in TestDeleteReachableFilesAction (#2951)
Szehon Ho [Mon, 9 Aug 2021 21:55:41 +0000 (14:55 -0700)] 
Spark: Fix random test failures in TestDeleteReachableFilesAction (#2951)

10 months agoAPI: Validate identifier fields in Schema (#2943)
ismail simsek [Mon, 9 Aug 2021 20:58:25 +0000 (22:58 +0200)] 
API: Validate identifier fields in Schema (#2943)

10 months agoSpark: Fix broken RepartitionByExpression in RewriteDelete for 3.1 (#2954)
Wing Yew Poon [Mon, 9 Aug 2021 20:55:05 +0000 (13:55 -0700)] 
Spark: Fix broken RepartitionByExpression in RewriteDelete for 3.1 (#2954)

The constructor of RepartitionByExpression changed between Spark 3.0 and 3.1.
There was an instance of constructing RepartitionByExpression that was missed in the original commit (#2512).

10 months agoAWS: Fix concurrent modification integration test (#2948)
Jack Ye [Mon, 9 Aug 2021 19:58:27 +0000 (12:58 -0700)] 
AWS: Fix concurrent modification integration test (#2948)

10 months agoDoc: update README local build instructions (#2949)
Jack Ye [Fri, 6 Aug 2021 22:43:31 +0000 (15:43 -0700)] 
Doc: update README local build instructions (#2949)

10 months agoDocs: Add Hadoop conf overrides in Spark (#2922)
Kyle Bendickson [Fri, 6 Aug 2021 21:14:26 +0000 (14:14 -0700)] 
Docs: Add Hadoop conf overrides in Spark (#2922)

10 months agoFlink: Add uidPrefix to operator name so that Flink web UI can show names for differe...
Steven Zhen Wu [Fri, 6 Aug 2021 14:53:40 +0000 (07:53 -0700)] 
Flink: Add uidPrefix to operator name so that Flink web UI can show names for different iceberg sinks in a job (#2886)

10 months agoCore: Fix nested schema projection in AllDataFilesTable (#2941)
Russell Spitzer [Thu, 5 Aug 2021 17:52:11 +0000 (12:52 -0500)] 
Core: Fix nested schema projection in AllDataFilesTable (#2941)

10 months agoSpark: Fix nested struct pruning (#2877)
Russell Spitzer [Thu, 5 Aug 2021 12:23:07 +0000 (07:23 -0500)] 
Spark: Fix nested struct pruning (#2877)

* Spark: Support Nested Struct Pruning in DataTasks

Previously DataTasks would return full schemas for some tables and pruned schemas for others and would rely on the underlying framework to do the actual projection. This moves projection and pruning into the core responsibility of the task. This fixes an issue where Spark would be able to pushdown some nested struct predicates to a metadata table but we wouldn't recognize this when trying to do the projection in the framework. StaticDataTasks now support projection in their creation but only if it does not require pruning fields from within a struct which is an element of a List or Map.

10 months agoAdd list of Github collaborators to asf.yaml (#2909)
Carl Steinbach [Thu, 5 Aug 2021 02:51:33 +0000 (19:51 -0700)] 
Add list of Github collaborators to asf.yaml (#2909)

10 months agoFlink: Add FlinkWriterFactory (#2924)
Anton Okolnychyi [Wed, 4 Aug 2021 14:35:22 +0000 (04:35 -1000)] 
Flink: Add FlinkWriterFactory (#2924)

10 months agoAdd thenewstack.io article to list of blog posts (#2930)
Carl Steinbach [Wed, 4 Aug 2021 01:21:53 +0000 (18:21 -0700)] 
Add thenewstack.io article to list of blog posts (#2930)

10 months agoCore: Allow creating v2 tables through table property (#2887)
Jack Ye [Tue, 3 Aug 2021 22:27:58 +0000 (15:27 -0700)] 
Core: Allow creating v2 tables through table property (#2887)

10 months agoCore: Support nulls in StructLike collections (#2929)
Anton Okolnychyi [Tue, 3 Aug 2021 19:47:39 +0000 (09:47 -1000)] 
Core: Support nulls in StructLike collections (#2929)

10 months agoFlink: Switch to using SerializableTable (#2923)
Anton Okolnychyi [Tue, 3 Aug 2021 17:29:44 +0000 (07:29 -1000)] 
Flink: Switch to using SerializableTable (#2923)

10 months agoParquet: Annotate UUID fields (#2913)
Piotr Findeisen [Tue, 3 Aug 2021 16:21:16 +0000 (18:21 +0200)] 
Parquet: Annotate UUID fields (#2913)

The spec mandates that UUID fields in Parquet have logical type "UUID"
(https://iceberg.apache.org/spec/#parquet). This is possible to fulfill
after 236615497bdc2c6fbedbd3acc41a4ed85c4a8bfd, as
`LogicalTypeAnnotation.uuidType` was added in Parquet 1.12.0.

10 months agoCore: Fix partition field IDs in table replacement (#2906)
Ryan Blue [Tue, 3 Aug 2021 14:39:58 +0000 (07:39 -0700)] 
Core: Fix partition field IDs in table replacement (#2906)

Co-authored-by: Jun He <jun-he@users.noreply.github.com>
10 months agoDocs: Update Slack invite link (#2904)
Ryan Blue [Sun, 1 Aug 2021 23:23:03 +0000 (16:23 -0700)] 
Docs: Update Slack invite link (#2904)

* Docs: Update Slack invite link.

* Update the intro paragraph as well.

10 months agoBuild: Fix site publishing in .asf.yaml.
Ryan Blue [Sun, 1 Aug 2021 18:28:03 +0000 (11:28 -0700)] 
Build: Fix site publishing in .asf.yaml.

10 months agoCore: Add WriterFactory (#2873)
Anton Okolnychyi [Sat, 31 Jul 2021 07:18:17 +0000 (21:18 -1000)] 
Core: Add WriterFactory (#2873)

10 months agoApi#2880: Close the underlying iterator in ClosingIterator in hasNext() call (#2881)
Saurabh Agarwal [Fri, 30 Jul 2021 11:48:22 +0000 (17:18 +0530)] 
Api#2880: Close the underlying iterator in ClosingIterator in hasNext() call (#2881)

11 months agoCore: Add includeColumnStats option in FindFiles API (#2875)
Flyangz [Thu, 29 Jul 2021 10:40:05 +0000 (18:40 +0800)] 
Core:  Add includeColumnStats option in FindFiles API (#2875)

11 months agoCore: Fix the NPE in DataFiles.Builder#copy (#2852)
openinx [Thu, 29 Jul 2021 07:35:07 +0000 (15:35 +0800)] 
Core: Fix the NPE in DataFiles.Builder#copy (#2852)

11 months ago[python] Updating pyarrow dependencies (#2888)
Ted Gooch [Wed, 28 Jul 2021 23:50:51 +0000 (16:50 -0700)] 
[python] Updating pyarrow dependencies (#2888)

Co-authored-by: tgooch <tgooch@netflix.com>
11 months ago[Python] support BucketByteBuffer and BucketUUID (#2836)
jun-he [Wed, 28 Jul 2021 19:34:22 +0000 (12:34 -0700)] 
[Python] support BucketByteBuffer and BucketUUID (#2836)

* [Python] support BucketByteBuffer and BucketUUID

* Add additional unit tests for bucket hash methods.

11 months agoDocs: Update Slack invite link (#2882)
Eduard Tudenhöfner [Wed, 28 Jul 2021 15:20:39 +0000 (17:20 +0200)] 
Docs: Update Slack invite link (#2882)

11 months agoDocs: Avoid insinuating other file format is supported (#2883)
Piotr Findeisen [Wed, 28 Jul 2021 11:42:29 +0000 (13:42 +0200)] 
Docs: Avoid insinuating other file format is supported (#2883)

11 months agoDocs: Add Tencent blog - Flink + Iceberg: How to Construct a Whole-scenario Real...
Daniel Weeks [Wed, 28 Jul 2021 01:26:25 +0000 (18:26 -0700)] 
Docs: Add Tencent blog - Flink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse (#2876)

11 months agoCore: Add validation for row-level deletes with rewrites (#2865)
Ryan Blue [Tue, 27 Jul 2021 15:59:53 +0000 (08:59 -0700)] 
Core: Add validation for row-level deletes with rewrites (#2865)

11 months agoBuild: Run tests against Spark 3.0 and Spark 3.1
Russell Spitzer [Tue, 27 Jul 2021 03:11:46 +0000 (22:11 -0500)] 
Build: Run tests against Spark 3.0 and Spark 3.1

Due to an issue with the build.gradle both the spark test and test31 modules were both running with Spark 3.1. Removing the inter dependency fixes the issue and both grade tasks now run with the correct respective spark versions.

11 months agoSpec: Make contains_nan partition summary field optional in v2. (#2864)
Ryan Blue [Tue, 27 Jul 2021 00:11:01 +0000 (17:11 -0700)] 
Spec: Make contains_nan partition summary field optional in v2. (#2864)

11 months agoSpec: Add back distinct_counts in data_file metadata (#2805)
Ryan Blue [Tue, 27 Jul 2021 00:10:44 +0000 (17:10 -0700)] 
Spec: Add back distinct_counts in data_file metadata (#2805)

* Spec: Add back distinct_counts in data_file metadata.

* Update for review comments.

11 months agoDocs: Add security page to ASF site (#2813)
Carl Steinbach [Mon, 26 Jul 2021 22:30:09 +0000 (15:30 -0700)] 
Docs: Add security page to ASF site (#2813)

This patch adds a page to the site docs which describes the process for
reporting security vulnerabilites found in Apache Iceberg.

11 months agoCore: Support multiple specs in OutputFileFactory (#2858)
Anton Okolnychyi [Mon, 26 Jul 2021 21:57:38 +0000 (11:57 -1000)] 
Core: Support multiple specs in OutputFileFactory (#2858)

11 months agoDocs: Add new blog - Apache Iceberg: An Architectural Look Under the Covers (#2856)
Dave Nielsen [Mon, 26 Jul 2021 02:26:35 +0000 (19:26 -0700)] 
Docs: Add new blog - Apache Iceberg: An Architectural Look Under the Covers (#2856)

11 months agoCore: Add DataWriter builders (#2857)
Anton Okolnychyi [Sat, 24 Jul 2021 01:03:46 +0000 (15:03 -1000)] 
Core: Add DataWriter builders (#2857)

11 months agoCore: Add table properties for Avro and Parquet delete files (#2851)
Anton Okolnychyi [Fri, 23 Jul 2021 21:28:02 +0000 (11:28 -1000)] 
Core: Add table properties for Avro and Parquet delete files (#2851)

11 months agoAPI: Fix string bucketing with non-BMP characters (#2849)
Piotr Findeisen [Thu, 22 Jul 2021 19:24:52 +0000 (21:24 +0200)] 
API: Fix string bucketing with non-BMP characters (#2849)

11 months agoCore: Adds SortRewriteStrategy (#2609)
Russell Spitzer [Wed, 21 Jul 2021 13:47:00 +0000 (08:47 -0500)] 
Core: Adds SortRewriteStrategy (#2609)

A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out
in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered
by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60),
this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40),
File C' (x: 41 - 60).

11 months agoDoc: add documentation for JDBC and DynamoDB catalogs (#2831)
Jack Ye [Wed, 21 Jul 2021 01:56:57 +0000 (18:56 -0700)] 
Doc: add documentation for JDBC and DynamoDB catalogs (#2831)

11 months agoSpec: Fix missing negative in binary/fixed hash examples (#2840)
Piotr Findeisen [Tue, 20 Jul 2021 21:00:34 +0000 (23:00 +0200)] 
Spec: Fix missing negative in binary/fixed hash examples (#2840)

11 months agoNessie: Bump Nessie to 0.8.3 / Rename auth_type to auth-type (#2834)
Eduard Tudenhöfner [Tue, 20 Jul 2021 08:32:40 +0000 (10:32 +0200)] 
Nessie: Bump Nessie to 0.8.3 / Rename auth_type to auth-type (#2834)

11 months agoSPARK: Allow spark catalogs to have hadoop configuration overrides p… (#2792)
Kyle Bendickson [Mon, 19 Jul 2021 22:09:03 +0000 (15:09 -0700)] 
SPARK: Allow spark catalogs to have hadoop configuration overrides p… (#2792)

Previously Iceberg Catalogs loaded into Spark would always use the Hadoop Configuration owned by the underlying Spark Session. This made it impossible to use a different set of configuration values which may be required to connect to a remote Catalog. This patch allows Spark catalogs to have hadoop configuration overrides per catalog permitting different configuration for different underlying Iceberg catalogs.

11 months agoMove Assert.assertTrue(..) instance checks to AssertJ assertions (#2756)
Eduard Tudenhöfner [Mon, 19 Jul 2021 10:57:09 +0000 (12:57 +0200)] 
Move Assert.assertTrue(..) instance checks to AssertJ assertions (#2756)

11 months agoORC: Upgrade ORC dependency to 1.6.9 (#2781)
Kyle Bendickson [Fri, 16 Jul 2021 15:34:38 +0000 (08:34 -0700)] 
ORC: Upgrade ORC dependency to 1.6.9 (#2781)

11 months agoBump Nessie to 0.8.2 + related changes (#2588)
Robert Stupp [Thu, 15 Jul 2021 22:21:27 +0000 (00:21 +0200)] 
Bump Nessie to 0.8.2 + related changes (#2588)

* Bump Nessie to 0.8.2 + replace Gradle plugin with new JUnit extension

More changes in this PR in following commits.

Replace Gradle plugin with new JUnit extension.
See [Add JAX-RS tests and add JUnit/Jupyter extension](https://github.com/projectnessie/nessie/pull/1566)

* Changes required by Nessie-API changes

Apply changes to Iceberg required by API changes in Nessie:
* [Re-introduce wrapper classes for query params of CommitLog/Entries](https://github.com/projectnessie/nessie/pull/1595)
* [Server-side commit range filtering](https://github.com/projectnessie/nessie/pull/1596)
* [Add hashOnRef query param to support time travel on a named ref](https://github.com/projectnessie/nessie/pull/1589)
* [Only accept NamedRefs in REST API](https://github.com/projectnessie/nessie/pull/1583)

* Bugfix: must send the Contents.id of the existing table

Nessie's `Contents.id` is a random ID generated when the `Contents.Key` is first used (think:
CREATE TABLE) and must not be changed. This change addresses a bug in the Iceberg-Nesie code
that caused a new id for every change.

* Throw `CommitStateUnknownException` for `renameTable` as well

Follow-up of #2515

* Fix race-condition & save one roundtrip to Nessie during "commit"

When commiting a change, the Nessie-API now returns the hash of the commit for the change.
This returned hash should then be used as the "expected hash" for the next commit.

The previous approach was to commit the change to Nessie and then do another request to
retrieve the new hash of HEAD.

This old approach is prone to a race condition, namely when another commit happens after
"this" commit but before retrieving the "new HEAD", so "this" instance would wrongly
ignore the other commit's changes during conflict checks.

See [Let VersionStore.create()+commit() return the current hash](https://github.com/projectnessie/nessie/pull/1089)

11 months agoSpark: Remove unused FileRewriteCoordinator code (#2819)
Russell Spitzer [Thu, 15 Jul 2021 17:45:24 +0000 (12:45 -0500)] 
Spark: Remove unused FileRewriteCoordinator code (#2819)

Since we changed our implementation of Spark3BinPackStrategy, we no longer need some
of the functionality that was previously in FileRewriteCoordinator. Here we remove
those functions and related test code.

11 months agoAdd support for reading/writing timestamps without timezone. (#2757)
sshkvar [Thu, 15 Jul 2021 17:06:44 +0000 (20:06 +0300)] 
Add support for reading/writing timestamps without timezone.  (#2757)

Previously Spark could not handle Iceberg tables which contained Timestamp.withoutTimeZone. New parameters are introduced to allow Timestamp without TimeZone to be treated as Timestamp with Timezone.

Co-authored-by: bkahloon <kahlonbakht@gmail.com>
Co-authored-by: shardulm94
11 months agoCore: Use Avro 1.10.1 (#1648)
Fokko Driesprong [Tue, 13 Jul 2021 15:59:22 +0000 (17:59 +0200)] 
Core: Use Avro 1.10.1 (#1648)

Co-authored-by: Fokko Driesprong <fdriesprong@ebay.com>
11 months agoSpark : Add Files Perf improvement by push down partition filter to Spark/Hive catalo...
Szehon Ho [Tue, 13 Jul 2021 14:58:04 +0000 (07:58 -0700)] 
Spark : Add Files Perf improvement by push down partition filter to Spark/Hive catalog (#2777)

Pushes down partition filters in Spark/Hive Import to underlying catalog instead of retrieving all partitions and then filtering.

11 months agoDocs: Fix link to intellij-java-palantir-style.xml (#2817)
Eduard Tudenhöfner [Tue, 13 Jul 2021 13:40:28 +0000 (15:40 +0200)] 
Docs: Fix link to intellij-java-palantir-style.xml (#2817)

11 months agoSpark: Add missing deprecation annotations for old actions (#2811)
Anton Okolnychyi [Tue, 13 Jul 2021 05:49:37 +0000 (19:49 -1000)] 
Spark: Add missing deprecation annotations for old actions (#2811)

11 months agoSpark: Use JavaSparkContext.fromSparkContext instead of constructor (#2812)
Anton Okolnychyi [Tue, 13 Jul 2021 02:28:51 +0000 (16:28 -1000)] 
Spark: Use JavaSparkContext.fromSparkContext instead of constructor (#2812)

11 months agoAPI: Use delete instead of remove in action names (#2810)
Anton Okolnychyi [Tue, 13 Jul 2021 02:20:32 +0000 (16:20 -1000)] 
API: Use delete instead of remove in action names (#2810)

11 months agoSpark: Add table property to skip delete snapshots in streaming (#2752)
daksha121 [Tue, 13 Jul 2021 00:16:36 +0000 (17:16 -0700)] 
Spark: Add table property to skip delete snapshots in streaming (#2752)

11 months agoSpark: Parallelize task init when fetching locality info (#2800)
jshmchenxi [Mon, 12 Jul 2021 17:24:19 +0000 (01:24 +0800)] 
Spark: Parallelize task init when fetching locality info (#2800)

11 months agoUpgrade to Tez 0.10.1 (#2790)
Marton Bod [Mon, 12 Jul 2021 11:31:58 +0000 (13:31 +0200)] 
Upgrade to Tez 0.10.1 (#2790)

11 months agoReduce code duplication in VectorizedParquetDefinitionLevelReader
Eduard Tudenhoefner [Mon, 28 Jun 2021 10:27:03 +0000 (12:27 +0200)] 
Reduce code duplication in VectorizedParquetDefinitionLevelReader

11 months agoReduce code duplication in VectorizedPageIterator
Eduard Tudenhoefner [Mon, 28 Jun 2021 08:23:19 +0000 (10:23 +0200)] 
Reduce code duplication in VectorizedPageIterator

11 months agoReduce code duplication in VectorizedDictionaryEncodedParquetValuesReader
Eduard Tudenhoefner [Fri, 25 Jun 2021 16:48:11 +0000 (18:48 +0200)] 
Reduce code duplication in VectorizedDictionaryEncodedParquetValuesReader

11 months agoReduce code duplication in VectorizedColumnIterator
Eduard Tudenhoefner [Fri, 25 Jun 2021 15:58:09 +0000 (17:58 +0200)] 
Reduce code duplication in VectorizedColumnIterator

11 months agoRefactor VectorizedArrowReader
Eduard Tudenhoefner [Fri, 25 Jun 2021 14:52:51 +0000 (16:52 +0200)] 
Refactor VectorizedArrowReader

11 months agoDon't use deprecated methods
Eduard Tudenhoefner [Fri, 25 Jun 2021 14:43:28 +0000 (16:43 +0200)] 
Don't use deprecated methods

11 months agoSpark: Reimplement RewriteDatafilesAction with partial progress (#2591)
Russell Spitzer [Sun, 11 Jul 2021 23:50:36 +0000 (18:50 -0500)] 
Spark: Reimplement RewriteDatafilesAction with partial progress (#2591)

11 months agoBuild: Upgrade to JUnit 5 (#2797)
Eduard Tudenhöfner [Sat, 10 Jul 2021 23:15:27 +0000 (01:15 +0200)] 
Build: Upgrade to JUnit 5 (#2797)

11 months agoDocs: Fixes broken links to old spark doc page (#2801)
Russell Spitzer [Fri, 9 Jul 2021 18:50:17 +0000 (13:50 -0500)] 
Docs: Fixes broken links to old spark doc page (#2801)

11 months agoBuild: Change Spark Versions to Support M1 Processors (#2795)
Russell Spitzer [Fri, 9 Jul 2021 14:45:03 +0000 (09:45 -0500)] 
Build: Change Spark Versions to Support M1 Processors (#2795)

Spark's Snappy native lib support is missing M1 support in
our current build. Upgrading Spark upgrades Snappy to a version
which has these native libs. This has no effect on actual runtime
Spark for end users since we do not include Spark with our
release jars.

11 months agoCore: Fix JdbcCatalog CATALOG_TABLE_NAME to be lowercase (#2778)
Ward Harris [Thu, 8 Jul 2021 07:30:18 +0000 (15:30 +0800)] 
Core: Fix JdbcCatalog CATALOG_TABLE_NAME to be lowercase (#2778)

11 months agoBuild: bump up DiffPlug Spotless version (#2776)
Jack Ye [Tue, 6 Jul 2021 20:50:37 +0000 (13:50 -0700)] 
Build: bump up DiffPlug Spotless version (#2776)

11 months agoSpec: Update v2 change summary (#2762)
Ryan Blue [Tue, 6 Jul 2021 19:57:10 +0000 (12:57 -0700)] 
Spec: Update v2 change summary (#2762)

11 months agoStyle: Delete blank line of CachedClientPool.java (#2787)
southernriver [Tue, 6 Jul 2021 14:59:29 +0000 (22:59 +0800)] 
Style: Delete blank line  of CachedClientPool.java (#2787)

11 months agoNessie: Properly format code in Nessie module (#2733)
Eduard Tudenhöfner [Mon, 5 Jul 2021 17:59:22 +0000 (19:59 +0200)] 
Nessie: Properly format code in Nessie module (#2733)

11 months agoDocs: Describe available Benchmarks and how to run them (#2767)
Eduard Tudenhöfner [Fri, 2 Jul 2021 12:04:29 +0000 (14:04 +0200)] 
Docs: Describe available Benchmarks and how to run them (#2767)

11 months agoSpark: RemoveReachableFiles action should fail if GC is disabled (#2763)
Karuppayya [Fri, 2 Jul 2021 00:25:48 +0000 (17:25 -0700)] 
Spark: RemoveReachableFiles action should fail if GC is disabled (#2763)

Co-authored-by: Karuppayya Rajendran <karuppayya.rajendran@apple.com>
11 months agoDocs: Fix typo in flink.md (#2772)
Ada Wong [Fri, 2 Jul 2021 00:22:39 +0000 (08:22 +0800)] 
Docs: Fix typo in flink.md (#2772)

11 months agoDocs: Update for mkdocs 1.2 (#2747)
Ryan Blue [Thu, 1 Jul 2021 20:36:51 +0000 (13:36 -0700)] 
Docs: Update for mkdocs 1.2 (#2747)

* Docs: Fix mkdocs use_directory_urls in 1.2.

* Fix broken links and update redirects.

11 months agoSpark: Add limited support for vectorized reads for Parquet V2 (#2749)
Samarth Jain [Thu, 1 Jul 2021 17:00:48 +0000 (10:00 -0700)] 
Spark: Add limited support for vectorized reads for Parquet V2 (#2749)

With this change, we have added support for Parquet data written in V2 format.
The only data encodings we support are dictionary and plain.
Vectorized reads against data written using Delta/RLE and other encodings are
not supported. As of this commit, note that the Spark Parquet vectorized reads also don't
support vectorized reads for such encodings.

11 months agoDocs: Describe how to configure Code formatter for IntelliJ IDEA (#2766)
Eduard Tudenhöfner [Thu, 1 Jul 2021 16:38:36 +0000 (18:38 +0200)] 
Docs: Describe how to configure Code formatter for IntelliJ IDEA (#2766)