John Zhuge [Wed, 6 Jul 2022 16:52:59 +0000 (09:52 -0700)]
Spark: Update Antlr in Spark 3.2 extensions to 4.8 (#5208)
This will match the Antlr version in Spart 3.2.
Russell Spitzer [Wed, 1 Jun 2022 19:37:18 +0000 (14:37 -0500)]
Add version.txt for release 0.13.2
Russell Spitzer [Wed, 1 Jun 2022 18:48:42 +0000 (13:48 -0500)]
Dev: Fix Source Release Script (#4932)
Prashant Singh [Wed, 1 Jun 2022 14:37:52 +0000 (20:07 +0530)]
Spark: Extend commit unknown exception handling to SparkPositionDeltaWrite (#4893)
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Szehon Ho [Sun, 29 May 2022 20:39:19 +0000 (13:39 -0700)]
Core: Fix query failure when using projection on top of partitions metadata table (#4720) (#4890)
Russell Spitzer [Wed, 25 May 2022 23:54:21 +0000 (16:54 -0700)]
Spark: Fix Alignment of Merge Commands with Mixed Case (#4848) (#4874)
* Spark: Fix Alignment of Merge Commands with Mixed Case
Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignment rule was applied. This would then fail the
entire MERGE INTO command. The commands were correctly aligned but
our alignment check was always case sensitive.
Eduard Tudenhöfner [Tue, 24 May 2022 18:50:24 +0000 (20:50 +0200)]
Spark: Backport CommitStateUnknownException handling for RewriteManifestSparkAction (#4850) (#4854)
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Szehon Ho [Tue, 24 May 2022 14:58:39 +0000 (07:58 -0700)]
Core: Backport filter pushdown fix for metadata tables with evolved specs to 0.13 (#4520) (#4569)
Eduard Tudenhöfner [Tue, 24 May 2022 14:56:17 +0000 (16:56 +0200)]
Spark: Handle CommitStateUnknown exception in RewriteManifestSparkAction (#4836) (#4852)
Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com>
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Eduard Tudenhöfner [Mon, 23 May 2022 23:55:19 +0000 (01:55 +0200)]
Nessie: Fix NPE while accessing refreshed table in nessie catalog (#4509) (#4840)
Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
Eduard Tudenhöfner [Thu, 19 May 2022 20:20:47 +0000 (22:20 +0200)]
Flink: Backport upsert delete file metadata fixes to 0.13 (#4786)
Co-authored-by: Kyle Bendickson <kjbendickson@gmail.com>
Co-authored-by: liliwei <hililiwei@gmail.com>
Co-authored-by: wangzeyu <1249369293@qq.com>
Co-authored-by: openinx <openinx@gmail.com>
Eduard Tudenhöfner [Tue, 17 May 2022 16:49:25 +0000 (18:49 +0200)]
Spark: Update commit state unknown handling (backport to 0.13) (#4787)
Co-authored-by: Russell Spitzer <rspitzer@apple.com>
Eduard Tudenhöfner [Tue, 17 May 2022 16:46:41 +0000 (18:46 +0200)]
Core: Fix transaction retry logic (#4464) (#4783)
Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
Eduard Tudenhöfner [Tue, 17 May 2022 16:45:38 +0000 (18:45 +0200)]
Core: Fix delete file handling in upgraded tables with rewritten manifests (#4514) (#4782)
Co-authored-by: vanliu <vanliu@tencent.com>
Eduard Tudenhöfner [Tue, 17 May 2022 16:45:08 +0000 (18:45 +0200)]
Spark: Fix NPEs in Spark value converter (#4663) (#4781)
Co-authored-by: Edgar Rodriguez <edgar.rodriguez@airbnb.com>
Eduard Tudenhöfner [Tue, 17 May 2022 16:44:31 +0000 (18:44 +0200)]
Core: Fix table corruption from OOM during commit cleanup (#4673) (#4779)
Co-authored-by: Ryan Blue <blue@apache.org>
Kyle Bendickson [Thu, 12 May 2022 14:24:35 +0000 (07:24 -0700)]
Flink 1.12: Log a warning message when upsert is enabled (#4754)
Xianyang Liu [Mon, 18 Apr 2022 17:12:14 +0000 (01:12 +0800)]
Core: Fixes read metadata failed after dropped partition transform for V1 format (#3411) (#4572)
Wing Yew Poon [Fri, 18 Feb 2022 23:36:10 +0000 (15:36 -0800)]
Core: Remove accidentally added method to TableMetadata (#4155)
Rishi [Thu, 17 Feb 2022 18:03:43 +0000 (10:03 -0800)]
Core: Fix history timestamp for rollbacks (#4135)
Jack Ye [Thu, 10 Feb 2022 23:21:35 +0000 (15:21 -0800)]
Add version.txt for release 0.13.1
Peidian li [Fri, 4 Feb 2022 17:38:39 +0000 (01:38 +0800)]
Flink: Ensure temp manifest names are unique across tasks (#3986)
Cheng Pan [Wed, 2 Feb 2022 22:06:16 +0000 (06:06 +0800)]
Spark: Fix create table in Hadoop catalog root namespace (#4024)
Anton Okolnychyi [Tue, 1 Feb 2022 20:46:48 +0000 (12:46 -0800)]
Spark 3.2: Fix predicate pushdown in row-level operations (#4023)
Jack Ye [Fri, 28 Jan 2022 08:56:28 +0000 (00:56 -0800)]
Add version.txt for release 0.13.0
Anton Okolnychyi [Fri, 28 Jan 2022 01:30:04 +0000 (17:30 -0800)]
Spark 3.2: Fix cardinality check for alternative join implementations (#3992)
Ashish Singh [Fri, 28 Jan 2022 00:17:56 +0000 (16:17 -0800)]
Docs: Add s3.checksum-enabled to AWS (#3996)
liliwei [Fri, 28 Jan 2022 00:17:01 +0000 (08:17 +0800)]
Docs: Fix MapType example (#3993)
Jack Ye [Fri, 28 Jan 2022 00:14:47 +0000 (16:14 -0800)]
Docs: Update release instructions (#3982)
Ashish Singh [Tue, 25 Jan 2022 20:50:19 +0000 (12:50 -0800)]
AWS: Support checksum validation with S3 eTags (#3813)
* [S3FileIO] Add capability to perform checksum validations using S3 eTags.
* fix checkstyle error
* Update to move checksum checks to s3 server side
* Enable s3 checksum checks in aws integration tests
* Catch protocol error and log helpful error message
* Use digest bytes instead of MessageDigest and update tests
* Fix checkstyle failure
* Use DigestOutputStream
* Remove redundant spaces
* rename etag to checksum in leftover places
* address
* Remove ununsed import
* Config name change
* minor updates
0xffmeta [Tue, 25 Jan 2022 10:54:36 +0000 (18:54 +0800)]
Docs: Add section to include instructions for Hive on Tez (#3944)
Anton Okolnychyi [Tue, 25 Jan 2022 08:03:36 +0000 (00:03 -0800)]
Spark 3.2: Revise distribution and ordering for merge-on-read DELETE (#3970)
Rajarshi Sarkar [Tue, 25 Jan 2022 05:46:57 +0000 (11:16 +0530)]
Docs: Add Amazon EMR announcement (#3976)
夏川和 [Tue, 25 Jan 2022 01:14:43 +0000 (17:14 -0800)]
AWS: fix Glue catalog for unknown commit status (#3967)
Anton Okolnychyi [Mon, 24 Jan 2022 22:48:45 +0000 (14:48 -0800)]
Spark 3.2: Add tests for copy-on-write MERGE distribution and ordering (#3964)
夏川和 [Mon, 24 Jan 2022 21:29:48 +0000 (13:29 -0800)]
AWS: show old fields in Glue table (#3888)
Yufei Gu [Mon, 24 Jan 2022 20:39:47 +0000 (12:39 -0800)]
Core: Add reserved UUID Table Property and Expose in HMS. (#3914)
Co-authored-by: Karuppayya Rajendran <karuppayya.rajendran@apple.com>
Co-authored-by: Yufei Gu <yufei_gu@apple.com>
夏川和 [Mon, 24 Jan 2022 19:55:20 +0000 (11:55 -0800)]
AWS: fix Iceberg to Glue schema conversion (#3887)
Peidian li [Mon, 24 Jan 2022 19:07:53 +0000 (03:07 +0800)]
Core: Fix delete file index with manifests of only existing files (#3943)
Nan Zhu [Mon, 24 Jan 2022 19:05:46 +0000 (11:05 -0800)]
Core: Allow removing and adding the same partition field as a noop (#3954)
vanliu [Mon, 24 Jan 2022 19:01:37 +0000 (03:01 +0800)]
Hive: Make Iceberg table filter optional in HiveCatalog (#3908)
This adds an option to return all Hive tables, not just Iceberg tables to avoid loading metadata and slowing down the operation.
Szehon Ho [Mon, 24 Jan 2022 18:38:16 +0000 (10:38 -0800)]
Parquet: NPE in Parquet Writer Metrics when data value max bound will overflow (#3760)
Previously when writing metrics whose max bound would overflow when incremented. This would result in a null value for the metrics and cause an NPE when put in the Metrics array. Now instead the null values are ignored if returned from truncate.
cccs-eric [Mon, 24 Jan 2022 16:27:23 +0000 (11:27 -0500)]
Python: Fix type for partition_type struct fields (#3939) (#3940)
Signed-off-by: cccs-eric <eric.ladouceur@cyber.gc.ca>
kingeasternsun [Mon, 24 Jan 2022 13:40:52 +0000 (21:40 +0800)]
Data: Read metrics in parallel during TableMigration (#3876)
Adds a parameter for reading the metrics of files in parallel, rather than one at a time in TableMigrationUtils.
Co-authored-by: King <wangdongyang@deepexi.com>
Rajarshi Sarkar [Mon, 24 Jan 2022 00:39:55 +0000 (06:09 +0530)]
Core: Added no-arg constructor in ResolvingFileIO (#3923)
Samuel Redai [Mon, 24 Jan 2022 00:37:31 +0000 (16:37 -0800)]
Python: Add FileIO, InputFile, and OutputFile abstract base classes (#3691)
Zhangg7723 [Mon, 24 Jan 2022 00:31:57 +0000 (08:31 +0800)]
Core: Fix an error message in BinPackStrategy (#3919)
Anton Okolnychyi [Sat, 22 Jan 2022 01:05:48 +0000 (17:05 -0800)]
Core: Deprecate the MERGE cardinality check property (#3953)
Ryan Blue [Sat, 22 Jan 2022 00:28:28 +0000 (16:28 -0800)]
Fix SparkCatalog time travel check. (#3942)
Anton Okolnychyi [Fri, 21 Jan 2022 23:46:24 +0000 (15:46 -0800)]
Spark 3.2: Revise distribution and ordering in copy-on-write UPDATE (#3949)
Russell Spitzer [Fri, 21 Jan 2022 20:00:19 +0000 (14:00 -0600)]
Spark: Backport Streaming Test Refactors (#3948)
Back-porting test refactor from #3775
Anton Okolnychyi [Fri, 21 Jan 2022 19:38:29 +0000 (11:38 -0800)]
Spark 3.2: Revise distribution and ordering in copy-on-write DELETE (#3930)
Eduard Tudenhöfner [Fri, 21 Jan 2022 17:09:39 +0000 (18:09 +0100)]
JMH: Parameterize spark project version for JHM Benchmarks (#3946)
Given that we have multiple Spark project versions in the codebase and
that users might want to run a particular Benchmark from a specific
Spark version, we should make the Spark project version a parameter of
the JMH Benchmark Action.
Marton Bod [Thu, 20 Jan 2022 10:22:50 +0000 (11:22 +0100)]
Hive: Do not skip IO config serialization for metadata queries (#3911)
xloya [Thu, 20 Jan 2022 05:47:02 +0000 (13:47 +0800)]
[Spark] Backport rewrite data files are eliminated by deletes to Spark v3.0 and Spark v3.1 (#3935)
Backport of #3724 to Spark 3.0 and 3.1
Co-authored-by: Jiebao Xiao <xiaojiebao@xiaomi.com>
liliwei [Thu, 20 Jan 2022 02:34:43 +0000 (10:34 +0800)]
Flink 1.14: Add tests to check whether should remove meta columns in source reader (#3893)
openinx [Thu, 20 Jan 2022 01:52:10 +0000 (09:52 +0800)]
Flink 1.12: Fix SerializableTable with Kryo (#3926)
xiaotianzhang01 [Thu, 20 Jan 2022 00:42:26 +0000 (08:42 +0800)]
Spark: Add helper to register truncate UDF (#3708)
Co-authored-by: zhangxiaotian13 <zhangxiaotian13@jd.com>
liliwei [Wed, 19 Jan 2022 23:23:59 +0000 (07:23 +0800)]
Docs: Add compression codec options (#3892)
cccs-eric [Wed, 19 Jan 2022 23:22:39 +0000 (18:22 -0500)]
Python: Fix incorrect single-value encoding for boolean (#3924) (#3927)
Signed-off-by: cccs-eric <eric.ladouceur@cyber.gc.ca>
Pucheng Yang [Wed, 19 Jan 2022 23:21:44 +0000 (15:21 -0800)]
Python: Fix quote handling in expression parser (#3875)
openinx [Wed, 19 Jan 2022 21:53:17 +0000 (05:53 +0800)]
Flink 1.14: Add Kryo tests for SerializableTable (#3925)
Kyle Bendickson [Wed, 19 Jan 2022 19:05:21 +0000 (11:05 -0800)]
Flink: Fix flaky tests that depend on row order (#3931)
xloya [Wed, 19 Jan 2022 16:34:43 +0000 (00:34 +0800)]
[Spark][Core]: Support RewriteDataFiles when Files are Completely Eliminated by Deletes (#3724)
Previously, RewriteDataFiles would fail if the outcome of a rewrite was the complete removal of all DataFiles, this is actually now a possibility given Merge on Read so it is now allowed.
Co-authored-by: Jiebao Xiao <xiaojiebao@xiaomi.com>
Yi Tang [Wed, 19 Jan 2022 00:07:21 +0000 (08:07 +0800)]
Flink: Fix classloader in Avro ManifestReader (#3906)
Nick Ouellet [Wed, 19 Jan 2022 00:03:34 +0000 (19:03 -0500)]
Python: Expand primitive types to individual classes (#3839)
Co-authored-by: Sam Redai <sam@tabular.io>
Kyle Bendickson [Wed, 19 Jan 2022 00:02:41 +0000 (16:02 -0800)]
Build: Fix source-release script in actions, add git remote validation (#3915)
Anton Okolnychyi [Tue, 18 Jan 2022 20:17:52 +0000 (12:17 -0800)]
Spark 3.2: Add tests for resolving star actions in MERGE by name (#3918)
Co-authored-by: Kyle Bendickson <kjbendickson@gmail.com>
Anton Okolnychyi [Tue, 18 Jan 2022 17:31:59 +0000 (09:31 -0800)]
Spark 3.2: Add tests for multiple NOT MATCHED clauses (#3917)
xiaotianzhang01 [Tue, 18 Jan 2022 16:55:59 +0000 (00:55 +0800)]
Docs: Add LOCALLY ORDERED BY and DISTRIBUTED BY clauses (#3820)
Co-authored-by: zhangxiaotian13 <zhangxiaotian13@jd.com>
liliwei [Tue, 18 Jan 2022 16:24:10 +0000 (00:24 +0800)]
Docs: Link expire_snapshots to table expiration properties (#3878)
Anton Okolnychyi [Tue, 18 Jan 2022 07:19:44 +0000 (23:19 -0800)]
Spark 3.2: Implement merge-on-read DELETE (#3763)
Russell Spitzer [Sat, 15 Jan 2022 05:42:26 +0000 (23:42 -0600)]
Core: Split FileScanTasks on Offsets (#460) (#3292)
Previously FileScanTasks would only be split if the exceed the target split size of requested. This prevented the combination of tasks which were smaller than the split size, but could be combined to make a request closer to the requested split size. To fix this we split all files on their offsets when we are splitting, and then recombine them during the creation of scan tasks to try to hit the desired split sizes.
Anton Okolnychyi [Fri, 14 Jan 2022 19:21:13 +0000 (11:21 -0800)]
Spark 3.2: Implement copy-on-write MERGE (#3804)
Hongyue/Steve Zhang [Fri, 14 Jan 2022 18:21:33 +0000 (10:21 -0800)]
Spark : Support parallelism in RemoveOrphanFiles (#3872)
Co-authored-by: Steve Zhang <hongyue_zhang@apple.com>
Ryan Blue [Thu, 13 Jan 2022 20:17:55 +0000 (12:17 -0800)]
Parquet: Lazily initialize the underlying writer in ParquetWriter (#3780)
Co-authored-by: Tim Steinbach <tim.steinbach@shopify.com>
Anurag Mantripragada [Thu, 13 Jan 2022 17:46:10 +0000 (09:46 -0800)]
API: Register existing tables in Iceberg HiveCatalog (#3851)
Co-authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Robert Stupp [Thu, 13 Jan 2022 12:28:08 +0000 (13:28 +0100)]
Bump Nessie from 0.17.0 to 0.18.0 (#3890)
Rui Li [Thu, 13 Jan 2022 06:00:12 +0000 (14:00 +0800)]
Test: Make sure to delete temp folders (#3790)
smallx [Thu, 13 Jan 2022 01:11:44 +0000 (09:11 +0800)]
Spark: Reduce requests from SparkSessionCatalog.invalidateTable (#3861)
Huaxin Gao [Wed, 12 Jan 2022 23:38:54 +0000 (15:38 -0800)]
Spark 3.2: Push down partition filter when importing file tables (#3745)
Chen Zhang [Wed, 12 Jan 2022 16:58:03 +0000 (00:58 +0800)]
Spark 3.1: Add Spark UI metrics for merge into DynamicFileFilterExec (#3882)
Co-authored-by: zhangchen351 <zhangchen351@jd.com>
xiaotianzhang01 [Wed, 12 Jan 2022 16:55:44 +0000 (00:55 +0800)]
Spark 3.1: Fix binary literals in pushdown filters (#3728)
Co-authored-by: zhangxiaotian13 <zhangxiaotian13@jd.com>
Chen Zhang [Tue, 11 Jan 2022 17:49:34 +0000 (01:49 +0800)]
Spark 3.0: Add Spark UI metrics for merge into DynamicFileFilterExec (#3863)
Co-authored-by: zhangchen351 <zhangchen351@jd.com>
Robert Stupp [Tue, 11 Jan 2022 12:55:31 +0000 (13:55 +0100)]
Allow using a custom NessieClientBuilder implementation (#3877)
Nessie defaults to use the HttpClientBuilder, but certain use cases
require a custom client builder implementation. This change allows
this by having a new configuration option.
zhang chaoming [Mon, 10 Jan 2022 23:16:38 +0000 (07:16 +0800)]
Build: Suppress warning about Flink nanosecond access (#3868)
Co-authored-by: zhangchaoming <zhangchaoming@360.com>
Eduard Tudenhöfner [Mon, 10 Jan 2022 21:34:45 +0000 (22:34 +0100)]
Build: Only use scalastyle plugin with scala modules (#3869)
Steven Zhen Wu [Mon, 10 Jan 2022 21:31:38 +0000 (13:31 -0800)]
Flink 1.14: Add FLIP-27 Iceberg source split (#3870)
Steven Zhen Wu [Mon, 10 Jan 2022 21:30:44 +0000 (13:30 -0800)]
Revert "Flink: Add FLIP-27 Iceberg source split (#3501)" (#3871)
This reverts commit
d2c26a02190a16539c8c0621c4d8aac2e9e3ec6c.
Kyle Bendickson [Mon, 10 Jan 2022 21:30:18 +0000 (13:30 -0800)]
Docs: Update copyright year in site mkdocs file to 2022 (#3873)
smallx [Sun, 9 Jan 2022 22:49:39 +0000 (06:49 +0800)]
Spark: Fix table UUID exceptions with CachingCatalog (#3837)
Yufei Gu [Sun, 9 Jan 2022 22:48:39 +0000 (14:48 -0800)]
Core: Replace set with bitmap for faster delete filtering (#3535)
Kyle Bendickson [Sun, 9 Jan 2022 22:35:43 +0000 (14:35 -0800)]
Build: Update NOTICE to include copyright to 2022 (#3855)
openinx [Sun, 9 Jan 2022 22:23:58 +0000 (06:23 +0800)]
Flink 1.13: Fix SerializableTable with Kryo (#3857)
Steven Zhen Wu [Sun, 9 Jan 2022 18:15:08 +0000 (10:15 -0800)]
Flink: Add FLIP-27 Iceberg source split (#3501)
Karl Manong [Fri, 7 Jan 2022 22:26:37 +0000 (06:26 +0800)]
Build: Upgrade gradle to 7.3.3 (#3793)
Huaxin Gao [Fri, 7 Jan 2022 22:12:49 +0000 (14:12 -0800)]
Core: Fix partitions metadata table with a column named partition (#3845)
* Partition Metadata table breaks with a partition column named 'partitition'
* address comments
* fix style
* add checkConflicts
* remove TestTables.clearTables in the end of test
* address comments
* checkConflict => checkConflicts
Kyle Bendickson [Fri, 7 Jan 2022 21:57:04 +0000 (13:57 -0800)]
Spec: Initial OpenAPI template for a REST catalog (#3770)
Ajantha Bhat [Fri, 7 Jan 2022 09:03:36 +0000 (14:33 +0530)]
Docs: Fix broken link of Dremio with Iceberg (#3856)
Nan Zhu [Tue, 4 Jan 2022 23:46:45 +0000 (15:46 -0800)]
Core: Allow adding a dropped partition column name (#3632)