Prashant Singh [Tue, 24 May 2022 14:56:55 +0000 (20:26 +0530)]
Spark: Backport CommitStateUnknownException handling for RewriteManifestSparkAction (#4850)
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Ryan Blue [Mon, 23 May 2022 23:36:15 +0000 (16:36 -0700)]
Core: Refactor REST catalog to RESTSessionCatalog (#4846)
Ryan Blue [Mon, 23 May 2022 23:35:27 +0000 (16:35 -0700)]
API: Update SessionCatalog context to use a credentials map (#4845)
Ryan Blue [Mon, 23 May 2022 21:59:36 +0000 (14:59 -0700)]
Core: Remove Authorization header handling from HTTPClient (#4832)
Kyle Bendickson [Mon, 23 May 2022 21:58:05 +0000 (14:58 -0700)]
Core: Add serialization for MetadataUpdate (#4716)
Ryan Blue [Mon, 23 May 2022 21:50:12 +0000 (14:50 -0700)]
Core: Add OAuth2 helpers for REST catalog (#4833)
Ryan Blue [Mon, 23 May 2022 21:14:33 +0000 (14:14 -0700)]
Spec: Add more context about OAuth2 to the REST spec (#4843)
Prashant Singh [Mon, 23 May 2022 17:39:42 +0000 (23:09 +0530)]
Spark: Handle CommitStateUnknown exception in RewriteManifestSparkAction (#4836)
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Adam Szita [Mon, 23 May 2022 09:50:13 +0000 (11:50 +0200)]
Add project method to IcebergGenerics to enable projections by Schema (#4819)
Đặng Minh Dũng [Sat, 21 May 2022 19:19:42 +0000 (02:19 +0700)]
AWS: support Path-Style Access (#4823)
Xingfan Xia [Fri, 20 May 2022 21:44:16 +0000 (14:44 -0700)]
AWS: handle Glue exceptions as iceberg errors instead of commit state unknown (#4821)
Fokko Driesprong [Fri, 20 May 2022 15:53:18 +0000 (17:53 +0200)]
Python: Fix the type_string of the NestedField (#4814)
If there is a doc, the rest gets ignored, which is kind of awkward.
Before:
```python
➜ python git:(master) ✗ python3
Python 3.9.12 (main, Mar 26 2022, 15:44:31)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from iceberg.types import NestedField, LongType
>>> str(NestedField(
... field_id=2,
... name='bar',
... field_type=LongType(),
... is_optional=False,
... doc="Just a long"
... ))
'2: bar: required long'
>>> str(NestedField(
... field_id=2,
... name='bar',
... field_type=LongType(),
... is_optional=False,
... doc="Just a long"
... ))
' (Just a long)'
```
Now:
```python
➜ python git:(master) ✗ python3
Python 3.9.12 (main, Mar 26 2022, 15:44:31)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> str(NestedField(
... field_id=2,
... name='bar',
... field_type=LongType(),
... is_optional=False,
... doc="Just a long"
... ))
'2: bar: required long (Just a long)'
```
Szehon Ho [Wed, 18 May 2022 23:52:56 +0000 (16:52 -0700)]
Spark 3.2: Add positiol and equality delete file count to ExpireSnapshot results (#4629)
Kyle Bendickson [Wed, 18 May 2022 21:50:33 +0000 (14:50 -0700)]
Build: Update release scripts publish Spark 3.2 with Scala 2.13 (#4167)
Pucheng Yang [Wed, 18 May 2022 17:16:39 +0000 (10:16 -0700)]
Python legacy: Use process pool for planning (#4745)
Fokko Driesprong [Wed, 18 May 2022 17:10:15 +0000 (19:10 +0200)]
Python: Set Python 3.8 as minimum version (#4784)
fallnirvana [Wed, 18 May 2022 16:03:44 +0000 (00:03 +0800)]
Flink: Support hadoop-conf-dir for hdfs-site.xml and core-site.xml (#4622)
Rajarshi Sarkar [Wed, 18 May 2022 15:28:02 +0000 (20:58 +0530)]
Core: Use ImmutableMap for catalog properties (#4803)
Kyle Bendickson [Wed, 18 May 2022 00:53:14 +0000 (17:53 -0700)]
Python - Add pyline disable super-init-not-called for one line (#4797)
Ryan Blue [Wed, 18 May 2022 00:18:28 +0000 (17:18 -0700)]
Core: Add request headers to REST client. (#4772)
Ashish Singh [Tue, 17 May 2022 22:51:25 +0000 (15:51 -0700)]
Core: Allow controlling table properties through catalog config (#4011)
Co-authored-by: Rajarshi Sarkar <srajars@amazon.com>
Fokko Driesprong [Tue, 17 May 2022 20:18:37 +0000 (22:18 +0200)]
Change types into dataclasses (#4767)
* Change types into dataclasses
Proposal to change the types into dataclasses.
This has several improvments:
- We can use the dataclasss field(repr=True) to include the fields in the representation, instead of building our own strings
- We can assign the types in the post_init when they are dynamic (List, Maps, Structs etc) , or just override them when they are static (Primitives)
- We don't have to implement any eq methods because they come for free
- The types are frozen, which is kind of nice since we re-use them
- The code is much more consise
- We can assign the min/max of the int/long/float as Final as of 3.8: https://peps.python.org/pep-0591/
My inspiration was the comment by Kyle:
https://github.com/apache/iceberg/pull/4742#discussion_r869494393
This would entail implementing eq, but why not use the generated one since we're comparing all the attributes :)
Would love to get you input
* Remove explicit repr and eq
* Use @cached_property to cache the string
Add missing words to spelling
* Add additional guard for initializing StructType using kwargs
* Replace type with field_type
Szehon Ho [Tue, 17 May 2022 20:04:49 +0000 (13:04 -0700)]
Spec: Add note about reserved field id 141 in manifests (#4750)
Fokko Driesprong [Tue, 17 May 2022 19:45:56 +0000 (21:45 +0200)]
Python: Test for warning of PyLint (#4791)
* Python: Test for warning of PyLint
* Add ignore rule for overriding constructor
Ryan Blue [Tue, 17 May 2022 17:42:12 +0000 (10:42 -0700)]
Core: Add OAuth2 to REST catalog spec (#4771)
* Core: Add OAuth2 to REST catalog spec.
* Fix OAuth2 requests and responses.
* Add descriptions for token params.
* Core: Add support for form encoding to RESTClient.
* Clarify OAuth2 getToken response fields.
* Rename route to use a plural.
* Ensure headers are always added in HTTPClient.
* Add JWT token type.
* Refactor OAuth in OpenAPI doc.
JustinLee [Tue, 17 May 2022 16:43:16 +0000 (00:43 +0800)]
Flink: Fix comment in BaseDeltaTaskWriter.java (#4788)
liliwei [Tue, 17 May 2022 16:42:30 +0000 (00:42 +0800)]
Docs: Add zstd for Avro to configuration page (#4790)
liliwei [Tue, 17 May 2022 16:41:35 +0000 (00:41 +0800)]
Flink: Port #4752 to Flink 1.14 & 1.13 (#4789)
emkornfield [Tue, 17 May 2022 16:34:51 +0000 (09:34 -0700)]
Spec: Clarify manifest length is size in bytes, fix a typo (#4793)
Ajantha Bhat [Tue, 17 May 2022 12:18:07 +0000 (17:48 +0530)]
Fix typo in revapi.yml (#4778)
Ryan Blue [Tue, 17 May 2022 01:52:05 +0000 (18:52 -0700)]
API: Add SessionCatalog interface and base class (#4773)
Anton Okolnychyi [Mon, 16 May 2022 18:10:02 +0000 (11:10 -0700)]
Core: Add content and delete file counts to manifest tables (#4764)
Piotr Findeisen [Mon, 16 May 2022 18:09:08 +0000 (20:09 +0200)]
Remove redundant assignment (#4755)
`refresh()` updates `this.base`.
liliwei [Mon, 16 May 2022 12:07:46 +0000 (20:07 +0800)]
Docs: Updated version support description for Flink (#4756)
Kyle Bendickson [Sun, 15 May 2022 19:21:59 +0000 (12:21 -0700)]
Core: Accept all API changes in master (#4770)
Kyle Bendickson [Sat, 14 May 2022 18:33:50 +0000 (11:33 -0700)]
Build: Add binary compatibility checks via revapi gradle plugin (#4638)
Anton Okolnychyi [Sat, 14 May 2022 02:09:40 +0000 (19:09 -0700)]
Core: Align snapshot summary property names for delete files (#4766)
Anton Okolnychyi [Sat, 14 May 2022 00:00:09 +0000 (17:00 -0700)]
Spark 3.2: Clean static vars in SparkTableUtil (#4765)
Fokko Driesprong [Fri, 13 May 2022 23:23:53 +0000 (01:23 +0200)]
Python: Add visitor to build Accessor for a Schema (#4685)
Fokko Driesprong [Fri, 13 May 2022 23:09:03 +0000 (01:09 +0200)]
Python: Add spellcheck to the CI (#4730)
jun-he [Fri, 13 May 2022 22:59:36 +0000 (15:59 -0700)]
Python: Add VoidTransform (#4727)
Anton Okolnychyi [Fri, 13 May 2022 20:09:58 +0000 (13:09 -0700)]
Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (#4758)
Steven Zhen Wu [Fri, 13 May 2022 19:55:57 +0000 (12:55 -0700)]
API: Introduce a new IncrementalAppendScan interface (#4580)
southernriver [Fri, 13 May 2022 17:51:16 +0000 (01:51 +0800)]
Core: Optimize when not deleting oldest metadata files after commit (#4651)
William Hyun [Fri, 13 May 2022 17:35:23 +0000 (10:35 -0700)]
ORC: Upgrade to 1.7.4 (#4573)
liliwei [Fri, 13 May 2022 16:07:43 +0000 (00:07 +0800)]
Flink 1.15: Call getCatalogTable directly (#4752)
Mingliang Liu [Fri, 13 May 2022 16:03:19 +0000 (09:03 -0700)]
Flink 1.15: Improve unit tests for sink (#4699)
Kyle Bendickson [Thu, 12 May 2022 23:19:35 +0000 (16:19 -0700)]
[SPEC] Fix typ in REST Catalog OpenAPI Spec (#4761)
Ryan Blue [Thu, 12 May 2022 19:09:24 +0000 (12:09 -0700)]
API: Add expression sanitizer, sanitize scan log messages (#4672)
Ajantha Bhat [Tue, 10 May 2022 23:34:03 +0000 (05:04 +0530)]
Spark: Allow use-caching to be disabled in RewriteManifestsProcedure (#4722)
Xianyang Liu [Tue, 10 May 2022 20:16:22 +0000 (04:16 +0800)]
Spark: Support parallel clean up for abort (#4704)
Co-authored-by: xianyangliu <xianyangliu@tencent.com>
Fokko Driesprong [Tue, 10 May 2022 02:34:09 +0000 (04:34 +0200)]
Python: test_schema_find_column_name was defined twice (#4729)
Ulrich Konrad [Tue, 10 May 2022 02:17:42 +0000 (04:17 +0200)]
Spark 3.2: Remove Thread.sleep in TestRemoveOrphanFilesAction (#4711)
Yi Tang [Tue, 10 May 2022 02:16:43 +0000 (10:16 +0800)]
Flink 1.15: Migrate back to use ReusableArrayData (#4712)
Kyle Bendickson [Tue, 10 May 2022 02:13:41 +0000 (19:13 -0700)]
Flink: Backport flaky test fix for limit 1 query in TestFlinkTableSource (#4724)
Fokko Driesprong [Tue, 10 May 2022 02:09:06 +0000 (04:09 +0200)]
Python: Remove mypy error in Transform (#4728)
Fokko Driesprong [Tue, 10 May 2022 02:06:08 +0000 (04:06 +0200)]
Python: Skip missing tox interpreters (#4731)
Kyle Bendickson [Tue, 10 May 2022 01:59:49 +0000 (18:59 -0700)]
Core: Support table rename in RESTCatalog (#4448)
Russell Spitzer [Mon, 9 May 2022 22:09:34 +0000 (17:09 -0500)]
Spark 2.4 - 3.X: Backport Fix CommitStateUnknown Handling (#4687) (#4719)
Kyle Bendickson [Mon, 9 May 2022 17:40:35 +0000 (10:40 -0700)]
Core: Add JSON parser for UpdateRequirement (#4693)
Prashant Singh [Mon, 9 May 2022 17:12:00 +0000 (22:42 +0530)]
Core: Fix query failure when using projection on top of partitions metadata table (#4720)
Co-authored-by: Prashant Singh <psinghvk@amazon.com>
Eduard Tudenhöfner [Mon, 9 May 2022 15:54:07 +0000 (17:54 +0200)]
Nessie: Use ref.hash parameter to read data at given hash (#4700)
This makes sure that the `NessieCatalog` reads data at the given
`ref.hash` (`NessieConfigConstants.CONF_NESSIE_REF_HASH`) when it's provided. If `ref.hash` is `null`, then this means
that data should be read from whatever the latest `HEAD` is.
Russell Spitzer [Sat, 7 May 2022 01:39:02 +0000 (20:39 -0500)]
Spark: Fix Commit State Unknown Handling (#4687)
Previously the snapshotUpdate operation could potentially throw a
CommitStateUnknownException which would cause Spark's Datasource to call
it's abort method. In this case we cannot remove the underlying data files
since those data files may actually have been successfully committed. We
previously added this code to prevent the removal of metadata files but
this issue is also possible in data file cleanup.
Kyle Bendickson [Fri, 6 May 2022 23:43:51 +0000 (16:43 -0700)]
Flink 1.15 - Fix flaky test that has implicit order dependency in TestFlinkTableSource (#4697)
Szehon Ho [Fri, 6 May 2022 21:18:41 +0000 (14:18 -0700)]
Core: Add all_delete_files and all_files tables (#4694)
Rajarshi Sarkar [Thu, 5 May 2022 15:28:32 +0000 (20:58 +0530)]
Spark 3.x: Support rewrite data files with starting sequence number (#4701) Backports (#3480)
Backport of #3480
Kyle Bendickson [Thu, 5 May 2022 02:57:56 +0000 (19:57 -0700)]
Core: Fix incorrect null check in JsonUtils::getIntOrNull and JsonUtils::getLongOrNull (#4696)
Szehon Ho [Wed, 4 May 2022 23:06:02 +0000 (16:06 -0700)]
Core: Fix filter pushdown for Partitions table with evolved specs (#4637)
Eduard Tudenhöfner [Wed, 4 May 2022 19:53:23 +0000 (21:53 +0200)]
Nessie: Support Namespace properties (#4610)
Kyle Bendickson [Wed, 4 May 2022 18:43:59 +0000 (11:43 -0700)]
License: Add Apache HTTPComponents to runtime licenses (#4688)
Samuel Redai [Wed, 4 May 2022 18:37:13 +0000 (14:37 -0400)]
Python: Add UnboundReference class (#4679)
Robert Stupp [Wed, 4 May 2022 16:17:06 +0000 (18:17 +0200)]
Bump Nessie to 0.28.0 and adopt test code (#4594)
Enhances Iceberg commit conflict detection by maintaining the commit-id from which a
table metadata has been loaded, to use it as the "expected hash" in a Nessie commit.
Makes Nessie specific properties available in `TableMetadata` properties:
* `nessie.content.id` - the Nessie `Content.getId()`
* `nessie.commit.id` - the commit ID used to retrieve the table metadata
* `nessie.reference.name` - the reference name from which the table metadata has been loaded
Also fixes an issue via `org.apache.iceberg.nessie.NessieTableOperations#loadTableMetadata`
that caused too many "previous files", because the `TableMetadata.buildFrom()` assumed that
it's only called for ongoing modifications.
Yufei Gu [Wed, 4 May 2022 15:45:34 +0000 (08:45 -0700)]
Hive: Log new metadata location in commit (#4681)
jun-he [Wed, 4 May 2022 15:43:27 +0000 (08:43 -0700)]
Python: Add UnknownTransform (#4684)
felixYyu [Wed, 4 May 2022 15:13:59 +0000 (23:13 +0800)]
Core: Prefer ImmutableList.of to Collections.emptyList (#4691)
felixYyu [Wed, 4 May 2022 15:12:57 +0000 (23:12 +0800)]
Core: Add missing override annotations (#4690)
Kyle Bendickson [Tue, 3 May 2022 15:53:01 +0000 (08:53 -0700)]
Flink: Add support for Flink 1.15 (#4553)
Kyle Bendickson [Tue, 3 May 2022 03:02:58 +0000 (20:02 -0700)]
Core: Add serialization for AddPartitionSpec, AddSortOrder (#4668)
Ashish Singh [Mon, 2 May 2022 23:49:02 +0000 (16:49 -0700)]
Flink: Support malformed Parquet lists added by addFiles API (#4555)
Kyle Bendickson [Mon, 2 May 2022 23:38:00 +0000 (16:38 -0700)]
Flink 1.13: Backport fix for order dependent flink table source tests (#4682)
Samuel Redai [Mon, 2 May 2022 23:17:49 +0000 (19:17 -0400)]
Python: Add a skeleton for the BuildPositionAccessors visitor (#4678)
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Mingliang Liu [Mon, 2 May 2022 20:58:52 +0000 (13:58 -0700)]
Core: Fix ConvertEqualityDeleteStrategy::options return type (#4669)
Ryan Blue [Mon, 2 May 2022 19:26:48 +0000 (12:26 -0700)]
Core: Fix table corruption from OOM during commit cleanup (#4673)
Chen, Junjie [Mon, 2 May 2022 19:06:53 +0000 (03:06 +0800)]
Core: Add eq and pos delete file counts to snapshot summary (#4677)
Samuel Redai [Mon, 2 May 2022 18:57:59 +0000 (14:57 -0400)]
Python: Specify pip>=21.1 in tox dependencies (#4676)
Andre [Fri, 29 Apr 2022 22:49:35 +0000 (15:49 -0700)]
Spark: Metadata-only delete should throw ValidationException instead of IllegalArgumentException (#4630)
Edgar Rodriguez [Fri, 29 Apr 2022 20:31:41 +0000 (16:31 -0400)]
Spark: Fix NPEs in Spark value converter (#4663)
Limian (Raymond) Zhang [Fri, 29 Apr 2022 20:28:15 +0000 (13:28 -0700)]
Docs: Default value support feature specification (#4301)
Kyle Bendickson [Fri, 29 Apr 2022 16:25:32 +0000 (09:25 -0700)]
Flink 1.14: Fix order-dependent FlinkTableSource tests that break in 1.15 (#4635)
Steven Zhen Wu [Fri, 29 Apr 2022 16:22:38 +0000 (09:22 -0700)]
Spark: Depend on Parquet directly for Spark modules (#4667)
Eduard Tudenhöfner [Fri, 29 Apr 2022 01:49:10 +0000 (03:49 +0200)]
Spec: Add/update required dialect field in View metadata example (#4648)
Kyle Bendickson [Fri, 29 Apr 2022 01:48:16 +0000 (18:48 -0700)]
Infra: Remove Szehon from collaborators (now a committer) (#4643)
Kyle Bendickson [Fri, 29 Apr 2022 01:46:59 +0000 (18:46 -0700)]
Core: Rename test class to match ConfigResponse (#4653)
Ryan Blue [Thu, 28 Apr 2022 17:06:34 +0000 (10:06 -0700)]
Spark 3.2: Add property to allow disabling HiddenPathFilter in RemoveOrphansFiles (#4655)
Co-authored-by: Ulrich Konrad <u.konrad@kasasi.de>
Ryan Blue [Thu, 28 Apr 2022 16:14:10 +0000 (09:14 -0700)]
Core: Add ByteBufferInputStream and implementations (#4623)
Samuel Redai [Wed, 27 Apr 2022 21:21:09 +0000 (17:21 -0400)]
Python: Update docs to avoid InterpreterNotFound errors from tox (#4650)
Kyle Bendickson [Wed, 27 Apr 2022 18:49:56 +0000 (11:49 -0700)]
Core: Add AddSchema, SetCurrentSchema, and SetDefaultPartitionSpec serialization (#4632)
Kyle Bendickson [Wed, 27 Apr 2022 18:37:10 +0000 (11:37 -0700)]
Core: Add TableMetadata to REST serializers (#4641)
Kyle Bendickson [Wed, 27 Apr 2022 18:10:21 +0000 (11:10 -0700)]
Core - Fix field names in LoadTableResponse object to match REST spec (#4642)
Kyle Bendickson [Wed, 27 Apr 2022 18:09:04 +0000 (11:09 -0700)]
Core - Use singleton RESTObjectMapper in REST RequestResponseTestBase class (#4636)
* Core - Use the dedicated RESTObjectMapper in REST request response test base class
* Update tests that previously threw on unrecognized fields as the object mapper is currently permissive - all but one of these cases fail validation anyways
* Fix last failing check