hudi.git
85 min ago[HUDI-3953]Flink Hudi module should support low-level source and sink api (#5445) master
JerryYue-M [Sat, 2 Jul 2022 00:38:46 +0000 (08:38 +0800)] 
[HUDI-3953]Flink Hudi module should support low-level source and sink api (#5445)

Co-authored-by: jerryyue <jerryyue@didiglobal.com>
31 hours ago[HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using...
RexAn [Thu, 30 Jun 2022 18:07:40 +0000 (02:07 +0800)] 
[HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using HDFS (#5048)

Add the differentiated logic of creating immutable file in HDFS by first creating the file.tmp and then renaming the file

33 hours ago[HUDI-3984] Remove mandatory check of partiton path for cli command (#5458)
miomiocat [Thu, 30 Jun 2022 17:00:13 +0000 (01:00 +0800)] 
[HUDI-3984] Remove mandatory check of partiton path for cli command (#5458)

37 hours ago[HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (#5907)
komao [Thu, 30 Jun 2022 12:48:50 +0000 (20:48 +0800)] 
[HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (#5907)

* [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

* add ut

Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>
47 hours ago[HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED (#5999)
RexAn [Thu, 30 Jun 2022 02:26:00 +0000 (10:26 +0800)] 
[HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED (#5999)

47 hours ago[MINOR] Following #2070, Fix BindException when running tests on shared machines...
cxzl25 [Thu, 30 Jun 2022 02:20:59 +0000 (10:20 +0800)] 
[MINOR] Following #2070, Fix BindException when running tests on shared machines. (#5951)

2 days ago[HUDI-4336] Fix records overwritten bug with binary primary key (#5996)
luoyajun [Thu, 30 Jun 2022 01:12:00 +0000 (09:12 +0800)] 
[HUDI-4336] Fix records overwritten bug with binary primary key (#5996)

2 days ago[HUDI-4331] Allow loading external config file from class loader (#5987)
wenningd [Thu, 30 Jun 2022 00:04:34 +0000 (17:04 -0700)] 
[HUDI-4331] Allow loading external config file from class loader (#5987)

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2 days ago[MINOR] Make CLI 'commit rollback' using rollbackUsingMarkers false as default (...
YueZhang [Wed, 29 Jun 2022 17:12:46 +0000 (01:12 +0800)] 
[MINOR] Make CLI 'commit rollback' using rollbackUsingMarkers false as default (#5174)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2 days ago[HUDI-1575] Claim RFC-56: Early Conflict Detection For Multi-writer (#6002)
YueZhang [Wed, 29 Jun 2022 08:43:31 +0000 (16:43 +0800)] 
[HUDI-1575] Claim RFC-56: Early Conflict Detection For Multi-writer (#6002)

Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net>
3 days ago[HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepoints...
Teng [Tue, 28 Jun 2022 22:13:29 +0000 (06:13 +0800)] 
[HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (#5994)

3 days ago[HUDI-1176] Upgrade hudi to log4j2 (#5366)
bschell [Tue, 28 Jun 2022 19:54:23 +0000 (14:54 -0500)] 
[HUDI-1176] Upgrade hudi to log4j2 (#5366)

* Move to log4j2

cr: https://code.amazon.com/reviews/CR-71010705

* Upgrade unit tests to log4j2

* update exclusion

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
3 days ago[HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could...
Alexey Kudinkin [Tue, 28 Jun 2022 19:27:32 +0000 (12:27 -0700)] 
[HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could be specified by the writer (#5970)

Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller

3 days ago[HUDI-4332] The current instant may be wrong under some extreme conditions in AppendW...
BruceLin [Tue, 28 Jun 2022 12:42:26 +0000 (20:42 +0800)] 
[HUDI-4332] The current instant may be wrong under some extreme conditions in AppendWriteFunction. (#5988)

3 days ago[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN...
ForwardXu [Tue, 28 Jun 2022 07:08:48 +0000 (15:08 +0800)] 
[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN (#5990)

4 days ago[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982)
KnightChess [Tue, 28 Jun 2022 01:44:41 +0000 (09:44 +0800)] 
[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982)

* [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon

4 days ago[HUDI-3506] Add call procedure for CommitsCommand (#5974)
superche [Tue, 28 Jun 2022 01:43:36 +0000 (09:43 +0800)] 
[HUDI-3506] Add call procedure for CommitsCommand (#5974)

* [HUDI-3506] Add call procedure for CommitsCommand

Co-authored-by: superche <superche@tencent.com>
4 days ago[HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (#5930)
Sagar Sumit [Mon, 27 Jun 2022 11:57:16 +0000 (17:27 +0530)] 
[HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (#5930)

4 days ago[HUDI-4311] Fix Flink lose data on some rollback scene (#5950)
吴祥平 [Mon, 27 Jun 2022 08:09:44 +0000 (16:09 +0800)] 
[HUDI-4311] Fix Flink lose data on some rollback scene (#5950)

4 days ago[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977)
ForwardXu [Mon, 27 Jun 2022 05:06:50 +0000 (13:06 +0800)] 
[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977)

4 days ago[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (#5957)
leesf [Mon, 27 Jun 2022 04:50:58 +0000 (12:50 +0800)] 
[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (#5957)

4 days ago[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMerge...
cxzl25 [Mon, 27 Jun 2022 03:09:30 +0000 (11:09 +0800)] 
[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner (#5959)

5 days ago[HUDI-4309] Spark3.2 custom parser should not throw exception (#5947)
cxzl25 [Mon, 27 Jun 2022 01:37:23 +0000 (09:37 +0800)] 
[HUDI-4309] Spark3.2 custom parser should not throw exception (#5947)

5 days ago[HUDI-5246] Bumping mysql connector version due to security vulnerability (#5851)
Sivabalan Narayanan [Sun, 26 Jun 2022 23:54:57 +0000 (16:54 -0700)] 
[HUDI-5246] Bumping mysql connector version due to security vulnerability (#5851)

5 days ago[MINOR] Remove -T option from CI build (#5972)
Shiyan Xu [Sun, 26 Jun 2022 15:34:05 +0000 (10:34 -0500)] 
[MINOR] Remove -T option from CI build (#5972)

5 days ago[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956)
ForwardXu [Sun, 26 Jun 2022 03:27:14 +0000 (11:27 +0800)] 
[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956)

6 days ago[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType...
xiarixiaoyao [Sat, 25 Jun 2022 13:03:19 +0000 (21:03 +0800)] 
[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky (#5973)

6 days ago[HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk...
Alexey Kudinkin [Sat, 25 Jun 2022 03:52:28 +0000 (20:52 -0700)] 
[HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk-inserting (#5966)

* Fixed Dictionary encoding config not being properly propagated to Parquet writer (making it unable to apply it, substantially bloating the storage footprint)

6 days agoRevert "[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)" (#5971)
xiarixiaoyao [Sat, 25 Jun 2022 03:23:17 +0000 (11:23 +0800)] 
Revert "[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)" (#5971)

This reverts commit e8fbd4daf49802f60f800ccc92e66369d44f07f6.

6 days ago[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)
xiarixiaoyao [Sat, 25 Jun 2022 02:15:08 +0000 (10:15 +0800)] 
[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)

7 days ago[HUDI-3512] Add call procedure for StatsCommand (#5955)
jiz [Sat, 25 Jun 2022 01:43:23 +0000 (09:43 +0800)] 
[HUDI-3512] Add call procedure for StatsCommand (#5955)

Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>
7 days ago[HUDI-4260] Change KEYGEN_CLASS_NAME without default value (#5877)
luokey [Fri, 24 Jun 2022 07:05:03 +0000 (15:05 +0800)] 
[HUDI-4260] Change KEYGEN_CLASS_NAME without default value (#5877)

* Change KEYGEN_CLASS_NAME without default value

Co-authored-by: 854194341@qq.com <loukey_7821>
7 days ago[HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (#5874)
xi chaomin [Fri, 24 Jun 2022 06:47:36 +0000 (14:47 +0800)] 
[HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (#5874)

7 days ago[HUDI-4273] Support inline schedule clustering for Flink stream (#5890)
Zhaojing Yu [Fri, 24 Jun 2022 03:28:06 +0000 (11:28 +0800)] 
[HUDI-4273] Support inline schedule clustering for Flink stream (#5890)

* [HUDI-4273] Support inline schedule clustering for Flink stream

* delete deprecated clustering plan strategy and add clustering ITTest

7 days ago[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949)
jiz [Fri, 24 Jun 2022 02:16:54 +0000 (10:16 +0800)] 
[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949)

Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com>
8 days ago[HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (#5941)
Sagar Sumit [Thu, 23 Jun 2022 14:10:08 +0000 (19:40 +0530)] 
[HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (#5941)

* [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups

* Separate out incremental sync fsview test with clustering

8 days ago[HUDI-4299] Fix problem about hudi-example-java run failed on idea. (#5936)
Forus [Thu, 23 Jun 2022 13:46:22 +0000 (21:46 +0800)] 
[HUDI-4299] Fix problem about hudi-example-java run failed on idea. (#5936)

9 days ago[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929)
jiz [Wed, 22 Jun 2022 09:50:20 +0000 (17:50 +0800)] 
[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929)

* [HUDI-3508] Add call procedure for FileSystemView

* minor

Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com>
9 days ago[minor] following 4270, add unit tests for the keys lost case (#5918)
Danny Chan [Wed, 22 Jun 2022 08:56:06 +0000 (16:56 +0800)] 
[minor] following 4270, add unit tests for the keys lost case (#5918)

9 days ago[HUDI-4279] Strength the remote fs view lagging check when latest commit refresh...
LinMingQiang [Wed, 22 Jun 2022 02:32:21 +0000 (10:32 +0800)] 
[HUDI-4279] Strength the remote fs view lagging check when latest commit refresh is enabled (#5917)

Signed-off-by: LinMingQiang <1356469429@qq.com>
10 days agoRevert master (#5925)
Zhaojing Yu [Tue, 21 Jun 2022 08:58:50 +0000 (16:58 +0800)] 
Revert master (#5925)

* Revert "udate"

This reverts commit 092e35c1e300f1eb1a7474136826fed26bc10ccd.

* Revert "[HUDI-3475] Initialize hudi table management module."

This reverts commit 4640a3bbb8e212030f94848a0112784d98772de8.

10 days agoudate 4872/head
喻兆靖 [Tue, 21 Jun 2022 07:22:04 +0000 (15:22 +0800)] 
udate

10 days ago[HUDI-3475] Initialize hudi table management module.
喻兆靖 [Wed, 8 Jun 2022 01:54:31 +0000 (09:54 +0800)] 
[HUDI-3475] Initialize hudi table management module.

10 days ago[HUDI-4270] Bootstrap op data loading missing (#5888)
Bo Cui [Tue, 21 Jun 2022 03:47:39 +0000 (11:47 +0800)] 
[HUDI-4270] Bootstrap op data loading missing (#5888)

10 days ago[HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (#5734)
Shawn Chang [Tue, 21 Jun 2022 02:54:12 +0000 (19:54 -0700)] 
[HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (#5734)

* Fix hudi-cli rollback with rollbackUsingMarkers method call
* Add test for hudi-cli rollbackUsingMarkers

Co-authored-by: Shawn Chang <yxchang@amazon.com>
11 days ago[HUDI-4251] Fix the problem that the command 'commits sync' description does not...
Forus [Mon, 20 Jun 2022 23:03:58 +0000 (07:03 +0800)] 
[HUDI-4251] Fix the problem that the command 'commits sync' description does not match. (#5881)

11 days ago[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths...
RexAn [Mon, 20 Jun 2022 17:32:34 +0000 (01:32 +0800)] 
[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (#5723)

11 days ago[MINOR] Update DOAP with 0.11.1 Release (#5908)
Y Ethan Guo [Mon, 20 Jun 2022 16:27:35 +0000 (09:27 -0700)] 
[MINOR] Update DOAP with 0.11.1 Release (#5908)

11 days ago[HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (...
Alexander Trushev [Mon, 20 Jun 2022 09:07:49 +0000 (16:07 +0700)] 
[HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (#5876)

* [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job

11 days ago[HUDI-4259] Flink create avro schema not conformance to standards (#5878)
luokey [Mon, 20 Jun 2022 07:41:23 +0000 (15:41 +0800)] 
[HUDI-4259]  Flink create avro schema not conformance to standards (#5878)

* flink create avro schema not conformance to standards

Co-authored-by: 854194341@qq.com <loukey_7821>
11 days agofix remove redundant Variable (#5806)
felixYyu [Mon, 20 Jun 2022 07:21:49 +0000 (15:21 +0800)] 
fix remove redundant Variable (#5806)

11 days ago[HUDI-4277] supoort flink table source with computed column (#5897)
Shizhi Chen [Mon, 20 Jun 2022 07:19:32 +0000 (15:19 +0800)] 
[HUDI-4277] supoort flink table source with computed column (#5897)

Co-authored-by: chenshizhi <chenshizhi@bilibili.com>
11 days ago[MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space...
5herhom [Mon, 20 Jun 2022 07:15:23 +0000 (15:15 +0800)] 
[MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space of "/tmp" full when compacting offline. (#5905)

11 days ago[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse...
huberylee [Mon, 20 Jun 2022 06:29:21 +0000 (14:29 +0800)] 
[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (#5894)

12 days ago[HUDI-3507] Support export command based on Call Produce Command (#5901)
ForwardXu [Sun, 19 Jun 2022 10:48:22 +0000 (18:48 +0800)] 
[HUDI-3507] Support export command based on Call Produce Command (#5901)

2 weeks ago[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761)
huberylee [Fri, 17 Jun 2022 10:33:58 +0000 (18:33 +0800)] 
[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761)

* Support Create/Drop/Show/Refresh Index Syntax for Spark SQL

2 weeks ago[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStrea...
董可伦 [Fri, 17 Jun 2022 08:57:14 +0000 (16:57 +0800)] 
[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (#5883)

2 weeks ago[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820)
KnightChess [Thu, 16 Jun 2022 09:58:37 +0000 (17:58 +0800)] 
[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820)

* [HUDI-4214] improve repeat init write schema in ExpressionPayload

2 weeks ago[HUDI-4217] improve repeat init object in ExpressionPayload (#5825)
KnightChess [Wed, 15 Jun 2022 12:21:28 +0000 (20:21 +0800)] 
[HUDI-4217] improve repeat init object in ExpressionPayload (#5825)

2 weeks ago[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occur...
董可伦 [Wed, 15 Jun 2022 10:10:35 +0000 (18:10 +0800)] 
[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (#5827)

2 weeks ago[HUDI-3499] Add Call Procedure for show rollbacks (#5848)
superche [Wed, 15 Jun 2022 08:50:15 +0000 (16:50 +0800)] 
[HUDI-3499] Add Call Procedure for show rollbacks (#5848)

* Add Call Procedure for show rollbacks

* fix

* add ut for show_rollback_detail and exception handle

Co-authored-by: superche <superche@tencent.com>
2 weeks ago[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866)
Danny Chan [Wed, 15 Jun 2022 06:23:23 +0000 (14:23 +0800)] 
[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866)

2 weeks ago[minor] Following HUDI-4207, remote the new wrapper #init method (#5865)
Danny Chan [Wed, 15 Jun 2022 00:48:13 +0000 (08:48 +0800)] 
[minor] Following HUDI-4207, remote the new wrapper #init method (#5865)

2 weeks ago[MINOR] Fix typo of DisruptorExecutor in RFC 53 (#5860)
felixYyu [Tue, 14 Jun 2022 06:30:17 +0000 (14:30 +0800)] 
[MINOR] Fix typo of DisruptorExecutor in RFC 53 (#5860)

2 weeks ago[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788)
HunterXHunter [Mon, 13 Jun 2022 14:36:06 +0000 (22:36 +0800)] 
[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788)

Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception

2 weeks ago[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718)
Qi Ji [Mon, 13 Jun 2022 14:31:57 +0000 (22:31 +0800)] 
[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718)

add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss
when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently
when failOnDataLoss is set, fail explicitly

2 weeks ago[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727)
luoyajun [Mon, 13 Jun 2022 14:29:32 +0000 (22:29 +0800)] 
[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727)

2 weeks ago[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790)
xi chaomin [Mon, 13 Jun 2022 14:22:12 +0000 (22:22 +0800)] 
[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790)

TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields.

This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing.

Co-authored-by: xicm <xicm@asiainfo.com>
2 weeks agoStrip extra spaces when creating new configuration (#5849)
superche [Mon, 13 Jun 2022 11:10:38 +0000 (19:10 +0800)] 
Strip extra spaces when creating new configuration (#5849)

Co-authored-by: superche <superche@tencent.com>
2 weeks ago[MINOR] fix AvroSchemaConverter duplicate branch in 'switch' (#5813)
sandyfog [Mon, 13 Jun 2022 02:55:24 +0000 (10:55 +0800)] 
[MINOR]  fix AvroSchemaConverter duplicate branch in 'switch' (#5813)

2 weeks ago[HUDI-4224] Fix CI issues (#5842)
Shiyan Xu [Sun, 12 Jun 2022 18:44:18 +0000 (11:44 -0700)] 
[HUDI-4224] Fix CI issues (#5842)

- Upgrade junit to 5.7.2
- Downgrade surefire and failsafe to 2.22.2
- Fix test failures that were previously not reported
- Improve azure pipeline configs

Co-authored-by: liujinhui1994 <965147871@qq.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2 weeks ago[HUDI-4205] Fix NullPointerException in HFile reader creation (#5841)
Y Ethan Guo [Sat, 11 Jun 2022 21:46:43 +0000 (14:46 -0700)] 
[HUDI-4205] Fix NullPointerException in HFile reader creation (#5841)

Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers

2 weeks ago[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata...
Y Ethan Guo [Sat, 11 Jun 2022 20:19:24 +0000 (13:19 -0700)] 
[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (#5840)

When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.

2 weeks ago[HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (#5829)
Sivabalan Narayanan [Sat, 11 Jun 2022 20:17:42 +0000 (16:17 -0400)] 
[HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (#5829)

3 weeks ago[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619)
xi chaomin [Thu, 9 Jun 2022 23:23:51 +0000 (07:23 +0800)] 
[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619)

Co-authored-by: xicm <xicm@asiainfo.com>
3 weeks ago[HUDI-4139]improvement for flink write operator name to identify tables easily (...
yanenze [Thu, 9 Jun 2022 21:48:20 +0000 (05:48 +0800)] 
[HUDI-4139]improvement for flink write operator name to identify tables easily (#5744)

Co-authored-by: yanenze <yanenze@keytop.com.cn>
3 weeks ago[HUDI-4213] Infer keygen clazz for Spark SQL (#5815)
Danny Chan [Thu, 9 Jun 2022 12:37:58 +0000 (20:37 +0800)] 
[HUDI-4213] Infer keygen clazz for Spark SQL (#5815)

3 weeks ago[MINOR] FlinkStateBackendConverter add more exception message (#5809)
sandyfog [Thu, 9 Jun 2022 07:13:27 +0000 (15:13 +0800)] 
[MINOR] FlinkStateBackendConverter add more  exception message (#5809)

* [MINOR] FlinkStateBackendConverter add more  exception message

3 weeks ago[MINOR][DOCS] Update the README.md file in hudi-examples (#5803)
liuzhuang2017 [Thu, 9 Jun 2022 00:45:00 +0000 (08:45 +0800)] 
[MINOR][DOCS] Update the README.md file in hudi-examples (#5803)

3 weeks ago[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration...
Alexey Kudinkin [Tue, 7 Jun 2022 23:30:46 +0000 (16:30 -0700)] 
[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737)

There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs.  This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.

3 weeks ago[HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768)
Raymond Xu [Tue, 7 Jun 2022 14:51:31 +0000 (07:51 -0700)] 
[HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768)

* HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory

* Resolve metastore uri config before loading fs conf

* Skip hiveql due to CI issue

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
3 weeks ago[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773)
Sivabalan Narayanan [Tue, 7 Jun 2022 12:19:52 +0000 (08:19 -0400)] 
[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773)

- Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock.
- Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix  #5208

3 weeks ago[MINOR][RFC-53] Fix typos (#5764)
YueZhang [Tue, 7 Jun 2022 00:28:28 +0000 (08:28 +0800)] 
[MINOR][RFC-53] Fix typos (#5764)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
3 weeks ago[MINOR] Mark AWSGlueCatalogSyncClient experimental (#5775)
Raymond Xu [Tue, 7 Jun 2022 00:25:59 +0000 (17:25 -0700)] 
[MINOR] Mark AWSGlueCatalogSyncClient experimental (#5775)

3 weeks ago[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747)
Sivabalan Narayanan [Mon, 6 Jun 2022 19:48:21 +0000 (15:48 -0400)] 
[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747)

- When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist.

3 weeks ago[HUDI-4197] Fix Async indexer to support building FILES partition (#5766)
Sivabalan Narayanan [Mon, 6 Jun 2022 19:47:11 +0000 (15:47 -0400)] 
[HUDI-4197] Fix Async indexer to support building FILES partition (#5766)

- When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.

3 weeks ago[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert...
Sivabalan Narayanan [Mon, 6 Jun 2022 17:21:00 +0000 (13:21 -0400)] 
[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (#5664)

Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen.  This patch fixes the issue.

3 weeks ago[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata...
Alexey Kudinkin [Mon, 6 Jun 2022 17:14:26 +0000 (10:14 -0700)] 
[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (#5733)

As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs.
Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time.

- Rebased onto new APIs to avoid excessive Hadoop's Path allocations
- Eliminated hasOperationField completely to avoid repeatitive computations
- Cleaning up duplication in HoodieActiveTimeline
- Added caching for common instances of HoodieCommitMetadata
- Made tableStructSchema lazy;

3 weeks ago[HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause...
HunterXHunter [Mon, 6 Jun 2022 13:53:55 +0000 (21:53 +0800)] 
[HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause the fileID of the task to not be loaded correctly (#5763)

Co-authored-by: john.wick <john.wick@vipshop.com>
3 weeks ago[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759)
Sagar Sumit [Mon, 6 Jun 2022 11:19:03 +0000 (16:49 +0530)] 
[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759)

3 weeks ago[HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (#5749)
Danny Chan [Mon, 6 Jun 2022 04:12:48 +0000 (12:12 +0800)] 
[HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (#5749)

3 weeks ago[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerExce...
marchpure [Mon, 6 Jun 2022 04:07:26 +0000 (12:07 +0800)] 
[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (#5755)

SeekTo top cells avoid NullPointerException

3 weeks ago[HUDI-4190] Include hbase-protocol for shading in the bundles (#5750)
Y Ethan Guo [Mon, 6 Jun 2022 00:42:16 +0000 (17:42 -0700)] 
[HUDI-4190] Include hbase-protocol for shading in the bundles (#5750)

3 weeks ago[HUDI-4168] Add Call Procedure for marker deletion (#5738)
Saisai Shao [Sun, 5 Jun 2022 03:05:38 +0000 (11:05 +0800)] 
[HUDI-4168] Add Call Procedure for marker deletion (#5738)

* Add Call Procedure for marker deletion

3 weeks ago[HUDI-4187] Fix partition order in aws glue sync (#5731)
Nicolas Paris [Sat, 4 Jun 2022 09:16:52 +0000 (11:16 +0200)] 
[HUDI-4187] Fix partition order in aws glue sync (#5731)

4 weeks ago[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743)
leesf [Fri, 3 Jun 2022 09:16:48 +0000 (17:16 +0800)] 
[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743)

4 weeks ago[HUDI-4179] Cluster with sort cloumns invalid (#5739)
KnightChess [Thu, 2 Jun 2022 12:28:21 +0000 (20:28 +0800)] 
[HUDI-4179] Cluster with sort cloumns invalid (#5739)

4 weeks ago[HUDI-4167] Remove the timeline refresh with initializing hoodie table (#5716)
Danny Chan [Thu, 2 Jun 2022 01:48:48 +0000 (09:48 +0800)] 
[HUDI-4167] Remove the timeline refresh with initializing hoodie table (#5716)

The timeline refresh on table initialization invokes the fs view #sync, which has two actions now:

1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata
2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view

But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest,
the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally.

In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata
is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view:

1. if the fs view is local, the visibility is based on the client table metadata client's latest commit
2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline

From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not.
That make the client logic more clear and less error-prone.

Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the
remote fs view, the server would encounter conflicts and the client encounters a response error.

4 weeks ago[HUDI-3670] free temp views in sql transformers (#5080)
Qi Ji [Wed, 1 Jun 2022 14:35:40 +0000 (22:35 +0800)] 
[HUDI-3670] free temp views in sql transformers (#5080)

4 weeks ago[HUDI-4011] Add hudi-aws-bundle (#5674)
Sagar Sumit [Wed, 1 Jun 2022 12:30:29 +0000 (18:00 +0530)] 
[HUDI-4011] Add hudi-aws-bundle (#5674)

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>