hudi.git
3 months ago[HUDI-1176] Upgrade hudi to log4j2 (#5366)
bschell [Tue, 28 Jun 2022 19:54:23 +0000 (14:54 -0500)] 
[HUDI-1176] Upgrade hudi to log4j2 (#5366)

* Move to log4j2

cr: https://code.amazon.com/reviews/CR-71010705

* Upgrade unit tests to log4j2

* update exclusion

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
3 months ago[HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could...
Alexey Kudinkin [Tue, 28 Jun 2022 19:27:32 +0000 (12:27 -0700)] 
[HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could be specified by the writer (#5970)

Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller

3 months ago[HUDI-4332] The current instant may be wrong under some extreme conditions in AppendW...
BruceLin [Tue, 28 Jun 2022 12:42:26 +0000 (20:42 +0800)] 
[HUDI-4332] The current instant may be wrong under some extreme conditions in AppendWriteFunction. (#5988)

3 months ago[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN... 5608/head
ForwardXu [Tue, 28 Jun 2022 07:08:48 +0000 (15:08 +0800)] 
[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN (#5990)

3 months ago[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982)
KnightChess [Tue, 28 Jun 2022 01:44:41 +0000 (09:44 +0800)] 
[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982)

* [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon

3 months ago[HUDI-3506] Add call procedure for CommitsCommand (#5974)
superche [Tue, 28 Jun 2022 01:43:36 +0000 (09:43 +0800)] 
[HUDI-3506] Add call procedure for CommitsCommand (#5974)

* [HUDI-3506] Add call procedure for CommitsCommand

Co-authored-by: superche <superche@tencent.com>
3 months ago[HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (#5930)
Sagar Sumit [Mon, 27 Jun 2022 11:57:16 +0000 (17:27 +0530)] 
[HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (#5930)

3 months ago[HUDI-4311] Fix Flink lose data on some rollback scene (#5950)
吴祥平 [Mon, 27 Jun 2022 08:09:44 +0000 (16:09 +0800)] 
[HUDI-4311] Fix Flink lose data on some rollback scene (#5950)

3 months ago[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977)
ForwardXu [Mon, 27 Jun 2022 05:06:50 +0000 (13:06 +0800)] 
[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977)

3 months ago[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (#5957)
leesf [Mon, 27 Jun 2022 04:50:58 +0000 (12:50 +0800)] 
[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (#5957)

3 months ago[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMerge...
cxzl25 [Mon, 27 Jun 2022 03:09:30 +0000 (11:09 +0800)] 
[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner (#5959)

3 months ago[HUDI-4309] Spark3.2 custom parser should not throw exception (#5947)
cxzl25 [Mon, 27 Jun 2022 01:37:23 +0000 (09:37 +0800)] 
[HUDI-4309] Spark3.2 custom parser should not throw exception (#5947)

3 months ago[HUDI-5246] Bumping mysql connector version due to security vulnerability (#5851)
Sivabalan Narayanan [Sun, 26 Jun 2022 23:54:57 +0000 (16:54 -0700)] 
[HUDI-5246] Bumping mysql connector version due to security vulnerability (#5851)

3 months ago[MINOR] Remove -T option from CI build (#5972)
Shiyan Xu [Sun, 26 Jun 2022 15:34:05 +0000 (10:34 -0500)] 
[MINOR] Remove -T option from CI build (#5972)

3 months ago[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956)
ForwardXu [Sun, 26 Jun 2022 03:27:14 +0000 (11:27 +0800)] 
[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956)

3 months ago[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType...
xiarixiaoyao [Sat, 25 Jun 2022 13:03:19 +0000 (21:03 +0800)] 
[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky (#5973)

3 months ago[HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk...
Alexey Kudinkin [Sat, 25 Jun 2022 03:52:28 +0000 (20:52 -0700)] 
[HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk-inserting (#5966)

* Fixed Dictionary encoding config not being properly propagated to Parquet writer (making it unable to apply it, substantially bloating the storage footprint)

3 months agoRevert "[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)" (#5971)
xiarixiaoyao [Sat, 25 Jun 2022 03:23:17 +0000 (11:23 +0800)] 
Revert "[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)" (#5971)

This reverts commit e8fbd4daf49802f60f800ccc92e66369d44f07f6.

3 months ago[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)
xiarixiaoyao [Sat, 25 Jun 2022 02:15:08 +0000 (10:15 +0800)] 
[TEST][DO_NOT_MERGE]fix random failed for ci (#5948)

3 months ago[HUDI-3512] Add call procedure for StatsCommand (#5955)
jiz [Sat, 25 Jun 2022 01:43:23 +0000 (09:43 +0800)] 
[HUDI-3512] Add call procedure for StatsCommand (#5955)

Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>
3 months ago[HUDI-4260] Change KEYGEN_CLASS_NAME without default value (#5877)
luokey [Fri, 24 Jun 2022 07:05:03 +0000 (15:05 +0800)] 
[HUDI-4260] Change KEYGEN_CLASS_NAME without default value (#5877)

* Change KEYGEN_CLASS_NAME without default value

Co-authored-by: 854194341@qq.com <loukey_7821>
3 months ago[HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (#5874)
xi chaomin [Fri, 24 Jun 2022 06:47:36 +0000 (14:47 +0800)] 
[HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (#5874)

3 months ago[HUDI-4273] Support inline schedule clustering for Flink stream (#5890)
Zhaojing Yu [Fri, 24 Jun 2022 03:28:06 +0000 (11:28 +0800)] 
[HUDI-4273] Support inline schedule clustering for Flink stream (#5890)

* [HUDI-4273] Support inline schedule clustering for Flink stream

* delete deprecated clustering plan strategy and add clustering ITTest

3 months ago[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949)
jiz [Fri, 24 Jun 2022 02:16:54 +0000 (10:16 +0800)] 
[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949)

Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com>
3 months ago[HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (#5941)
Sagar Sumit [Thu, 23 Jun 2022 14:10:08 +0000 (19:40 +0530)] 
[HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (#5941)

* [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups

* Separate out incremental sync fsview test with clustering

3 months ago[HUDI-4299] Fix problem about hudi-example-java run failed on idea. (#5936)
Forus [Thu, 23 Jun 2022 13:46:22 +0000 (21:46 +0800)] 
[HUDI-4299] Fix problem about hudi-example-java run failed on idea. (#5936)

3 months ago[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929)
jiz [Wed, 22 Jun 2022 09:50:20 +0000 (17:50 +0800)] 
[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929)

* [HUDI-3508] Add call procedure for FileSystemView

* minor

Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com>
3 months ago[minor] following 4270, add unit tests for the keys lost case (#5918)
Danny Chan [Wed, 22 Jun 2022 08:56:06 +0000 (16:56 +0800)] 
[minor] following 4270, add unit tests for the keys lost case (#5918)

3 months ago[HUDI-4279] Strength the remote fs view lagging check when latest commit refresh...
LinMingQiang [Wed, 22 Jun 2022 02:32:21 +0000 (10:32 +0800)] 
[HUDI-4279] Strength the remote fs view lagging check when latest commit refresh is enabled (#5917)

Signed-off-by: LinMingQiang <1356469429@qq.com>
3 months agoRevert master (#5925)
Zhaojing Yu [Tue, 21 Jun 2022 08:58:50 +0000 (16:58 +0800)] 
Revert master (#5925)

* Revert "udate"

This reverts commit 092e35c1e300f1eb1a7474136826fed26bc10ccd.

* Revert "[HUDI-3475] Initialize hudi table management module."

This reverts commit 4640a3bbb8e212030f94848a0112784d98772de8.

3 months agoudate 4872/head
喻兆靖 [Tue, 21 Jun 2022 07:22:04 +0000 (15:22 +0800)] 
udate

3 months ago[HUDI-3475] Initialize hudi table management module.
喻兆靖 [Wed, 8 Jun 2022 01:54:31 +0000 (09:54 +0800)] 
[HUDI-3475] Initialize hudi table management module.

3 months ago[HUDI-4270] Bootstrap op data loading missing (#5888)
Bo Cui [Tue, 21 Jun 2022 03:47:39 +0000 (11:47 +0800)] 
[HUDI-4270] Bootstrap op data loading missing (#5888)

3 months ago[HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (#5734)
Shawn Chang [Tue, 21 Jun 2022 02:54:12 +0000 (19:54 -0700)] 
[HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (#5734)

* Fix hudi-cli rollback with rollbackUsingMarkers method call
* Add test for hudi-cli rollbackUsingMarkers

Co-authored-by: Shawn Chang <yxchang@amazon.com>
3 months ago[HUDI-4251] Fix the problem that the command 'commits sync' description does not...
Forus [Mon, 20 Jun 2022 23:03:58 +0000 (07:03 +0800)] 
[HUDI-4251] Fix the problem that the command 'commits sync' description does not match. (#5881)

3 months ago[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths...
RexAn [Mon, 20 Jun 2022 17:32:34 +0000 (01:32 +0800)] 
[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (#5723)

3 months ago[MINOR] Update DOAP with 0.11.1 Release (#5908)
Y Ethan Guo [Mon, 20 Jun 2022 16:27:35 +0000 (09:27 -0700)] 
[MINOR] Update DOAP with 0.11.1 Release (#5908)

3 months ago[HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (...
Alexander Trushev [Mon, 20 Jun 2022 09:07:49 +0000 (16:07 +0700)] 
[HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (#5876)

* [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job

3 months ago[HUDI-4259] Flink create avro schema not conformance to standards (#5878)
luokey [Mon, 20 Jun 2022 07:41:23 +0000 (15:41 +0800)] 
[HUDI-4259]  Flink create avro schema not conformance to standards (#5878)

* flink create avro schema not conformance to standards

Co-authored-by: 854194341@qq.com <loukey_7821>
3 months agofix remove redundant Variable (#5806)
felixYyu [Mon, 20 Jun 2022 07:21:49 +0000 (15:21 +0800)] 
fix remove redundant Variable (#5806)

3 months ago[HUDI-4277] supoort flink table source with computed column (#5897)
Shizhi Chen [Mon, 20 Jun 2022 07:19:32 +0000 (15:19 +0800)] 
[HUDI-4277] supoort flink table source with computed column (#5897)

Co-authored-by: chenshizhi <chenshizhi@bilibili.com>
3 months ago[MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space...
5herhom [Mon, 20 Jun 2022 07:15:23 +0000 (15:15 +0800)] 
[MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space of "/tmp" full when compacting offline. (#5905)

3 months ago[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse...
huberylee [Mon, 20 Jun 2022 06:29:21 +0000 (14:29 +0800)] 
[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (#5894)

3 months ago[HUDI-3507] Support export command based on Call Produce Command (#5901)
ForwardXu [Sun, 19 Jun 2022 10:48:22 +0000 (18:48 +0800)] 
[HUDI-3507] Support export command based on Call Produce Command (#5901)

3 months ago[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761)
huberylee [Fri, 17 Jun 2022 10:33:58 +0000 (18:33 +0800)] 
[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761)

* Support Create/Drop/Show/Refresh Index Syntax for Spark SQL

3 months ago[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStrea...
董可伦 [Fri, 17 Jun 2022 08:57:14 +0000 (16:57 +0800)] 
[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (#5883)

3 months ago[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820)
KnightChess [Thu, 16 Jun 2022 09:58:37 +0000 (17:58 +0800)] 
[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820)

* [HUDI-4214] improve repeat init write schema in ExpressionPayload

3 months ago[HUDI-4217] improve repeat init object in ExpressionPayload (#5825)
KnightChess [Wed, 15 Jun 2022 12:21:28 +0000 (20:21 +0800)] 
[HUDI-4217] improve repeat init object in ExpressionPayload (#5825)

3 months ago[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occur...
董可伦 [Wed, 15 Jun 2022 10:10:35 +0000 (18:10 +0800)] 
[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (#5827)

3 months ago[HUDI-3499] Add Call Procedure for show rollbacks (#5848)
superche [Wed, 15 Jun 2022 08:50:15 +0000 (16:50 +0800)] 
[HUDI-3499] Add Call Procedure for show rollbacks (#5848)

* Add Call Procedure for show rollbacks

* fix

* add ut for show_rollback_detail and exception handle

Co-authored-by: superche <superche@tencent.com>
3 months ago[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866)
Danny Chan [Wed, 15 Jun 2022 06:23:23 +0000 (14:23 +0800)] 
[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866)

3 months ago[minor] Following HUDI-4207, remote the new wrapper #init method (#5865)
Danny Chan [Wed, 15 Jun 2022 00:48:13 +0000 (08:48 +0800)] 
[minor] Following HUDI-4207, remote the new wrapper #init method (#5865)

3 months ago[MINOR] Fix typo of DisruptorExecutor in RFC 53 (#5860)
felixYyu [Tue, 14 Jun 2022 06:30:17 +0000 (14:30 +0800)] 
[MINOR] Fix typo of DisruptorExecutor in RFC 53 (#5860)

3 months ago[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788)
HunterXHunter [Mon, 13 Jun 2022 14:36:06 +0000 (22:36 +0800)] 
[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788)

Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception

3 months ago[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718)
Qi Ji [Mon, 13 Jun 2022 14:31:57 +0000 (22:31 +0800)] 
[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718)

add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss
when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently
when failOnDataLoss is set, fail explicitly

3 months ago[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727)
luoyajun [Mon, 13 Jun 2022 14:29:32 +0000 (22:29 +0800)] 
[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727)

3 months ago[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790)
xi chaomin [Mon, 13 Jun 2022 14:22:12 +0000 (22:22 +0800)] 
[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790)

TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields.

This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing.

Co-authored-by: xicm <xicm@asiainfo.com>
3 months agoStrip extra spaces when creating new configuration (#5849)
superche [Mon, 13 Jun 2022 11:10:38 +0000 (19:10 +0800)] 
Strip extra spaces when creating new configuration (#5849)

Co-authored-by: superche <superche@tencent.com>
3 months ago[MINOR] fix AvroSchemaConverter duplicate branch in 'switch' (#5813)
sandyfog [Mon, 13 Jun 2022 02:55:24 +0000 (10:55 +0800)] 
[MINOR]  fix AvroSchemaConverter duplicate branch in 'switch' (#5813)

3 months ago[HUDI-4224] Fix CI issues (#5842)
Shiyan Xu [Sun, 12 Jun 2022 18:44:18 +0000 (11:44 -0700)] 
[HUDI-4224] Fix CI issues (#5842)

- Upgrade junit to 5.7.2
- Downgrade surefire and failsafe to 2.22.2
- Fix test failures that were previously not reported
- Improve azure pipeline configs

Co-authored-by: liujinhui1994 <965147871@qq.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
3 months ago[HUDI-4205] Fix NullPointerException in HFile reader creation (#5841)
Y Ethan Guo [Sat, 11 Jun 2022 21:46:43 +0000 (14:46 -0700)] 
[HUDI-4205] Fix NullPointerException in HFile reader creation (#5841)

Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers

3 months ago[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata...
Y Ethan Guo [Sat, 11 Jun 2022 20:19:24 +0000 (13:19 -0700)] 
[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (#5840)

When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.

3 months ago[HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (#5829)
Sivabalan Narayanan [Sat, 11 Jun 2022 20:17:42 +0000 (16:17 -0400)] 
[HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (#5829)

3 months ago[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619)
xi chaomin [Thu, 9 Jun 2022 23:23:51 +0000 (07:23 +0800)] 
[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619)

Co-authored-by: xicm <xicm@asiainfo.com>
3 months ago[HUDI-4139]improvement for flink write operator name to identify tables easily (...
yanenze [Thu, 9 Jun 2022 21:48:20 +0000 (05:48 +0800)] 
[HUDI-4139]improvement for flink write operator name to identify tables easily (#5744)

Co-authored-by: yanenze <yanenze@keytop.com.cn>
3 months ago[HUDI-4213] Infer keygen clazz for Spark SQL (#5815)
Danny Chan [Thu, 9 Jun 2022 12:37:58 +0000 (20:37 +0800)] 
[HUDI-4213] Infer keygen clazz for Spark SQL (#5815)

3 months ago[MINOR] FlinkStateBackendConverter add more exception message (#5809)
sandyfog [Thu, 9 Jun 2022 07:13:27 +0000 (15:13 +0800)] 
[MINOR] FlinkStateBackendConverter add more  exception message (#5809)

* [MINOR] FlinkStateBackendConverter add more  exception message

3 months ago[MINOR][DOCS] Update the README.md file in hudi-examples (#5803)
liuzhuang2017 [Thu, 9 Jun 2022 00:45:00 +0000 (08:45 +0800)] 
[MINOR][DOCS] Update the README.md file in hudi-examples (#5803)

3 months ago[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration...
Alexey Kudinkin [Tue, 7 Jun 2022 23:30:46 +0000 (16:30 -0700)] 
[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737)

There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs.  This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.

3 months ago[HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768)
Raymond Xu [Tue, 7 Jun 2022 14:51:31 +0000 (07:51 -0700)] 
[HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768)

* HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory

* Resolve metastore uri config before loading fs conf

* Skip hiveql due to CI issue

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
3 months ago[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773)
Sivabalan Narayanan [Tue, 7 Jun 2022 12:19:52 +0000 (08:19 -0400)] 
[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773)

- Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock.
- Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix  #5208

3 months ago[MINOR][RFC-53] Fix typos (#5764)
YueZhang [Tue, 7 Jun 2022 00:28:28 +0000 (08:28 +0800)] 
[MINOR][RFC-53] Fix typos (#5764)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
3 months ago[MINOR] Mark AWSGlueCatalogSyncClient experimental (#5775)
Raymond Xu [Tue, 7 Jun 2022 00:25:59 +0000 (17:25 -0700)] 
[MINOR] Mark AWSGlueCatalogSyncClient experimental (#5775)

3 months ago[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747)
Sivabalan Narayanan [Mon, 6 Jun 2022 19:48:21 +0000 (15:48 -0400)] 
[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747)

- When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist.

3 months ago[HUDI-4197] Fix Async indexer to support building FILES partition (#5766)
Sivabalan Narayanan [Mon, 6 Jun 2022 19:47:11 +0000 (15:47 -0400)] 
[HUDI-4197] Fix Async indexer to support building FILES partition (#5766)

- When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.

3 months ago[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert...
Sivabalan Narayanan [Mon, 6 Jun 2022 17:21:00 +0000 (13:21 -0400)] 
[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (#5664)

Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen.  This patch fixes the issue.

3 months ago[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata...
Alexey Kudinkin [Mon, 6 Jun 2022 17:14:26 +0000 (10:14 -0700)] 
[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (#5733)

As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs.
Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time.

- Rebased onto new APIs to avoid excessive Hadoop's Path allocations
- Eliminated hasOperationField completely to avoid repeatitive computations
- Cleaning up duplication in HoodieActiveTimeline
- Added caching for common instances of HoodieCommitMetadata
- Made tableStructSchema lazy;

3 months ago[HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause...
HunterXHunter [Mon, 6 Jun 2022 13:53:55 +0000 (21:53 +0800)] 
[HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause the fileID of the task to not be loaded correctly (#5763)

Co-authored-by: john.wick <john.wick@vipshop.com>
3 months ago[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759)
Sagar Sumit [Mon, 6 Jun 2022 11:19:03 +0000 (16:49 +0530)] 
[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759)

3 months ago[HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (#5749)
Danny Chan [Mon, 6 Jun 2022 04:12:48 +0000 (12:12 +0800)] 
[HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (#5749)

3 months ago[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerExce...
marchpure [Mon, 6 Jun 2022 04:07:26 +0000 (12:07 +0800)] 
[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (#5755)

SeekTo top cells avoid NullPointerException

3 months ago[HUDI-4190] Include hbase-protocol for shading in the bundles (#5750)
Y Ethan Guo [Mon, 6 Jun 2022 00:42:16 +0000 (17:42 -0700)] 
[HUDI-4190] Include hbase-protocol for shading in the bundles (#5750)

3 months ago[HUDI-4168] Add Call Procedure for marker deletion (#5738)
Saisai Shao [Sun, 5 Jun 2022 03:05:38 +0000 (11:05 +0800)] 
[HUDI-4168] Add Call Procedure for marker deletion (#5738)

* Add Call Procedure for marker deletion

3 months ago[HUDI-4187] Fix partition order in aws glue sync (#5731)
Nicolas Paris [Sat, 4 Jun 2022 09:16:52 +0000 (11:16 +0200)] 
[HUDI-4187] Fix partition order in aws glue sync (#5731)

3 months ago[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743)
leesf [Fri, 3 Jun 2022 09:16:48 +0000 (17:16 +0800)] 
[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743)

3 months ago[HUDI-4179] Cluster with sort cloumns invalid (#5739)
KnightChess [Thu, 2 Jun 2022 12:28:21 +0000 (20:28 +0800)] 
[HUDI-4179] Cluster with sort cloumns invalid (#5739)

3 months ago[HUDI-4167] Remove the timeline refresh with initializing hoodie table (#5716)
Danny Chan [Thu, 2 Jun 2022 01:48:48 +0000 (09:48 +0800)] 
[HUDI-4167] Remove the timeline refresh with initializing hoodie table (#5716)

The timeline refresh on table initialization invokes the fs view #sync, which has two actions now:

1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata
2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view

But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest,
the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally.

In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata
is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view:

1. if the fs view is local, the visibility is based on the client table metadata client's latest commit
2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline

From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not.
That make the client logic more clear and less error-prone.

Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the
remote fs view, the server would encounter conflicts and the client encounters a response error.

3 months ago[HUDI-3670] free temp views in sql transformers (#5080)
Qi Ji [Wed, 1 Jun 2022 14:35:40 +0000 (22:35 +0800)] 
[HUDI-3670] free temp views in sql transformers (#5080)

3 months ago[HUDI-4011] Add hudi-aws-bundle (#5674)
Sagar Sumit [Wed, 1 Jun 2022 12:30:29 +0000 (18:00 +0530)] 
[HUDI-4011] Add hudi-aws-bundle (#5674)

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
4 months ago[HUDI-4174] Add hive conf dir option for flink sink (#5725)
Danny Chan [Wed, 1 Jun 2022 08:17:36 +0000 (16:17 +0800)] 
[HUDI-4174] Add hive conf dir option for flink sink (#5725)

4 months ago[HUDI-4107] Added --sync-tool-classes config option in HoodieMultiTableDeltaStreamer...
Kumud Kumar Srivatsava Tirupati [Tue, 31 May 2022 14:57:50 +0000 (20:27 +0530)] 
[HUDI-4107] Added --sync-tool-classes config option in HoodieMultiTableDeltaStreamer (#5597)

* added --sync-tool-classes config option in multitable delta streamer

* added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context

4 months ago[HUDI-4149] Drop-Table fails when underlying table directory is broken (#5672)
Jin Xing [Mon, 30 May 2022 11:09:26 +0000 (19:09 +0800)] 
[HUDI-4149] Drop-Table fails when underlying table directory is broken (#5672)

4 months ago[HUDI-4163] Catch general exception instead of IOException while fetching rollback...
Danny Chan [Mon, 30 May 2022 05:08:02 +0000 (13:08 +0800)] 
[HUDI-4163] Catch general exception instead of IOException while fetching rollback plan during rollback (#5703)

If the avro file is corrupted, an InvalidAvroMagicException throws.

4 months ago[HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering (#5563)
苏承祥 [Sun, 29 May 2022 05:35:47 +0000 (13:35 +0800)] 
[HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering (#5563)

Co-authored-by: 苏承祥 <sucx@tuya.com>
4 months ago[HUDI-3551] Fix testStorageSchemes for oci storage (#5711)
Raymond Xu [Sat, 28 May 2022 19:13:37 +0000 (12:13 -0700)] 
[HUDI-3551] Fix testStorageSchemes for oci storage (#5711)

4 months ago[HUDI-3551] Add the Oracle Cloud Infrastructure (oci) Object Storage URI scheme ...
Carter Shanklin [Sat, 28 May 2022 15:26:14 +0000 (08:26 -0700)] 
[HUDI-3551] Add the Oracle Cloud Infrastructure (oci) Object Storage URI scheme (#4952)

4 months ago[HUDI-4166] Added SimpleClient plugin for integ test (#5710)
uday08bce [Sat, 28 May 2022 15:20:52 +0000 (17:20 +0200)] 
[HUDI-4166] Added SimpleClient plugin for integ test (#5710)

4 months ago[MINOR] Fix Hive and meta sync config for sql statement (#5316)
ForwardXu [Sat, 28 May 2022 14:56:39 +0000 (22:56 +0800)] 
[MINOR] Fix Hive and meta sync config for sql statement (#5316)

4 months ago[HUDI-4160] Make database regex of MaxwellJsonKafkaSourcePostProcessor optional ...
wangxianghu [Sat, 28 May 2022 07:13:24 +0000 (11:13 +0400)] 
[HUDI-4160] Make database regex of MaxwellJsonKafkaSourcePostProcessor optional (#5697)

4 months ago[HUDI-4151] flink split_reader supports rocksdb (#5675)
Bo Cui [Sat, 28 May 2022 00:37:34 +0000 (08:37 +0800)] 
[HUDI-4151] flink split_reader supports rocksdb (#5675)

* [HUDI-4151] flink split_reader supports rocksdb