sivabalan [Wed, 26 Jan 2022 01:15:31 +0000 (20:15 -0500)]
[MINOR] Update release version to reflect published version 0.10.1
sivabalan [Wed, 19 Jan 2022 23:05:51 +0000 (18:05 -0500)]
Bumping release candidate number 2
sivabalan [Wed, 19 Jan 2022 23:02:07 +0000 (18:02 -0500)]
Removing a extraneous test class
Thinking Chen [Tue, 18 Jan 2022 19:51:09 +0000 (03:51 +0800)]
[HUDI-3245] Convert uppercase letters to lowercase in storage configs (#4602)
Danny Chan [Tue, 18 Jan 2022 09:46:40 +0000 (17:46 +0800)]
[HUDI-3263] Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE (#4625)
Yuwei XIAO [Mon, 17 Jan 2022 22:24:24 +0000 (06:24 +0800)]
[HUDI-3194] fix MOR snapshot query during compaction (#4540)
Danny Chan [Mon, 17 Jan 2022 10:18:45 +0000 (18:18 +0800)]
[HUDI-3257] Excluding clustering instants from pending rollback info (#4616)
Sivabalan Narayanan [Wed, 19 Jan 2022 16:03:56 +0000 (11:03 -0500)]
[HUDI-3268] Fixing NullPointerException with HoodieFileIndex when keygenclass is null in table config (#4633)
sivabalan narayanan [Thu, 13 Jan 2022 12:46:40 +0000 (07:46 -0500)]
Bumping release candidate number 1 for 0.10.1
yuzhaojing [Fri, 31 Dec 2021 05:12:32 +0000 (13:12 +0800)]
[HUDI-3120] Cache compactionPlan in buffer (#4463)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
WangMinChao [Wed, 15 Dec 2021 12:16:48 +0000 (20:16 +0800)]
[HUDI-3024] Add explicit write handler for flink (#4329)
Co-authored-by: wangminchao <wangminchao@asinking.com>
Y Ethan Guo [Wed, 12 Jan 2022 17:03:27 +0000 (09:03 -0800)]
[HUDI-3007] Fix issues in HoodieRepairTool (#4564)
Sagar Sumit [Thu, 13 Jan 2022 11:55:00 +0000 (17:25 +0530)]
[HUDI-3010] Unbundle parquet-avro and shade other dependencies in prsto bundle (#4551) (#4578)
Sivabalan Narayanan [Thu, 13 Jan 2022 00:30:34 +0000 (19:30 -0500)]
[HUDI-2943] Complete pending clustering before deltastreamer sync (against 0.10.1 minor release branch) (#4573)
Sagar Sumit [Wed, 12 Jan 2022 17:05:21 +0000 (22:35 +0530)]
[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4575)
- Move log4j-core to top level pom
Sivabalan Narayanan [Tue, 11 Jan 2022 02:50:14 +0000 (21:50 -0500)]
Removing extraneous warn logs in ClusteringUtils (#4553)
t0il3ts0ap [Mon, 10 Jan 2022 23:09:47 +0000 (04:39 +0530)]
[HUDI-3148] Create pushgateway client based on port (#4497)
Co-authored-by: anoop narang <anoop.narang@navi.com>
Co-authored-by: sivabalan narayanan <n.siva.b@gmail.com>
Y Ethan Guo [Mon, 10 Jan 2022 21:07:52 +0000 (13:07 -0800)]
[MINOR] Fix port number in setupKafka.sh (#4546)
Y Ethan Guo [Mon, 10 Jan 2022 20:31:25 +0000 (12:31 -0800)]
[HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi (#4544)
Sivabalan Narayanan [Mon, 10 Jan 2022 02:14:28 +0000 (21:14 -0500)]
Removing rollbacks instants from timeline for restore operation (#4518)
Thinking Chen [Sun, 9 Jan 2022 23:31:57 +0000 (07:31 +0800)]
[HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem (#4458)
RexAn [Sun, 9 Jan 2022 10:23:46 +0000 (18:23 +0800)]
[HUDI-3157] Remove aws jars from hudi bundles (#4542)
Co-authored-by: Hui An <hui.an@shopee.com>
Raymond Xu [Tue, 11 Jan 2022 00:37:55 +0000 (16:37 -0800)]
[HUDI-3195] Fix spark 3 pom (#4555)
* [HUDI-3195] Fix spark 3 pom
- drop 3.0.x profile
- update readme
- update build CI bot.yml
* fix spark 3 bundle name
Yann Byron [Mon, 10 Jan 2022 20:15:35 +0000 (04:15 +0800)]
[HUDI-3131] fix ctas error in spark3.1.1 (#4549)
Yann Byron [Sun, 9 Jan 2022 07:43:25 +0000 (15:43 +0800)]
[HUDI-3125] spark-sql write timestamp directly (#4471)
Thinking Chen [Sun, 9 Jan 2022 07:10:17 +0000 (15:10 +0800)]
[HUDI-3104] Kafka-connect support of hadoop config environments and properties (#4451)
Sivabalan Narayanan [Sat, 8 Jan 2022 15:34:47 +0000 (10:34 -0500)]
[HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data (#4530)
- There is a chance that the actual write eventually failed in data table but commit was successful in Metadata table, and if compaction was triggered in MDT, compaction could have included the uncommitted data. But once compacted, it may never be ignored while reading from metadata table. So, this patch fixes the bug. Metadata table compaction is triggered before applying the commit to metadata table to circumvent this issue.
Sagar Sumit [Sat, 8 Jan 2022 15:29:36 +0000 (20:59 +0530)]
[HUDI-3139] Shade htrace and parquet-avro in presto bundle (#4495)
Filter out unnecessary classes
Sagar Sumit [Sat, 8 Jan 2022 15:22:44 +0000 (20:52 +0530)]
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
董可伦 [Sat, 8 Jan 2022 02:48:37 +0000 (10:48 +0800)]
[HUDI-3192] Spark metastore schema evolution broken (#4533)
Sagar Sumit [Fri, 7 Jan 2022 21:20:11 +0000 (02:50 +0530)]
[HUDI-3185] HoodieConfig#getBoolean should return false when default not set (#4536)
Remove unnecessary config
Sivabalan Narayanan [Fri, 7 Jan 2022 16:38:58 +0000 (11:38 -0500)]
[HUDI-2947] Fixing checkpoint fetch in detlastreamer (#4485)
* Fixing checkpoint fetch in detlastreamer
* Addressing comments
董可伦 [Fri, 7 Jan 2022 12:59:55 +0000 (20:59 +0800)]
[MINOR] fix typos in DDLExecutor (#4534)
Y Ethan Guo [Fri, 7 Jan 2022 12:56:08 +0000 (04:56 -0800)]
[HUDI-3188] Update quick start guide for Kafka Connect Sink for Hudi (#4527)
Raymond Xu [Fri, 7 Jan 2022 07:26:35 +0000 (23:26 -0800)]
[HUDI-3100] Add config for hive conditional sync (#4440)
YueZhang [Fri, 7 Jan 2022 02:16:29 +0000 (10:16 +0800)]
[HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter (#4521)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Thinking Chen [Thu, 6 Jan 2022 23:46:51 +0000 (07:46 +0800)]
[HUDI-3118] Add default HUDI_DIR in setupKafka.sh (#4460)
xuzifu666 [Thu, 6 Jan 2022 23:36:13 +0000 (07:36 +0800)]
[MINOR] Remove unused methods in HoodieColumnProjectionUtils (#4408)
sivabalan narayanan [Mon, 10 Jan 2022 15:52:08 +0000 (10:52 -0500)]
Fixing clustering yaml
sivabalan narayanan [Mon, 10 Jan 2022 15:08:33 +0000 (10:08 -0500)]
Fixing TestHoodieDeltaStreamerWithMultiWriter tests for deterministic execution
sivabalan narayanan [Mon, 10 Jan 2022 12:30:31 +0000 (07:30 -0500)]
Fixing build failure with TestHoodieClientMultiWriter
Sivabalan Narayanan [Thu, 6 Jan 2022 18:04:10 +0000 (13:04 -0500)]
[HUDI-3165] Enabling InProcessLockProvider for all multi-writer tests instead of FileSystemBasedLockProviderTestClass (#4427)
hehexiaoduantui [Thu, 6 Jan 2022 07:49:30 +0000 (15:49 +0800)]
Update HiveIncrementalPuller to configure filesystem (#4431)
* Update HiveIncrementalPuller.java
fix get FileSystem bug
* Update HiveIncrementalPuller.java
fix error
* Update HiveIncrementalPuller.java
fie error
Vinish Reddy [Wed, 5 Jan 2022 16:43:10 +0000 (22:13 +0530)]
[HUDI-3168] Fixing null schema with empty commit in incremental relation (#4513)
Sagar Sumit [Wed, 5 Jan 2022 13:09:58 +0000 (18:39 +0530)]
[HUDI-3170] Do not preserve filename when preserveCommitMetadata enabled (#4512)
Danny Chan [Wed, 5 Jan 2022 08:41:33 +0000 (16:41 +0800)]
[HUDI-3171] Sync empty table to hive metastore (#4511)
Sivabalan Narayanan [Wed, 5 Jan 2022 02:57:18 +0000 (21:57 -0500)]
[HUDI-2966] Closing LogRecordScanner in compactor (#4478)
* Closing LogRecordScanner in compactor
* Addressing comments
Nicolas Paris [Tue, 4 Jan 2022 21:42:28 +0000 (22:42 +0100)]
[HUDI-3147] Add endpoint_url to dynamodb lock provider (#4500)
Co-authored-by: Nicolas Paris <nicolas.paris@adevinta.com>
Manoj Govindassamy [Tue, 4 Jan 2022 21:41:33 +0000 (13:41 -0800)]
[HUDI-3141] Metadata merged log record reader - avoiding NullPointerException when records by keys (#4505)
- HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
are not thread safe. When multiple queries come in for gettting log records
by keys, they all operate on the same log record reader instance provided by
HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
as they clear/put/get the same class memeber records.
- The fix is to streamline the mutatation to class member records. Making
HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
to avoid concurrent log records readers getting into NPE.
Sagar Sumit [Tue, 4 Jan 2022 21:32:05 +0000 (03:02 +0530)]
[HUDI-2774] Handle duplicate instants when fetching pending clustering plans (#4118)
Sivabalan Narayanan [Tue, 4 Jan 2022 05:18:04 +0000 (00:18 -0500)]
Adding tests to validate different key generators (#4473)
harshal [Mon, 3 Jan 2022 06:49:43 +0000 (12:19 +0530)]
[HUDI-2558] Fixing Clustering w/ sort columns with null values fails (#4404)
Raymond Xu [Mon, 3 Jan 2022 04:34:37 +0000 (20:34 -0800)]
[MINOR] Update README.md (#4492)
Update Spark 3 build instructions
YueZhang [Mon, 3 Jan 2022 03:43:30 +0000 (11:43 +0800)]
[HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (#4493)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Aimiyoo [Sat, 1 Jan 2022 07:38:38 +0000 (15:38 +0800)]
[HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341)
YueZhang [Fri, 31 Dec 2021 07:56:33 +0000 (15:56 +0800)]
[HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or hms (#4453)
* constructDropPartitions when drop partitions using jdbc
* done
* done
* code style
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Sivabalan Narayanan [Thu, 6 Jan 2022 23:54:32 +0000 (18:54 -0500)]
Fixing build failures
yuzhaojing [Thu, 30 Dec 2021 03:54:34 +0000 (11:54 +0800)]
[HUDI-3124] Bootstrap when timeline have completed instant (#4467)
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
董可伦 [Thu, 30 Dec 2021 03:53:17 +0000 (11:53 +0800)]
[HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean (#4016)
Sivabalan Narayanan [Thu, 30 Dec 2021 02:45:09 +0000 (21:45 -0500)]
Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)" (#4465)
This reverts commit
7e7ad1558c0dcc06e059f631e43e44dc04100aa4.
ForwardXu [Wed, 29 Dec 2021 12:23:23 +0000 (20:23 +0800)]
[HUDI-3108] Fix Purge Drop MOR Table Cause error (#4455)
xuzifu666 [Wed, 29 Dec 2021 10:43:16 +0000 (18:43 +0800)]
[MINOR] HoodieInstantTimeGenerator improve method used (#4462)
Udit Mehrotra [Tue, 28 Dec 2021 15:15:05 +0000 (07:15 -0800)]
[HUDI-2983] Remove Log4j2 transitive dependencies (#4281)
Sivabalan Narayanan [Tue, 28 Dec 2021 10:26:30 +0000 (05:26 -0500)]
Fixing dynamoDbLockConfig required prop check (#4422)
ForwardXu [Tue, 28 Dec 2021 06:11:14 +0000 (14:11 +0800)]
[HUDI-3106] Fix HiveSyncTool not sync schema (#4452)
Yann Byron [Tue, 28 Dec 2021 05:39:52 +0000 (13:39 +0800)]
[HUDI-3093] fix spark-sql query table that write with TimestampBasedKeyGenerator (#4416)
Danny Chan [Sat, 25 Dec 2021 10:10:43 +0000 (18:10 +0800)]
[HUDI-3102] Do not store rollback plan in inflight instant (#4445)
Danny Chan [Sat, 25 Dec 2021 06:10:45 +0000 (14:10 +0800)]
[HUDI-3101] Excluding compaction instants from pending rollback info (#4443)
Danny Chan [Wed, 22 Dec 2021 03:10:27 +0000 (11:10 +0800)]
[HUDI-3032] Do not clean the log files right after compaction for metadata table (#4336)
harshal patil [Tue, 14 Dec 2021 11:58:18 +0000 (17:28 +0530)]
[HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
Raymond Xu [Tue, 21 Dec 2021 05:01:59 +0000 (21:01 -0800)]
[HUDI-2970] Add test for archiving replace commit (#4345)
zhangyue19921010 [Tue, 21 Dec 2021 03:59:50 +0000 (11:59 +0800)]
[HUDI-3070] Add rerunFailingTestsCount for flakly testes (#4398)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Sivabalan Narayanan [Tue, 21 Dec 2021 01:27:22 +0000 (17:27 -0800)]
[MINOR] Increasing CI timeout to 90 mins (#4407)
Manoj Govindassamy [Sun, 19 Dec 2021 18:31:02 +0000 (10:31 -0800)]
[HUDI-3064][HUDI-3054] FileSystemBasedLockProviderTestClass tryLock fix and TestHoodieClientMultiWriter test fixes (#4384)
- Made FileSystemBasedLockProviderTestClass thread safe and fixed the
tryLock retry logic.
- Made TestHoodieClientMultiWriter. testHoodieClientBasicMultiWriter
deterministic in verifying the HoodieWriteConflictException.
Sivabalan Narayanan [Sun, 19 Dec 2021 07:59:39 +0000 (23:59 -0800)]
[HUDI-2970] Adding tests for archival of replace commit actions (#4268)
Danny Chan [Sun, 19 Dec 2021 02:09:48 +0000 (10:09 +0800)]
[minor] fix NetworkUtils#getHostname (#4355)
Raymond Xu [Sun, 19 Dec 2021 01:58:51 +0000 (17:58 -0800)]
[HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381)
Raymond Xu [Sun, 19 Dec 2021 01:00:56 +0000 (17:00 -0800)]
[MINOR] Azure CI IT tasks clean up (#4337)
Sivabalan Narayanan [Sat, 18 Dec 2021 21:15:48 +0000 (13:15 -0800)]
[HUDI-3054] Fixing default lock configs for FileSystemBasedLock and fixing a flaky test (#4374)
Sivabalan Narayanan [Sat, 18 Dec 2021 16:52:11 +0000 (08:52 -0800)]
[HUDI-3064] Fixing a bug in TransactionManager and FileSystemTestLock (#4372)
Manoj Govindassamy [Sat, 18 Dec 2021 16:43:10 +0000 (08:43 -0800)]
[HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions (#4373)
Manoj Govindassamy [Sat, 18 Dec 2021 14:43:17 +0000 (06:43 -0800)]
[HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions (#4363)
* [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
- Transaction manager has begin and end transactions as synchronized methods.
Based on the lock provider implementaion, this can lead to deadlock
situation when the underlying lock() calls are blocking or with a long timeout.
- Fixing transaction manager begin and end transactions to not get to deadlock
and to not assume anything on the lock provider implementation.
Sivabalan Narayanan [Sat, 18 Dec 2021 02:37:45 +0000 (18:37 -0800)]
[HUDI-3043] De-coupling multi writer tests (#4362)
Manoj Govindassamy [Sat, 18 Dec 2021 01:18:46 +0000 (17:18 -0800)]
[HUDI-2962] InProcess lock provider to guard single writer process with async table operations (#4259)
- Adding Local JVM process based lock provider implementation
- This local lock provider can be used by a single writer process with async
table operations to guard the metadata tabl against concurrent updates.
xiarixiaoyao [Fri, 17 Dec 2021 13:58:02 +0000 (21:58 +0800)]
[HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType (#4253)
Danny Chan [Fri, 17 Dec 2021 05:57:53 +0000 (13:57 +0800)]
[HUDI-3037] Add back remote view storage config for flink (#4338)
xiarixiaoyao [Thu, 16 Dec 2021 20:36:01 +0000 (04:36 +0800)]
[HUDI-3001] Clean up the marker directory when finish bootstrap operation. (#4298)
zhangyue19921010 [Thu, 16 Dec 2021 19:15:08 +0000 (03:15 +0800)]
[Minor] Catch and ignore all the exceptions in quietDeleteMarkerDir (#4301)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Danny Chan [Thu, 16 Dec 2021 07:26:16 +0000 (15:26 +0800)]
[HUDI-3015] Implement #reset and #sync for metadata filesystem view (#4307)
Raymond Xu [Wed, 15 Dec 2021 23:33:33 +0000 (15:33 -0800)]
[HUDI-3028] Use blob storage to speed up CI downloads (#4331)
Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
Y Ethan Guo [Wed, 15 Dec 2021 18:44:42 +0000 (10:44 -0800)]
[HUDI-3025] Add additional wait time for namenode availability during IT tests initiatialization (#4328)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
ForwardXu [Wed, 15 Dec 2021 11:38:02 +0000 (19:38 +0800)]
[HUDI-3022] Fix NPE for isDropPartition method (#4319)
* [HUDI-3022] Fix NPE for isDropPartition method
Danny Chan [Tue, 14 Dec 2021 06:08:13 +0000 (14:08 +0800)]
[HUDI-2997] Skip the corrupt meta file for pending rollback action (#4296)
Fugle666 [Tue, 14 Dec 2021 03:31:36 +0000 (11:31 +0800)]
[HUDI-2996] Flink streaming reader 'skip_compaction' option does not work (#4304)
close apache/hudi#4304
WangMinChao [Mon, 13 Dec 2021 12:41:03 +0000 (20:41 +0800)]
[HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU… (#4294)
* [HUDI-2994] Add judgement to existed partition path in the catch code block for HUDI-2743
Co-authored-by: wangminchao <wangminchao@asinking.com>
ForwardXu [Mon, 13 Dec 2021 12:40:06 +0000 (20:40 +0800)]
[HUDI-2990] Sync to HMS when deleting partitions (#4291)
Manoj Govindassamy [Sun, 12 Dec 2021 04:42:36 +0000 (20:42 -0800)]
[HUDI-2938] Metadata table util to get latest file slices for reader/writers (#4218)
wenningd [Sun, 12 Dec 2021 04:18:39 +0000 (23:18 -0500)]
[HUDI-2946] Upgrade maven plugins to be compatible with higher Java versions (#4232)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
Danny Chan [Sat, 11 Dec 2021 08:19:10 +0000 (16:19 +0800)]
[HUDI-2984] Implement #close for AbstractTableFileSystemView (#4285)
Y Ethan Guo [Sat, 11 Dec 2021 08:16:05 +0000 (00:16 -0800)]
[HUDI-2906] Add a repair util to clean up dangling data and log files (#4278)