hudi.git
2 months ago[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632)
wangxianghu [Sat, 21 May 2022 03:12:53 +0000 (07:12 +0400)] 
[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632)

2 months ago[MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ...
Sivabalan Narayanan [Fri, 20 May 2022 23:27:35 +0000 (19:27 -0400)] 
[MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (#5646)

2 months ago[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource...
huberylee [Fri, 20 May 2022 14:25:32 +0000 (22:25 +0800)] 
[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (#5532)

2 months ago[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642)
Danny Chan [Fri, 20 May 2022 13:31:23 +0000 (21:31 +0800)] 
[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642)

2 months ago[HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector...
aliceyyan [Fri, 20 May 2022 10:10:24 +0000 (18:10 +0800)] 
[HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (#5626)

* HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

Co-authored-by: aliceyyan <aliceyyan@tencent.com>
3 months ago[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (#5617)
Danny Chan [Thu, 19 May 2022 02:59:05 +0000 (10:59 +0800)] 
[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (#5617)

No need to #sync actively because the table instance is instantiated freshly,
its view manager has empty fiew instantces, the fs view would be synced lazily when
is it requested.

3 months ago[HUDI-4116] Unify clustering/compaction related procedures' output type (#5620)
huberylee [Thu, 19 May 2022 01:48:03 +0000 (09:48 +0800)] 
[HUDI-4116] Unify clustering/compaction related procedures' output type (#5620)

* Unify clustering/compaction related procedures' output type

* Address review comments

3 months agoRevert "[HUDI-3870] Add timeout rollback for flink online compaction (#5314)" (#5622)
Danny Chan [Wed, 18 May 2022 12:30:54 +0000 (20:30 +0800)] 
Revert "[HUDI-3870] Add timeout rollback for flink online compaction (#5314)" (#5622)

This reverts commit 6f9b02decb5bb2b83709b1b6ec04a97e4d102c11.

3 months ago[HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606)
cxzl25 [Wed, 18 May 2022 11:18:52 +0000 (19:18 +0800)] 
[HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606)

3 months ago[HUDI-3942] [RFC-50] Improve Timeline Server (#5392)
Zhaojing Yu [Wed, 18 May 2022 10:43:48 +0000 (18:43 +0800)] 
[HUDI-3942] [RFC-50] Improve Timeline Server (#5392)

3 months agoClean the marker files for flink compaction (#5611)
luokey [Wed, 18 May 2022 03:21:14 +0000 (11:21 +0800)] 
Clean the marker files for flink compaction (#5611)

Co-authored-by: 854194341@qq.com <loukey_7821>
3 months ago[HUDI-4109] Copy the old record directly when it is chosen for merging (#5603)
Danny Chan [Wed, 18 May 2022 02:17:00 +0000 (10:17 +0800)] 
[HUDI-4109] Copy the old record directly when it is chosen for merging (#5603)

3 months ago[minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600)
Danny Chan [Wed, 18 May 2022 01:30:09 +0000 (09:30 +0800)] 
[minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600)

3 months ago[MINOR] Fixing spark long running yaml for non-partitioned (#5607)
Sivabalan Narayanan [Tue, 17 May 2022 13:58:18 +0000 (09:58 -0400)] 
[MINOR] Fixing spark long running yaml for non-partitioned (#5607)

3 months ago[HUDI-4110] Clean the marker files for flink compaction (#5604)
BruceLin [Tue, 17 May 2022 13:09:27 +0000 (21:09 +0800)] 
[HUDI-4110] Clean the marker files for flink compaction (#5604)

3 months ago[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (#5564)
Jin Xing [Tue, 17 May 2022 06:12:50 +0000 (14:12 +0800)] 
[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (#5564)

* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand

* Set hoodie.query.as.ro.table in serde properties

3 months ago[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion...
Danny Chan [Tue, 17 May 2022 02:34:57 +0000 (10:34 +0800)] 
[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (#5590)

3 months ago[HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when decidin...
Danny Chan [Tue, 17 May 2022 02:34:15 +0000 (10:34 +0800)] 
[HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (#5594)

3 months ago[HUDI-3654] Preparations for hudi metastore. (#5572)
Shawy Geng [Tue, 17 May 2022 01:47:10 +0000 (09:47 +0800)] 
[HUDI-3654] Preparations for hudi metastore. (#5572)

* [HUDI-3654] Preparations for hudi metastore.

Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>
3 months ago[HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table...
董可伦 [Mon, 16 May 2022 15:26:23 +0000 (23:26 +0800)] 
[HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL

3 months ago[HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (...
Danny Chan [Mon, 16 May 2022 09:40:08 +0000 (17:40 +0800)] 
[HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (#5583)

3 months ago[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480)
Yuwei XIAO [Mon, 16 May 2022 03:07:01 +0000 (11:07 +0800)] 
[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480)

 1. basic write path(insert/upsert) implementation
 2. adapt simple bucket index

3 months agofix hive sync no partition table error (#5585)
陈浩 [Mon, 16 May 2022 01:51:24 +0000 (09:51 +0800)] 
fix hive sync no partition table error (#5585)

3 months ago[HUDI-4001] Filter the properties should not be used when create table for Spark...
董可伦 [Mon, 16 May 2022 01:50:29 +0000 (09:50 +0800)] 
[HUDI-4001] Filter the properties should not be used when create table for Spark SQL (#5495)

3 months ago[HUDI-3980] Suport kerberos hbase index (#5464)
xi chaomin [Sat, 14 May 2022 11:37:31 +0000 (19:37 +0800)] 
[HUDI-3980] Suport kerberos hbase index (#5464)

- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.

Co-authored-by: xicm <xicm@asiainfo.com>
3 months ago[HUDI-4097] add table info to jobStatus (#5529)
wqwl611 [Sat, 14 May 2022 01:01:15 +0000 (09:01 +0800)] 
[HUDI-4097] add table info to jobStatus (#5529)

Co-authored-by: wqwl611 <wqwl611@gmail.com>
3 months ago[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543)
Sivabalan Narayanan [Fri, 13 May 2022 12:26:47 +0000 (08:26 -0400)] 
[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543)

3 months ago[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)
Bo Cui [Fri, 13 May 2022 11:52:55 +0000 (19:52 +0800)] 
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

3 months ago[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (#5545)
Bo Cui [Fri, 13 May 2022 06:32:48 +0000 (14:32 +0800)] 
[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (#5545)

* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files

3 months ago[MINOR] Fix a NPE for Option (#5461)
Xingcan Cui [Fri, 13 May 2022 04:20:40 +0000 (00:20 -0400)] 
[MINOR] Fix a NPE for Option (#5461)

3 months ago[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)
Bo Cui [Fri, 13 May 2022 01:50:11 +0000 (09:50 +0800)] 
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

3 months ago[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete...
Sivabalan Narayanan [Fri, 13 May 2022 01:01:55 +0000 (21:01 -0400)] 
[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (#5501)

- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.

3 months ago[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing...
YueZhang [Thu, 12 May 2022 11:26:00 +0000 (19:26 +0800)] 
[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (#5562)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
3 months ago[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreame...
Sivabalan Narayanan [Wed, 11 May 2022 20:02:54 +0000 (16:02 -0400)] 
[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (#5559)

3 months ago[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546)
Jin Xing [Wed, 11 May 2022 14:28:58 +0000 (22:28 +0800)] 
[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546)

3 months ago[HUDI-4038] Avoid calling `getDataSize` after every record written (#5497)
Alexey Kudinkin [Wed, 11 May 2022 12:08:31 +0000 (05:08 -0700)] 
[HUDI-4038] Avoid calling `getDataSize` after every record written (#5497)

- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.

Co-authored-by: sivabalan <n.siva.b@gmail.com>
3 months ago[HUDI-4003] Try to read all the log file to parse schema (#5473)
Lanyuanxiaoyao [Tue, 10 May 2022 22:45:53 +0000 (06:45 +0800)] 
[HUDI-4003] Try to read all the log file to parse schema (#5473)

3 months ago[HUDI-4044] When reading data from flink-hudi to external storage, the … (#5516)
aliceyyan [Tue, 10 May 2022 02:25:13 +0000 (10:25 +0800)] 
[HUDI-4044] When reading data from flink-hudi to external storage, the … (#5516)

Co-authored-by: aliceyyan <aliceyyan@tencent.com>
3 months ago[HUDI-3995] Making perf optimizations for bulk insert row writer path (#5462)
Sivabalan Narayanan [Mon, 9 May 2022 16:40:22 +0000 (12:40 -0400)] 
[HUDI-3995] Making perf optimizations for bulk insert row writer path (#5462)

- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.

3 months ago[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (#5526)
xicm [Mon, 9 May 2022 08:35:50 +0000 (16:35 +0800)] 
[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (#5526)

* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized

Co-authored-by: xicm <xicm@asiainfo.com>
3 months ago[MINOR] Fixing close for HoodieCatalog's test (#5531)
ForwardXu [Mon, 9 May 2022 07:17:24 +0000 (15:17 +0800)] 
[MINOR] Fixing close for HoodieCatalog's test (#5531)

* [MINOR] Fixing close for HoodieCatalog's test

3 months ago[HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530)
guanziyue [Mon, 9 May 2022 02:27:37 +0000 (10:27 +0800)] 
[HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530)

3 months ago[MINOR] fixing flaky tests in deltastreamer tests (#5521)
Sivabalan Narayanan [Sat, 7 May 2022 19:37:20 +0000 (15:37 -0400)] 
[MINOR] fixing flaky tests in deltastreamer tests (#5521)

3 months ago[MINOR] Fixing class not found when using flink and enable metadata table (#5527)
BruceLin [Sat, 7 May 2022 12:03:18 +0000 (20:03 +0800)] 
[MINOR] Fixing class not found when using flink and enable metadata table (#5527)

3 months ago[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287)
cxzl25 [Sat, 7 May 2022 07:39:14 +0000 (15:39 +0800)] 
[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287)

3 months ago[HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode...
Sivabalan Narayanan [Fri, 6 May 2022 13:27:29 +0000 (09:27 -0400)] 
[HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (#5073)

- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.

3 months ago[HUDI-4017] Improve spark sql coverage in CI (#5512)
Raymond Xu [Fri, 6 May 2022 12:52:06 +0000 (05:52 -0700)] 
[HUDI-4017] Improve spark sql coverage in CI (#5512)

Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.

3 months ago[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506)
Jin Xing [Fri, 6 May 2022 07:29:47 +0000 (15:29 +0800)] 
[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506)

3 months ago[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully...
guanziyue [Thu, 5 May 2022 20:49:34 +0000 (04:49 +0800)] 
[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264)

3 months ago[MINOR] Optimize code logic (#5499)
qianchutao [Thu, 5 May 2022 16:33:06 +0000 (00:33 +0800)] 
[MINOR] Optimize code logic (#5499)

3 months ago[HUDI-3667] Run unit tests of hudi-integ-tests in CI (#5078)
Y Ethan Guo [Thu, 5 May 2022 06:39:18 +0000 (23:39 -0700)] 
[HUDI-3667] Run unit tests of hudi-integ-tests in CI (#5078)

3 months ago[HUDI-4031] Avoid clustering update handling when no pending replacecommit (#5487)
Sagar Sumit [Wed, 4 May 2022 14:17:11 +0000 (19:47 +0530)] 
[HUDI-4031] Avoid clustering update handling when no pending replacecommit (#5487)

3 months ago[HUDI-4005] Update release scripts to help validation (#5479)
Raymond Xu [Wed, 4 May 2022 14:15:54 +0000 (07:15 -0700)] 
[HUDI-4005] Update release scripts to help validation (#5479)

3 months ago[MINOR] Update RFC status (#5486)
Sagar Sumit [Tue, 3 May 2022 15:57:18 +0000 (21:27 +0530)] 
[MINOR] Update RFC status (#5486)

3 months ago[HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)
Todd Gao [Mon, 2 May 2022 16:35:23 +0000 (00:35 +0800)] 
[HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)

* Add RFC doc

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
* Add note regarding catalog naming

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
3 months ago[MINOR] Update DOAP for release 0.11.0 (#5467)
Raymond Xu [Sat, 30 Apr 2022 17:51:16 +0000 (10:51 -0700)] 
[MINOR] Update DOAP for release 0.11.0 (#5467)

3 months ago[HUDI-3978] Fix use of partition path field as hive partition field in flink (#5434)
Wangyh [Sat, 30 Apr 2022 03:58:54 +0000 (11:58 +0800)] 
[HUDI-3978] Fix use of partition path field as hive partition field in flink (#5434)

* Fix partition path fields as hive sync partition fields error

3 months ago[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)
xicm [Fri, 29 Apr 2022 23:21:52 +0000 (07:21 +0800)] 
[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)

Co-authored-by: xicm <xicm@asiainfo.com>
3 months ago[MINOR] Fix CI by ignoring SparkContext error (#5468)
Y Ethan Guo [Fri, 29 Apr 2022 18:19:07 +0000 (11:19 -0700)] 
[MINOR] Fix CI by ignoring SparkContext error (#5468)

Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers

3 months ago[HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index...
吴祥平 [Fri, 29 Apr 2022 06:10:20 +0000 (14:10 +0800)] 
[HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index  (#5185)

* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView

3 months ago[MINOR] support different cleaning policy for flink (#5459)
Gary Li [Fri, 29 Apr 2022 01:48:44 +0000 (09:48 +0800)] 
[MINOR] support different cleaning policy for flink (#5459)

3 months ago[HUDI-3943] Some description fixes for 0.10.1 docs (#5447)
LiChuang [Thu, 28 Apr 2022 22:18:56 +0000 (06:18 +0800)] 
[HUDI-3943] Some description fixes for 0.10.1 docs (#5447)

3 months ago[HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value...
Ibson [Wed, 27 Apr 2022 23:09:44 +0000 (07:09 +0800)] 
[HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (#5368)

Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>
3 months ago[HUDI-3945] After the async compaction operation is complete, the task should exit...
watermelon12138 [Wed, 27 Apr 2022 13:16:09 +0000 (21:16 +0800)] 
[HUDI-3945] After the async compaction operation is complete, the task should exit. (#5391)

Co-authored-by: y00617041 <yangxuan42@huawei.com>
3 months agoClaim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (#5441)
huberylee [Wed, 27 Apr 2022 06:07:29 +0000 (14:07 +0800)] 
Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (#5441)

3 months ago[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedE...
Danny Chan [Wed, 27 Apr 2022 05:19:55 +0000 (13:19 +0800)] 
[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (#5432)

3 months ago[MINOR] Update alter rename command class type for pattern matching (#5381)
KnightChess [Wed, 27 Apr 2022 02:39:51 +0000 (10:39 +0800)] 
[MINOR] Update alter rename command class type for pattern matching (#5381)

3 months ago[HUDI-3478] Claim RFC 51 For CDC (#5437)
Yann Byron [Tue, 26 Apr 2022 15:26:47 +0000 (23:26 +0800)] 
[HUDI-3478] Claim RFC 51 For CDC (#5437)

3 months ago[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes...
Sivabalan Narayanan [Tue, 26 Apr 2022 03:03:10 +0000 (23:03 -0400)] 
[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (#5424)

Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

3 months ago[HUDI-3085] Improve bulk insert partitioner abstraction (#4441)
Yuwei XIAO [Mon, 25 Apr 2022 10:42:17 +0000 (18:42 +0800)] 
[HUDI-3085] Improve bulk insert partitioner abstraction (#4441)

3 months agoRevert "[HUDI-3951]support generan parameter 'sink.parallelism' for flink-hudi (...
ForwardXu [Mon, 25 Apr 2022 04:58:27 +0000 (12:58 +0800)] 
Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for flink-hudi (#5405)" (#5421)

This reverts commit bda3db078e927421c10932cfcb3019cfddb125b6.

3 months ago[HUDI-3946] Validate option path in flink hudi sink (#5397)
Ruguo Yu [Mon, 25 Apr 2022 02:13:47 +0000 (10:13 +0800)] 
[HUDI-3946] Validate option path in flink hudi sink (#5397)

3 months agosupport generan parameter 'sink.parallelism' for flink-hudi (#5405)
hehuiyuan [Sun, 24 Apr 2022 11:09:39 +0000 (19:09 +0800)] 
support generan parameter 'sink.parallelism' for flink-hudi (#5405)

Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com>
3 months ago[HUDI-3923] Fix cast exception while reading boolean type of partitioned field (...
miomiocat [Sat, 23 Apr 2022 12:12:54 +0000 (20:12 +0800)] 
[HUDI-3923] Fix cast exception while reading boolean type of partitioned field (#5373)

3 months ago[HUDI-3948] Fix presto bundle missing HBase classes (#5398)
Y Ethan Guo [Sat, 23 Apr 2022 08:33:55 +0000 (01:33 -0700)] 
[HUDI-3948] Fix presto bundle missing HBase classes (#5398)

3 months ago[HUDI-3950] add parquet-avro to gcp-bundle (#5399)
Raymond Xu [Sat, 23 Apr 2022 03:59:49 +0000 (20:59 -0700)] 
[HUDI-3950] add parquet-avro to gcp-bundle (#5399)

3 months ago[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401)
Sivabalan Narayanan [Sat, 23 Apr 2022 02:20:05 +0000 (22:20 -0400)] 
[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401)

3 months ago[DOCS] Add commit activity, twitter badgers, and Hudi logo in README (#5336)
Y Ethan Guo [Fri, 22 Apr 2022 08:51:07 +0000 (01:51 -0700)] 
[DOCS] Add commit activity, twitter badgers, and Hudi logo in README (#5336)

3 months ago[HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3...
Alexey Kudinkin [Fri, 22 Apr 2022 01:00:38 +0000 (18:00 -0700)] 
[HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3.2.0 (#5378)

- Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that.

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
3 months ago[HUDI-3936] Fix projection for a nested field as pre-combined key (#5379)
Y Ethan Guo [Fri, 22 Apr 2022 00:17:57 +0000 (17:17 -0700)] 
[HUDI-3936] Fix projection for a nested field as pre-combined key (#5379)

This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.

- Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns

3 months ago[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376)
xiarixiaoyao [Thu, 21 Apr 2022 22:27:54 +0000 (06:27 +0800)] 
[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376)

- when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.

3 months ago[HUDI-3940] Fix retry count increment in lock manager (#5387)
Sagar Sumit [Thu, 21 Apr 2022 20:52:05 +0000 (02:22 +0530)] 
[HUDI-3940] Fix retry count increment in lock manager (#5387)

3 months ago[MINOR] Increase azure CI timeout to 120m (#5384)
Raymond Xu [Thu, 21 Apr 2022 11:35:44 +0000 (04:35 -0700)] 
[MINOR] Increase azure CI timeout to 120m (#5384)

3 months ago[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from...
Alexey Kudinkin [Thu, 21 Apr 2022 08:36:19 +0000 (01:36 -0700)] 
[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377)

3 months ago[HUDI-3938] Fix default value for num retries to acquire lock (#5380)
Sivabalan Narayanan [Thu, 21 Apr 2022 08:08:43 +0000 (04:08 -0400)] 
[HUDI-3938] Fix default value for num retries to acquire lock (#5380)

3 months ago[HUDI-3204] Fixing partition-values being derived from partition-path instead of...
Alexey Kudinkin [Wed, 20 Apr 2022 11:30:27 +0000 (04:30 -0700)] 
[HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns (#5364)

 - Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row
 - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not
 - Fallback to append partition values in cases when the source columns are not persisted in data-file
 - Fixing HoodieBaseRelation incorrectly handling mandatory columns

3 months ago[HUDI-3912] Fix lose data when rollback in flink async compact (#5357)
吴祥平 [Wed, 20 Apr 2022 11:23:39 +0000 (19:23 +0800)] 
[HUDI-3912] Fix lose data when rollback in flink async compact (#5357)

* stop add event when has failed compact event

Co-authored-by: wxp <wxp4532@outlook.com>
3 months ago[HUDI-3904] Claim RFC number for Improve timeline server (#5354)
Zhaojing Yu [Wed, 20 Apr 2022 06:31:21 +0000 (14:31 +0800)] 
[HUDI-3904] Claim RFC number for Improve timeline server (#5354)

3 months ago[HUDI-3917] Flink write task hangs if last checkpoint has no data input (#5360)
Danny Chan [Wed, 20 Apr 2022 04:48:24 +0000 (12:48 +0800)] 
[HUDI-3917] Flink write task hangs if last checkpoint has no data input (#5360)

3 months ago[HUDI-3920] Fix partition path construction in metadata table validator (#5365)
Y Ethan Guo [Tue, 19 Apr 2022 23:40:09 +0000 (16:40 -0700)] 
[HUDI-3920] Fix partition path construction in metadata table validator (#5365)

3 months ago[HUDI-3905] Add S3 related setup in Kafka Connect quick start (#5356)
Y Ethan Guo [Tue, 19 Apr 2022 22:08:28 +0000 (15:08 -0700)] 
[HUDI-3905] Add S3 related setup in Kafka Connect quick start (#5356)

3 months ago[HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution...
Alexey Kudinkin [Tue, 19 Apr 2022 17:40:20 +0000 (10:40 -0700)] 
[HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution (#5352)

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
3 months ago[HUDI-3894] Fix gcp bundle to include HBase dependencies and shading (#5349)
Raymond Xu [Tue, 19 Apr 2022 04:47:10 +0000 (21:47 -0700)] 
[HUDI-3894] Fix gcp bundle to include HBase dependencies and shading (#5349)

3 months ago[HUDI-3899] Drop index to delete pending index instants from timeline if applicable...
Sagar Sumit [Tue, 19 Apr 2022 02:28:46 +0000 (07:58 +0530)] 
[HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342)

Co-authored-by: sivabalan <n.siva.b@gmail.com>
3 months ago[HUDI-3903] Fix NoClassDefFoundError with Kafka Connect bundle (#5353)
Y Ethan Guo [Tue, 19 Apr 2022 01:17:53 +0000 (18:17 -0700)] 
[HUDI-3903] Fix NoClassDefFoundError with Kafka Connect bundle (#5353)

4 months ago[HUDI-3894] Fix datahub to include HBase dependencies and shading (#5338)
Y Ethan Guo [Mon, 18 Apr 2022 23:20:50 +0000 (16:20 -0700)] 
[HUDI-3894] Fix datahub to include HBase dependencies and shading (#5338)

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
4 months ago[HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we...
Alexey Kudinkin [Mon, 18 Apr 2022 20:06:52 +0000 (13:06 -0700)] 
[HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we bucket the files efficiently (#5337)

4 months ago[HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347)
Sagar Sumit [Mon, 18 Apr 2022 17:34:04 +0000 (23:04 +0530)] 
[HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347)

4 months ago[HUDI-3886] Adding default null for some of the fields in col stats in MDT schema...
Sivabalan Narayanan [Mon, 18 Apr 2022 14:37:03 +0000 (10:37 -0400)] 
[HUDI-3886] Adding default null for some of the fields in col stats in MDT schema (#5329)

4 months agoFixing async clustering job test in TestHoodieDeltaStreamer (#5317)
Sivabalan Narayanan [Mon, 18 Apr 2022 12:08:33 +0000 (08:08 -0400)] 
Fixing async clustering job test in TestHoodieDeltaStreamer (#5317)