Sagar Sumit [Thu, 26 May 2022 05:58:49 +0000 (11:28 +0530)]
[HUDI-4023] Decouple hudi-spark from hudi-utilities-slim-bundle (#5641)
RexAn [Thu, 26 May 2022 05:09:04 +0000 (13:09 +0800)]
[HUDI-4040] Bulk insert Support CustomColumnsSortPartitioner with Row (#5502)
* Along the lines of RDDCustomColumnsSortPartitioner but for Row
Danny Chan [Thu, 26 May 2022 03:21:39 +0000 (11:21 +0800)]
[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) (#5676)
Sagar Sumit [Wed, 25 May 2022 14:08:56 +0000 (19:38 +0530)]
[HUDI-3193] Decouple hudi-aws from hudi-client-common (#5666)
Move HoodieMetricsCloudWatchConfig to hudi-client-common
冯健 [Wed, 25 May 2022 12:31:39 +0000 (20:31 +0800)]
[HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (#5682)
luoyajun [Tue, 24 May 2022 18:13:18 +0000 (02:13 +0800)]
[MINOR] Fix a potential NPE and some finer points of hudi cli (#5656)
Zhaojing Yu [Tue, 24 May 2022 16:47:28 +0000 (00:47 +0800)]
Merge pull request #3599 from yuzhaojing/HUDI-2207
[HUDI-2207] Support independent flink hudi clustering function
Sivabalan Narayanan [Tue, 24 May 2022 12:17:15 +0000 (08:17 -0400)]
[HUDI-4132] Fixing determining target table schema for delta sync with empty batch (#5648)
喻兆靖 [Sat, 21 May 2022 13:25:15 +0000 (21:25 +0800)]
[HUDI-2207] Support independent flink hudi clustering function
liujinhui [Tue, 24 May 2022 10:56:28 +0000 (18:56 +0800)]
[HUDI-4135] remove netty and netty-all (#5663)
Danny Chan [Tue, 24 May 2022 09:33:30 +0000 (17:33 +0800)]
[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (#5669)
Sivabalan Narayanan [Tue, 24 May 2022 07:33:21 +0000 (03:33 -0400)]
[HUDI-2473] Fixing compaction write operation in commit metadata (#5203)
Danny Chan [Tue, 24 May 2022 05:07:55 +0000 (13:07 +0800)]
[HUDI-4138] Fix the concurrency modification of hoodie table config for flink (#5660)
* Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected
* Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary
* Remove the modification of read code path in HoodieTableConfig
Sivabalan Narayanan [Tue, 24 May 2022 03:05:56 +0000 (23:05 -0400)]
[HUDI-4084] Add support to test async table services with integ test suite framework (#5557)
* Add support to test async table services with integ test suite framework
* Make await time for validation configurable
Heap [Mon, 23 May 2022 22:28:48 +0000 (06:28 +0800)]
[HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655)
felixYyu [Mon, 23 May 2022 22:26:36 +0000 (06:26 +0800)]
[MINOR] Removing redundant semicolons and line breaks (#5662)
Y Ethan Guo [Mon, 23 May 2022 13:48:09 +0000 (06:48 -0700)]
[HUDI-3933] Add UT cases to cover different key gen (#5638)
Sagar Sumit [Mon, 23 May 2022 12:40:07 +0000 (18:10 +0530)]
[HUDI-4142] Claim RFC-54 for new table APIs (#5665)
YuangZhang [Mon, 23 May 2022 01:57:34 +0000 (09:57 +0800)]
[HUDI-4129] Initializes a new fs view for WriteProfile#reload (#5640)
Co-authored-by: zhangyuang <zhangyuang@corp.netease.com>
Raymond Xu [Sun, 22 May 2022 07:47:51 +0000 (00:47 -0700)]
[HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (#5517)
* [HUDI-4051] Allow nested field as preCombineField in spark sql
* relax validation for primary key
uday08bce [Sat, 21 May 2022 16:22:55 +0000 (18:22 +0200)]
[HUDI-3890] fix rat plugin issue with sql files (#5644)
Jin Xing [Sat, 21 May 2022 14:41:18 +0000 (22:41 +0800)]
[HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (#5588)
YueZhang [Sat, 21 May 2022 13:16:14 +0000 (21:16 +0800)]
[HUDI-3858] Shade javax.servlet for Spark bundle jar (#5295)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Raymond Xu [Sat, 21 May 2022 12:34:08 +0000 (05:34 -0700)]
[MINOR] remove unused gson test dependency (#5652)
wangxianghu [Sat, 21 May 2022 03:12:53 +0000 (07:12 +0400)]
[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632)
Sivabalan Narayanan [Fri, 20 May 2022 23:27:35 +0000 (19:27 -0400)]
[MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (#5646)
huberylee [Fri, 20 May 2022 14:25:32 +0000 (22:25 +0800)]
[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (#5532)
Danny Chan [Fri, 20 May 2022 13:31:23 +0000 (21:31 +0800)]
[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642)
aliceyyan [Fri, 20 May 2022 10:10:24 +0000 (18:10 +0800)]
[HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (#5626)
* HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi
Co-authored-by: aliceyyan <aliceyyan@tencent.com>
Danny Chan [Thu, 19 May 2022 02:59:05 +0000 (10:59 +0800)]
[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (#5617)
No need to #sync actively because the table instance is instantiated freshly,
its view manager has empty fiew instantces, the fs view would be synced lazily when
is it requested.
huberylee [Thu, 19 May 2022 01:48:03 +0000 (09:48 +0800)]
[HUDI-4116] Unify clustering/compaction related procedures' output type (#5620)
* Unify clustering/compaction related procedures' output type
* Address review comments
Danny Chan [Wed, 18 May 2022 12:30:54 +0000 (20:30 +0800)]
Revert "[HUDI-3870] Add timeout rollback for flink online compaction (#5314)" (#5622)
This reverts commit
6f9b02decb5bb2b83709b1b6ec04a97e4d102c11.
cxzl25 [Wed, 18 May 2022 11:18:52 +0000 (19:18 +0800)]
[HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606)
Zhaojing Yu [Wed, 18 May 2022 10:43:48 +0000 (18:43 +0800)]
[HUDI-3942] [RFC-50] Improve Timeline Server (#5392)
luokey [Wed, 18 May 2022 03:21:14 +0000 (11:21 +0800)]
Clean the marker files for flink compaction (#5611)
Co-authored-by: 854194341@qq.com <loukey_7821>
Danny Chan [Wed, 18 May 2022 02:17:00 +0000 (10:17 +0800)]
[HUDI-4109] Copy the old record directly when it is chosen for merging (#5603)
Danny Chan [Wed, 18 May 2022 01:30:09 +0000 (09:30 +0800)]
[minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600)
Sivabalan Narayanan [Tue, 17 May 2022 13:58:18 +0000 (09:58 -0400)]
[MINOR] Fixing spark long running yaml for non-partitioned (#5607)
BruceLin [Tue, 17 May 2022 13:09:27 +0000 (21:09 +0800)]
[HUDI-4110] Clean the marker files for flink compaction (#5604)
Jin Xing [Tue, 17 May 2022 06:12:50 +0000 (14:12 +0800)]
[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (#5564)
* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand
* Set hoodie.query.as.ro.table in serde properties
Danny Chan [Tue, 17 May 2022 02:34:57 +0000 (10:34 +0800)]
[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (#5590)
Danny Chan [Tue, 17 May 2022 02:34:15 +0000 (10:34 +0800)]
[HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (#5594)
Shawy Geng [Tue, 17 May 2022 01:47:10 +0000 (09:47 +0800)]
[HUDI-3654] Preparations for hudi metastore. (#5572)
* [HUDI-3654] Preparations for hudi metastore.
Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>
董可伦 [Mon, 16 May 2022 15:26:23 +0000 (23:26 +0800)]
[HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL
Danny Chan [Mon, 16 May 2022 09:40:08 +0000 (17:40 +0800)]
[HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (#5583)
Yuwei XIAO [Mon, 16 May 2022 03:07:01 +0000 (11:07 +0800)]
[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480)
1. basic write path(insert/upsert) implementation
2. adapt simple bucket index
陈浩 [Mon, 16 May 2022 01:51:24 +0000 (09:51 +0800)]
fix hive sync no partition table error (#5585)
董可伦 [Mon, 16 May 2022 01:50:29 +0000 (09:50 +0800)]
[HUDI-4001] Filter the properties should not be used when create table for Spark SQL (#5495)
xi chaomin [Sat, 14 May 2022 11:37:31 +0000 (19:37 +0800)]
[HUDI-3980] Suport kerberos hbase index (#5464)
- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.
Co-authored-by: xicm <xicm@asiainfo.com>
wqwl611 [Sat, 14 May 2022 01:01:15 +0000 (09:01 +0800)]
[HUDI-4097] add table info to jobStatus (#5529)
Co-authored-by: wqwl611 <wqwl611@gmail.com>
Sivabalan Narayanan [Fri, 13 May 2022 12:26:47 +0000 (08:26 -0400)]
[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543)
Bo Cui [Fri, 13 May 2022 11:52:55 +0000 (19:52 +0800)]
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
Bo Cui [Fri, 13 May 2022 06:32:48 +0000 (14:32 +0800)]
[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (#5545)
* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files
Xingcan Cui [Fri, 13 May 2022 04:20:40 +0000 (00:20 -0400)]
[MINOR] Fix a NPE for Option (#5461)
Bo Cui [Fri, 13 May 2022 01:50:11 +0000 (09:50 +0800)]
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
Sivabalan Narayanan [Fri, 13 May 2022 01:01:55 +0000 (21:01 -0400)]
[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (#5501)
- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.
YueZhang [Thu, 12 May 2022 11:26:00 +0000 (19:26 +0800)]
[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (#5562)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Sivabalan Narayanan [Wed, 11 May 2022 20:02:54 +0000 (16:02 -0400)]
[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (#5559)
Jin Xing [Wed, 11 May 2022 14:28:58 +0000 (22:28 +0800)]
[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546)
Alexey Kudinkin [Wed, 11 May 2022 12:08:31 +0000 (05:08 -0700)]
[HUDI-4038] Avoid calling `getDataSize` after every record written (#5497)
- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.
Co-authored-by: sivabalan <n.siva.b@gmail.com>
Lanyuanxiaoyao [Tue, 10 May 2022 22:45:53 +0000 (06:45 +0800)]
[HUDI-4003] Try to read all the log file to parse schema (#5473)
aliceyyan [Tue, 10 May 2022 02:25:13 +0000 (10:25 +0800)]
[HUDI-4044] When reading data from flink-hudi to external storage, the … (#5516)
Co-authored-by: aliceyyan <aliceyyan@tencent.com>
Sivabalan Narayanan [Mon, 9 May 2022 16:40:22 +0000 (12:40 -0400)]
[HUDI-3995] Making perf optimizations for bulk insert row writer path (#5462)
- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.
xicm [Mon, 9 May 2022 08:35:50 +0000 (16:35 +0800)]
[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (#5526)
* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized
Co-authored-by: xicm <xicm@asiainfo.com>
ForwardXu [Mon, 9 May 2022 07:17:24 +0000 (15:17 +0800)]
[MINOR] Fixing close for HoodieCatalog's test (#5531)
* [MINOR] Fixing close for HoodieCatalog's test
guanziyue [Mon, 9 May 2022 02:27:37 +0000 (10:27 +0800)]
[HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530)
Sivabalan Narayanan [Sat, 7 May 2022 19:37:20 +0000 (15:37 -0400)]
[MINOR] fixing flaky tests in deltastreamer tests (#5521)
BruceLin [Sat, 7 May 2022 12:03:18 +0000 (20:03 +0800)]
[MINOR] Fixing class not found when using flink and enable metadata table (#5527)
cxzl25 [Sat, 7 May 2022 07:39:14 +0000 (15:39 +0800)]
[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287)
Sivabalan Narayanan [Fri, 6 May 2022 13:27:29 +0000 (09:27 -0400)]
[HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (#5073)
- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.
Raymond Xu [Fri, 6 May 2022 12:52:06 +0000 (05:52 -0700)]
[HUDI-4017] Improve spark sql coverage in CI (#5512)
Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.
Jin Xing [Fri, 6 May 2022 07:29:47 +0000 (15:29 +0800)]
[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506)
guanziyue [Thu, 5 May 2022 20:49:34 +0000 (04:49 +0800)]
[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264)
qianchutao [Thu, 5 May 2022 16:33:06 +0000 (00:33 +0800)]
[MINOR] Optimize code logic (#5499)
Y Ethan Guo [Thu, 5 May 2022 06:39:18 +0000 (23:39 -0700)]
[HUDI-3667] Run unit tests of hudi-integ-tests in CI (#5078)
Sagar Sumit [Wed, 4 May 2022 14:17:11 +0000 (19:47 +0530)]
[HUDI-4031] Avoid clustering update handling when no pending replacecommit (#5487)
Raymond Xu [Wed, 4 May 2022 14:15:54 +0000 (07:15 -0700)]
[HUDI-4005] Update release scripts to help validation (#5479)
Sagar Sumit [Tue, 3 May 2022 15:57:18 +0000 (21:27 +0530)]
[MINOR] Update RFC status (#5486)
Todd Gao [Mon, 2 May 2022 16:35:23 +0000 (00:35 +0800)]
[HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)
* Add RFC doc
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
* Add note regarding catalog naming
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Raymond Xu [Sat, 30 Apr 2022 17:51:16 +0000 (10:51 -0700)]
[MINOR] Update DOAP for release 0.11.0 (#5467)
Wangyh [Sat, 30 Apr 2022 03:58:54 +0000 (11:58 +0800)]
[HUDI-3978] Fix use of partition path field as hive partition field in flink (#5434)
* Fix partition path fields as hive sync partition fields error
xicm [Fri, 29 Apr 2022 23:21:52 +0000 (07:21 +0800)]
[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)
Co-authored-by: xicm <xicm@asiainfo.com>
Y Ethan Guo [Fri, 29 Apr 2022 18:19:07 +0000 (11:19 -0700)]
[MINOR] Fix CI by ignoring SparkContext error (#5468)
Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers
吴祥平 [Fri, 29 Apr 2022 06:10:20 +0000 (14:10 +0800)]
[HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (#5185)
* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView
Gary Li [Fri, 29 Apr 2022 01:48:44 +0000 (09:48 +0800)]
[MINOR] support different cleaning policy for flink (#5459)
LiChuang [Thu, 28 Apr 2022 22:18:56 +0000 (06:18 +0800)]
[HUDI-3943] Some description fixes for 0.10.1 docs (#5447)
Ibson [Wed, 27 Apr 2022 23:09:44 +0000 (07:09 +0800)]
[HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (#5368)
Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>
watermelon12138 [Wed, 27 Apr 2022 13:16:09 +0000 (21:16 +0800)]
[HUDI-3945] After the async compaction operation is complete, the task should exit. (#5391)
Co-authored-by: y00617041 <yangxuan42@huawei.com>
huberylee [Wed, 27 Apr 2022 06:07:29 +0000 (14:07 +0800)]
Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (#5441)
Danny Chan [Wed, 27 Apr 2022 05:19:55 +0000 (13:19 +0800)]
[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (#5432)
KnightChess [Wed, 27 Apr 2022 02:39:51 +0000 (10:39 +0800)]
[MINOR] Update alter rename command class type for pattern matching (#5381)
Yann Byron [Tue, 26 Apr 2022 15:26:47 +0000 (23:26 +0800)]
[HUDI-3478] Claim RFC 51 For CDC (#5437)
Sivabalan Narayanan [Tue, 26 Apr 2022 03:03:10 +0000 (23:03 -0400)]
[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (#5424)
Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.
Yuwei XIAO [Mon, 25 Apr 2022 10:42:17 +0000 (18:42 +0800)]
[HUDI-3085] Improve bulk insert partitioner abstraction (#4441)
ForwardXu [Mon, 25 Apr 2022 04:58:27 +0000 (12:58 +0800)]
Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for flink-hudi (#5405)" (#5421)
This reverts commit
bda3db078e927421c10932cfcb3019cfddb125b6.
Ruguo Yu [Mon, 25 Apr 2022 02:13:47 +0000 (10:13 +0800)]
[HUDI-3946] Validate option path in flink hudi sink (#5397)
hehuiyuan [Sun, 24 Apr 2022 11:09:39 +0000 (19:09 +0800)]
support generan parameter 'sink.parallelism' for flink-hudi (#5405)
Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com>
miomiocat [Sat, 23 Apr 2022 12:12:54 +0000 (20:12 +0800)]
[HUDI-3923] Fix cast exception while reading boolean type of partitioned field (#5373)
Y Ethan Guo [Sat, 23 Apr 2022 08:33:55 +0000 (01:33 -0700)]
[HUDI-3948] Fix presto bundle missing HBase classes (#5398)
Raymond Xu [Sat, 23 Apr 2022 03:59:49 +0000 (20:59 -0700)]
[HUDI-3950] add parquet-avro to gcp-bundle (#5399)