hudi.git
6 months ago[HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint (#5173)
YueZhang [Thu, 31 Mar 2022 03:26:37 +0000 (11:26 +0800)] 
[HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint (#5173)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
6 months ago[HUDI-3750] Fix NPE when build HoodieFileIndex (#5134)
KnightChess [Thu, 31 Mar 2022 02:19:05 +0000 (10:19 +0800)] 
[HUDI-3750] Fix NPE when build HoodieFileIndex (#5134)

Co-authored-by: wulingqi <wulingqi@baijiahulian.com>
6 months ago[MINOR] Fixing flakiness in TestHoodieSparkMergeOnReadTableRollback.testRollbackWithD...
Sivabalan Narayanan [Thu, 31 Mar 2022 02:07:22 +0000 (19:07 -0700)] 
[MINOR] Fixing flakiness in TestHoodieSparkMergeOnReadTableRollback.testRollbackWithDeltaAndCompactionCommit (#5183)

6 months ago[HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datasource modules...
Y Ethan Guo [Thu, 31 Mar 2022 01:08:35 +0000 (18:08 -0700)] 
[HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datasource modules (#5176)

6 months ago[HUDI-3681] Provision additional hudi-spark-bundle with different versions (#5171)
Y Ethan Guo [Thu, 31 Mar 2022 00:35:56 +0000 (17:35 -0700)] 
[HUDI-3681] Provision additional hudi-spark-bundle with different versions (#5171)

6 months ago[HUDI-3355] Issue with out of order commits in the timeline when ingestion writers...
xiarixiaoyao [Wed, 30 Mar 2022 22:54:25 +0000 (06:54 +0800)] 
[HUDI-3355] Issue with out of order commits in the timeline when ingestion writers using SparkAllowUpdateStrategy (#4962)

6 months ago[HUDI-3736] Fix null pointer when key not specified (#5167)
Nicolas Paris [Wed, 30 Mar 2022 22:11:26 +0000 (00:11 +0200)] 
[HUDI-3736] Fix null pointer when key not specified (#5167)

6 months ago[HUDI-3536] Add hudi-datahub-sync implementation (#5155)
Raymond Xu [Wed, 30 Mar 2022 21:38:02 +0000 (14:38 -0700)] 
[HUDI-3536] Add hudi-datahub-sync implementation (#5155)

6 months ago[MINOR] Repeated execution of update status (#5089)
Bo Cui [Wed, 30 Mar 2022 21:30:06 +0000 (05:30 +0800)] 
[MINOR] Repeated execution of update status (#5089)

6 months ago[HUDI-3635] Fix HoodieMetadataTableValidator around comparison of partition path...
YueZhang [Wed, 30 Mar 2022 21:23:37 +0000 (05:23 +0800)] 
[HUDI-3635] Fix HoodieMetadataTableValidator around comparison of partition path listing (#5100)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
6 months ago[HUDI-3647] HoodieMetadataTableValidator: check MDT was initialized at first (#5152)
YueZhang [Wed, 30 Mar 2022 21:18:08 +0000 (05:18 +0800)] 
[HUDI-3647] HoodieMetadataTableValidator: check MDT was initialized at first (#5152)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
6 months ago[HUDI-3653] Cleaning up bespoke Column Stats Index implementation (#5062)
Alexey Kudinkin [Wed, 30 Mar 2022 17:01:43 +0000 (10:01 -0700)] 
[HUDI-3653] Cleaning up bespoke Column Stats Index implementation (#5062)

6 months ago[MINOR] Fix dates as per UTC in TestDataSkippingUtils (#5166)
Sagar Sumit [Wed, 30 Mar 2022 14:33:14 +0000 (20:03 +0530)] 
[MINOR] Fix dates as per UTC in TestDataSkippingUtils (#5166)

* Fix timezone in test

6 months ago[minor] Follow 3178, fix the flink metadata table compaction (#5175)
Danny Chan [Wed, 30 Mar 2022 12:45:29 +0000 (20:45 +0800)] 
[minor] Follow 3178, fix the flink metadata table compaction (#5175)

6 months ago[HUDI-3745] Support for spark datasource options in S3EventsHoodieIncrSource (#5170)
harshal [Wed, 30 Mar 2022 05:34:49 +0000 (11:04 +0530)] 
[HUDI-3745] Support for spark datasource options in S3EventsHoodieIncrSource (#5170)

6 months ago[HUDI-3485] Adding scheduler pool configs for async clustering (#5043)
Sivabalan Narayanan [Wed, 30 Mar 2022 01:27:45 +0000 (18:27 -0700)] 
[HUDI-3485] Adding scheduler pool configs for async clustering (#5043)

6 months ago[HUDI-3741] Fix flink bucket index bulk insert generates too many small files (#5164)
Danny Chan [Wed, 30 Mar 2022 00:18:36 +0000 (08:18 +0800)] 
[HUDI-3741] Fix flink bucket index bulk insert generates too many small files (#5164)

6 months ago[HUDI-2520] Fix CTAS statment issue when sync to hive (#5145)
ForwardXu [Tue, 29 Mar 2022 19:25:31 +0000 (03:25 +0800)] 
[HUDI-2520] Fix CTAS statment issue when sync to hive (#5145)

6 months ago[HUDI-3549] Removing dependency on "spark-avro" (#4955)
Alexey Kudinkin [Tue, 29 Mar 2022 18:44:47 +0000 (11:44 -0700)] 
[HUDI-3549] Removing dependency on "spark-avro"  (#4955)

Hudi will be taking on promise for it bundles to stay compatible with Spark minor versions (for ex 2.4, 3.1, 3.2): meaning that single build of Hudi (for ex "hudi-spark3.2-bundle") will be compatible with ALL patch versions in that minor branch (in that case 3.2.1, 3.2.0, etc)

To achieve that we'll have to remove (and ban) "spark-avro" as a dependency, which on a few occasions was the root-cause of incompatibility b/w consecutive Spark patch versions (most recently 3.2.1 and 3.2.0, due to this PR).

Instead of bundling "spark-avro" as dependency, we will be copying over some of the classes Hudi depends on and maintain them along the Hudi code-base to make sure we're able to provide for the aforementioned guarantee. To workaround arising compatibility issues we will be applying local patches to guarantee compatibility of Hudi bundles w/in the Spark minor version branches.

Following Hudi modules to Spark minor branches is currently maintained:

"hudi-spark3" -> 3.2.x
"hudi-spark3.1.x" -> 3.1.x
"hudi-spark2" -> 2.4.x
Following classes hierarchies (borrowed from "spark-avro") are maintained w/in these Spark-specific modules to guarantee compatibility with respective minor version branches:

AvroSerializer
AvroDeserializer
AvroUtils
Each of these classes has been correspondingly copied from Spark 3.2.1 (for 3.2.x branch), 3.1.2 (for 3.1.x branch), 2.4.4 (for 2.4.x branch) into their respective modules.

SchemaConverters class in turn is shared across all those modules given its relative stability (there're only cosmetical changes from 2.4.4 to 3.2.1).
All of the aforementioned classes have their corresponding scope of visibility limited to corresponding packages (org.apache.spark.sql.avro, org.apache.spark.sql) to make sure broader code-base does not become dependent on them and instead relies on facades abstracting them.

Additionally, given that Hudi plans on supporting all the patch versions of Spark w/in aforementioned minor versions branches of Spark, additional build steps were added to validate that Hudi could be properly compiled against those versions. Testing, however, is performed against the most recent patch versions of Spark with the help of Azure CI.

Brief change log:
- Removing spark-avro bundling from Hudi by default
- Scaffolded Spark 3.2.x hierarchy
- Bootstrapped Spark 3.1.x Avro serializer/deserializer hierarchy
- Bootstrapped Spark 2.4.x Avro serializer/deserializer hierarchy
- Moved ExpressionCodeGen,ExpressionPayload into hudi-spark module
- Fixed AvroDeserializer to stay compatible w/ both Spark 3.2.1 and 3.2.0
- Modified bot.yml to build full matrix of support Spark versions
- Removed "spark-avro" dependency from all modules
- Fixed relocation of spark-avro classes in bundles to assist in running integ-tests.

6 months ago[HUDI-2520] Fix drop partition issue when sync to hive (#5147)
ForwardXu [Tue, 29 Mar 2022 18:28:19 +0000 (02:28 +0800)] 
[HUDI-2520] Fix drop partition issue when sync to hive (#5147)

6 months ago[HUDI-3731] Fixing Column Stats Index record Merging sequence missing `columnName...
Alexey Kudinkin [Tue, 29 Mar 2022 15:39:56 +0000 (08:39 -0700)] 
[HUDI-3731] Fixing Column Stats Index record Merging sequence missing `columnName` (#5159)

* Added `DataSkippingFailureMode` to control how DS handles failures in the flow (either "strict", when exception would be thrown, or "fallback" when it will just fallback to the full-scan)

* Make sure tests execute in `DataSkippingFailureMode.Strict`

* Fixed Column Stats Index record merging sequence missing `columnName`

6 months ago[MINOR] Move Experiemental to javadoc (#5161)
Raymond Xu [Tue, 29 Mar 2022 04:07:59 +0000 (21:07 -0700)] 
[MINOR] Move Experiemental to javadoc (#5161)

6 months ago[HUDI-3736] Fix default dynamodblock url default value (#4967)
Nicolas Paris [Tue, 29 Mar 2022 03:31:46 +0000 (05:31 +0200)] 
[HUDI-3736] Fix default dynamodblock url default value (#4967)

6 months ago[HUDI-2520] Fix drop table issue when sync to Hive (#5143)
leesf [Tue, 29 Mar 2022 02:34:12 +0000 (10:34 +0800)] 
[HUDI-2520] Fix drop table issue when sync to Hive (#5143)

6 months ago[HUDI-3728] Set the sort operator parallelism for flink bucket bulk insert (#5154)
Danny Chan [Tue, 29 Mar 2022 01:52:35 +0000 (09:52 +0800)] 
[HUDI-3728] Set the sort operator parallelism for flink bucket bulk insert (#5154)

6 months ago[HUDI-3722] Fix truncate hudi table's error (#5140)
ForwardXu [Tue, 29 Mar 2022 01:44:18 +0000 (09:44 +0800)] 
[HUDI-3722] Fix truncate hudi table's error (#5140)

6 months ago[HUDI-2566] Adding multi-writer test support to integ test (#5065)
Sivabalan Narayanan [Mon, 28 Mar 2022 21:05:00 +0000 (14:05 -0700)] 
[HUDI-2566] Adding multi-writer test support to integ test (#5065)

6 months ago[HUDI-2757] Implement Hudi AWS Glue sync (#5076)
Raymond Xu [Mon, 28 Mar 2022 18:54:59 +0000 (11:54 -0700)] 
[HUDI-2757] Implement Hudi AWS Glue sync (#5076)

6 months ago[HUDI-3720] Fix the logic of reattempting pending rollback (#5148)
Y Ethan Guo [Mon, 28 Mar 2022 18:54:31 +0000 (11:54 -0700)] 
[HUDI-3720] Fix the logic of reattempting pending rollback (#5148)

6 months ago[HUDI-3539] Flink bucket index bucketID bootstrap optimization. (#5093)
Shawy Geng [Mon, 28 Mar 2022 11:50:36 +0000 (19:50 +0800)] 
[HUDI-3539] Flink bucket index bucketID bootstrap optimization. (#5093)

* [HUDI-3539] Flink bucket index bucketID bootstrap optimization.

Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>
6 months ago[HUDI-3538] Support Compaction Command Based on Call Procedure Command for Spark...
huberylee [Mon, 28 Mar 2022 06:11:35 +0000 (14:11 +0800)] 
[HUDI-3538] Support Compaction Command Based on Call Procedure Command for Spark SQL (#4945)

* Support Compaction Command Based on Call Procedure Command for Spark SQL

* Addressed review comments

6 months ago[MINOR] Fix call command parser use spark3.2 (#5144)
ForwardXu [Mon, 28 Mar 2022 03:13:44 +0000 (11:13 +0800)] 
[MINOR] Fix call command parser use spark3.2 (#5144)

6 months ago[HUDI-3724] Fixing closure of ParquetReader (#5141)
Sivabalan Narayanan [Mon, 28 Mar 2022 01:36:15 +0000 (18:36 -0700)] 
[HUDI-3724] Fixing closure of ParquetReader (#5141)

6 months ago[HUDI-3719] High performance costs of AvroSerizlizer in DataSource wr… (#5137)
xiarixiaoyao [Sun, 27 Mar 2022 18:01:43 +0000 (02:01 +0800)] 
[HUDI-3719] High performance costs of AvroSerizlizer in DataSource wr… (#5137)

* [HUDI-3719] High performance costs of AvroSerizlizer in DataSource writing

* add benchmark framework which modify from spark
add avroSerDerBenchmark

6 months ago[MINOR] Relaxing cleaner and archival configs (#5142)
Sivabalan Narayanan [Sun, 27 Mar 2022 16:26:24 +0000 (09:26 -0700)] 
[MINOR] Relaxing cleaner and archival configs (#5142)

6 months ago[HUDI-3604] Adjust the order of timeline changes in rollbacks (#5114)
Y Ethan Guo [Sun, 27 Mar 2022 05:37:44 +0000 (22:37 -0700)] 
[HUDI-3604] Adjust the order of timeline changes in rollbacks (#5114)

6 months ago[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BUCKET index ...
Danny Chan [Sun, 27 Mar 2022 01:13:58 +0000 (09:13 +0800)] 
[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BUCKET index (#5135)

6 months ago[HUDI-3709] Fixing `ParquetWriter` impls not respecting Parquet Max File Size limit...
Alexey Kudinkin [Sat, 26 Mar 2022 21:51:36 +0000 (14:51 -0700)] 
[HUDI-3709] Fixing `ParquetWriter` impls not respecting Parquet Max File Size limit (#5129)

6 months ago[HUDI-3612] Clustering strategy should create new TypedProperties when modifying...
RexAn [Sat, 26 Mar 2022 10:46:03 +0000 (18:46 +0800)] 
[HUDI-3612] Clustering strategy should create new TypedProperties when modifying it (#5027)

6 months ago[HUDI-3435] Do not throw exception when instant to rollback does not exist in metadat...
Danny Chan [Sat, 26 Mar 2022 03:42:54 +0000 (11:42 +0800)] 
[HUDI-3435] Do not throw exception when instant to rollback does not exist in metadata table active timeline (#4821)

6 months ago[HUDI-3396] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected...
Alexey Kudinkin [Fri, 25 Mar 2022 16:32:03 +0000 (09:32 -0700)] 
[HUDI-3396] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns (#4888)

6 months ago[MINOR] fix QuickstartUtils move (#5133)
ForwardXu [Fri, 25 Mar 2022 14:34:35 +0000 (22:34 +0800)] 
[MINOR] fix QuickstartUtils move (#5133)

6 months ago[HUDI-3563] Make quickstart examples covered by CI tests (#5082)
ForwardXu [Fri, 25 Mar 2022 08:37:17 +0000 (16:37 +0800)] 
[HUDI-3563] Make quickstart examples covered by CI tests (#5082)

6 months ago[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PRECOMBINE_FIELD_T...
wangxianghu [Fri, 25 Mar 2022 07:02:54 +0000 (11:02 +0400)] 
[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PRECOMBINE_FIELD_TYPE_PROP (#5096)

6 months ago[HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping...
Alexey Kudinkin [Fri, 25 Mar 2022 05:27:15 +0000 (22:27 -0700)] 
[HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping flow (#4996)

6 months ago[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadata' is true ...
Danny Chan [Fri, 25 Mar 2022 03:48:50 +0000 (11:48 +0800)] 
[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadata' is true (#5088)

6 months ago[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC (#5128)
Surya Prasanna [Fri, 25 Mar 2022 03:26:04 +0000 (20:26 -0700)] 
[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC (#5128)

6 months ago[HUDI-3703] Reset taskID in restoreWriteMetadata (#5122)
Zhaojing Yu [Fri, 25 Mar 2022 02:18:28 +0000 (10:18 +0800)] 
[HUDI-3703] Reset taskID in restoreWriteMetadata (#5122)

6 months ago[HUDI-1180] Upgrade HBase to 2.4.9 (#5004)
Y Ethan Guo [Fri, 25 Mar 2022 02:04:53 +0000 (19:04 -0700)] 
[HUDI-1180] Upgrade HBase to 2.4.9 (#5004)

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
6 months ago[HUDI-3701] Flink bulk_insert support bucket hash index (#5118)
Danny Chan [Fri, 25 Mar 2022 01:01:42 +0000 (09:01 +0800)] 
[HUDI-3701] Flink bulk_insert support bucket hash index (#5118)

6 months ago[HUDI-3638] Make ZookeeperBasedLockProvider serializable (#5112)
Y Ethan Guo [Fri, 25 Mar 2022 00:59:47 +0000 (17:59 -0700)] 
[HUDI-3638] Make ZookeeperBasedLockProvider serializable (#5112)

6 months ago[HUDI-3624] Check all instants before starting a commit in metadata table (#5098)
Y Ethan Guo [Fri, 25 Mar 2022 00:13:58 +0000 (17:13 -0700)] 
[HUDI-3624] Check all instants before starting a commit in metadata table (#5098)

6 months ago[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer (#5127)
Y Ethan Guo [Thu, 24 Mar 2022 23:42:44 +0000 (16:42 -0700)] 
[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer (#5127)

6 months ago[HUDI-3689] Fix delta streamer tests (#5124)
Raymond Xu [Thu, 24 Mar 2022 21:19:53 +0000 (14:19 -0700)] 
[HUDI-3689] Fix delta streamer tests (#5124)

6 months ago[HUDI-3706] Downgrade maven surefire and failsafe version (#5123)
Y Ethan Guo [Thu, 24 Mar 2022 16:31:46 +0000 (09:31 -0700)] 
[HUDI-3706] Downgrade maven surefire and failsafe version (#5123)

6 months ago[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120)
Raymond Xu [Thu, 24 Mar 2022 16:10:33 +0000 (09:10 -0700)] 
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120)

6 months ago[HUDI-3689] Remove Azure CI cache (#5121)
Raymond Xu [Thu, 24 Mar 2022 12:39:11 +0000 (05:39 -0700)] 
[HUDI-3689] Remove Azure CI cache (#5121)

6 months ago[HUDI-3684] Fixing NPE in `ParquetUtils` (#5102)
Alexey Kudinkin [Thu, 24 Mar 2022 12:07:38 +0000 (05:07 -0700)] 
[HUDI-3684] Fixing NPE in `ParquetUtils` (#5102)

* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`

6 months ago[HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)
Sagar Sumit [Thu, 24 Mar 2022 10:18:35 +0000 (15:48 +0530)] 
[HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)

* Remove glob pattern basePath from the deltastreamer tests.

* [HUDI-3689] Fix file scheme config

for CI failure in TestHoodieRealTimeRecordReader

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
6 months ago[minor] Checks the data block type for archived timeline (#5106)
Danny Chan [Thu, 24 Mar 2022 06:10:43 +0000 (14:10 +0800)] 
[minor] Checks the data block type for archived timeline (#5106)

6 months agoFixing non partitioned all files record in MDT (#5108)
Sivabalan Narayanan [Thu, 24 Mar 2022 02:26:39 +0000 (19:26 -0700)] 
Fixing non partitioned all files record in MDT (#5108)

6 months ago[HUDI-3642] Handle NPE due to empty requested replacecommit metadata (#5090)
Sagar Sumit [Wed, 23 Mar 2022 19:13:02 +0000 (00:43 +0530)] 
[HUDI-3642] Handle NPE due to empty requested replacecommit metadata (#5090)

6 months ago[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize config...
Rajesh Mahindra [Tue, 22 Mar 2022 02:56:31 +0000 (19:56 -0700)] 
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175)

- Refactor hive sync tool / config to use reflection and standardize configs

Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
6 months ago[HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine...
Y Ethan Guo [Tue, 22 Mar 2022 00:35:06 +0000 (17:35 -0700)] 
[HUDI-3640] Set SimpleKeyGenerator as default in 2to3 table upgrade for Spark engine (#5075)

6 months ago[HUDI-1436]: Provide an option to trigger clean every nth commit (#4385)
Pratyaksh Sharma [Tue, 22 Mar 2022 00:06:30 +0000 (05:36 +0530)] 
[HUDI-1436]: Provide an option to trigger clean every nth commit (#4385)

- Provided option to trigger clean every nth commit with default number of commits as 1 so that existing users are not affected.
Co-authored-by: sivabalan <n.siva.b@gmail.com>
6 months ago[HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
wxp4532 [Fri, 11 Mar 2022 06:07:52 +0000 (14:07 +0800)] 
[HUDI-3559] Flink bucket index with COW table throws NoSuchElementException

Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.

close apache/hudi#5018

6 months ago[MINOR] Fixing sparkUpdateNode for record generation (#5079)
Sivabalan Narayanan [Mon, 21 Mar 2022 04:56:30 +0000 (21:56 -0700)] 
[MINOR] Fixing sparkUpdateNode for record generation (#5079)

6 months ago[HUDI-3665] Support flink multiple versions (#5072)
Danny Chan [Mon, 21 Mar 2022 02:34:50 +0000 (10:34 +0800)] 
[HUDI-3665] Support flink multiple versions (#5072)

6 months ago[MINOR] Remove flaky assert in TestInLineFileSystem (#5069)
Y Ethan Guo [Sun, 20 Mar 2022 22:58:30 +0000 (15:58 -0700)] 
[MINOR] Remove flaky assert in TestInLineFileSystem (#5069)

6 months ago[HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit...
Alexey Kudinkin [Sun, 20 Mar 2022 04:54:13 +0000 (21:54 -0700)] 
[HUDI-3663] Fixing Column Stats index to properly handle first Data Table commit (#5070)

* Fixed metadata conversion util to extract schema from `HoodieCommitMetadata`

* Fixed failure to fetch columns to index in empty table

* Abort indexing seq in case there are no columns to index

* Fallback to index at least primary key columns, in case no writer schema could be obtained to index all columns

* Fixed `getRecordFields` incorrectly ignoring default value

* Make sure Hudi metadata fields are also indexed

6 months ago[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877)
Alexey Kudinkin [Sat, 19 Mar 2022 05:32:16 +0000 (22:32 -0700)] 
[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877)

Refactoring Spark DataSource Relations to avoid code duplication.

Following Relations were in scope:

- BaseFileOnlyViewRelation
- MergeOnReadSnapshotRelaation
- MergeOnReadIncrementalRelation

6 months ago[HUDI-3659] Reducing the validation frequency with integ tests (#5067)
Sivabalan Narayanan [Fri, 18 Mar 2022 16:45:33 +0000 (09:45 -0700)] 
[HUDI-3659] Reducing the validation frequency with integ tests (#5067)

6 months ago[HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests...
Sivabalan Narayanan [Fri, 18 Mar 2022 16:44:56 +0000 (09:44 -0700)] 
[HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests (#5063)

6 months ago[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consist...
JerryYue-M [Fri, 18 Mar 2022 02:47:29 +0000 (10:47 +0800)] 
[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator (#5049)

for chaining purpose

Co-authored-by: jerryyue <jerryyue@didiglobal.com>
6 months ago[MINOR] HoodieFileScanRDD could print null path (#5056)
RexAn [Thu, 17 Mar 2022 19:53:45 +0000 (03:53 +0800)] 
[MINOR] HoodieFileScanRDD could print null path (#5056)

Co-authored-by: Rex An <bonean131@gmail.com>
6 months ago[HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors...
Raymond Xu [Thu, 17 Mar 2022 11:17:56 +0000 (19:17 +0800)] 
[HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors (#4856)

- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types

6 months ago[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap...
冯健 [Thu, 17 Mar 2022 08:50:28 +0000 (16:50 +0800)] 
[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap (#5028)

- Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap

6 months ago[HUDI-3494] Consider triggering condition of MOR compaction during archival (#4974)
Y Ethan Guo [Thu, 17 Mar 2022 05:28:11 +0000 (22:28 -0700)] 
[HUDI-3494] Consider triggering condition of MOR compaction during archival (#4974)

6 months ago[HUDI-3404] Automatically adjust write configs based on metadata table and write...
Y Ethan Guo [Thu, 17 Mar 2022 05:25:04 +0000 (22:25 -0700)] 
[HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode (#4975)

6 months ago[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidat...
YueZhang [Thu, 17 Mar 2022 01:31:00 +0000 (09:31 +0800)] 
[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator (#4994)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
6 months ago[HUDI-3607] Support backend switch in HoodieFlinkStreamer (#5032)
that's cool [Wed, 16 Mar 2022 06:07:31 +0000 (14:07 +0800)] 
[HUDI-3607] Support backend switch in HoodieFlinkStreamer (#5032)

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. checkstyle fix

* [HUDI-3607] Support backend switch in HoodieFlinkStreamer
1. change the msg

6 months ago[HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image (#4997)
Y Ethan Guo [Wed, 16 Mar 2022 01:49:30 +0000 (18:49 -0700)] 
[HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image (#4997)

6 months ago[HUDI-3589] flink sync hive metadata supports table properties and serde properties...
todd5167 [Tue, 15 Mar 2022 19:56:37 +0000 (03:56 +0800)] 
[HUDI-3589] flink sync hive metadata supports table properties and serde properties (#4995)

6 months ago[HUDI-3633] Allow non-string values to be set in TypedProperties (#5045)
Sagar Sumit [Tue, 15 Mar 2022 18:33:22 +0000 (00:03 +0530)] 
[HUDI-3633] Allow non-string values to be set in TypedProperties (#5045)

* [HUDI-3633] Allow non-string values to be set in TypedProperties

* Override getProperty to ignore instanceof string check

6 months ago[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948)
Alexey Kudinkin [Tue, 15 Mar 2022 17:38:36 +0000 (10:38 -0700)] 
[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948)

6 months ago[HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#5033)
l-shen [Tue, 15 Mar 2022 12:34:31 +0000 (20:34 +0800)] 
[HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#5033)

Co-authored-by: root <l-shen@localhost.localdomain>
6 months ago[HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom (#5017)
Thinking Chen [Tue, 15 Mar 2022 11:06:50 +0000 (19:06 +0800)] 
[HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom (#5017)

6 months ago[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json...
wangxianghu [Tue, 15 Mar 2022 11:06:30 +0000 (15:06 +0400)] 
[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string (#4987)

* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

* add ut

* Address comment

6 months ago[HUDI-3620] Adding spark3.2.0 profile (#5038)
Sivabalan Narayanan [Mon, 14 Mar 2022 23:14:00 +0000 (16:14 -0700)] 
[HUDI-3620] Adding spark3.2.0 profile (#5038)

6 months ago[HUDI-3623] Removing hive sync node from non hive yamls (#5040)
Sivabalan Narayanan [Mon, 14 Mar 2022 22:39:26 +0000 (15:39 -0700)] 
[HUDI-3623] Removing hive sync node from non hive yamls (#5040)

6 months ago[HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039)
Sivabalan Narayanan [Mon, 14 Mar 2022 22:34:17 +0000 (15:34 -0700)] 
[HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039)

6 months ago[MINODR] Remove repeated kafka-clients dependencies (#5034)
wangxianghu [Mon, 14 Mar 2022 14:24:06 +0000 (18:24 +0400)] 
[MINODR] Remove repeated kafka-clients dependencies (#5034)

6 months agofix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline...
peanut-chenzhong [Mon, 14 Mar 2022 08:40:38 +0000 (16:40 +0800)] 
fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits (#4976)

* Update CompactionHoodiePathCommand.scala

fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits

* Update CompactionHoodiePathCommand.scala

fix IndexOutOfBoundsException when there`s no schedule for compaction

* Update CompactionHoodiePathCommand.scala

fix CI issue

6 months ago[HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for...
Danny Chan [Mon, 14 Mar 2022 06:22:07 +0000 (14:22 +0800)] 
[HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink (#5010)

6 months ago[HUDI-3613] Adding/fixing yamls for metadata (#5029)
Sivabalan Narayanan [Mon, 14 Mar 2022 01:11:37 +0000 (18:11 -0700)] 
[HUDI-3613] Adding/fixing yamls for metadata (#5029)

6 months ago[HUDI-3501] Support savepoints command based on Call Produce Command (#5025)
ForwardXu [Sun, 13 Mar 2022 12:58:21 +0000 (20:58 +0800)] 
[HUDI-3501] Support savepoints command based on Call Produce Command (#5025)

6 months ago[HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException (#4984)
liujinhui [Sun, 13 Mar 2022 07:00:50 +0000 (15:00 +0800)] 
[HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException (#4984)

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
6 months ago[HUDI-3593] Restore TypedProperties and flush checksum in table config (#5013)
Sagar Sumit [Sun, 13 Mar 2022 02:28:55 +0000 (07:58 +0530)] 
[HUDI-3593] Restore TypedProperties and flush checksum in table config (#5013)

Create new TypedProperties while performing clustering

Add OrderedProperties and minor refactoring

Add javadoc and remove getters from OrderedProperties

6 months ago[HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction...
Sivabalan Narayanan [Fri, 11 Mar 2022 23:40:13 +0000 (15:40 -0800)] 
[HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way (#4971)

6 months ago[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi...
wangxianghu [Fri, 11 Mar 2022 22:49:30 +0000 (02:49 +0400)] 
[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at once (#4969)