hudi.git
7 months agoRevert "[HUDI-2677] Add DFS based message queue for flink writer (#3915)" revert-3915-HUDI-2677 3923/head
Danny Chan [Thu, 4 Nov 2021 12:45:36 +0000 (20:45 +0800)] 
Revert "[HUDI-2677] Add DFS based message queue for flink writer (#3915)"

This reverts commit dbf8c44bdb3019f2ce93d6b1224d9d478c0340fa.

7 months ago[HUDI-2677] Add DFS based message queue for flink writer (#3915)
Danny Chan [Thu, 4 Nov 2021 10:09:00 +0000 (18:09 +0800)] 
[HUDI-2677] Add DFS based message queue for flink writer (#3915)

7 months ago[HUDI-2684] Use DefaultHoodieRecordPayload when precombine field is specified specifi...
Danny Chan [Thu, 4 Nov 2021 08:23:36 +0000 (16:23 +0800)] 
[HUDI-2684] Use DefaultHoodieRecordPayload when precombine field is specified specifically (#3922)

7 months ago[HUDI-2678] flink writer writes huge log file (#3916)
Danny Chan [Wed, 3 Nov 2021 14:12:49 +0000 (22:12 +0800)] 
[HUDI-2678] flink writer writes huge log file (#3916)

7 months ago[HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro table. (#3911)
xiarixiaoyao [Wed, 3 Nov 2021 12:36:01 +0000 (20:36 +0800)] 
[HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro table. (#3911)

7 months ago[HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data...
peanut-chenzhong [Wed, 3 Nov 2021 12:23:40 +0000 (20:23 +0800)] 
[HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column (#3761)

Co-authored-by: 502395931@qq.com <lzyadam315>
7 months ago[HUDI-2660] Delete the view storage properties first before creation (#3899)
Danny Chan [Wed, 3 Nov 2021 06:30:20 +0000 (14:30 +0800)] 
[HUDI-2660] Delete the view storage properties first before creation (#3899)

7 months ago[HUDI-2674] hudi hive reader should not print read values. (#3910)
xiarixiaoyao [Wed, 3 Nov 2021 03:10:18 +0000 (11:10 +0800)] 
[HUDI-2674] hudi hive reader should not print read values. (#3910)

7 months ago[MINOR] Fixed RAT config for "hudi-utilities-bundle" to ignore transient build-bound...
Alexey Kudinkin [Wed, 3 Nov 2021 03:06:26 +0000 (20:06 -0700)] 
[MINOR] Fixed RAT config for "hudi-utilities-bundle" to ignore transient build-bound artifiacts (#3909)

7 months ago[HUDI-2538] persist some configs to hoodie.properties when the first write (#3823)
Yann Byron [Wed, 3 Nov 2021 02:04:23 +0000 (10:04 +0800)] 
[HUDI-2538] persist some configs to hoodie.properties when the first write (#3823)

7 months ago[HUDI-1869] Upgrading Spark3 To 3.1 (#3844)
Yann Byron [Wed, 3 Nov 2021 01:25:12 +0000 (09:25 +0800)] 
[HUDI-1869] Upgrading Spark3 To 3.1 (#3844)

Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>
7 months ago[HUDI-2582] Support concurrent key gen for different tables with row writer path...
Carl-Zhou-CN [Tue, 2 Nov 2021 22:05:09 +0000 (06:05 +0800)] 
[HUDI-2582] Support concurrent key gen for different tables with row writer path (#3817)

Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>
7 months ago[HUDI-2101][RFC-28] support z-order for hudi (#3330)
xiarixiaoyao [Tue, 2 Nov 2021 16:31:57 +0000 (00:31 +0800)] 
[HUDI-2101][RFC-28] support z-order for hudi (#3330)

* [HUDI-2101]support z-order for hudi

* Renaming some configs for consistency/simplicity.

* Minor code cleanups

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
7 months ago[MINOR] Adding a deprecated constructor to AbstractSyncHoodieClient (#3902)
Sivabalan Narayanan [Tue, 2 Nov 2021 16:16:38 +0000 (12:16 -0400)] 
[MINOR] Adding a deprecated constructor to AbstractSyncHoodieClient (#3902)

7 months ago[HUDI-2515] Add close when producing records failed (#3746)
董可伦 [Tue, 2 Nov 2021 11:43:20 +0000 (19:43 +0800)] 
[HUDI-2515] Add close when producing records failed (#3746)

7 months ago[HUDI-2472] Enabling Metadata table for some of TestCleaner unit tests (#3803)
Manoj Govindassamy [Tue, 2 Nov 2021 10:54:36 +0000 (03:54 -0700)] 
[HUDI-2472] Enabling Metadata table for some of TestCleaner unit tests (#3803)

- Making use of HoodieTableMetadataWriter when constructing the HoodieMetadataTestTable
   instance for the test to enable metadata table usage.

7 months ago[HUDI-2005] Fixing partition path creation in AbstractTableFileSystemView (#3769)
Sivabalan Narayanan [Tue, 2 Nov 2021 04:16:45 +0000 (00:16 -0400)] 
[HUDI-2005] Fixing partition path creation in AbstractTableFileSystemView (#3769)

7 months ago[HUDI-2662] Downloads from Nexus Pentaho repo taking too long (#3901)
Sagar Sumit [Mon, 1 Nov 2021 23:14:48 +0000 (04:44 +0530)] 
[HUDI-2662] Downloads from Nexus Pentaho repo taking too long (#3901)

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
7 months ago[HUDI-2643] Remove duplicated hbase-common with tests classifier exists in bundles...
vinoyang [Mon, 1 Nov 2021 12:11:00 +0000 (20:11 +0800)] 
[HUDI-2643] Remove duplicated hbase-common with tests classifier exists in bundles (#3886)

7 months ago[HUDI-2654] Add compaction failed event(part2) (#3896)
Danny Chan [Sun, 31 Oct 2021 09:51:11 +0000 (17:51 +0800)] 
[HUDI-2654] Add compaction failed event(part2) (#3896)

7 months ago[HUDI-2654] Schedules the compaction from earliest for flink (#3891)
Danny Chan [Sat, 30 Oct 2021 00:37:30 +0000 (08:37 +0800)] 
[HUDI-2654] Schedules the compaction from earliest for flink (#3891)

7 months ago[HUDI-1295] Hash ID generator util for Hudi table columns, partition and files (...
Manoj Govindassamy [Fri, 29 Oct 2021 23:19:38 +0000 (16:19 -0700)] 
[HUDI-1295] Hash ID generator util for Hudi table columns, partition and files (#3884)

* [HUDI-1295] Hash ID generator util for Hudi table columns, partition and files

- Adding a new utility class HashID to generate 32,64,128 bits hashes for any
  given message of string or byte array type. This class internally uses
  MessageDigest and xxhash libraries.

- Adding stateful hash holders for Hudi table columns, partition and files to
  pass around for metaindex and to convert to base64encoded strings whenever
  needed

7 months ago[HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved (...
Sagar Sumit [Fri, 29 Oct 2021 17:09:09 +0000 (22:39 +0530)] 
[HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved (#3802)

7 months ago[HUDI-2573] Fixing double locking with multi-writers (#3827)
Sivabalan Narayanan [Fri, 29 Oct 2021 16:14:39 +0000 (12:14 -0400)] 
[HUDI-2573] Fixing double locking with multi-writers (#3827)

- There are two code paths, where we are taking double locking. this was added as part of adding data table locks to update metadata table. Fixing those flows to avoid taking locks if a parent transaction already acquired a lock.

7 months ago[HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks...
Sivabalan Narayanan [Fri, 29 Oct 2021 16:12:44 +0000 (12:12 -0400)] 
[HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table (#3762)

7 months ago[HUDI-2502] Refactor index in hudi-client module (#3778)
Y Ethan Guo [Thu, 28 Oct 2021 08:16:00 +0000 (01:16 -0700)] 
[HUDI-2502] Refactor index in hudi-client module (#3778)

- Refactor Index to reduce Line of Code and re-use across engines.

7 months ago[HUDI-2633] Make precombine field optional for flink (#3874)
Danny Chan [Thu, 28 Oct 2021 05:52:06 +0000 (13:52 +0800)] 
[HUDI-2633] Make precombine field optional for flink (#3874)

8 months ago[MINOR] Add links to all the existing RFCs in rfc/README.md (#3876)
vinoth chandar [Wed, 27 Oct 2021 12:25:19 +0000 (05:25 -0700)] 
[MINOR] Add links to all the existing RFCs in rfc/README.md (#3876)

8 months ago[HUDI-2632] Schema evolution for flink parquet reader (#3872)
Danny Chan [Wed, 27 Oct 2021 12:00:24 +0000 (20:00 +0800)] 
[HUDI-2632] Schema evolution for flink parquet reader (#3872)

8 months ago[HUDI-1475]: fixed java doc for precombine api (#3867)
Pratyaksh Sharma [Tue, 26 Oct 2021 22:15:20 +0000 (03:45 +0530)] 
[HUDI-1475]: fixed java doc for precombine api (#3867)

8 months ago[MINOR] Fix README for hudi-kafka-connect (#3858)
Y Ethan Guo [Tue, 26 Oct 2021 21:45:52 +0000 (14:45 -0700)] 
[MINOR] Fix README for hudi-kafka-connect (#3858)

8 months ago[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles...
vinoyang [Tue, 26 Oct 2021 14:36:10 +0000 (22:36 +0800)] 
[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864)

8 months ago[HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader ...
Sivabalan Narayanan [Tue, 26 Oct 2021 01:43:15 +0000 (21:43 -0400)] 
[HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)" (#3863)

This reverts commit 1bb05325637740498cac548872cf7223e34950d0.

8 months ago[MINOR] Fix typo,'deseralized' corrected to 'deserialized' & 'Kyro' corrected to...
董可伦 [Mon, 25 Oct 2021 13:56:47 +0000 (21:56 +0800)] 
[MINOR] Fix typo,'deseralized' corrected to 'deserialized' & 'Kyro' corrected to 'Kryo' (#3846)

8 months ago[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles...
vinoyang [Mon, 25 Oct 2021 05:45:28 +0000 (13:45 +0800)] 
[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847)

8 months ago[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)
Sivabalan Narayanan [Mon, 25 Oct 2021 05:21:08 +0000 (01:21 -0400)] 
[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)

8 months ago[HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter (#3849)
Raymond Xu [Mon, 25 Oct 2021 04:14:39 +0000 (21:14 -0700)] 
[HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter (#3849)

Remove the logic of using deltastreamer to prep test table. Use fixture (compressed test table) instead.

8 months ago[MINOR] Show source table operator details on the flink web when reading hudi table...
mincwang [Sun, 24 Oct 2021 15:18:01 +0000 (23:18 +0800)] 
[MINOR] Show source table operator details on the flink web when reading hudi table (#3842)

8 months ago[HUDI-2468] Metadata table support for rolling back the first commit (#3843)
Manoj Govindassamy [Sat, 23 Oct 2021 14:07:09 +0000 (07:07 -0700)] 
[HUDI-2468] Metadata table support for rolling back the first commit (#3843)

- Fix is to make Metadata table writer creation aware of the currently inflight action so that it can
  make some informed decision about whether bootstrapping is needed for the table and whether
  any pending action on the data timeline can be ignored.

8 months ago[HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client...
Y Ethan Guo [Fri, 22 Oct 2021 19:58:51 +0000 (12:58 -0700)] 
[HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module (#3741)

8 months ago[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aimin...
zhangyue19921010 [Fri, 22 Oct 2021 16:03:58 +0000 (00:03 +0800)] 
[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests (#3719)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
8 months ago[HUDI-2592] Fix write empty array when write.precombine.field is decimal type (#3837)
Matrix42 [Fri, 22 Oct 2021 11:42:13 +0000 (19:42 +0800)] 
[HUDI-2592] Fix write empty array when write.precombine.field is decimal type (#3837)

8 months ago[HUDI-2553] Metadata table compaction trigger max delta commits (#3794)
Manoj Govindassamy [Thu, 21 Oct 2021 17:09:37 +0000 (10:09 -0700)] 
[HUDI-2553] Metadata table compaction trigger max delta commits (#3794)

-  Setting the max delta commits default value from 24 to 10 to trigger the compaction in metadata table.

8 months ago[HUDI-2507] Generate more dependency list file for other bundles (#3773)
vinoyang [Thu, 21 Oct 2021 06:10:01 +0000 (14:10 +0800)] 
[HUDI-2507] Generate more dependency list file for other bundles (#3773)

8 months ago[HUDI-2583] Refactor TestWriteCopyOnWrite test cases (#3832)
Danny Chan [Thu, 21 Oct 2021 04:36:41 +0000 (12:36 +0800)] 
[HUDI-2583] Refactor TestWriteCopyOnWrite test cases (#3832)

8 months ago[HUDI-2077] Fix flakiness in TestHoodieDeltaStreamer (#3829)
Raymond Xu [Thu, 21 Oct 2021 03:57:12 +0000 (20:57 -0700)] 
[HUDI-2077] Fix flakiness in TestHoodieDeltaStreamer (#3829)

8 months ago[HUDI-2472] Fix few Cleaner tests with metadata table enabled (#3825)
Manoj Govindassamy [Wed, 20 Oct 2021 22:57:00 +0000 (15:57 -0700)] 
[HUDI-2472] Fix few Cleaner tests with metadata table enabled (#3825)

8 months ago[HUDI-2578] Support merging small files for flink insert operation (#3822)
Danny Chan [Wed, 20 Oct 2021 13:10:07 +0000 (21:10 +0800)] 
[HUDI-2578] Support merging small files for flink insert operation (#3822)

8 months ago[HUDI-2469] [Kafka Connect] Replace json based payload with protobuf for Transaction...
rmahindra123 [Tue, 19 Oct 2021 21:29:48 +0000 (14:29 -0700)] 
[HUDI-2469] [Kafka Connect] Replace json based payload with protobuf for Transaction protocol. (#3694)

* Substitue Control Event with protobuf

* Fix tests

* Fix unit tests

* Add javadocs

* Add javadocs

* Address reviewer comments

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
8 months ago[MINOR] Fix typo,'intance' corrected to 'instance' (#3788)
董可伦 [Tue, 19 Oct 2021 15:16:48 +0000 (23:16 +0800)] 
[MINOR] Fix typo,'intance' corrected to 'instance' (#3788)

8 months ago[HUDI-2482] support 'drop partition' sql (#3754)
Yann Byron [Tue, 19 Oct 2021 14:09:53 +0000 (22:09 +0800)] 
[HUDI-2482] support 'drop partition' sql (#3754)

8 months ago[MINOR] Fix typo, 'upsert' corrected to 'insert' in java write example (#3809)
jaxonzhang [Tue, 19 Oct 2021 12:04:18 +0000 (20:04 +0800)] 
[MINOR] Fix typo, 'upsert' corrected to 'insert' in java write example (#3809)

8 months ago[HUDI-2572] Strength flink compaction rollback strategy (#3819)
Danny Chan [Tue, 19 Oct 2021 02:47:38 +0000 (10:47 +0800)] 
[HUDI-2572] Strength flink compaction rollback strategy (#3819)

* make the events of commit task distinct by file id
* fix the existence check for inflight state file
* make the compaction task fail-safe

8 months ago[HUDI-2561] BitCaskDiskMap - avoiding hostname resolution when logging messages ...
Manoj Govindassamy [Mon, 18 Oct 2021 17:07:53 +0000 (10:07 -0700)] 
[HUDI-2561] BitCaskDiskMap - avoiding hostname resolution when logging messages (#3811)

- InetAddress.getLocalHost() can take up as much as 30+seconds if the network
   configurations are not done right. This might be due to local hostname
   missing IPv6 address mapping in /etc/hosts or network configs slowing down
   any IPv6 name resolutions. If this API is used for logging verbose messages
   and that too in the hot code path, it can lead to order of magnitude
   slowness in the overall task completion.

8 months ago[HUDI-2571] Remove include-flink-sql-connector-hive profile from flink bundle (#3818)
Danny Chan [Mon, 18 Oct 2021 09:34:49 +0000 (17:34 +0800)] 
[HUDI-2571] Remove include-flink-sql-connector-hive profile from flink bundle (#3818)

8 months agoHUDI-2569 shaded hive (#3816)
yiduwangkai [Mon, 18 Oct 2021 09:12:13 +0000 (17:12 +0800)] 
HUDI-2569 shaded hive (#3816)

Co-authored-by: wangkai9 <wangkai9@tuhu.cn>
8 months ago[HUDI-2568] Simplify the view storage config properties (#3815)
Danny Chan [Mon, 18 Oct 2021 06:42:33 +0000 (14:42 +0800)] 
[HUDI-2568] Simplify the view storage config properties (#3815)

8 months ago[HUDI-2557] Shade javax.servlet for flink bundle jar (#3807)
yiduwangkai [Mon, 18 Oct 2021 03:26:21 +0000 (11:26 +0800)] 
[HUDI-2557] Shade javax.servlet for flink bundle jar (#3807)

Co-authored-by: wangkai9 <wangkai9@tuhu.cn>
8 months ago[HUDI-2562] Embedded timeline server on JobManager (#3812)
Danny Chan [Mon, 18 Oct 2021 02:45:39 +0000 (10:45 +0800)] 
[HUDI-2562] Embedded timeline server on JobManager (#3812)

8 months ago[MINOR] fix typo,'seprarated' corrected to 'separated' (#3789)
Jimmy.Zhou [Fri, 15 Oct 2021 20:26:16 +0000 (04:26 +0800)] 
[MINOR] fix typo,'seprarated' corrected to 'separated' (#3789)

8 months ago[HUDI-2556] Tweak some default config options for flink (#3800)
Danny Chan [Thu, 14 Oct 2021 11:42:56 +0000 (19:42 +0800)] 
[HUDI-2556] Tweak some default config options for flink (#3800)

* rename write.insert.drop.duplicates to write.precombine and set it as true for COW table
* set index.global.enabled default as true
* set compaction.target_io default as 500GB

8 months ago[HUDI-2551] Support DefaultHoodieRecordPayload for flink (#3792)
Danny Chan [Thu, 14 Oct 2021 05:46:53 +0000 (13:46 +0800)] 
[HUDI-2551] Support DefaultHoodieRecordPayload for flink (#3792)

8 months ago[HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787)
Danny Chan [Thu, 14 Oct 2021 02:36:18 +0000 (10:36 +0800)] 
[HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787)

8 months ago[HUDI-2552] Fixing some test failures to unblock broken CI master (#3793)
Sivabalan Narayanan [Wed, 13 Oct 2021 22:44:43 +0000 (18:44 -0400)] 
[HUDI-2552] Fixing some test failures to unblock broken CI master (#3793)

8 months ago[HUDI-2435][BUG]Fix clustering handle errors (#3666)
zhangyue19921010 [Tue, 12 Oct 2021 22:24:48 +0000 (06:24 +0800)] 
[HUDI-2435][BUG]Fix clustering handle errors (#3666)

* done

* remove unused imports

* code reviewed

* code reviewed

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
8 months ago[HUDI-2494] Fixing glob pattern to skip all hoodie meta paths (#3768)
Sivabalan Narayanan [Tue, 12 Oct 2021 18:06:40 +0000 (14:06 -0400)] 
[HUDI-2494] Fixing glob pattern to skip all hoodie meta paths (#3768)

8 months ago[HUDI-2532] Metadata table compaction trigger max delta commits (#3784)
Manoj Govindassamy [Tue, 12 Oct 2021 13:49:42 +0000 (06:49 -0700)] 
[HUDI-2532] Metadata table compaction trigger max delta commits (#3784)

-  Setting the max delta commits default value from 24 to 10 to trigger the
     compaction in metadata table.

8 months ago[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764)
董可伦 [Mon, 11 Oct 2021 18:07:34 +0000 (02:07 +0800)] 
[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764)

8 months ago[HUDI-2540] Fixed wrong validation for metadataTableEnabled in HoodieTable (#3781)
Roc Marshal [Mon, 11 Oct 2021 17:58:33 +0000 (01:58 +0800)] 
[HUDI-2540] Fixed wrong validation for metadataTableEnabled in HoodieTable (#3781)

8 months ago[HUDI-2542] AppendWriteFunction throws NPE when checkpointing without written data...
Danny Chan [Mon, 11 Oct 2021 08:22:22 +0000 (16:22 +0800)] 
[HUDI-2542] AppendWriteFunction throws NPE when checkpointing without written data (#3777)

8 months ago[HUDI-2496] Insert duplicate records when precombined is deactivated for "insert...
Ilias Antoniou [Mon, 11 Oct 2021 01:33:16 +0000 (04:33 +0300)] 
[HUDI-2496] Insert duplicate records when precombined is deactivated  for "insert" operation (#3740)

8 months ago[HUDI-2537] Fix metadata table for flink (#3774)
Danny Chan [Sun, 10 Oct 2021 01:30:39 +0000 (09:30 +0800)] 
[HUDI-2537] Fix metadata table for flink (#3774)

8 months ago[HUDI-2534] Remove the sort operation when bulk_insert in batch mode (#3772)
Danny Chan [Sat, 9 Oct 2021 10:02:10 +0000 (18:02 +0800)] 
[HUDI-2534] Remove the sort operation when bulk_insert in batch mode (#3772)

8 months ago[HUDI-2530] Adding async compaction support to integ test suite framework (#3750)
Sivabalan Narayanan [Fri, 8 Oct 2021 15:30:48 +0000 (11:30 -0400)] 
[HUDI-2530] Adding async compaction support to integ test suite framework (#3750)

8 months ago[MINOR] Fix typo,'properites' corrected to 'properties' (#3738)
董可伦 [Thu, 7 Oct 2021 00:37:01 +0000 (08:37 +0800)] 
[MINOR] Fix typo,'properites' corrected to 'properties' (#3738)

8 months ago[HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module (...
Y Ethan Guo [Thu, 7 Oct 2021 00:20:41 +0000 (17:20 -0700)] 
[HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module (#3743)

8 months ago[HUDI-2456] support 'show partitions' sql (#3693)
Yann Byron [Wed, 6 Oct 2021 07:46:49 +0000 (15:46 +0800)] 
[HUDI-2456] support 'show partitions' sql (#3693)

8 months ago[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from...
Sivabalan Narayanan [Wed, 6 Oct 2021 04:17:52 +0000 (00:17 -0400)] 
[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)

* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer.
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
8 months ago[HUDI-2497] Refactor clean and restore actions in hudi-client module (#3734)
Y Ethan Guo [Thu, 30 Sep 2021 22:20:25 +0000 (15:20 -0700)] 
[HUDI-2497] Refactor clean and restore actions in hudi-client module (#3734)

8 months ago[HUDI-2499] Making jdbc-url, user and pass as non-required field for other sync modes...
Vinay Patil [Thu, 30 Sep 2021 15:41:15 +0000 (21:11 +0530)] 
[HUDI-2499] Making jdbc-url, user and pass as non-required field for other sync modes (#3732)

8 months ago[HUDI-2440] Add dependency change diff script for dependency governace (#3674)
vinoyang [Thu, 30 Sep 2021 08:56:11 +0000 (16:56 +0800)] 
[HUDI-2440] Add dependency change diff script for dependency governace (#3674)

8 months ago[MINOR] Support JuiceFileSystem (#3729)
tangyoupeng [Thu, 30 Sep 2021 04:50:46 +0000 (12:50 +0800)] 
[MINOR] Support JuiceFileSystem (#3729)

8 months ago[MINOR] Fix typo Hooodie corrected to Hoodie & reuqired corrected to required (#3730)
董可伦 [Thu, 30 Sep 2021 01:55:32 +0000 (09:55 +0800)] 
[MINOR] Fix typo Hooodie corrected to Hoodie & reuqired corrected to required (#3730)

8 months ago[HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource (#3413)
zhangyue19921010 [Wed, 29 Sep 2021 15:54:12 +0000 (23:54 +0800)] 
[HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource (#3413)

* add ORCDFSSource to support reading orc file into hudi format && add UTs

* remove ununsed import

* simplify tes

* code review

* code review

* code review

* code review

* code review

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
8 months ago[MINOR] Add a RFC template and folder (#3726)
vinoth chandar [Tue, 28 Sep 2021 16:33:27 +0000 (10:33 -0600)] 
[MINOR] Add a RFC template and folder (#3726)

8 months ago[HUDI-2474] Refreshing timeline for every operation in Hudi when metadata is enabled...
Sivabalan Narayanan [Tue, 28 Sep 2021 09:16:52 +0000 (05:16 -0400)] 
[HUDI-2474] Refreshing timeline for every operation in Hudi when metadata is enabled (#3698)

8 months ago[HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka (#3715)
qianchutao [Tue, 28 Sep 2021 05:47:15 +0000 (13:47 +0800)] 
[HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka (#3715)

9 months ago[MINOR] Fix typo,'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet...
董可伦 [Sun, 26 Sep 2021 13:53:39 +0000 (21:53 +0800)] 
[MINOR] Fix typo,'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet' (#3717)

9 months ago[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721)
qianchutao [Sun, 26 Sep 2021 13:52:35 +0000 (21:52 +0800)] 
[MINOR] fix typo,'SPAKR' corrected to 'SPARK' (#3721)

9 months ago[HUDI-2451] On windows client with hdfs server for wrong file separator (#3687)
Carl-Zhou-CN [Sun, 26 Sep 2021 13:51:27 +0000 (21:51 +0800)] 
[HUDI-2451] On windows client with hdfs server for wrong file separator (#3687)

Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>
9 months ago[HUDI-2484] Fix hive sync mode setting in Deltastreamer (#3712)
Sagar Sumit [Fri, 24 Sep 2021 17:05:42 +0000 (22:35 +0530)] 
[HUDI-2484] Fix hive sync mode setting in Deltastreamer (#3712)

9 months ago[HUDI-2485] Consume as mini-batch for flink stream reader (#3710)
Danny Chan [Fri, 24 Sep 2021 15:44:01 +0000 (23:44 +0800)] 
[HUDI-2485] Consume as mini-batch for flink stream reader (#3710)

9 months ago[HUDI-2483] Infer changelog mode for flink compactor (#3706)
Danny Chan [Fri, 24 Sep 2021 06:52:27 +0000 (14:52 +0800)] 
[HUDI-2483] Infer changelog mode for flink compactor (#3706)

9 months ago[HUDI-2385] Make parquet dictionary encoding configurable (#3578)
Shawy Geng [Fri, 24 Sep 2021 05:33:34 +0000 (13:33 +0800)] 
[HUDI-2385] Make parquet dictionary encoding configurable (#3578)

Co-authored-by: leesf <leesf@apache.org>
9 months ago[HUDI-2248] Fixing the closing of hms client (#3364)
jsbali [Thu, 23 Sep 2021 20:45:24 +0000 (02:15 +0530)] 
[HUDI-2248] Fixing the closing of hms client (#3364)

* [HUDI-2248] Fixing the closing of hms client

* [HUDI-2248] Using Hive.closeCurrent() over client.close()

9 months ago[HUDI-2383] Clean the marker files after compaction (#3576)
Shawy Geng [Thu, 23 Sep 2021 19:40:58 +0000 (03:40 +0800)] 
[HUDI-2383] Clean the marker files after compaction (#3576)

9 months ago[HUDI-2395] Metadata tests rewrite (#3695)
Sagar Sumit [Thu, 23 Sep 2021 19:40:11 +0000 (01:10 +0530)] 
[HUDI-2395] Metadata tests rewrite (#3695)

- Added commit metadata infra to test table so that we can test entire metadata using test table itself. These tests don't care about the contents of files as such and hence we should be able to test all code paths for metadata using test table.

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
9 months ago[HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files (#3702)
Danny Chan [Thu, 23 Sep 2021 07:14:30 +0000 (15:14 +0800)] 
[HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files (#3702)

9 months ago[MINOR] Cosmetic changes for flink (#3701)
Danny Chan [Wed, 22 Sep 2021 04:18:02 +0000 (12:18 +0800)] 
[MINOR] Cosmetic changes for flink (#3701)

9 months ago[MINOR] Fix typo."funcitons" corrected to "functions" (#3681)
Jimmy.Zhou [Wed, 22 Sep 2021 00:30:13 +0000 (08:30 +0800)] 
[MINOR] Fix typo."funcitons" corrected to "functions" (#3681)