helix.git
2 months agoEnable helix-front for release
Junkai Xue [Thu, 12 May 2022 02:58:00 +0000 (19:58 -0700)] 
Enable helix-front for release

2 months agoRevert "[maven-release-plugin] prepare release helix-1.0.4"
Junkai Xue [Thu, 12 May 2022 02:56:12 +0000 (19:56 -0700)] 
Revert "[maven-release-plugin] prepare release helix-1.0.4"

This reverts commit ef684139187f2a33c1941e4075e76f8c2c104746.

2 months agoRevert "[maven-release-plugin] prepare for next development iteration"
Junkai Xue [Thu, 12 May 2022 02:55:55 +0000 (19:55 -0700)] 
Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 013042e97b82e4d7a4556988b310379f20955987.

2 months ago[maven-release-plugin] prepare for next development iteration
Junkai Xue [Thu, 12 May 2022 01:27:06 +0000 (18:27 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 months ago[maven-release-plugin] prepare release helix-1.0.4
Junkai Xue [Thu, 12 May 2022 01:26:52 +0000 (18:26 -0700)] 
[maven-release-plugin] prepare release helix-1.0.4

2 months agoFix release URL
Junkai Xue [Thu, 12 May 2022 01:18:48 +0000 (18:18 -0700)] 
Fix release URL

2 months agoRefactor bump-up.command script and add helix-view-aggregator (#2086)
Qi (Quincy) Qu [Tue, 10 May 2022 20:25:49 +0000 (13:25 -0700)] 
Refactor bump-up.command script and add helix-view-aggregator (#2086)

Refactor bump-up.command script and add helix-view-aggregator

3 months agoFix race condition between instance drop and participant history update (#2073)
Qi (Quincy) Qu [Tue, 3 May 2022 21:38:40 +0000 (14:38 -0700)] 
Fix race condition between instance drop and participant history update (#2073)

Fix race condition between instance drop and participant history update

3 months agoIssue #1970: allow clients to prevent HelixProperty from cloning the ZNRecord (#2072)
Richard Startin [Thu, 28 Apr 2022 23:42:52 +0000 (00:42 +0100)] 
Issue #1970: allow clients to prevent HelixProperty from cloning the ZNRecord (#2072)

3 months agofix issue #2064: bug where RuntimeJobDag.generateJobList could loop until parallelism...
Richard Startin [Thu, 28 Apr 2022 16:50:14 +0000 (17:50 +0100)] 
fix issue #2064: bug where RuntimeJobDag.generateJobList could loop until parallelism is reached when in JobQueue mode (#2065)

3 months agoMaintenanceManagementService improvement - Allow implementation of OperationInterfac...
xyuanlu [Wed, 27 Apr 2022 22:42:51 +0000 (15:42 -0700)] 
MaintenanceManagementService  improvement - Allow implementation of OperationInterface to return null (#2035)

Allow implementation of OperationInterface to return null

3 months agosupport common input for operational input on Maintenance Management API (#2055)
xyuanlu [Wed, 27 Apr 2022 22:42:34 +0000 (15:42 -0700)] 
support common input for operational input on Maintenance Management API  (#2055)

support common input for operational input on Maintenance Management API

3 months agoDo not proceed with cluster creation if addCluster() fails. (#2068)
Komal Desai [Wed, 27 Apr 2022 22:42:08 +0000 (15:42 -0700)] 
Do not proceed with cluster creation if addCluster() fails. (#2068)

In Helix-Tools - ClusterSetup::addCluster() doesn't check return value
of HelixAdmin::addCluster() method. It proceeds even when the call
returns failure.
Check the return status of addCluster() and throw an exception if cluster creation fails.

3 months agoMore term cleanup for tutorial website (#2059)
Qi (Quincy) Qu [Wed, 27 Apr 2022 22:41:06 +0000 (15:41 -0700)] 
More term cleanup for tutorial website (#2059)

3 months agomove to Apache Analytics(Matomo)
Olivier Lamy [Mon, 25 Apr 2022 08:13:54 +0000 (18:13 +1000)] 
move to Apache Analytics(Matomo)

Signed-off-by: Olivier Lamy <olamy@apache.org>
3 months agouse reflow version just released and working with recent version maven-site-plugin
Olivier Lamy [Mon, 25 Apr 2022 08:02:49 +0000 (18:02 +1000)] 
use reflow version just released and working with recent version maven-site-plugin

Signed-off-by: Olivier Lamy <olamy@apache.org>
3 months agofix website deployment
Olivier Lamy [Sat, 23 Apr 2022 04:39:46 +0000 (14:39 +1000)] 
fix website deployment

Signed-off-by: Olivier Lamy <olamy@apache.org>
3 months agoFix release note for Log4j version
Junkai Xue [Fri, 22 Apr 2022 21:35:40 +0000 (14:35 -0700)] 
Fix release note for Log4j version

3 months agoUpdate menu bar
Junkai Xue [Thu, 21 Apr 2022 21:29:45 +0000 (14:29 -0700)] 
Update menu bar

3 months agoAdd bump snapshot script and bump snapshort
Junkai Xue [Thu, 21 Apr 2022 21:19:58 +0000 (14:19 -0700)] 
Add bump snapshot script and bump snapshort

3 months agoupgrade xstream to security issues (#2046)
PJ Fanning [Wed, 20 Apr 2022 21:48:29 +0000 (23:48 +0200)] 
upgrade xstream to security issues (#2046)

3 months agoReplace non-inclusive terms in quickstart (#2050)
Qi (Quincy) Qu [Wed, 20 Apr 2022 20:55:40 +0000 (13:55 -0700)] 
Replace non-inclusive terms in quickstart (#2050)

3 months ago[issue-1728] upgrade guava due to cve (#2042)
PJ Fanning [Tue, 19 Apr 2022 18:21:45 +0000 (20:21 +0200)] 
[issue-1728] upgrade guava due to cve (#2042)

3 months agoupgrade jackson to 2.12.6.1 due to cve (#2043)
PJ Fanning [Tue, 19 Apr 2022 18:21:11 +0000 (20:21 +0200)] 
upgrade jackson to 2.12.6.1 due to cve (#2043)

3 months agoAdd 1.0.3 releasenote
Junkai Xue [Mon, 18 Apr 2022 22:18:22 +0000 (15:18 -0700)] 
Add 1.0.3 releasenote

3 months agoReplace non-inclusive terms in tutorial.md (#2039)
Qi (Quincy) Qu [Mon, 18 Apr 2022 17:57:29 +0000 (10:57 -0700)] 
Replace non-inclusive terms in tutorial.md (#2039)

3 months agoCode refactor and cleanup on instance validation (#2032)
Qi (Quincy) Qu [Mon, 18 Apr 2022 17:57:15 +0000 (10:57 -0700)] 
Code refactor and cleanup on instance validation (#2032)

Unify the usage of checking instance enable/disable using InstanceValidationUtil

3 months agoFixed size history for Scheduled Workflow tasks (#2036)
Komal Desai [Sat, 16 Apr 2022 22:31:19 +0000 (15:31 -0700)] 
Fixed size history for Scheduled Workflow tasks (#2036)

Once we execute scheduled workflow task, we append entry to history.
Each entry is of the format "<taskname>-<timestamp>"

But we never purged old entries.
This will result in hitting size limit of Znode.

Introducing fixed size history of 20 and purge all the previous entries.

3 months agoupgrade snakeyaml to v1.30 due to cve (#2041)
PJ Fanning [Sat, 16 Apr 2022 22:30:31 +0000 (00:30 +0200)] 
upgrade snakeyaml to v1.30 due to cve (#2041)

3 months ago[issue-1727] upgrade to commons-io 2.11.0 due to cve (#2040)
PJ Fanning [Sat, 16 Apr 2022 22:29:52 +0000 (00:29 +0200)] 
[issue-1727] upgrade to commons-io 2.11.0 due to cve (#2040)

3 months agoRemove temp file
Junkai Xue [Fri, 15 Apr 2022 08:10:45 +0000 (01:10 -0700)] 
Remove temp file

3 months agoFix website deployment
Junkai Xue [Fri, 15 Apr 2022 07:24:42 +0000 (00:24 -0700)] 
Fix website deployment

3 months agoRemove archived versions
Junkai Xue [Thu, 14 Apr 2022 19:02:38 +0000 (12:02 -0700)] 
Remove archived versions

3 months agoUpdate dependencies and fix compile errors
Qi (Quincy) Qu [Tue, 5 Apr 2022 23:21:06 +0000 (16:21 -0700)] 
Update dependencies and fix compile errors

Unit test fix and code style improvement.

3 months agoHELIX-1233: fix broken test in helix view aggregator module
Harry Zhang [Fri, 10 Aug 2018 20:26:36 +0000 (13:26 -0700)] 
HELIX-1233: fix broken test in helix view aggregator module

RB=1388194
BUG=HELIX-1233
G=helix-reviewers
R=jxue,lxia,jjwang,hulee
A=lxia

3 months agominor improvements
Harry Zhang [Tue, 8 May 2018 19:27:44 +0000 (12:27 -0700)] 
minor improvements

RB=1304338
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang

3 months agoHELIX-711: implement distributed state model for helix view aggregator
Harry Zhang [Fri, 27 Apr 2018 21:19:19 +0000 (14:19 -0700)] 
HELIX-711: implement distributed state model for helix view aggregator

RB=1295025
BUG=HELIX-711
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=lxia,jjwang

3 months agoHELIX-708: adding basic metrics to HelixViewAggregator
hrzhang [Fri, 2 Mar 2018 02:46:54 +0000 (18:46 -0800)] 
HELIX-708: adding basic metrics to HelixViewAggregator

RB=1237803
BUG=HELIX-708
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang

3 months agoHELIX-797: add mssing package def to helix-view-aggregator's pom and ivy files
hrzhang [Tue, 27 Feb 2018 19:34:25 +0000 (11:34 -0800)] 
HELIX-797: add mssing package def to helix-view-aggregator's pom and ivy files

RB=1233804
BUG=HELIX-797
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang

3 months agoChange RoutingTableProvider to support direct aggregating routing information from...
Lei Xia [Tue, 13 Feb 2018 18:15:55 +0000 (10:15 -0800)] 
Change RoutingTableProvider to support direct aggregating routing information from CurrentStates in each liveinstance. When sourceDataType is set as CurrentState, RoutingTableProvider will listen on CurrentStateChanges and refresh routing table from CurrentStates upon changes.

RB=1221442
G=helix-reviewers
A=jxue

3 months agoHELIX-705: Implement ViewClusterRefresher logic and tests
hrzhang [Tue, 6 Feb 2018 04:30:39 +0000 (20:30 -0800)] 
HELIX-705: Implement ViewClusterRefresher logic and tests

RB=1213810
BUG=HELIX-776
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang,jxue

3 months agoHELIX-705: implemented SourceClusterDataProvider's core logic and related tests
hrzhang [Tue, 6 Feb 2018 19:21:27 +0000 (11:21 -0800)] 
HELIX-705: implemented SourceClusterDataProvider's core logic and related tests

RB=1205694
BUG=HELIX-775
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang

3 months agoHELIX-705: implement view cluster config change related logics and tests
hrzhang [Wed, 24 Jan 2018 01:55:23 +0000 (17:55 -0800)] 
HELIX-705: implement view cluster config change related logics and tests

RB=1205359
BUG=HELIX-705
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=lxia

3 months agoHELIX-705: create interfaces and interactions among HelixViewAggregator components
hrzhang [Wed, 24 Jan 2018 01:55:23 +0000 (17:55 -0800)] 
HELIX-705: create interfaces and interactions among HelixViewAggregator components

RB=1205300
BUG=HELIX-705
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang,jxue

3 months agoHELIX-705: created project infrastructure for helix-view-aggregator module
hrzhang [Wed, 24 Jan 2018 01:02:31 +0000 (17:02 -0800)] 
HELIX-705: created project infrastructure for helix-view-aggregator module

RB=1202163
BUG=HELIX-705
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jxue

3 months agoBump jackson-databind in /metadata-store-directory-common (#2029)
dependabot[bot] [Mon, 11 Apr 2022 17:48:08 +0000 (13:48 -0400)] 
Bump jackson-databind in /metadata-store-directory-common (#2029)

Bumps [jackson-databind](https://github.com/FasterXML/jackson) from 2.11.0 to 2.12.6.1.
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

---
updated-dependencies:
- dependency-name: com.fasterxml.jackson.core:jackson-databind
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 months agoBump moment from 2.22.2 to 2.29.2 in /helix-front (#2025)
dependabot[bot] [Mon, 11 Apr 2022 14:15:51 +0000 (10:15 -0400)] 
Bump moment from 2.22.2 to 2.29.2 in /helix-front (#2025)

Bumps [moment](https://github.com/moment/moment) from 2.22.2 to 2.29.2.
- [Release notes](https://github.com/moment/moment/releases)
- [Changelog](https://github.com/moment/moment/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/moment/moment/compare/2.22.2...2.29.2)

---
updated-dependencies:
- dependency-name: moment
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 months agoImplement DefaultCloudEventCallbackImpl (#1995)
Molly Gao [Wed, 6 Apr 2022 17:53:45 +0000 (10:53 -0700)] 
Implement DefaultCloudEventCallbackImpl (#1995)

Implement a default callback implementation for Helix cloud event listeners.

4 months agoFix TestCloudEventCallbackProperty by bypassing connecting to zk (#2017)
Molly Gao [Wed, 6 Apr 2022 17:53:22 +0000 (10:53 -0700)] 
Fix TestCloudEventCallbackProperty by bypassing connecting to zk (#2017)

Due to logical change in ZKHelixManager constructor in a #1986, TestCloudEventCallbackProperty broke because in this test it doesn't connect to zookeeper server.
To fix this test, we separated MockCloudEventAwareHelixManager (previously called MockEventAwareZKHelixManager, nested inside TestCloudEventCallbackProperty)into a class, and include all and only the logics related to cloud events in MockCloudEventAwareHelixManager. More detailed, we mock a cloud config object retrieved from zk to bypass connection to zk.

4 months agoEnable HelixManager as an event listener (#1978)
Molly Gao [Mon, 21 Mar 2022 20:31:35 +0000 (13:31 -0700)] 
Enable HelixManager as an event listener (#1978)

Make helix manager cloud event aware by registering a cloud event listener when connect Helix manager

4 months agoAdd event handler and event listener interface (#1976)
Molly Gao [Thu, 10 Mar 2022 22:05:33 +0000 (14:05 -0800)] 
Add event handler and event listener interface (#1976)

This commit creates a skeleton for event handling framework and adds the following classes:
CloudEventListener interface
CloudEventHandler class
CloudEventHandlerFactory class

4 months agoImprove ZkClientMonitor and ZkClientPathMonitor performance (#2021)
Henri Hagberg [Thu, 7 Apr 2022 14:09:49 +0000 (17:09 +0300)] 
Improve ZkClientMonitor and ZkClientPathMonitor performance (#2021)

Previously, regex matches were used, which was inefficient. This commit does this following:
Replace String#matches with more efficient String#contains in ZkClientPathMonitor
Refactor record* methods in ZkClientMonitor to avoid repetition and simplify matching logic

4 months agoadd type and reason to cluster config (#2006)
xyuanlu [Wed, 6 Apr 2022 01:34:49 +0000 (18:34 -0700)] 
add type and reason to cluster config (#2006)

Add type and reason for batch disable/enable instance

4 months agoRefactor config string processing logic into a util class (#2015)
xyuanlu [Tue, 5 Apr 2022 21:01:02 +0000 (14:01 -0700)] 
Refactor config string processing logic into a util class (#2015)

Refactor config string processing logic into a util class

4 months agoFix TestDropResourceMetricsReset (#2011)
Qi (Quincy) Qu [Tue, 5 Apr 2022 17:41:44 +0000 (10:41 -0700)] 
Fix TestDropResourceMetricsReset (#2011)

Add a temp workaround by manually triggering a CurrentStageChange event for ExternalViewStage computation so that resource monitor can be cleaned up in unit test.

4 months agoPopulate helix cloud property using cloud config (#2005)
Molly Gao [Tue, 5 Apr 2022 17:13:01 +0000 (10:13 -0700)] 
Populate helix cloud property using cloud config (#2005)

Currently when instantiating a zk helix manager, we retrieve cloud config from zk and replace entire HelixCloudProperty object, which may cause some fields that user pass in in HelixCloudProperty that are not included in cloud config get missing. This commit changes the logic to only populate fields in HelixCloudProperty with values that are present in cloud config, and leave other fields unchanged.

4 months agoFix web build and deployment Issue (#2014)
Junkai Xue [Mon, 4 Apr 2022 17:17:49 +0000 (10:17 -0700)] 
Fix web build and deployment Issue (#2014)

4 months agofix Broken Logic in perPartitionHealthCheck (#2012)
xyuanlu [Mon, 4 Apr 2022 17:14:03 +0000 (10:14 -0700)] 
fix Broken Logic in perPartitionHealthCheck (#2012)

fix Broken Logic in perPartitionHealthCheck

4 months agofix repoducible builds issue (#2013)
Hervé Boutemy [Sun, 3 Apr 2022 21:18:07 +0000 (23:18 +0200)] 
fix repoducible builds issue (#2013)

4 months agoUpdate 0.9.9 to be replaced by 0.9.10
Junkai Xue [Sat, 2 Apr 2022 23:29:28 +0000 (16:29 -0700)] 
Update 0.9.9 to be replaced by 0.9.10

4 months agoadd new error message for customized partition check host connection error (#1984)
xyuanlu [Thu, 31 Mar 2022 23:19:00 +0000 (16:19 -0700)] 
add new error message for customized partition check host connection error  (#1984)

Add new error message for customized partition check host connection error

4 months agoauto Exit MM for auto EMM test (#2003)
xyuanlu [Thu, 31 Mar 2022 23:18:25 +0000 (16:18 -0700)] 
auto Exit MM for auto EMM test (#2003)

4 months agoAdd instance disable reason (#1993) (#2004)
xyuanlu [Thu, 31 Mar 2022 23:18:11 +0000 (16:18 -0700)] 
Add instance disable reason  (#1993) (#2004)

Add instance disable reason

4 months agoImplement deactivate rest API (#1988)
Qi (Quincy) Qu [Thu, 24 Mar 2022 16:51:46 +0000 (09:51 -0700)] 
Implement deactivate rest API (#1988)

Implement deactivate REST API
Added a new REST API for deactivating cluster from supercluster.

4 months agoRead cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager...
xyuanlu [Thu, 24 Mar 2022 16:30:00 +0000 (09:30 -0700)] 
Read cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager constructor (#1986)

Read cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager constructor.

4 months agoAdd Authorization Components to helix-rest (#1967) (#1981)
Neal Sun [Mon, 14 Mar 2022 23:30:43 +0000 (16:30 -0700)] 
Add Authorization Components to helix-rest (#1967) (#1981)

* Add Authorization Components to helix-rest

* Address some comments

4 months agoUse ZooKeeper 3.5.9 in zookeeper-api instead (#1977)
Hunter Lee [Sat, 12 Mar 2022 23:22:21 +0000 (18:22 -0500)] 
Use ZooKeeper 3.5.9 in zookeeper-api instead (#1977)

With the upgrade of apache zookeeper version, snappy-java was missing in the dependency. This commit adds snappy-java and removes unused imports in osgi declaration. Also, using ZooKeeper 3.5.9 in zookeeper-api instead because 3.6.0+ causes some tests to fail in zookeeper-api.

5 months agoAdd 3 Zookeeper CreateMode types to AccessOption (#1975)
Ramin Bashizade [Wed, 9 Mar 2022 17:29:31 +0000 (09:29 -0800)] 
Add 3 Zookeeper CreateMode types to AccessOption (#1975)

This commit adds the 3 missing CreateMode types to AccessOption
class in helix-core: CONTAINER, PERSISTENT_WITH_TTL, and
PERSISTENT_SEQUENTIAL_WITH_TTL.

5 months agoUpgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS
rahulrane50 [Tue, 8 Mar 2022 21:42:44 +0000 (13:42 -0800)] 
Upgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS

Upgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS support

5 months ago[HELIX-862] s/maintainence/maintenance docs fix (#1968)
Micah Stubbs [Thu, 3 Mar 2022 22:08:34 +0000 (14:08 -0800)] 
[HELIX-862] s/maintainence/maintenance docs fix (#1968)

5 months agoAdd rest endpoint for virtual topology group (#1958)
Qi (Quincy) Qu [Wed, 16 Feb 2022 17:55:56 +0000 (12:55 -0500)] 
Add rest endpoint for virtual topology group (#1958)

5 months agoImplement java API and utils for virtual topology group (#1935)
Qi (Quincy) Qu [Tue, 8 Feb 2022 21:53:40 +0000 (16:53 -0500)] 
Implement java API and utils for virtual topology group (#1935)

Add comment to VirtualTopologyGroupService.

5 months agoIntroduce VirtualTopologyGroup and its assignment logic with benchmark. (#1948)
Qi (Quincy) Qu [Thu, 3 Feb 2022 20:18:46 +0000 (12:18 -0800)] 
Introduce VirtualTopologyGroup and its assignment logic with benchmark. (#1948)

* Cleanup unused assignment schemes and minor change.

* Further refactor and code cleanup.

5 months agoFix #1946 -- Refactor and move ClusterTopologyConfig
Qi (Quincy) Qu [Sat, 29 Jan 2022 00:14:09 +0000 (16:14 -0800)] 
Fix #1946 -- Refactor and move ClusterTopologyConfig

Move ClusterTopologyConfig from nested to a standalone class in helix/model and to be used by virtual topology group logic.

5 months agoUse final remaining capacity when computing weighted score (#1961)
xyuanlu [Wed, 16 Feb 2022 21:52:48 +0000 (13:52 -0800)] 
Use final remaining capacity when computing weighted score (#1961)

WAGED improvement: Use final remaining capacity when computing weighted score

5 months agoRemove WAGED sorting for each assignment (#1959)
xyuanlu [Tue, 15 Feb 2022 00:58:05 +0000 (16:58 -0800)] 
Remove WAGED sorting for each assignment (#1959)

Improve WAGED sorting from n^2 to n*log(n)

5 months agoremove log before write error message to ZNode (#1955)
xyuanlu [Mon, 7 Feb 2022 23:07:16 +0000 (15:07 -0800)] 
remove log before write error message to ZNode (#1955)

Remove message logging before writing ZNode.

5 months agoFixes #1802 - messages intended for instances that are no longer in the cluster ...
Komal Desai [Mon, 7 Feb 2022 23:06:57 +0000 (15:06 -0800)] 
Fixes #1802 - messages intended for instances that are no longer in the cluster (#1951)

In MessageGenerationPhase.java, - process() method populates the list of live instances from cache.

But while generateMessage() method has the sessionIdMap information, it still goes through partition/resource/instance map without checking if instance is still part of the cluster or not.

It is possible that cache has stale entry but that logic needs to be worked separately. But while generating message, we should check if the instance is still there.

So this is a simple change. We need to still look further if cache is getting invalidated properly.

To make sure that the cache properly is handled/refreshed under instance being replaced or deletion - have filled another bug: #1956

6 months agoLet logging framework format exception stack traces (#1954)
Henri Hagberg [Thu, 3 Feb 2022 21:35:05 +0000 (23:35 +0200)] 
Let logging framework format exception stack traces (#1954)

Where possible, logging calls are changed so that logging framework handles exception formatting instead of stack trace being manually formatted using Throwable#getStackTrace

6 months agoAdd new metrics to record ZNRecord compression count. (#1943)
Jiajun Wang [Wed, 2 Feb 2022 20:28:14 +0000 (12:28 -0800)] 
Add new metrics to record ZNRecord compression count. (#1943)

This PR determines if a ZK write request is compressed by calling GZipCompressionUtil. This is an indirect method and can be inaccurate. So the decision is based on trade-offs.

Alternatively, the ZkClientMonitor can be passed into the serializer class and then report compressed write internally. However, this will require multiple changes in the serializer interfaces.
Due to the multiple layers (PathBasedZkSerializer, ZkSerializer) of serializer interfaces definition, it would be very costly to implement the alternative without major refactoring.

6 months agoFix for - Stale message redundant logs
desaikomal [Mon, 31 Jan 2022 02:08:42 +0000 (18:08 -0800)] 
Fix for - Stale message redundant logs

Avoid printing redundant log messages for unrelated partitions and resources.

6 months agoFix Issue#1941 - Incorrect condition caused not to log error message
desaikomal [Sat, 29 Jan 2022 00:16:16 +0000 (16:16 -0800)] 
Fix Issue#1941 - Incorrect condition caused not to log error message

Properly populate the error log messages for partitions and resource names whose replica status is in ERROR state.

6 months agoFix CVE dependency issue (#1927)
CVEDetect [Wed, 19 Jan 2022 15:17:11 +0000 (23:17 +0800)] 
Fix CVE dependency issue (#1927)

6 months agoRemove dependency to an old Jackson v1 library (org.codehaus.jackson:jackson-mapper...
Andrzej Hołowko [Wed, 19 Jan 2022 15:15:31 +0000 (16:15 +0100)] 
Remove dependency to an old Jackson v1 library (org.codehaus.jackson:jackson-mapper-asl) affected by the critical vulnerability: CVE-2019-17267 (#1934)

6 months agoFix race condition in scheduler message processing logic. (#1930)
Jiajun Wang [Wed, 19 Jan 2022 01:35:45 +0000 (17:35 -0800)] 
Fix race condition in scheduler message processing logic. (#1930)

This PR aims to fix the race condition that happens during processing scheduler messages. The previous logic which dynamically delete task partitions in the scheduler message IdealState may cause conflicts and results in inconsistent message status update. Since updating the task partitions is not a necessary step, this PR removes the corresponding logic and simplify the message handling procedure.

This PR will help to stablize TestSchedulerMessage.java.

6 months agoDaemonize ZkBucketDataAccessor GC_THREAD (#1936)
Henri Hagberg [Tue, 18 Jan 2022 22:30:13 +0000 (00:30 +0200)] 
Daemonize ZkBucketDataAccessor GC_THREAD (#1936)

GC_THREAD (which is actually an ExecutorService, not Thread) is a static field in ZkBucketDataAccessor. The executor is started when ZkBucketDataAccessor class is initialized but it is never shut down. Since ExecutorService threads are generally not daemon threads, not shutting down GC_THREAD prevents JVM from shutting down cleanly.

This commit makes ZkBucketDataAccessor GC_THREAD a daemon thread so it doesn't prevent application shutdown.

6 months agoUpgrade Log4j to 2.16.0 to address CVE-2021-44228 (#1922)
Brent [Fri, 14 Jan 2022 19:57:57 +0000 (11:57 -0800)] 
Upgrade Log4j to 2.16.0 to address CVE-2021-44228 (#1922)

* HELIX-1921: Upgrade Log4j to 2.16.0 to address CVE-2021-44228
- Upgrade SLF4J API version from 1.7.25 to 1.7.32 (latest)
- Remove use of slf4j-log4j12 package
- Add use of log4j-slf4j-impl package
- Remove unused custom log appender class
- Change direct Log4J reference to SLF4J
- Add -Dlog4j2.formatMsgNoLookups flag to scripts.
- Rename properties files to log4j2.properties and change CLI parameter to log4j2.configurationFile for Log4j2's precedence behavior
- Change properties files to use log4j2 syntax
- Add -Dlog4j2.configurationFile=file://"$BASEDIR"/conf/log4j2.properties to CLIs that were missing it

6 months agoImprove helix tutorial and code formatting (#1931) (#1932)
Qi (Quincy) Qu [Tue, 11 Jan 2022 19:48:54 +0000 (11:48 -0800)] 
Improve helix tutorial and code formatting (#1931) (#1932)

Improve helix tutorial and code formatting

7 months agoAvoid NPE when getting property store through Helix-rest API. (#1929)
Jiajun Wang [Fri, 7 Jan 2022 22:21:45 +0000 (14:21 -0800)] 
Avoid NPE when getting property store through Helix-rest API. (#1929)

This PR aims to fix the ambiguous error return message when user request to get an empty ZK node through the Helix-rest property store access API.
This PR changes the server behavior to response NO_CONTENT instead of internal_server_error in the scenarios described above.

7 months agoDeclare dependency to Zookeeper in zookeeper-api-*.ivy (#1926)
Ramin Bashizade [Wed, 5 Jan 2022 19:45:54 +0000 (11:45 -0800)] 
Declare dependency to Zookeeper in zookeeper-api-*.ivy (#1926)

Adds dependency to Zookeeper in the ivy file in zookeeper-api module.

7 months agoFix a string operation for custom health check and update test (#1924)
xyuanlu [Mon, 20 Dec 2021 18:28:51 +0000 (10:28 -0800)] 
Fix a string operation for custom health check and update test (#1924)

7 months agoAdd take/free instance implementation and test (#1918)
xyuanlu [Wed, 15 Dec 2021 21:41:36 +0000 (13:41 -0800)] 
Add take/free instance implementation and test (#1918)

* take & free single instance impl

7 months agoMake theadpool shutdown timeout configurable for the HelixTaskExecutor. (#1920)
Jiajun Wang [Tue, 14 Dec 2021 00:51:39 +0000 (16:51 -0800)] 
Make theadpool shutdown timeout configurable for the HelixTaskExecutor. (#1920)

Add TestHelixTaskExecutor.testHandlerResetTimeout() to cover the new changes.
Also refactoring the related code to reduce duplicate and confusing code.

8 months agoImplement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper to...
xyuanlu [Fri, 3 Dec 2021 23:32:21 +0000 (15:32 -0800)] 
Implement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper to a RestSnapShot object (#1913)

* implement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper with RestSnapShot object

8 months agoAdd rest API for take/free instance (#1917)
xyuanlu [Thu, 2 Dec 2021 23:53:12 +0000 (15:53 -0800)] 
Add rest API for take/free instance (#1917)

* add rest API for take/free instance

8 months agorefactor instanceService to clusterMaintenanceService (#1912)
xyuanlu [Thu, 25 Nov 2021 03:05:04 +0000 (19:05 -0800)] 
refactor instanceService to clusterMaintenanceService (#1912)

8 months agoFix No Instance Level Throttling (#1908)
Junkai Xue [Mon, 22 Nov 2021 19:48:09 +0000 (11:48 -0800)] 
Fix No Instance Level Throttling (#1908)

Instance level throttling quota never charged. Add the charging logic and tests.

8 months agoSplit BatchGetInstancesStoppableChecks (#1902)
xyuanlu [Mon, 22 Nov 2021 17:45:24 +0000 (09:45 -0800)] 
Split BatchGetInstancesStoppableChecks (#1902)

Split BatchGetInstancesStoppableChecks into 2 private util functions.

8 months agoAdd 0.9.9 to menu bar
Junkai Xue [Sun, 21 Nov 2021 21:15:56 +0000 (13:15 -0800)] 
Add 0.9.9 to menu bar

8 months agoMissing end quote
Junkai Xue [Sun, 21 Nov 2021 20:27:19 +0000 (12:27 -0800)] 
Missing end quote