helix.git
2 months ago[maven-release-plugin] prepare release helix-1.0.3 helix-1.0.3
Junkai Xue [Fri, 15 Apr 2022 04:44:07 +0000 (21:44 -0700)] 
[maven-release-plugin] prepare release helix-1.0.3

2 months agoChange apache source url
Junkai Xue [Fri, 15 Apr 2022 04:19:01 +0000 (21:19 -0700)] 
Change apache source url

2 months agoenable helix-front
Junkai Xue [Fri, 15 Apr 2022 03:27:07 +0000 (20:27 -0700)] 
enable helix-front

2 months agoImplement DefaultCloudEventCallbackImpl (#1995)
Molly Gao [Wed, 6 Apr 2022 17:53:45 +0000 (10:53 -0700)] 
Implement DefaultCloudEventCallbackImpl (#1995)

Implement a default callback implementation for Helix cloud event listeners.

2 months agoFix TestCloudEventCallbackProperty by bypassing connecting to zk (#2017)
Molly Gao [Wed, 6 Apr 2022 17:53:22 +0000 (10:53 -0700)] 
Fix TestCloudEventCallbackProperty by bypassing connecting to zk (#2017)

Due to logical change in ZKHelixManager constructor in a #1986, TestCloudEventCallbackProperty broke because in this test it doesn't connect to zookeeper server.
To fix this test, we separated MockCloudEventAwareHelixManager (previously called MockEventAwareZKHelixManager, nested inside TestCloudEventCallbackProperty)into a class, and include all and only the logics related to cloud events in MockCloudEventAwareHelixManager. More detailed, we mock a cloud config object retrieved from zk to bypass connection to zk.

2 months agoEnable HelixManager as an event listener (#1978)
Molly Gao [Mon, 21 Mar 2022 20:31:35 +0000 (13:31 -0700)] 
Enable HelixManager as an event listener (#1978)

Make helix manager cloud event aware by registering a cloud event listener when connect Helix manager

2 months agoAdd event handler and event listener interface (#1976)
Molly Gao [Thu, 10 Mar 2022 22:05:33 +0000 (14:05 -0800)] 
Add event handler and event listener interface (#1976)

This commit creates a skeleton for event handling framework and adds the following classes:
CloudEventListener interface
CloudEventHandler class
CloudEventHandlerFactory class

2 months agoImprove ZkClientMonitor and ZkClientPathMonitor performance (#2021)
Henri Hagberg [Thu, 7 Apr 2022 14:09:49 +0000 (17:09 +0300)] 
Improve ZkClientMonitor and ZkClientPathMonitor performance (#2021)

Previously, regex matches were used, which was inefficient. This commit does this following:
Replace String#matches with more efficient String#contains in ZkClientPathMonitor
Refactor record* methods in ZkClientMonitor to avoid repetition and simplify matching logic

3 months agoadd type and reason to cluster config (#2006)
xyuanlu [Wed, 6 Apr 2022 01:34:49 +0000 (18:34 -0700)] 
add type and reason to cluster config (#2006)

Add type and reason for batch disable/enable instance

3 months agoRefactor config string processing logic into a util class (#2015)
xyuanlu [Tue, 5 Apr 2022 21:01:02 +0000 (14:01 -0700)] 
Refactor config string processing logic into a util class (#2015)

Refactor config string processing logic into a util class

3 months agoFix TestDropResourceMetricsReset (#2011)
Qi (Quincy) Qu [Tue, 5 Apr 2022 17:41:44 +0000 (10:41 -0700)] 
Fix TestDropResourceMetricsReset (#2011)

Add a temp workaround by manually triggering a CurrentStageChange event for ExternalViewStage computation so that resource monitor can be cleaned up in unit test.

3 months agoPopulate helix cloud property using cloud config (#2005)
Molly Gao [Tue, 5 Apr 2022 17:13:01 +0000 (10:13 -0700)] 
Populate helix cloud property using cloud config (#2005)

Currently when instantiating a zk helix manager, we retrieve cloud config from zk and replace entire HelixCloudProperty object, which may cause some fields that user pass in in HelixCloudProperty that are not included in cloud config get missing. This commit changes the logic to only populate fields in HelixCloudProperty with values that are present in cloud config, and leave other fields unchanged.

3 months agoFix web build and deployment Issue (#2014)
Junkai Xue [Mon, 4 Apr 2022 17:17:49 +0000 (10:17 -0700)] 
Fix web build and deployment Issue (#2014)

3 months agofix Broken Logic in perPartitionHealthCheck (#2012)
xyuanlu [Mon, 4 Apr 2022 17:14:03 +0000 (10:14 -0700)] 
fix Broken Logic in perPartitionHealthCheck (#2012)

fix Broken Logic in perPartitionHealthCheck

3 months agofix repoducible builds issue (#2013)
Hervé Boutemy [Sun, 3 Apr 2022 21:18:07 +0000 (23:18 +0200)] 
fix repoducible builds issue (#2013)

3 months agoUpdate 0.9.9 to be replaced by 0.9.10
Junkai Xue [Sat, 2 Apr 2022 23:29:28 +0000 (16:29 -0700)] 
Update 0.9.9 to be replaced by 0.9.10

3 months agoadd new error message for customized partition check host connection error (#1984)
xyuanlu [Thu, 31 Mar 2022 23:19:00 +0000 (16:19 -0700)] 
add new error message for customized partition check host connection error  (#1984)

Add new error message for customized partition check host connection error

3 months agoauto Exit MM for auto EMM test (#2003)
xyuanlu [Thu, 31 Mar 2022 23:18:25 +0000 (16:18 -0700)] 
auto Exit MM for auto EMM test (#2003)

3 months agoAdd instance disable reason (#1993) (#2004)
xyuanlu [Thu, 31 Mar 2022 23:18:11 +0000 (16:18 -0700)] 
Add instance disable reason  (#1993) (#2004)

Add instance disable reason

3 months agoImplement deactivate rest API (#1988)
Qi (Quincy) Qu [Thu, 24 Mar 2022 16:51:46 +0000 (09:51 -0700)] 
Implement deactivate rest API (#1988)

Implement deactivate REST API
Added a new REST API for deactivating cluster from supercluster.

3 months agoRead cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager...
xyuanlu [Thu, 24 Mar 2022 16:30:00 +0000 (09:30 -0700)] 
Read cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager constructor (#1986)

Read cloud config from zk and propagate to HelixManagerProperty in ZkHelixManager constructor.

3 months agoAdd Authorization Components to helix-rest (#1967) (#1981)
Neal Sun [Mon, 14 Mar 2022 23:30:43 +0000 (16:30 -0700)] 
Add Authorization Components to helix-rest (#1967) (#1981)

* Add Authorization Components to helix-rest

* Address some comments

3 months agoUse ZooKeeper 3.5.9 in zookeeper-api instead (#1977)
Hunter Lee [Sat, 12 Mar 2022 23:22:21 +0000 (18:22 -0500)] 
Use ZooKeeper 3.5.9 in zookeeper-api instead (#1977)

With the upgrade of apache zookeeper version, snappy-java was missing in the dependency. This commit adds snappy-java and removes unused imports in osgi declaration. Also, using ZooKeeper 3.5.9 in zookeeper-api instead because 3.6.0+ causes some tests to fail in zookeeper-api.

3 months agoAdd 3 Zookeeper CreateMode types to AccessOption (#1975)
Ramin Bashizade [Wed, 9 Mar 2022 17:29:31 +0000 (09:29 -0800)] 
Add 3 Zookeeper CreateMode types to AccessOption (#1975)

This commit adds the 3 missing CreateMode types to AccessOption
class in helix-core: CONTAINER, PERSISTENT_WITH_TTL, and
PERSISTENT_SEQUENTIAL_WITH_TTL.

3 months agoUpgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS
rahulrane50 [Tue, 8 Mar 2022 21:42:44 +0000 (13:42 -0800)] 
Upgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS

Upgrading Zookeeper version to 3.6.13 to enable zk client SSL/TLS support

4 months ago[HELIX-862] s/maintainence/maintenance docs fix (#1968)
Micah Stubbs [Thu, 3 Mar 2022 22:08:34 +0000 (14:08 -0800)] 
[HELIX-862] s/maintainence/maintenance docs fix (#1968)

4 months agoAdd rest endpoint for virtual topology group (#1958)
Qi (Quincy) Qu [Wed, 16 Feb 2022 17:55:56 +0000 (12:55 -0500)] 
Add rest endpoint for virtual topology group (#1958)

4 months agoImplement java API and utils for virtual topology group (#1935)
Qi (Quincy) Qu [Tue, 8 Feb 2022 21:53:40 +0000 (16:53 -0500)] 
Implement java API and utils for virtual topology group (#1935)

Add comment to VirtualTopologyGroupService.

4 months agoIntroduce VirtualTopologyGroup and its assignment logic with benchmark. (#1948)
Qi (Quincy) Qu [Thu, 3 Feb 2022 20:18:46 +0000 (12:18 -0800)] 
Introduce VirtualTopologyGroup and its assignment logic with benchmark. (#1948)

* Cleanup unused assignment schemes and minor change.

* Further refactor and code cleanup.

4 months agoFix #1946 -- Refactor and move ClusterTopologyConfig
Qi (Quincy) Qu [Sat, 29 Jan 2022 00:14:09 +0000 (16:14 -0800)] 
Fix #1946 -- Refactor and move ClusterTopologyConfig

Move ClusterTopologyConfig from nested to a standalone class in helix/model and to be used by virtual topology group logic.

4 months agoUse final remaining capacity when computing weighted score (#1961)
xyuanlu [Wed, 16 Feb 2022 21:52:48 +0000 (13:52 -0800)] 
Use final remaining capacity when computing weighted score (#1961)

WAGED improvement: Use final remaining capacity when computing weighted score

4 months agoRemove WAGED sorting for each assignment (#1959)
xyuanlu [Tue, 15 Feb 2022 00:58:05 +0000 (16:58 -0800)] 
Remove WAGED sorting for each assignment (#1959)

Improve WAGED sorting from n^2 to n*log(n)

4 months agoremove log before write error message to ZNode (#1955)
xyuanlu [Mon, 7 Feb 2022 23:07:16 +0000 (15:07 -0800)] 
remove log before write error message to ZNode (#1955)

Remove message logging before writing ZNode.

4 months agoFixes #1802 - messages intended for instances that are no longer in the cluster ...
Komal Desai [Mon, 7 Feb 2022 23:06:57 +0000 (15:06 -0800)] 
Fixes #1802 - messages intended for instances that are no longer in the cluster (#1951)

In MessageGenerationPhase.java, - process() method populates the list of live instances from cache.

But while generateMessage() method has the sessionIdMap information, it still goes through partition/resource/instance map without checking if instance is still part of the cluster or not.

It is possible that cache has stale entry but that logic needs to be worked separately. But while generating message, we should check if the instance is still there.

So this is a simple change. We need to still look further if cache is getting invalidated properly.

To make sure that the cache properly is handled/refreshed under instance being replaced or deletion - have filled another bug: #1956

5 months agoLet logging framework format exception stack traces (#1954)
Henri Hagberg [Thu, 3 Feb 2022 21:35:05 +0000 (23:35 +0200)] 
Let logging framework format exception stack traces (#1954)

Where possible, logging calls are changed so that logging framework handles exception formatting instead of stack trace being manually formatted using Throwable#getStackTrace

5 months agoAdd new metrics to record ZNRecord compression count. (#1943)
Jiajun Wang [Wed, 2 Feb 2022 20:28:14 +0000 (12:28 -0800)] 
Add new metrics to record ZNRecord compression count. (#1943)

This PR determines if a ZK write request is compressed by calling GZipCompressionUtil. This is an indirect method and can be inaccurate. So the decision is based on trade-offs.

Alternatively, the ZkClientMonitor can be passed into the serializer class and then report compressed write internally. However, this will require multiple changes in the serializer interfaces.
Due to the multiple layers (PathBasedZkSerializer, ZkSerializer) of serializer interfaces definition, it would be very costly to implement the alternative without major refactoring.

5 months agoFix for - Stale message redundant logs
desaikomal [Mon, 31 Jan 2022 02:08:42 +0000 (18:08 -0800)] 
Fix for - Stale message redundant logs

Avoid printing redundant log messages for unrelated partitions and resources.

5 months agoFix Issue#1941 - Incorrect condition caused not to log error message
desaikomal [Sat, 29 Jan 2022 00:16:16 +0000 (16:16 -0800)] 
Fix Issue#1941 - Incorrect condition caused not to log error message

Properly populate the error log messages for partitions and resource names whose replica status is in ERROR state.

5 months agoFix CVE dependency issue (#1927)
CVEDetect [Wed, 19 Jan 2022 15:17:11 +0000 (23:17 +0800)] 
Fix CVE dependency issue (#1927)

5 months agoRemove dependency to an old Jackson v1 library (org.codehaus.jackson:jackson-mapper...
Andrzej Hołowko [Wed, 19 Jan 2022 15:15:31 +0000 (16:15 +0100)] 
Remove dependency to an old Jackson v1 library (org.codehaus.jackson:jackson-mapper-asl) affected by the critical vulnerability: CVE-2019-17267 (#1934)

5 months agoFix race condition in scheduler message processing logic. (#1930)
Jiajun Wang [Wed, 19 Jan 2022 01:35:45 +0000 (17:35 -0800)] 
Fix race condition in scheduler message processing logic. (#1930)

This PR aims to fix the race condition that happens during processing scheduler messages. The previous logic which dynamically delete task partitions in the scheduler message IdealState may cause conflicts and results in inconsistent message status update. Since updating the task partitions is not a necessary step, this PR removes the corresponding logic and simplify the message handling procedure.

This PR will help to stablize TestSchedulerMessage.java.

5 months agoDaemonize ZkBucketDataAccessor GC_THREAD (#1936)
Henri Hagberg [Tue, 18 Jan 2022 22:30:13 +0000 (00:30 +0200)] 
Daemonize ZkBucketDataAccessor GC_THREAD (#1936)

GC_THREAD (which is actually an ExecutorService, not Thread) is a static field in ZkBucketDataAccessor. The executor is started when ZkBucketDataAccessor class is initialized but it is never shut down. Since ExecutorService threads are generally not daemon threads, not shutting down GC_THREAD prevents JVM from shutting down cleanly.

This commit makes ZkBucketDataAccessor GC_THREAD a daemon thread so it doesn't prevent application shutdown.

5 months agoUpgrade Log4j to 2.16.0 to address CVE-2021-44228 (#1922)
Brent [Fri, 14 Jan 2022 19:57:57 +0000 (11:57 -0800)] 
Upgrade Log4j to 2.16.0 to address CVE-2021-44228 (#1922)

* HELIX-1921: Upgrade Log4j to 2.16.0 to address CVE-2021-44228
- Upgrade SLF4J API version from 1.7.25 to 1.7.32 (latest)
- Remove use of slf4j-log4j12 package
- Add use of log4j-slf4j-impl package
- Remove unused custom log appender class
- Change direct Log4J reference to SLF4J
- Add -Dlog4j2.formatMsgNoLookups flag to scripts.
- Rename properties files to log4j2.properties and change CLI parameter to log4j2.configurationFile for Log4j2's precedence behavior
- Change properties files to use log4j2 syntax
- Add -Dlog4j2.configurationFile=file://"$BASEDIR"/conf/log4j2.properties to CLIs that were missing it

5 months agoImprove helix tutorial and code formatting (#1931) (#1932)
Qi (Quincy) Qu [Tue, 11 Jan 2022 19:48:54 +0000 (11:48 -0800)] 
Improve helix tutorial and code formatting (#1931) (#1932)

Improve helix tutorial and code formatting

5 months agoAvoid NPE when getting property store through Helix-rest API. (#1929)
Jiajun Wang [Fri, 7 Jan 2022 22:21:45 +0000 (14:21 -0800)] 
Avoid NPE when getting property store through Helix-rest API. (#1929)

This PR aims to fix the ambiguous error return message when user request to get an empty ZK node through the Helix-rest property store access API.
This PR changes the server behavior to response NO_CONTENT instead of internal_server_error in the scenarios described above.

5 months agoDeclare dependency to Zookeeper in zookeeper-api-*.ivy (#1926)
Ramin Bashizade [Wed, 5 Jan 2022 19:45:54 +0000 (11:45 -0800)] 
Declare dependency to Zookeeper in zookeeper-api-*.ivy (#1926)

Adds dependency to Zookeeper in the ivy file in zookeeper-api module.

6 months agoFix a string operation for custom health check and update test (#1924)
xyuanlu [Mon, 20 Dec 2021 18:28:51 +0000 (10:28 -0800)] 
Fix a string operation for custom health check and update test (#1924)

6 months agoAdd take/free instance implementation and test (#1918)
xyuanlu [Wed, 15 Dec 2021 21:41:36 +0000 (13:41 -0800)] 
Add take/free instance implementation and test (#1918)

* take & free single instance impl

6 months agoMake theadpool shutdown timeout configurable for the HelixTaskExecutor. (#1920)
Jiajun Wang [Tue, 14 Dec 2021 00:51:39 +0000 (16:51 -0800)] 
Make theadpool shutdown timeout configurable for the HelixTaskExecutor. (#1920)

Add TestHelixTaskExecutor.testHandlerResetTimeout() to cover the new changes.
Also refactoring the related code to reduce duplicate and confusing code.

7 months agoImplement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper to...
xyuanlu [Fri, 3 Dec 2021 23:32:21 +0000 (15:32 -0800)] 
Implement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper to a RestSnapShot object (#1913)

* implement RestSnapShot and substitute the kv maps in HelixDataAccessorWrapper with RestSnapShot object

7 months agoAdd rest API for take/free instance (#1917)
xyuanlu [Thu, 2 Dec 2021 23:53:12 +0000 (15:53 -0800)] 
Add rest API for take/free instance (#1917)

* add rest API for take/free instance

7 months agorefactor instanceService to clusterMaintenanceService (#1912)
xyuanlu [Thu, 25 Nov 2021 03:05:04 +0000 (19:05 -0800)] 
refactor instanceService to clusterMaintenanceService (#1912)

7 months agoFix No Instance Level Throttling (#1908)
Junkai Xue [Mon, 22 Nov 2021 19:48:09 +0000 (11:48 -0800)] 
Fix No Instance Level Throttling (#1908)

Instance level throttling quota never charged. Add the charging logic and tests.

7 months agoSplit BatchGetInstancesStoppableChecks (#1902)
xyuanlu [Mon, 22 Nov 2021 17:45:24 +0000 (09:45 -0800)] 
Split BatchGetInstancesStoppableChecks (#1902)

Split BatchGetInstancesStoppableChecks into 2 private util functions.

7 months agoAdd 0.9.9 to menu bar
Junkai Xue [Sun, 21 Nov 2021 21:15:56 +0000 (13:15 -0800)] 
Add 0.9.9 to menu bar

7 months agoMissing end quote
Junkai Xue [Sun, 21 Nov 2021 20:27:19 +0000 (12:27 -0800)] 
Missing end quote

7 months agoFix site.xml head/footer
Junkai Xue [Sun, 21 Nov 2021 20:09:42 +0000 (12:09 -0800)] 
Fix site.xml head/footer

7 months agoRemove 0.9.7 doc folder
Junkai Xue [Sun, 21 Nov 2021 20:03:21 +0000 (12:03 -0800)] 
Remove 0.9.7 doc folder

7 months agoRemove 0.9.1, 0.9.4, 0.9.7, 1.0.0 docs to keep only latest 2 releases doc
Junkai Xue [Sun, 21 Nov 2021 20:01:41 +0000 (12:01 -0800)] 
Remove 0.9.1, 0.9.4, 0.9.7, 1.0.0 docs to keep only latest 2 releases doc

7 months agoRemove 0.8 series doc
Junkai Xue [Sun, 21 Nov 2021 19:58:11 +0000 (11:58 -0800)] 
Remove 0.8 series doc

7 months agoupgrade maven-site-plugin: fix site.xml head/footer (#1910)
Hervé Boutemy [Sun, 21 Nov 2021 10:18:01 +0000 (11:18 +0100)] 
upgrade maven-site-plugin: fix site.xml head/footer (#1910)

7 months agoupdate parent and site plugin (#1909)
Hervé Boutemy [Sat, 20 Nov 2021 11:42:23 +0000 (12:42 +0100)] 
update parent and site plugin (#1909)

7 months agoAdd additional note for release
Junkai Xue [Wed, 17 Nov 2021 21:55:26 +0000 (13:55 -0800)] 
Add additional note for release

7 months agofix input issue for stoppable rest API (#1905)
xyuanlu [Wed, 17 Nov 2021 01:10:57 +0000 (17:10 -0800)] 
fix input issue for stoppable rest API (#1905)

7 months agoAdd 0.9.9 release notes
Junkai Xue [Tue, 16 Nov 2021 22:26:14 +0000 (14:26 -0800)] 
Add 0.9.9 release notes

7 months agoadd take/free instance(s) API (#1899)
xyuanlu [Mon, 15 Nov 2021 19:41:23 +0000 (11:41 -0800)] 
add take/free instance(s) API (#1899)

Create Cluster Maintenance Management service and add API signature and interfaces

8 months agoAdd additional ZK serializer configuration to active ZNRecord compression even the...
Jiajun Wang [Thu, 4 Nov 2021 18:52:05 +0000 (11:52 -0700)] 
Add additional ZK serializer configuration to active ZNRecord compression even the node size is smaller than write size limit.

The property zk.serializer.znrecord.auto-compress.threshold.bytes defines a threshold of ZNRecord size in bytes that the ZK serializer starts to auto compress the ZNRecord for write requests if it's size exceeds the threshold.
If the threshold is not configured or exceed ZKRecord write size limit, default value zk.serializer.znrecord.write.size.limit.bytes (if configured) or 1MB (if no configuration) will be applied.

8 months agoFix log format in instanceValidationUtil (#1894)
xyuanlu [Mon, 1 Nov 2021 20:11:51 +0000 (13:11 -0700)] 
Fix log format in instanceValidationUtil (#1894)

* fix log format in instanceValidationUtil

9 months agoFix test failure TestInstancesAccessor (#1881)
Junkai Xue [Mon, 27 Sep 2021 20:23:23 +0000 (13:23 -0700)] 
Fix test failure TestInstancesAccessor (#1881)

9 months agoFix inconsistent behavior beween batch stoppable and single stoppable API (#1879)
Junkai Xue [Fri, 24 Sep 2021 17:46:23 +0000 (10:46 -0700)] 
Fix inconsistent behavior beween batch stoppable and single stoppable API (#1879)

9 months agoFix TestClusterAggregateMetrics (#1842)
Neal Sun [Tue, 21 Sep 2021 23:15:55 +0000 (16:15 -0700)] 
Fix TestClusterAggregateMetrics (#1842)

Fix TestClusterAggregateMetrics

9 months agoImprove Purge Offline Instances API (#1870)
Neal Sun [Tue, 21 Sep 2021 23:15:16 +0000 (16:15 -0700)] 
Improve Purge Offline Instances API (#1870)

This commit improves the API such that it will also purge any incomplete instance data, such as instance path without InstanceConfig or ParticipantHistory.

9 months agoFix adding a task to a job after deleting old tasks (#1875)
Ali Reza Zamani Zadeh Najari [Fri, 17 Sep 2021 19:56:20 +0000 (12:56 -0700)] 
Fix adding a task to a job after deleting old tasks (#1875)

In this commit, the issue of dynamically adding a task to a job
in which some of its tasks have been deleted before is being
addressed.

9 months agoFix flaky test testClusterFreezeMode (#1871)
Huizhi Lu [Tue, 14 Sep 2021 18:15:55 +0000 (11:15 -0700)] 
Fix flaky test testClusterFreezeMode (#1871)

The test fails randomly. Root cause is the test used a random cluster from the clusters set which has super clusters and task cluster, so the cluster could be a task cluster and then the test fails. Freeze mode does not apply to task framework.

This commit fixes it by changing the test cluster name with a fixed name TestClusters_0.

9 months agoFix error partition blocks load rebalance (#1867)
Junkai Xue [Tue, 14 Sep 2021 02:43:24 +0000 (19:43 -0700)] 
Fix error partition blocks load rebalance (#1867)

There are three things fixed:
1. State priority is higher priority with smaller number.
2. When only downward is allowed, any non downward STs must be removed from message and throttled.
3. Even for downward STs should be respect to the throttling as backward compatible behavior.
4. Fix test for TestErrorReplicaPersist

9 months agoChange our release voting process
Junkai Xue [Fri, 10 Sep 2021 00:06:07 +0000 (17:06 -0700)] 
Change our release voting process

Acked from our Apache Helix VP, we will count committers' vote as binding vote for release.

9 months agoAdd input validation for getJobContext (#1864)
xyuanlu [Wed, 8 Sep 2021 21:05:03 +0000 (14:05 -0700)] 
Add input validation for getJobContext (#1864)

* add validation for getJobContext

10 months agoFix a potential race condition in MBean unregister logic.
Jiajun Wang [Wed, 1 Sep 2021 17:43:07 +0000 (10:43 -0700)] 
Fix a potential race condition in MBean unregister logic.

The unregister method should relies on the MBeanServer class to validate if the target MBean has been unregistered or not to avoid race condition.

10 months agoUpgrade mockito lib version to avoid test case failure due to mock mechanism legacy...
Jiajun Wang [Tue, 31 Aug 2021 21:01:10 +0000 (14:01 -0700)] 
Upgrade mockito lib version to avoid test case failure due to mock mechanism legacy issues.

10 months agoImprove TestControllerLeadershipChange test logic to tolerate longer delay when test...
Jiajun Wang [Tue, 31 Aug 2021 17:24:05 +0000 (10:24 -0700)] 
Improve TestControllerLeadershipChange test logic to tolerate longer delay when test runs. (#1853)

Change the assert condition to consider real test script execute delay to avoid test failure due to slow test runs.

10 months agoRemove unpublished 0.9.9
Junkai Xue [Wed, 1 Sep 2021 19:07:53 +0000 (12:07 -0700)] 
Remove unpublished 0.9.9

10 months agoRemove archived 0.6 & 0.7 version docs
Junkai Xue [Wed, 1 Sep 2021 19:05:17 +0000 (12:05 -0700)] 
Remove archived 0.6 & 0.7 version docs

10 months agoUpdate Menu Bar
Junkai Xue [Wed, 1 Sep 2021 18:36:04 +0000 (11:36 -0700)] 
Update Menu Bar

10 months agoFix management mode history duplicate recording (#1846)
Huizhi Lu [Wed, 1 Sep 2021 17:00:34 +0000 (10:00 -0700)] 
Fix management mode history duplicate recording (#1846)

The management mode history has duplicate entries. It does not impact the normal function, but it's good to get it fixed to avoid confusion. This commit fixes the issue by adding a check for the status in metadata store and the calculated status.

10 months agoRevert "Improve TestControllerLeadershipChange test logic to tolerate longer delay...
Jiajun Wang [Tue, 31 Aug 2021 20:02:33 +0000 (13:02 -0700)] 
Revert "Improve TestControllerLeadershipChange test logic to tolerate longer delay when test runs. (#1853)" (#1860)

This reverts commit a570d0566c42942b6154cb84a3d44f864fde37f0.

10 months agoImprove TestControllerLeadershipChange test logic to tolerate longer delay when test...
Jiajun Wang [Tue, 31 Aug 2021 17:24:05 +0000 (10:24 -0700)] 
Improve TestControllerLeadershipChange test logic to tolerate longer delay when test runs. (#1853)

Change the assert condition to consider real test script execute delay to avoid test failure due to slow test runs.

10 months agoThrow exception when query partition assignment in Maintenance mode (#1855)
xyuanlu [Mon, 30 Aug 2021 18:25:34 +0000 (11:25 -0700)] 
Throw exception when query partition assignment in Maintenance mode (#1855)

Throw exception for partitionAssignment when cluster in Maintenance mode.

10 months agoDisable delayed rebalance as default for partitionAssignment (#1852)
xyuanlu [Mon, 30 Aug 2021 18:25:06 +0000 (11:25 -0700)] 
Disable delayed rebalance as default for partitionAssignment (#1852)

Disable delayed rebalance as default for partitionAssignment.

10 months agoUse activate/DeactivateInstances as keyword in PartitionAssignment API (#1850)
xyuanlu [Tue, 24 Aug 2021 21:23:45 +0000 (14:23 -0700)] 
Use activate/DeactivateInstances as keyword in PartitionAssignment API (#1850)

Use activate/DeactivateInstances as keyword in PartitionAssignment API.

10 months agoAdd stop server wait time to be 10 seconds in MockMetadataStoreDirectoryServer to...
Jiajun Wang [Tue, 24 Aug 2021 00:42:20 +0000 (17:42 -0700)] 
Add stop server wait time to be 10 seconds in MockMetadataStoreDirectoryServer to avoid test failure. (#1848)

This change aims to reduce the unexpected test failure due to http endpoint in use. The additional timeout shall help the test terminate server instances gracefully.

10 months agoCheck server field state before shutdown to avoid NPE in MockMetadataStoreDirectorySe...
Jiajun Wang [Mon, 23 Aug 2021 23:04:20 +0000 (16:04 -0700)] 
Check server field state before shutdown to avoid NPE in MockMetadataStoreDirectoryServer. (#1847)

Check server field state before shutdown to avoid NPE in MockMetadataStoreDirectoryServer.

10 months agoRemove duplicated notice content
Junkai Xue [Thu, 19 Aug 2021 19:55:36 +0000 (12:55 -0700)] 
Remove duplicated notice content

10 months agoUpdate Lisence and Notice
Junkai Xue [Thu, 19 Aug 2021 19:53:38 +0000 (12:53 -0700)] 
Update Lisence and Notice

10 months agoAdd TF Available Threads Metrics (#1834)
Neal Sun [Mon, 16 Aug 2021 18:22:49 +0000 (11:22 -0700)] 
Add TF Available Threads Metrics (#1834)

Add metrics about Task Framework available threads in the cluster per job type.

10 months agoFix test logic for testExternalViewDiffFromTargetExternalView (#1835)
Junkai Xue [Mon, 9 Aug 2021 18:44:07 +0000 (11:44 -0700)] 
Fix test logic for testExternalViewDiffFromTargetExternalView (#1835)

11 months agoStabilize TestInstancesAccessor (#1828)
Ali Reza Zamani Zadeh Najari [Thu, 5 Aug 2021 18:35:01 +0000 (11:35 -0700)] 
Stabilize TestInstancesAccessor (#1828)

TestInstancesAccessor is unstable because the cluster might not be
stable while the test runs the stoppable check and cluster config and
instance config can be missing. In this commit, the test waits until the
cluster config and instance config get created and then proceeds for
the stoppable checks.

11 months agoFix the JavaDoc about transition priority. (#1831)
Jiajun Wang [Wed, 4 Aug 2021 19:33:45 +0000 (12:33 -0700)] 
Fix the JavaDoc about transition priority. (#1831)

Fix the JavaDoc about transition priority.
In addition, update several old code style for simplicity.

11 months agoFix TestZkConnectionLost (#1824)
Neal Sun [Tue, 3 Aug 2021 17:57:12 +0000 (10:57 -0700)] 
Fix TestZkConnectionLost (#1824)

Fix TestZkConnectionLost by adding longer connection timeout and fixing logic that are incorrect.

Co-authored-by: Neal Sun <nesun@nesun-mn1.linkedin.biz>
11 months agoAdd response metadata to response header for partitionAssignment (#1797)
xyuanlu [Tue, 3 Aug 2021 17:53:59 +0000 (10:53 -0700)] 
Add response metadata to response header for partitionAssignment (#1797)

Add response metadata to response header for partitionAssignment.

11 months agoRename property CLUSTER_PAUSE to CLUSTER_FREEZE in PauseSignal (#1820)
Huizhi Lu [Fri, 16 Jul 2021 03:26:11 +0000 (20:26 -0700)] 
Rename property CLUSTER_PAUSE to CLUSTER_FREEZE in PauseSignal (#1820)

Rename property CLUSTER_PAUSE to CLUSTER_FREEZE in PauseSignal.