helix.git
7 months ago[maven-release-plugin] prepare release helix-0.9.9 helix-0.9.9
Junkai Xue [Wed, 17 Nov 2021 18:14:56 +0000 (10:14 -0800)] 
[maven-release-plugin] prepare release helix-0.9.9

7 months agoRevert "[maven-release-plugin] prepare release helix-0.9.9"
Junkai Xue [Wed, 17 Nov 2021 17:17:28 +0000 (09:17 -0800)] 
Revert "[maven-release-plugin] prepare release helix-0.9.9"

This reverts commit 6c8ae611a917753bb59e92146e812066cb64a61c.

7 months agoRevert "[maven-release-plugin] prepare for next development iteration"
Junkai Xue [Wed, 17 Nov 2021 17:17:13 +0000 (09:17 -0800)] 
Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit ba861a08c8756cbb94ee82ca60ddfa804da2b57a.

7 months agoBackport: Validate data write size limit in ZkClient #1072
Junkai Xue [Tue, 16 Nov 2021 23:28:43 +0000 (15:28 -0800)] 
Backport: Validate data write size limit in ZkClient #1072

7 months ago[maven-release-plugin] prepare for next development iteration
Junkai Xue [Sun, 14 Nov 2021 22:19:00 +0000 (14:19 -0800)] 
[maven-release-plugin] prepare for next development iteration

7 months ago[maven-release-plugin] prepare release helix-0.9.9
Junkai Xue [Sun, 14 Nov 2021 22:17:04 +0000 (14:17 -0800)] 
[maven-release-plugin] prepare release helix-0.9.9

7 months agoBackport patch of Add separate ZK serializer configuration to active ZNRecord compres...
Junkai Xue [Wed, 10 Nov 2021 22:41:01 +0000 (14:41 -0800)] 
Backport patch of Add separate ZK serializer configuration to active ZNRecord compression when size exceeds a threshold. #1901

20 months agoUpdate ivy files.
Jiajun Wang [Mon, 19 Oct 2020 22:51:11 +0000 (15:51 -0700)] 
Update ivy files.

20 months ago[maven-release-plugin] prepare for next development iteration
Jiajun Wang [Mon, 12 Oct 2020 19:13:34 +0000 (12:13 -0700)] 
[maven-release-plugin] prepare for next development iteration

20 months ago[maven-release-plugin] prepare release helix-0.9.8 helix-0.9.8
Jiajun Wang [Mon, 12 Oct 2020 19:13:21 +0000 (12:13 -0700)] 
[maven-release-plugin] prepare release helix-0.9.8

20 months agoEnable helix-front for release
Hunter Lee [Wed, 22 Jan 2020 02:31:51 +0000 (18:31 -0800)] 
Enable helix-front for release

20 months agoFix flaky test testGetChildrenOnLargeNumChildren (#1194)
Huizhi Lu [Wed, 5 Aug 2020 17:56:55 +0000 (10:56 -0700)] 
Fix flaky test testGetChildrenOnLargeNumChildren (#1194)

testGetChildrenOnLargeNumChildren becomes flaky after more commits are checked in because of reflection doesn't work as expected. This commit fixes it by replacing reflection with creating 110K children for the test.

20 months agoZkClient should not keep retrying getChildren() due to large number of children ...
Huizhi Lu [Thu, 23 Jul 2020 18:03:54 +0000 (11:03 -0700)] 
ZkClient should not keep retrying getChildren() due to large number of children (#1109)

For ZkClient's getChildren() operation, if there are a large number of children and the response packet size exceeds jute.maxbuffer default value 4MB on zk client side, ZkClient will get a ConnectionLossException and keep retrying connecting to ZK. The consequence is, the infinite retry may cause heavy GC on ZK server and kill ZK server.

This commit implements a workaround to exit retry loop for getChildren() if a large number of children cause connection loss.

20 months agoEnforce result check for data accessors batch get calls to prevent partial batch...
Jiajun Wang [Fri, 15 May 2020 00:34:22 +0000 (17:34 -0700)] 
Enforce result check for data accessors batch get calls to prevent partial batch read. (#974)

This will help to ensure the main Helix logic does not calculate based on incomplete input.

2 years agoDisable helix-front
Junkai Xue [Mon, 18 May 2020 20:03:19 +0000 (13:03 -0700)] 
Disable helix-front

2 years agoUpdate ivy files
Junkai Xue [Mon, 18 May 2020 20:02:38 +0000 (13:02 -0700)] 
Update ivy files

2 years ago[maven-release-plugin] prepare for next development iteration
Junkai Xue [Mon, 11 May 2020 23:06:05 +0000 (16:06 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release helix-0.9.7 helix-0.9.7
Junkai Xue [Mon, 11 May 2020 23:05:26 +0000 (16:05 -0700)] 
[maven-release-plugin] prepare release helix-0.9.7

2 years agoRevert "[maven-release-plugin] prepare release helix-0.9.5"
Junkai Xue [Mon, 11 May 2020 20:03:39 +0000 (13:03 -0700)] 
Revert "[maven-release-plugin] prepare release helix-0.9.5"

This reverts commit c4eafd0c522f5346231540b88b60a4368129fb87.

2 years agoRevert "[maven-release-plugin] prepare for next development iteration"
Junkai Xue [Mon, 11 May 2020 20:03:28 +0000 (13:03 -0700)] 
Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 2b1c90bcab34ab310ce4cd8beb21c470f7bce321.

2 years ago[maven-release-plugin] prepare for next development iteration
Junkai Xue [Mon, 11 May 2020 18:44:07 +0000 (11:44 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release helix-0.9.5
Junkai Xue [Mon, 11 May 2020 18:43:46 +0000 (11:43 -0700)] 
[maven-release-plugin] prepare release helix-0.9.5

2 years agoAsync write operation should not throw Exception for serializing error (#845) (#999)
Jiajun Wang [Thu, 7 May 2020 17:08:10 +0000 (10:08 -0700)] 
Async write operation should not throw Exception for serializing error (#845) (#999)

This change will make the async write operations return error through the async callback instead of throwing exceptions. This change will fix the batch write/create failure due to one single node serializing failure.
In addition, according to the serializer interface definition, change ZK related serializers to throw ZkMarshallingError instead of ZkClientException.

2 years agoDowngrade the log level to INFO when isInstanceSetup() fails. (#870)
Jiajun Wang [Fri, 6 Mar 2020 05:38:51 +0000 (21:38 -0800)] 
Downgrade the log level to INFO when isInstanceSetup() fails. (#870)

Follow the log convention in ZKUtil class and downgrade the log level to INFO when isInstanceSetup() fails. This change avoids verbose error log that misleads the users.

2 years agoAdd system property options to config write size limit for ZNRecord Serializer (...
Huizhi Lu [Wed, 6 May 2020 22:02:25 +0000 (15:02 -0700)] 
Add system property options to config write size limit for ZNRecord Serializer (#809) (#998)

With default value 1 MB of ZNRecord size limit in ZNRecord serializers, serialized data may still fail to be written to Zookeeper. This commit adds system property options to config ZNRecord's write size limit and auto compression enabled in ZNRecord serializers.

Signed-off-by: Huizhi Lu <hulu@linkedin.com>
Signed-off-by: Huizhi Lu <ihuizhi.lu@gmail.com>
2 years agoUpdate website with 0.9.4 helix-0.9.4-release
Hunter Lee [Thu, 23 Jan 2020 20:13:12 +0000 (12:13 -0800)] 
Update website with 0.9.4

2 years ago[maven-release-plugin] prepare for next development iteration
Hunter Lee [Wed, 22 Jan 2020 02:54:37 +0000 (18:54 -0800)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release helix-0.9.4 helix-0.9.4
Hunter Lee [Wed, 22 Jan 2020 02:54:26 +0000 (18:54 -0800)] 
[maven-release-plugin] prepare release helix-0.9.4

2 years agoEnable helix-front for release
Hunter Lee [Wed, 22 Jan 2020 02:31:51 +0000 (18:31 -0800)] 
Enable helix-front for release

2 years agoRevert "Deep copy for mapFields and listFields in ZNRecord's copy constructor. (...
Huizhi Lu [Mon, 4 Nov 2019 22:12:12 +0000 (14:12 -0800)] 
Revert "Deep copy for mapFields and listFields in ZNRecord's copy constructor. (#552)"

This reverts commit 2a335cf73ac65b53fd2b06c6b1ee8c70553d30b1.

2 years agoDeep copy for mapFields and listFields in ZNRecord's copy constructor. (#552)
Huizhi L [Thu, 31 Oct 2019 18:03:27 +0000 (11:03 -0700)] 
Deep copy for mapFields and listFields in ZNRecord's copy constructor. (#552)

Deep copy for mapFields and listFields in ZNRecord's copy constructor.
Change list:
1. deep copy for mapFields and listFields in ZNRecord's copy constructor.
2. add unit test for the deep copy constructor.

2 years agoFix null response for instance stoppable check when connection refused. (#504)
Huizhi L [Wed, 23 Oct 2019 06:29:09 +0000 (23:29 -0700)] 
Fix null response for instance stoppable check when connection refused. (#504)

Issue: Instance stoppable check endpoint /clusters//instances//stoppable returns null when connection between helix rest server and storage node.
This diff fixes this by: return a StoppableCheck object when connection refused.

2 years agoAdd back the original DataPropagationLatencyGuage (with a typo) and mark it deprecate...
Huizhi L [Wed, 23 Oct 2019 02:54:05 +0000 (19:54 -0700)] 
Add back the original DataPropagationLatencyGuage (with a typo) and mark it deprecated (#517)

If we remove the name with a typo, DataPropagationLatencyGuage, current metrics graph may not see the old metric and historical DataPropagationLatencyGuage data might get lost. To support backward compatibility, adding back DataPropagationLatencyGuage and mark it as deprecated.

2 years agoAdd a null check for StateModel in Participant reset logic (#523)
Hunter Lee [Tue, 22 Oct 2019 18:29:29 +0000 (11:29 -0700)] 
Add a null check for StateModel in Participant reset logic (#523)

It was discovered that sometimes during shutdown/disconnect, this reset() gets called, and due to the partition having been dropped right around the same time, we get an NPE on the state model. Added a null check.

2 years agoFix name typo for DataPropagationLatencyGuage. (#513)
Huizhi L [Fri, 18 Oct 2019 00:15:59 +0000 (17:15 -0700)] 
Fix name typo for DataPropagationLatencyGuage. (#513)

DataPropagationLatencyGuage has a typo which makes helix clients/products confusing to use. Fix this name typo to make it clearer.

Change list:

Refactor the name DataPropagationLatencyGuage with DataPropagationLatencyGauge.
Replace hard coded names DataPropagationLatencyGuage with enum name DataPropagationLatencyGuage in unit tests.

2 years agoAdd import order for java and javax. (#499)
Huizhi L [Thu, 10 Oct 2019 21:29:34 +0000 (14:29 -0700)] 
Add import order for java and javax. (#499)

* Add import order for java and javax.
* Remve the empty line between java and javax.

2 years agoAdd unit test for setting application name.
Huizhi Lu [Mon, 30 Sep 2019 17:15:09 +0000 (10:15 -0700)] 
Add unit test for setting application name.

2 years ago#493 Set jersey servlet application name with namespace name.
Huizhi Lu [Sat, 28 Sep 2019 03:53:13 +0000 (20:53 -0700)] 
#493 Set jersey servlet application name with namespace name.

2 years agoMake the Java Doc for API more clear
Junkai Xue [Mon, 23 Sep 2019 23:57:15 +0000 (16:57 -0700)] 
Make the Java Doc for API more clear

Some users got confused with inputs based on the Java doc. Make it more clear for user usage.

2 years agoAdd Intellij code style XML file for Helix code style. (#481)
pkuwm [Tue, 24 Sep 2019 21:46:28 +0000 (14:46 -0700)] 
Add Intellij code style XML file for Helix code style. (#481)

Add Intellij code style XML file so we can import it into Intellij to configure java code style for Helix.

2 years agoChange the way Helix triggers rebalance (#472)
Ali Reza Zamani Zadeh Najari [Tue, 24 Sep 2019 17:28:24 +0000 (10:28 -0700)] 
Change the way Helix triggers rebalance (#472)

A method is added which generates OnDemandRebalance event.
This event causes the controller to run the rebalance pipeline for both of the pipelines.
"Touch" logic (as in directly reading and writing to ZNodes) has been removed and replaced by this new method.

2 years agoFilter instances of weight = 0 for any partition assignment (#369)
Yi Wang [Wed, 18 Sep 2019 21:25:46 +0000 (14:25 -0700)] 
Filter instances of weight = 0 for any partition assignment (#369)

1. Fix the available space calculation issue in card dealing algorithm
2. Remove the instances of weight 0 from AbstractEvenDistributionRebalanceStrategy.java#computePartitionAssignment's input parameters

2 years ago[helix-rest] Delete unused default namespace (api "/namespaces/default") (#449)
pkuwm [Tue, 17 Sep 2019 21:33:37 +0000 (14:33 -0700)] 
[helix-rest] Delete unused default namespace (api "/namespaces/default") (#449)

We have a namespace api: /admin/v2/namespaces/{namespace}/. However, the /namespaces/default path is not in use. We need to delete it. On the code level,
if there is not a default namespace, we won't create a DEFAULT_SERVLET.
On the app level, we can configure app not to add name "default" namespace.
With this change, endpoint /admin/v2/namespaces/ will be disable if no namespace
sets IS_DEFAULT to true.

2 years agoFix CustomRebalancer's assignment computation (#477)
Hunter Lee [Mon, 16 Sep 2019 21:40:20 +0000 (14:40 -0700)] 
Fix CustomRebalancer's assignment computation (#477)

It was observed that sometimes CustomRebalancer would leave out an instance entirely if an instance is disabled or the partition on the instance was still bootstrapping (current state is null). This would cause a cluster not to converge. This diff fixes this by 1) still including an assignment from IdealState even though the current state is null (maybe due to a pending state transition) 2) putting disabled partitions in InitialState.
Changelist:
1. Fix the issue
2. Add a test: TestCustomRebalancer

2 years agoFix missed callbacks in CurrentStates based RoutingTableProvider. (#458)
pkuwm [Sat, 14 Sep 2019 00:08:17 +0000 (17:08 -0700)] 
Fix missed callbacks in CurrentStates based RoutingTableProvider. (#458)

1. Update BasicClusterDataCache to do refresh with selective update. Only when a change happens, we do the cache refresh only for that change type.
2. Improve RoutingTableProvider.queueEvent() and RoutingTableProvider.handleEvent(). Return instanceConfigs snapshot to callback immediately, instead of waiting for currentStates completion.

2 years agoFix helix-front build failure by downgrading types/lodash version. (#470)
pkuwm [Fri, 13 Sep 2019 00:51:50 +0000 (17:51 -0700)] 
Fix helix-front build failure by downgrading types/lodash version. (#470)

Fix helix-front build failure by downgrading types/lodash version.

2 years agoAdd field for MIN_ACTIVE_REPLICA_NOT_SET
Junkai Xue [Tue, 10 Sep 2019 22:42:42 +0000 (15:42 -0700)] 
Add field for MIN_ACTIVE_REPLICA_NOT_SET

2 years agoMake State Transition Throttling respect MIN_ACTIVE_REPLICA
Junkai Xue [Tue, 10 Sep 2019 00:14:57 +0000 (17:14 -0700)] 
Make State Transition Throttling respect MIN_ACTIVE_REPLICA

There are two phases for improving Helix state transition throttling:
1. Respect MIN_ACTIVE_REPLICA
2. Throttle per replica state transitions.

This commit contains the logic of respecting MIN_ACTIVE_REPLICA in IntermediateCalStage and state transition throttling.

2 years agoFix the issue where JobContext is not updated properly (#435)
Ali Reza Zamani Zadeh Najari [Tue, 10 Sep 2019 18:47:34 +0000 (11:47 -0700)] 
Fix the issue where JobContext is not updated properly (#435)

1- A method has been added which extracts the "prevInstanceToTaskAssignments" information from the context.
2- If it is confirmed that the currentstate is null:
An "if statement" is being utilized which sets the context using the target state information.
3- An integration test is added.

2 years agoFix the order of workflow context update
Ali Reza Zamani Zadeh Najari [Wed, 4 Sep 2019 19:39:45 +0000 (12:39 -0700)] 
Fix the order of workflow context update

* Fix the order of workflow context update

In this commit:
The order that workflow dispatcher updates the workflow status has been changed.
If execution delay is set and job is inflight, the context will get updated.
An integration test has been added.

* minor fixes

2 years agoTASK: Fix forceDelete for jobs in JobQueue
Hunter Lee [Wed, 4 Sep 2019 19:26:10 +0000 (12:26 -0700)] 
TASK: Fix forceDelete for jobs in JobQueue

We observed that the force delete functionality doesn't really work when the job is running, saying that the job is currently running. Force delete should go through regardless of the current job status.
Changelist:
1. Change the semantics in deleteJobFromQueue
2. Add an integration test: TestDeleteJobFromJobQueue

2 years agoremove all unused imports
leesf [Thu, 8 Aug 2019 07:46:02 +0000 (15:46 +0800)] 
remove all unused imports

2 years agoAdd integration test for workflow ForceDelete
Ali Reza Zamani Zadeh Najari [Fri, 30 Aug 2019 16:56:39 +0000 (09:56 -0700)] 
Add integration test for workflow ForceDelete

This commit adds integration tests for ForceDelete.
check the functionality of ForceDelete.
Add comment to ForceDelete that discourages users from using ForceDelete.
Several workflow states have been considered and checked for ForceDelete.

2 years agoTASK: Fix incorrect counting of numAttempts for tasks (#432)
Hunter Lee [Mon, 26 Aug 2019 20:48:21 +0000 (13:48 -0700)] 
TASK: Fix incorrect counting of numAttempts for tasks (#432)

TASK: Fix incorrect counting of numAttempts for tasks

It was discovered that sometimes the tasks' NUM_ATTEMPTS field in JobContext was getting incremented even without the tasks being retried. This was because the numAttempts field was getting incremented in other (incorrect) places than at scheduling time. The logic for incrementing the number of attempts has been moved to the schedule logic in this diff.
Changelist:
1. Modify tests so that they test for numAttempts more tightly
2. Fix the incrementation logic
3. Add a new integration test: TestTaskNumAttempts

2 years agoMake the reservoir sliding window length used in Helix monintor metrics configurable...
chenboat [Thu, 22 Aug 2019 04:43:01 +0000 (21:43 -0700)] 
Make the reservoir sliding window length used in Helix monintor metrics configurable. #382

2 years agoMake the reservoir sliding window length used in Helix monintor metrics configurable...
chenboat [Wed, 21 Aug 2019 04:21:03 +0000 (21:21 -0700)] 
Make the reservoir sliding window length used in Helix monintor metrics configurable. #382

2 years agoMake the reservoir sliding window length used in Helix monintor metrics configurable...
chenboat [Tue, 20 Aug 2019 06:11:43 +0000 (23:11 -0700)] 
Make the reservoir sliding window length used in Helix monintor metrics configurable. #382

2 years agoMake the reservoir sliding window length used in Helix monintor metrics configurable...
chenboat [Tue, 20 Aug 2019 04:59:59 +0000 (21:59 -0700)] 
Make the reservoir sliding window length used in Helix monintor metrics configurable. #382

2 years agoFix a typo. #382
chenboat [Sun, 18 Aug 2019 02:01:40 +0000 (19:01 -0700)] 
Fix a typo. #382

2 years agoFix a typo. #382
chenboat [Wed, 14 Aug 2019 06:22:33 +0000 (23:22 -0700)] 
Fix a typo. #382

2 years agoAdd a unit test case.. #382
chenboat [Wed, 14 Aug 2019 06:20:59 +0000 (23:20 -0700)] 
Add a unit test case.. #382

2 years agoUse the system property value as the sliding window length. #382
chenboat [Tue, 13 Aug 2019 05:55:22 +0000 (22:55 -0700)] 
Use the system property value as the sliding window length. #382

2 years agoUse the system property value as the sliding window length. #382
chenboat [Tue, 13 Aug 2019 05:47:54 +0000 (22:47 -0700)] 
Use the system property value as the sliding window length. #382

2 years agoUse the system property value as the sliding window length. #382
chenboat [Fri, 9 Aug 2019 07:06:11 +0000 (00:06 -0700)] 
Use the system property value as the sliding window length. #382

2 years agoFix the execution delay for the jobs
Ali Reza Zamani Zadeh Najari [Wed, 14 Aug 2019 15:57:49 +0000 (08:57 -0700)] 
Fix the execution delay for the jobs

In the Task Framework part of helix, execution delay for the jobs is not respected.
In this commit, when the job is extracted from the inflighJobs queue, the timeline has been checked before scheduling.

2 years agoMove partition heatlh check method into dataAccessor layer
Yi Wang [Thu, 15 Aug 2019 18:53:26 +0000 (11:53 -0700)] 
Move partition heatlh check method into dataAccessor layer

2 years agoUpdate menu bar.
Jiajun Wang [Mon, 19 Aug 2019 21:00:31 +0000 (14:00 -0700)] 
Update menu bar.

2 years agoFix typo for process name
Junkai Xue [Mon, 19 Aug 2019 18:15:05 +0000 (11:15 -0700)] 
Fix typo for process name

2 years agoRevert "Add ChangeDetector interface and ResourceChangeDetector implementation (...
Hunter Lee [Thu, 15 Aug 2019 21:46:31 +0000 (14:46 -0700)] 
Revert "Add ChangeDetector interface and ResourceChangeDetector implementation (#388)"

This reverts commit e0c1c66dd6ed9a01955927ea1828fabcf59eeaad.

2 years agoAdd ChangeDetector interface and ResourceChangeDetector implementation (#388)
Hunter Lee [Thu, 15 Aug 2019 21:33:02 +0000 (14:33 -0700)] 
Add ChangeDetector interface and ResourceChangeDetector implementation (#388)

Add ChangeDetector interface and ResourceChangeDetector implementation

In order to efficiently react to changes happening to the cluster in the new WAGED rebalancer, a new component called ChangeDetector was added.

Changelist:
1. Add ChangeDetector interface
2. Implement ResourceChangeDetector
3. Add ResourceChangeCache, a wrapper for critical cluster metadata
4. Add an integration test, TestResourceChangeDetector

2 years agoFix issue when client only sets ANY at cluster level throttle config
Yi Wang [Fri, 9 Aug 2019 00:34:47 +0000 (17:34 -0700)] 
Fix issue when client only sets ANY at cluster level throttle config

fixes #332
Added unit test for StateTransitionThrottleController
Added integration test for verifying case when only cluster level ANY throttle set to 1# Please enter the commit message for your changes. Lines starting

2 years agoFix ZNode does not exist in HealthCheck
Junkai Xue [Wed, 14 Aug 2019 18:23:11 +0000 (11:23 -0700)] 
Fix ZNode does not exist in HealthCheck

If the ZNode of PartitionHealth does not exist, REST will return failed checks due to NPE. The fix will be adding the instance to be refreshed entirely. Then REST can check based on API refreshed result.

2 years agoRevert "Reenable helix-front module for official release." (#406)
Jiajun Wang [Wed, 14 Aug 2019 20:58:22 +0000 (13:58 -0700)] 
Revert "Reenable helix-front module for official release." (#406)

This reverts commit 3c3db0bf797cbc1e0c1aec59395c8632ed6455db.

2 years agoRelease note for 0.9.1.
jiajunwang [Tue, 13 Aug 2019 23:10:56 +0000 (16:10 -0700)] 
Release note for 0.9.1.

2 years agoBump up the snapshot version.
jiajunwang [Tue, 13 Aug 2019 23:54:12 +0000 (16:54 -0700)] 
Bump up the snapshot version.

Also fix the missing helix-agent snapshot update logic in the bump-up.comand.

2 years agoMerge with the lastest optimization on batch get zookeeper properties
Yi Wang [Thu, 8 Aug 2019 19:11:19 +0000 (12:11 -0700)] 
Merge ... the lastest optimization on batch get zookeeper properties

2 years agoAdd InstanceServieImpl#batchGetInstancesStoppableChecks to solve performance issue...
Yi Wang [Tue, 6 Aug 2019 01:33:38 +0000 (18:33 -0700)] 
Add InstanceServieImpl#batchGetInstancesStoppableChecks to solve performance issue #366

2 years ago[maven-release-plugin] prepare for next development iteration
Jiajun Wang [Tue, 13 Aug 2019 18:30:45 +0000 (11:30 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release helix-0.9.1 helix-0.9.1
Jiajun Wang [Tue, 13 Aug 2019 18:30:35 +0000 (11:30 -0700)] 
[maven-release-plugin] prepare release helix-0.9.1

2 years agoReenable helix-front module for official release.
Jiajun Wang [Mon, 12 Aug 2019 20:57:21 +0000 (13:57 -0700)] 
Reenable helix-front module for official release.

2 years agoRevert "[maven-release-plugin] prepare release helix-0.9.1"
Jiajun Wang [Mon, 12 Aug 2019 20:55:40 +0000 (13:55 -0700)] 
Revert "[maven-release-plugin] prepare release helix-0.9.1"

This reverts commit c7e8e6366f6e5360d416e2fd1867252ebdcd7242.

2 years agoRevert "[maven-release-plugin] prepare for next development iteration"
Jiajun Wang [Mon, 12 Aug 2019 20:55:35 +0000 (13:55 -0700)] 
Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit f2746c823193991a0dd6152827b7344d66226368.

2 years ago[maven-release-plugin] prepare for next development iteration
Jiajun Wang [Mon, 12 Aug 2019 20:36:09 +0000 (13:36 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release helix-0.9.1
Jiajun Wang [Mon, 12 Aug 2019 20:35:57 +0000 (13:35 -0700)] 
[maven-release-plugin] prepare release helix-0.9.1

2 years agoFix the CallbackHandler registration logic in DistributedLeaderElection (#395)
Jiajun Wang [Mon, 12 Aug 2019 17:58:21 +0000 (10:58 -0700)] 
Fix the CallbackHandler registration logic in DistributedLeaderElection (#395)

* Fix the CallbackHandler registration logic in DistributedLeaderElection that may cause a leader node has no callback registered.

Our current initialization logic assumes a strict leader acquire/relinquish events sequence. However, due to the possible carried over ZK events from the previous ZK session, the controller node change event might be triggered in the following sequence:
1. CALLBACK (from the previous session): Create new leader node and add handlers.
2. FINALIZE (Handle the previous session expire): Clean up handlers.
3. INIT (For the new session establishment): Expect to add the handlers back again.
As a result, if the INIT event processing does not recover the handlers, the leader controller won't be able to manage anything. This fix ensures all the acquireLeadership call will try to initialize the leader controller's callback handlers.

Also, add the additional test logic in TestHandleNewSession to verify the fix.

* Improve the leader history update logic so there is no duplicate entry recorded.

2 years agoTASK: Drop all tasks whose requested states are DROPPED
Hunter Lee [Fri, 9 Aug 2019 23:57:05 +0000 (16:57 -0700)] 
TASK: Drop all tasks whose requested states are DROPPED

Upon a Participant disconnect, the Participant would carry over from the last session. This would copy all previous task states to the current session and set their requested states as DROPPED (for INIT and RUNNING states).

It came to our attention that sometimes these Participants experience connection issues and the tasks happen to be in TASK_ERROR or COMPLETED states. These tasks would get stuck on the Participant and never be dropped. This issue proposes to add the logic that would get all tasks whose requested states are DROPPED to be dropped immediately.
Changelist:
1. Make sure all tasks whose requested state is DROPPED get added to tasksToDrop
2. Add a unit test: TestDropTerminalTasksUponReset

2 years agoImprove ZK read with batch call
Junkai Xue [Mon, 5 Aug 2019 23:33:50 +0000 (16:33 -0700)] 
Improve ZK read with batch call

Current HealthReport read is single call for each participant. Improve it will batch call to ZK to reduce the number of calls.

2 years agoAdd reviews@helix.apache.org to mailing list
Junkai Xue [Tue, 6 Aug 2019 03:53:13 +0000 (20:53 -0700)] 
Add reviews@helix.apache.org to mailing list

2 years agoStablize the REST tests
Junkai Xue [Mon, 5 Aug 2019 23:25:03 +0000 (16:25 -0700)] 
Stablize the REST tests

Stablize the REST tests by following changes:
1. Remove temporary cluster which impact the ClusterAccessor test
2. Add all start/end message for test debug purpose.
3. Disable unstable monitoring test for default MBeans. Sometimes we can query it sometimes not. It is not critical test path. Let's make it stable later.

2 years agoRead ClusterConfig from ZK selectively
Hunter Lee [Tue, 6 Aug 2019 18:32:16 +0000 (11:32 -0700)] 
Read ClusterConfig from ZK selectively

Previously, ClusterConfig would be read from ZK every pipeline run. This PR makes it a selective read and also add to the set of all changed types so that cluster change detector could more easily tell whether ClusterConfig changed without having to store two copies of ClusterConfig objects.

2 years agoFix RoutingTableProvider statePropagationLatency metric reporting bug (#365)
kaisun2000 [Tue, 6 Aug 2019 18:58:16 +0000 (11:58 -0700)] 
Fix RoutingTableProvider statePropagationLatency metric reporting bug (#365)

Issue:

CurrentStateCache updating snapshot would miss all the existing partitions that having state change.

RoutingTableProvider callback on the main event thread. Time is not accounted in log.

Description:
fix the bug by updating the snapshot with the correct reloadkeys.

enhanced log to accout for user callback code separately.

Tests:
mvn test passed.

2 years agoDynamically change the processor thread name when consuming event
Yi Wang [Tue, 23 Jul 2019 23:27:59 +0000 (16:27 -0700)] 
Dynamically change the processor thread name when consuming event

2 years agoRemove DEFAULT_VIEW_CLUSTER_REFRESH_PERIOD from ClusterConfig
Hunter Lee [Mon, 5 Aug 2019 17:16:27 +0000 (10:16 -0700)] 
Remove DEFAULT_VIEW_CLUSTER_REFRESH_PERIOD from ClusterConfig

This is a constant that is no longer used.

2 years agoRemove .reviewboardrc from the open source repository
Hunter Lee [Mon, 5 Aug 2019 16:19:51 +0000 (09:19 -0700)] 
Remove .reviewboardrc from the open source repository

2 years agoRemove unnecessary touch logics that trigge pipeline
Ali Reza Zamani Zadeh Najari [Thu, 1 Aug 2019 17:54:28 +0000 (10:54 -0700)] 
Remove unnecessary touch logics that trigge pipeline

In the places that ZooKeeper Resourceconfig is updated,
it is not necessary to do touch logic anymore to run the pipeline again.
Resourcesconfig update automatically runs triggers pipeline.

This commit fixes issue #370.

2 years agoFix the race condition while Helix refresh cluster status cache. (#363)
jiajunwang [Tue, 30 Jul 2019 21:41:32 +0000 (14:41 -0700)] 
Fix the race condition while Helix refresh cluster status cache. (#363)

* Fix the race condition while Helix refresh cluster status cache.

This change fix issue #331.
The design is ensuring one read only to avoid locking during the change notification. However, a later update introduced addition read. The result is that two reads may have different results because notification is lock free. This leads the cache to be in an inconsistent state. The impact is that the expected rebalance might not happen.

2 years agoRemove TODO NPE log for computeResourceBestPossibleState
Ali Reza Zamani Zadeh Najari [Tue, 23 Jul 2019 22:15:47 +0000 (15:15 -0700)] 
Remove TODO NPE log for computeResourceBestPossibleState

The logs related to NPE in computeResourceBestPossibleState is not needed anymore.

This commit fixes issue #351.

2 years agoRead Failure while reading non-existent znode
Ali Najari [Wed, 17 Jul 2019 21:21:40 +0000 (14:21 -0700)] 
Read Failure while reading non-existent znode

In this commit, in case of encountering NoNodeException while reading data from a znode that does not exist, the NoNodeException will be caught and readfailurecounter will not incremented.
Instead, the related information (read Counter, read Latency, etc.) will be recorded.

This commit fixes issue #345.

2 years agoImplementation of stateModelDef modification in REST 2.0
Kai Sun [Wed, 17 Jul 2019 01:36:42 +0000 (18:36 -0700)] 
Implementation of stateModelDef modification in REST 2.0

Current implementation of Rest 2.0 does not support stateModelDef modification. Here, we will implement

delete -- remove the stateModelDef with the input id.

put -- create new statemodeldef if no existing one with same input id

set -- replace the content of node with input id

We also add the following test cases:

Test delete model one; expect success
Test delete model one again; expect success
Create the deleted model one; expect success
Create the deleted model one again; expect failure as the same model id exists
Set the model one with modified content; expect success
Read the model one; expect the content would be same as modified content
Set the model one to original content restore original state; expect success

2 years agoChange IllegalStateException to Helix Exception for CRUSH based rebalance strategy...
Ali Reza Zamani Zadeh Najari [Mon, 15 Jul 2019 22:25:52 +0000 (15:25 -0700)] 
Change IllegalStateException to Helix Exception for CRUSH based rebalance strategy algorithm

In this commit the IllegalStateException has been caught and HelixException has been thrown for the upper layer instead. The error log shows more meaningful exception.
A test has been changed accordingly.

This commit fixes issue #322.