Junkai Xue [Wed, 15 Jul 2020 23:03:29 +0000 (16:03 -0700)]
[maven-release-plugin] prepare for next development iteration
Junkai Xue [Wed, 15 Jul 2020 20:09:22 +0000 (13:09 -0700)]
[maven-release-plugin] prepare release helix-1.0.1
Hunter Lee [Wed, 15 Jul 2020 19:11:23 +0000 (12:11 -0700)]
Remove legacy duplicate metric classes in helix-core (#1147)
As part of ZooKeeper API separation initiative, most metric-related classes were moved to metrics-common module. However, there were some that were left in helix-core for backward-compatibility purposes, but these were causing class conflicts at runtime. This change removes such classes.
Junkai Xue [Wed, 15 Jul 2020 19:54:05 +0000 (12:54 -0700)]
Revert "[maven-release-plugin] prepare release helix-1.0.1"
This reverts commit
4231a87ae8bdbe3c71b3ed8ab12853955e5d75b7.
Junkai Xue [Wed, 15 Jul 2020 19:53:48 +0000 (12:53 -0700)]
Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit
e60b8c6247352c601b9b5556a72301cd88c6df2c.
Junkai Xue [Wed, 1 Jul 2020 21:22:04 +0000 (14:22 -0700)]
Change version to 1.0.0
Junkai Xue [Wed, 1 Jul 2020 21:12:36 +0000 (14:12 -0700)]
Fix typo
Junkai Xue [Wed, 1 Jul 2020 21:06:38 +0000 (14:06 -0700)]
Enable 1.0.1 docs
Junkai Xue [Wed, 1 Jul 2020 20:48:53 +0000 (13:48 -0700)]
Create 1.0.1 web and release note
Junkai Xue [Wed, 1 Jul 2020 20:04:23 +0000 (13:04 -0700)]
[maven-release-plugin] prepare for next development iteration
Junkai Xue [Wed, 1 Jul 2020 20:04:11 +0000 (13:04 -0700)]
[maven-release-plugin] prepare release helix-1.0.1
Junkai Xue [Wed, 1 Jul 2020 19:54:25 +0000 (12:54 -0700)]
Enable helix-front for release
Junkai Xue [Wed, 1 Jul 2020 19:52:31 +0000 (12:52 -0700)]
Revert "[maven-release-plugin] prepare release helix-1.0.1"
This reverts commit
0a09f9a8b91bfc7137dde0377ea73b31d352ea73.
Junkai Xue [Wed, 1 Jul 2020 19:51:44 +0000 (12:51 -0700)]
Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit
e4312f38a7a7d7a2cd2f0ffa079afd6ab2cdc5bc.
Junkai Xue [Wed, 1 Jul 2020 19:50:59 +0000 (12:50 -0700)]
[maven-release-plugin] prepare for next development iteration
Junkai Xue [Wed, 1 Jul 2020 19:50:39 +0000 (12:50 -0700)]
[maven-release-plugin] prepare release helix-1.0.1
kaisun2000 [Fri, 19 Jun 2020 23:18:31 +0000 (16:18 -0700)]
Enhance flaky test TestRemoveUserCbHandlerOnPathRemoval (#1104)
TestRemoveUserCbHandlerOnPathRemoval has a flaky assert. Address the assertion in this diff.
Co-authored-by: Kai Sun <ksun@ksun-mn1.linkedin.biz>
kaisun2000 [Sat, 6 Jun 2020 02:55:06 +0000 (19:55 -0700)]
Fix leaking Zk path watch and Callbackhandler issue (#1035)
Short term fix #1034. Get rid of dangling CallbackHandlers and its
related current state parent path in Zookeeper. Get rid of leaking
of current state znode path due to async nature of deletion of
current state znode path to installatio of watcher in various
thread in Helix.
Junkai Xue [Wed, 24 Jun 2020 18:24:33 +0000 (11:24 -0700)]
Revert "Remove waiting on message deletion if current state is already updated (#1067)"
This reverts commit
d6d97c819970d5f6b76d9e3ade441ef8017fdb2f.
Hunter Lee [Wed, 24 Jun 2020 06:53:10 +0000 (23:53 -0700)]
Upgrade restlet.jse version
restlet.jse version 2.3.12 contains a security vulnerability fix. Using an older version of this library was causing issues with downloading via certain repositories, so this commit upgrades it.
xyuanlu [Tue, 23 Jun 2020 18:03:30 +0000 (11:03 -0700)]
Add metrics for expired session count (#1101)
Add metrics for expired session count
This change adds a "ExpiredSessionCounter" metrics in ZkClientMonitor.
Meng Zhang [Tue, 23 Jun 2020 00:45:59 +0000 (17:45 -0700)]
Remove waiting on message deletion if current state is already updated (#1068)
Remove the waiting logic in message generation phase on message deletion if current state is already updated. This will help increase the rate of P2P message during mastership handoff.
Jiajun Wang [Tue, 23 Jun 2020 00:04:16 +0000 (17:04 -0700)]
Fix ZkBucketDataAccessor failure due to concurrent modification. (#1107)
Concurrent modification causes two issues.
1. Regular GC task fails due to concurrent list modification and the stale versions are not removed at all.
2. If, by coincident, there is a newer version in the list other then the current version, then because of the modification of the list inside the loop, the final element (the newer version) won't be filtered but being left in the to-be-removed list. Then the GC task removes the most recent version. For example,
a) Input, current version "2"
b) Children = [1, 2, 3]
c) The task avoids checking "2", so the list for loop is: [1, 3]
d) When checking "1", it is removed from the list. So the list becomes [3]. Then the loop ends, because the first item has already been looped from the for iteration perspective.
e) The version to be removed is "3"!
This PR fixes the issue by avoiding concurrent modification. Also, it simplifies the logic so as to reduce the pending GC tasks.
The test is also updated accordingly.
Jiajun Wang [Mon, 22 Jun 2020 23:58:06 +0000 (16:58 -0700)]
Do not ignore the baseline assignment when evaluating in PartitionMovementConstraint. (#1078)
The current implementation of the PartitionMovementConstraint will ignore the baseline assignment completely when the previous best possible assignment has the corresponding record.
Note that since the previous best possible assignment might become invalid, the constraint should refer to the baseline assignment as a secondary option.
This change fixes this issue by prioritizing the baseline and the best possible assignments instead of just ignoring.
This reduces the chance of divergences between the baseline and the best possible assignments. Also, it reduces the possibility of bouncing partition assignments.
xyuanlu [Mon, 22 Jun 2020 20:56:32 +0000 (13:56 -0700)]
Code clean up Topology.java (#1043)
This change did come code refactor work for Topology.java. It extracts the sanity check logic in multiple functions into one single function so it can be used else where as well.
Ali Reza Zamani Zadeh Najari [Thu, 18 Jun 2020 23:34:43 +0000 (16:34 -0700)]
Stabilize TestWorkflowTermination (#1096)
Stabilize TestWorkflowTermination
In this PR, TestWorkflowTermination has been stabilized.
Also, testGetStateModelDef which has not been adjusted
in previous commits is fixed.
Jiajun Wang [Thu, 18 Jun 2020 19:56:52 +0000 (12:56 -0700)]
Fix the issue that the instance may not be assigned a replica as expected. (#1098)
This is to fix a regression that was introduced by PR #986
The PR tried to prioritize the preference list to avoid unnecessary top state transitions. However, there was a bug in the prioritizing logic and if one participant is skipped due to low priority, it won't be picked up again during the calculating. As a result, this participant won't be assigned with any replica even it is originally in the preference list.
This only happens if the state model has been customized so it is multiple top states and there is an intermediate state with expected count -1 between the top state and the other states.
This fix will ensure the skipped participant being checked again until it gets the assignment.
Ali Reza Zamani Zadeh Najari [Wed, 17 Jun 2020 18:24:46 +0000 (11:24 -0700)]
Stabilize TestTaskThrottling (#1093)
Stabilize TestTaskThrottling
In this PR, the Thread sleeps have been removed from TestTaskThrottling and replaced with TestHelper verify method.
Jiajun Wang [Mon, 15 Jun 2020 18:53:07 +0000 (11:53 -0700)]
Add monitor to record the abnormal states processing. (#1059)
Example ObjectName of the new monitor MBean: Rebalancer:ClusterName=<clusterName>, EntityName=AbnormalStates.<StateModelDefName>
Attributes,
1. AbnormalStatePartitionCounter: record the total count of the partitions that have been found in abnormal status. Note that if one partition has been found to be abnormal twice, then we will record it twice in this counter as well.
2. RecoveryAttemptCounter: record the total count of successful recovery computation that has been done by the resolver.
Jiajun Wang [Thu, 4 Jun 2020 18:50:23 +0000 (11:50 -0700)]
Add ExcessiveTopStateResolver to gracefully fix the double-masters situation. (#1037)
Although the rebalancer will fix the additional master eventually, the default operations are arbitrary and it may cause an older master to survive. This may cause serious application logic issues since many applications require the master to have the latest data.
With this state resolver, the rebalancer will change the default behavior to reset all the master replicas so as to ensure the remaining one is the youngest one. Then the double-masters situation is gracefully resolved.
Jiajun Wang [Thu, 28 May 2020 22:55:33 +0000 (15:55 -0700)]
Add Abnormal States Resolver interface and configuration item. (#1028)
The Abnormal States Resolver defines a generic interface to find and recover if the partition has any abnormal current states. For example,
- double masters
- application data out of sync
The interface shall be implemented according to the requirement.
The resolver is applied in the rebalance process according to the corresponding cluster config item. For example,
"ABNORMAL_STATES_RESOLVER_MAP" : {
"MASTERSLAVE" : "org.apache.helix.api.rebalancer.constraint.MasterSlaveAbnormalStateReslovler"
}
The default behavior without any configuration is not doing any recovery work.
Hunter Lee [Mon, 15 Jun 2020 21:14:11 +0000 (14:14 -0700)]
Move serializers to zookeeper-api (#1085)
Some serializer implementations (Chained, ByteArray) were never moved to zookeeper-api. This commit moves them out of helix-core and into zookeeper-api.
Note that this might cause older implementations of Helix applications to no longer build. Since this is an API-level change that simply moves the classes, the user can simply update the package path in the import statements and can make it build again.
Ali Reza Zamani Zadeh Najari [Mon, 15 Jun 2020 19:41:46 +0000 (12:41 -0700)]
Fix NPE for RoutingDataCache.refresh (#1087)
This commit fixes NPE that is thrown in RoutingDataCache.refresh()
method when CUSTOMIZEDVIEW has not being used as an entry of
sourceDataTypeMap.
Ali Reza Zamani Zadeh Najari [Mon, 15 Jun 2020 17:53:16 +0000 (10:53 -0700)]
Fix waitToStop method in TaskDriver (#1083)
Fix waitToStop method in TaskDriver
In this commit, waitToStop has been changed to make sure workflow/queue
is in stopped state.
Hunter Lee [Sat, 13 Jun 2020 04:05:10 +0000 (21:05 -0700)]
Implement getStat in ZookeeperAccessor (#1089)
Helix REST doesn't provide a way for users to get the Stat object for a ZNode. This commit enables this with a getStat command. Also, all getData() endpoints embed the stat fields by default.
Hunter Lee [Wed, 10 Jun 2020 22:17:16 +0000 (15:17 -0700)]
Fix ReadOnlyWagedRebalancer so that it computes mapping from scratch (#1058)
Previously, ReadOnlyWagedRebalancer would only read from the previously computed best possible mapping and returns it. This commit changes it so that it computes things from scratch - it can read the previously computed best possible mapping but shouldn't just return it without doing any calculation.
Molly Gao [Wed, 10 Jun 2020 22:06:56 +0000 (15:06 -0700)]
Add close method to Helix lock (#1077)
Add a close method to Helix lock interface and current implementation to avoid ZK connection leakage.
Hunter Lee [Wed, 10 Jun 2020 18:39:12 +0000 (11:39 -0700)]
Upgrade lodash version to 4.17.12+ for helix-rest (#1081)
There was a security vulnerability found in older versions of lodash. This commit upgrades it to a version containing the fix. For details, see https://snyk.io/blog/snyk-research-team-discovers-severe-prototype-pollution-security-vulnerabilities-affecting-all-versions-of-lodash/
Hunter Lee [Tue, 9 Jun 2020 23:04:19 +0000 (16:04 -0700)]
Add delete for PropertyStore in Helix REST (#1079)
This commit adds the delete endpoint for PropertyStore in Helix REST.
Jiajun Wang [Tue, 9 Jun 2020 21:10:40 +0000 (14:10 -0700)]
Trim non cluster topology related changes in the WAGED rebalancer calculation. (#1065)
Cluster topology fields are considered as the fundamental attributes of the cluster. For example, instance capacity, resource partition count, partition weight, etc.
If the topology related fields are updated, the WAGED rebalancer will trigger the global rebalance process. Otherwise, the rebalancer will only do local (limited) rebalance to avoid unexpected partition movements.
This change introduces HelixProperty Trimmer to filter out the non-topology related fields so the changes in those fields won't trigger the global rebalancing.
Hunter Lee [Tue, 9 Jun 2020 19:30:36 +0000 (12:30 -0700)]
Deprecate Raw ZkClient in helix-core (#1070)
We have moved the raw ZkClient to zookeeper-api. As such, we need to deprecate the old one in helix-core. We are leaving the class for backward-compatibility purposes.
Ali Reza Zamani Zadeh Najari [Fri, 5 Jun 2020 22:27:58 +0000 (15:27 -0700)]
Avoid adding JobConfig if queue has reached its capacity limit (#1064)
Avoid adding JobConfig if queue has reached its capacity limit
In this commit, the necessary check has been added in TaskDriver side
to fail creation of JobConfig if the queue has reached its capacity limit.
Meng Zhang [Fri, 5 Jun 2020 17:34:31 +0000 (10:34 -0700)]
Add path exists check for customized state (#1033)
Remove stack trace when customized state root does not exist and throws no node exception.
Change logging in Callback handler to be parameterized logging.
Junkai Xue [Fri, 5 Jun 2020 03:43:35 +0000 (20:43 -0700)]
Fix rest tests with prefix
Junkai Xue [Thu, 4 Jun 2020 03:57:28 +0000 (20:57 -0700)]
Change the error message to be meaningful naming (#1041)
xyuanlu [Wed, 3 Jun 2020 21:18:59 +0000 (14:18 -0700)]
Add more accurate error message for resetPartition (#1007)
This commit adds more specific messages for failure reason for reserPartion. It also adds a unit test for resetPartition.
Co-authored-by: Xiaoyuan Lu <xialu@xialu-mn1.linkedin.biz>
Huizhi Lu [Wed, 3 Jun 2020 05:37:21 +0000 (22:37 -0700)]
Fix test testDropInstance (#1053)
Test testDropInstance doesn't assert expected HelixException, so when a live instance is not created, the test still passes. The reason is the test uses a sharedZkClient to create a live instance but fails.
This PR addresses this issue by creating a dedicated zkclient in ZkBaseDataAccessor to create a live instance.
Hunter Lee [Wed, 3 Jun 2020 00:13:56 +0000 (17:13 -0700)]
Add PropertyStore write endpoint to Helix REST (#1049)
This commit adds in Helix REST a write endpoint that allows you to either write byte array or a ZNRecord to any path under PropertyStore, which is a directory in cluster metadata in ZK where applications can write custom data.
Molly Gao [Fri, 29 May 2020 21:03:01 +0000 (14:03 -0700)]
Clean up lock module dependencies (#1038)
Hunter Lee [Fri, 29 May 2020 16:20:58 +0000 (09:20 -0700)]
Add getIdealAssignmentForWagedFullAuto in HelixUtil for WAGED rebalancer (#1031)
This commit adds a method, getIdealAssignmentForWagedFullAuto() in HelixUtil that returns to the user the cluster-wide assignment result obtained from running a rebalance using WAGED. The user will be able to use this method to predict how Helix will be rebalancing resources using the WAGED rebalancer.
Huizhi Lu [Wed, 27 May 2020 06:37:40 +0000 (23:37 -0700)]
Fix incorrect name in exception of SharedZkClient (#1024)
There is a typo in the IllegalArgumentException of SharedZkClient which is confusing: DedicatedZkClient should be changed to SharedZkClient.
Huizhi Lu [Wed, 27 May 2020 06:37:16 +0000 (23:37 -0700)]
Fix incorrect exception type and confusing assertion messages (#976)
In TestZNRecordSizeLimit, the assertion message should indicate the data is smaller than 1 MB threshold. HelixException should be ZkMarshallingError because the exception type is changed in ZNRecord serializer.
Meng Zhang [Tue, 26 May 2020 21:50:02 +0000 (14:50 -0700)]
Remove commented out code in participant manager (#1029)
Ali Reza Zamani Zadeh Najari [Tue, 26 May 2020 17:12:54 +0000 (10:12 -0700)]
Fix adding same job multiple times to RuntimeJobDag so parallelJobs config will be honored (#1006)
In this commit, a fix has been implemented to avoid
_readyJobList in RuntimeJobDag to contain multiple
entries of the same job.
The investigation for this fix was motivated by the observation that JobQueues' parallelJobs config wasn't being honored - it was only processing jobs sequentially one by one. This commit fixes this.
Huizhi Lu [Sat, 23 May 2020 00:12:18 +0000 (17:12 -0700)]
Fix conflicting MonitorDomainNames in metrics-common and helix-core (#1023)
There are two MonitorDomainNames classes in modules: helix-core and metrics-common. The one in metrics-common misses a field Rebalancer and would cause RuntimeException error and failure for customers' running if the old class is being used in runtime: java.lang.NoSuchFieldError: Rebalancer.
This commit fixes this issue by updating the class MonitorDomainNames in metrics-common and removing the duplicate one in helix-core
Ali Reza Zamani Zadeh Najari [Fri, 22 May 2020 22:19:36 +0000 (15:19 -0700)]
Add timeout for stoppable post request (#1013)
Add timeout for stoppable post request
In this commit, a http timeout has been added for the POST requests
which is mainly used for REST's stoppable check.
Junkai Xue [Wed, 20 May 2020 20:50:13 +0000 (13:50 -0700)]
Deprecate 1.0.0 release
dasahcc [Wed, 20 May 2020 05:13:15 +0000 (22:13 -0700)]
Fix url link for release note
Hunter Lee [Tue, 19 May 2020 04:49:37 +0000 (21:49 -0700)]
Add null check in close() of ZkConnectionManager (#1016)
Java objects may not have been fully initialized when their callback/interface methods are called. This adds a null check to ensure that an NPE does not happen.
Junkai Xue [Mon, 18 May 2020 19:56:43 +0000 (12:56 -0700)]
Update main page
Jiajun Wang [Fri, 15 May 2020 00:34:22 +0000 (17:34 -0700)]
Enforce result check for data accessors batch get calls to prevent partial batch read. (#974)
This will help to ensure the main Helix logic does not calculate based on incomplete input.
Molly Gao [Wed, 13 May 2020 22:44:08 +0000 (15:44 -0700)]
Use updaters to update read messages to ZK (#1002)
Use updaters to update messages with READ state to ZK.
Junkai Xue [Mon, 11 May 2020 23:02:32 +0000 (16:02 -0700)]
Change 0.9.7 release note
Junkai Xue [Mon, 11 May 2020 22:54:52 +0000 (15:54 -0700)]
Add 0.9.7 to pom
Junkai Xue [Mon, 11 May 2020 22:51:27 +0000 (15:51 -0700)]
Change release note to 0.9.7
Junkai Xue [Mon, 11 May 2020 18:25:45 +0000 (11:25 -0700)]
Add 0.9.4, 0.9.5 build
Junkai Xue [Mon, 11 May 2020 18:10:53 +0000 (11:10 -0700)]
Add 0.9.5 release note
Junkai Xue [Mon, 11 May 2020 17:36:28 +0000 (10:36 -0700)]
Update SNAPSHOT to 1.0.1
Junkai Xue [Mon, 11 May 2020 17:31:29 +0000 (10:31 -0700)]
Add 0.9.4 and update main page
Jiajun Wang [Wed, 6 May 2020 22:48:15 +0000 (15:48 -0700)]
Adjust the auto rebalancer state assignment logic to reduce top state transition. (#986)
The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready.
For example,
1. The current states are: [N1:M, N2:S, N3,S]
2. The desired states are: [N4:M, N2:S, N1:S]
3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica.
4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff.
To be clear, we don't want step 3. But only the state transition in step 4.
In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.
Junkai Xue [Tue, 5 May 2020 01:35:26 +0000 (18:35 -0700)]
Disable helix-front
Junkai Xue [Tue, 5 May 2020 00:54:15 +0000 (17:54 -0700)]
Update Ivy file
Junkai Xue [Tue, 5 May 2020 00:19:41 +0000 (17:19 -0700)]
Add release note for 1.0.0
Junkai Xue [Tue, 5 May 2020 00:13:35 +0000 (17:13 -0700)]
Revert "Revert "Add async call retry to resolve the transient ZK connection issue. (#970)""
This reverts commit
370e277966f75a7fba45f5b96f7608c127b2905c.
Junkai Xue [Mon, 4 May 2020 23:13:13 +0000 (16:13 -0700)]
[maven-release-plugin] prepare for next development iteration
Junkai Xue [Mon, 4 May 2020 23:12:53 +0000 (16:12 -0700)]
[maven-release-plugin] prepare release helix-1.0.0
Junkai Xue [Mon, 4 May 2020 21:28:43 +0000 (14:28 -0700)]
Revert "[maven-release-plugin] prepare release helix-1.0.0"
This reverts commit
a71153cfe1e188b0a870a7b113ac3b112cb25a47.
Junkai Xue [Mon, 4 May 2020 20:02:33 +0000 (13:02 -0700)]
Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit
0fd189e099389ce8859870212f7539124e0b017f.
Junkai Xue [Mon, 4 May 2020 20:02:13 +0000 (13:02 -0700)]
Revert "Add async call retry to resolve the transient ZK connection issue. (#970)"
This reverts commit
96ebb27c23004a7a69dc4799b14586ff82d53c9e.
Hunter Lee [Mon, 4 May 2020 21:28:43 +0000 (14:28 -0700)]
Rearrange zookeeper imports in pom.xml (#995)
This commit makes sure pom.xml is up to date in preparation for the 1.0.X release.
Jiajun Wang [Mon, 4 May 2020 19:36:13 +0000 (12:36 -0700)]
Add async call retry to resolve the transient ZK connection issue. (#970)
If any exceptions happen during the async call, the current design will fail the operation and may eventually return a partial result.
This change makes the ZkClient retry operation if the error is because of a temporary ZK connection issue (CONNECTIONLOSS, SESSIONEXPIRED, SESSIONMOVED).
So the async call has a larger chance to finish the operation. Note that if the exception is due to business logic, the async call will still fail and the right return code will be sent to the callback handler.
Junkai Xue [Mon, 4 May 2020 19:32:57 +0000 (12:32 -0700)]
[maven-release-plugin] prepare for next development iteration
Junkai Xue [Mon, 4 May 2020 19:23:27 +0000 (12:23 -0700)]
[maven-release-plugin] prepare release helix-1.0.0
Junkai Xue [Mon, 4 May 2020 19:13:16 +0000 (12:13 -0700)]
Enable helix-front for release
Meng Zhang [Mon, 4 May 2020 17:38:23 +0000 (10:38 -0700)]
fix version comparison issue in compatibility check stage (#992)
Ali Reza Zamani Zadeh Najari [Fri, 1 May 2020 00:48:20 +0000 (17:48 -0700)]
Stabilizing 4 flaky tests (#981)
Four tests has been stabilized in this commit.
These tests are:
1-TestJobFailure
2-TestRebalanceRunningTask
3-TestTaskRebalancerStopResume
4-TestTaskSchedulingTwoCurrentStates
TestJobFailure was unstable because we get ExternalView of a resources and if the ExternalView is not populated yet by the controller, we hit NullPointerException.
TestRebalanceRunningTask was unstable. In this PR, we make sure that the master is existed in two different nodes (master is switched to new instance) and then we check the assigned participants.
TestRebalanceStopAndResume was unstable because of Thread.Sleep usage. Instead of stopping the workflow after some time, we first make sure that workflow and job is IN_PROGRESS and then stop the workflow.
TestTaskSchedulingTwoCurrent has been stabilized by making sure that master has been switched to new instance after modifying IS. After that we make sure that task is assigned to the correct instance and make sure it does not switched to new instance and cancel is not being called incorrectly.
Junkai Xue [Wed, 29 Apr 2020 00:42:21 +0000 (17:42 -0700)]
Add 1.0.0 release folder and release notes
Huizhi Lu [Fri, 24 Apr 2020 23:46:58 +0000 (16:46 -0700)]
Fix failed tests in helix-rest (#966)
In tests: TestZkRoutingDataWriter and TestZkRoutingDataReader, the zkClient is trying to read/write ZNRecords, however, zkClient's serializer is not a ZNRecordSerializer but a BasicZkSerializer. So when read/write a ZNRecord, a ZkMarshallingError is thrown and causes the tests failed.
Hunter Lee [Fri, 24 Apr 2020 22:40:09 +0000 (15:40 -0700)]
Improve logging for isClusterSetup (#968)
If isClusterSetup fails, we don't get the errorMsg. This PR improves that behavior by making it a warn log.
Molly Gao [Thu, 23 Apr 2020 19:34:57 +0000 (12:34 -0700)]
Change DistributedLock interface APIs (#961)
Change DistributedLock interface API names to follow Java convention
Molly Gao [Wed, 19 Feb 2020 01:58:19 +0000 (17:58 -0800)]
Rename interface HelixLock to DistributedLock
Molly Gao [Tue, 18 Feb 2020 22:21:15 +0000 (14:21 -0800)]
Remove dependency of LockInfo on HelixProperty
Molly Gao [Sat, 15 Feb 2020 02:51:48 +0000 (18:51 -0800)]
Clean up code
Molly Gao [Fri, 14 Feb 2020 21:28:07 +0000 (13:28 -0800)]
Created LockScope interface
Molly Gao [Tue, 11 Feb 2020 02:31:26 +0000 (18:31 -0800)]
refactor LockInfo and some updates on the HelixLockScope
Molly Gao [Fri, 7 Feb 2020 22:53:03 +0000 (14:53 -0800)]
Added cluster level to HelixLockScope and convert lock path to uppercase
Molly Gao [Fri, 7 Feb 2020 04:40:50 +0000 (20:40 -0800)]
simplified acquireLock logic
Molly Gao [Thu, 6 Feb 2020 23:46:12 +0000 (15:46 -0800)]
Fixed lock path generation
Molly Gao [Thu, 6 Feb 2020 20:02:00 +0000 (12:02 -0800)]
Changed method doc for releaseLock in HelixLock interface
Molly Gao [Thu, 6 Feb 2020 02:07:00 +0000 (18:07 -0800)]
A few fixes on syntax