2 years agoUpdating .auroraversion to release version 0.12.0. rel/0.12.0
John Sirois [Mon, 8 Feb 2016 22:42:44 +0000 (15:42 -0700)] 
Updating .auroraversion to release version 0.12.0.

2 years agoFixup release script tag step.
John Sirois [Mon, 8 Feb 2016 22:42:22 +0000 (15:42 -0700)] 
Fixup release script tag step.

2 years agoUpdating .auroraversion to 0.12.0-rc4.
John Sirois [Fri, 5 Feb 2016 22:12:18 +0000 (15:12 -0700)] 
Updating .auroraversion to 0.12.0-rc4.

2 years agoIncrementing snapshot version to 0.13.0-SNAPSHOT.
John Sirois [Fri, 5 Feb 2016 22:12:18 +0000 (15:12 -0700)] 
Incrementing snapshot version to 0.13.0-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.12.0 release.
John Sirois [Fri, 5 Feb 2016 22:12:18 +0000 (15:12 -0700)] 
Updating CHANGELOG for 0.12.0 release.

2 years agoBackfilling JobConfiguration.Identity
Maxim Khutornenko [Fri, 5 Feb 2016 21:55:26 +0000 (13:55 -0800)] 
Backfilling JobConfiguration.Identity

Bugs closed: AURORA-1610

Reviewed at

2 years agoReset .auroraversion and CHANGELOG in prep for 0.12.0-rc4.
John Sirois [Fri, 5 Feb 2016 20:00:46 +0000 (13:00 -0700)] 
Reset .auroraversion and CHANGELOG in prep for 0.12.0-rc4.

2 years agoIncrementing snapshot version to 0.13.0-SNAPSHOT.
John Sirois [Fri, 5 Feb 2016 19:06:23 +0000 (12:06 -0700)] 
Incrementing snapshot version to 0.13.0-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.12.0 release.
John Sirois [Fri, 5 Feb 2016 19:06:23 +0000 (12:06 -0700)] 
Updating CHANGELOG for 0.12.0 release.

2 years agoManual prep for 0.12.0-rc3 release.
John Sirois [Fri, 5 Feb 2016 18:53:29 +0000 (11:53 -0700)] 
Manual prep for 0.12.0-rc3 release.

Reset .auroraversiob to 0.12.0-SNAPSHOT and revert to 'Aurora 0.11.0'

2 years agoAdd failed result email protocol.
John Sirois [Fri, 5 Feb 2016 15:58:10 +0000 (08:58 -0700)] 
Add failed result email protocol.

Hints of this protocol exist down in step 6 when a release succeeds,
but this places the failure action in-line in the step process to make
it more likely the reader does the right thing.

Also kill an incorrect instruction to send the successful release vote
result email to the private@ list.

Testing Done:
I have no clue if the instructions and provided example link are correct.
I did find variation when reading past [RESULT][VOTE] failures; so
guidance on what is required vs what is personal flair is appreciated.

Rendered here:

Reviewed at

2 years agoAdd deprecated field storage backfill
Maxim Khutornenko [Thu, 4 Feb 2016 23:18:38 +0000 (15:18 -0800)] 
Add deprecated field storage backfill

Bugs closed: AURORA-1603

Reviewed at

2 years agoRemove unused <result> entry in TaskMapper.
Zameer Manji [Thu, 4 Feb 2016 18:37:07 +0000 (10:37 -0800)] 
Remove unused <result> entry in TaskMapper.

The property `taskConfigRowId` doesn't exist on `DbScheduledTask` so this line
has no use.

Testing Done:
./gadlew test

Reviewed at

2 years agoExpose MyBatis PoolState via stats.
Zameer Manji [Wed, 3 Feb 2016 22:28:12 +0000 (14:28 -0800)] 
Expose MyBatis PoolState via stats.

To better understand the MyBatis connection pool this patch exposes the pool
state via stats.

Reviewed at

2 years agoAdd header to allow bypassing the LeaderRedirectFilter.
Joshua Cohen [Wed, 3 Feb 2016 01:40:26 +0000 (19:40 -0600)] 
Add header to allow bypassing the LeaderRedirectFilter.

Bugs closed: AURORA-1601

Reviewed at

2 years agoMake --announcer-enable optional no-op instead of removing it completely.
Zhitao Li [Tue, 2 Feb 2016 23:28:25 +0000 (15:28 -0800)] 
Make --announcer-enable optional no-op instead of removing it completely.

Reviewed at

2 years agoReorganize NEWS into updates and deprecations
Stephan Erb [Tue, 2 Feb 2016 21:34:20 +0000 (14:34 -0700)] 
Reorganize NEWS into updates and deprecations

I've splitted all releases with additions and deprecations into too sections. This should make it much easier to track past deprecations.

Reviewed at

2 years agoMap Aurora task metadata to Mesos task labels.
Stephan Erb [Tue, 2 Feb 2016 20:55:39 +0000 (12:55 -0800)] 
Map Aurora task metadata to Mesos task labels.

Bugs closed: AURORA-1052

Reviewed at

2 years agoUpgrade to pants 0.0.70.
John Sirois [Tue, 2 Feb 2016 19:30:40 +0000 (12:30 -0700)] 
Upgrade to pants 0.0.70.

This bumps us to last week's regular weekly release.
The changelog is here:

No changes of note directly impacting Aurora, just keeping up
with the release train.

Testing Done:
Locally green:

Reviewed at

2 years agoReverting deprecated field removal patches.
Maxim Khutornenko [Tue, 2 Feb 2016 19:07:08 +0000 (11:07 -0800)] 
Reverting deprecated field removal patches.

This reverts commit e1b55fa544765c12251ce6c1736e6352da3f7edb.

This reverts commit 89fad5a8895482b6c3fa45356137aa250d766dfe.

Bugs closed: AURORA-1603

Reviewed at

2 years agoFixing duplicate instances in the UI.
Maxim Khutornenko [Tue, 2 Feb 2016 04:31:28 +0000 (20:31 -0800)] 
Fixing duplicate instances in the UI.

Bugs closed: AURORA-1604

Reviewed at

2 years agoAdd a flag to configure H2 LOCK_TIMEOUT.
Zameer Manji [Mon, 1 Feb 2016 22:48:51 +0000 (14:48 -0800)] 
Add a flag to configure H2 LOCK_TIMEOUT.

Bugs closed: AURORA-1596

Reviewed at

2 years agoImprove --read-json to handle multi-job files
Benjamin Staffin [Mon, 1 Feb 2016 22:24:13 +0000 (14:24 -0800)] 
Improve --read-json to handle multi-job files

Still handles the old --read-json behavior of expecting a single job,
but adds the ability to read files with a {"jobs": [job1, job2, ...]}
schema like the pystachio format.

Also adds --read-json to the `aurora config load` command, as it is
now useful there.

Json configs are now loaded in a way that is much closer to the
pystachio one, so the config loader will no longer ignore unknown

Bugs closed: AURORA-1577

Reviewed at

2 years agoAllow dots and hyphens in metric names.
Stephan Erb [Mon, 1 Feb 2016 22:16:07 +0000 (14:16 -0800)] 
Allow dots and hyphens in metric names.

This will make sure we won't warn about invalid stat names for valid job identifiers.

Bugs closed: AURORA-1282

Reviewed at

2 years agoBump virtualenv version for in repo tools.
Zameer Manji [Mon, 1 Feb 2016 22:08:16 +0000 (14:08 -0800)] 
Bump virtualenv version for in repo tools.

Reviewed at

2 years agoEnable ping query to prevent use of invalid pooled connections.
Zameer Manji [Mon, 1 Feb 2016 21:59:46 +0000 (13:59 -0800)] 
Enable ping query to prevent use of invalid pooled connections.

Bugs closed: AURORA-1596

Reviewed at

2 years agoAdd Fitbit to the Aurora adopters list
Benjamin Staffin [Sat, 30 Jan 2016 02:35:41 +0000 (18:35 -0800)] 
Add Fitbit to the Aurora adopters list

Reviewed at

2 years agoIncrementing snapshot version to 0.12.1-SNAPSHOT.
John Sirois [Thu, 28 Jan 2016 23:51:05 +0000 (16:51 -0700)] 
Incrementing snapshot version to 0.12.1-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.12.0 release.
John Sirois [Thu, 28 Jan 2016 23:51:05 +0000 (16:51 -0700)] 
Updating CHANGELOG for 0.12.0 release.

2 years agoRevert "Updating CHANGELOG for 0.12.0 release."
John Sirois [Thu, 28 Jan 2016 23:45:18 +0000 (16:45 -0700)] 
Revert "Updating CHANGELOG for 0.12.0 release."

This reverts commit 309ed9968d0aa10d63c66a173c16d7e4f9c552ca.

2 years agoRevert "Incrementing snapshot version to 0.13.0-SNAPSHOT."
John Sirois [Thu, 28 Jan 2016 23:44:30 +0000 (16:44 -0700)] 
Revert "Incrementing snapshot version to 0.13.0-SNAPSHOT."

This reverts commit 81722a9e700641f5b435e97c1cbb38d7eed4e98c.

2 years agoRevert "Updating CHANGELOG for 0.12.0 release."
John Sirois [Thu, 28 Jan 2016 23:42:11 +0000 (16:42 -0700)] 
Revert "Updating CHANGELOG for 0.12.0 release."

This reverts commit d34609a2e24b434701347542b9328581acfd829b.

2 years agoRevert "Incrementing snapshot version to 0.12.1-SNAPSHOT."
John Sirois [Thu, 28 Jan 2016 23:42:08 +0000 (16:42 -0700)] 
Revert "Incrementing snapshot version to 0.12.1-SNAPSHOT."

This reverts commit 131771c1b1517c0290739d389ad0d504da1dd12e.

2 years agoIncrementing snapshot version to 0.12.1-SNAPSHOT.
John Sirois [Thu, 28 Jan 2016 23:34:56 +0000 (16:34 -0700)] 
Incrementing snapshot version to 0.12.1-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.12.0 release.
John Sirois [Thu, 28 Jan 2016 23:34:56 +0000 (16:34 -0700)] 
Updating CHANGELOG for 0.12.0 release.

2 years agoRevert "Updating CHANGELOG for 0.13.0 release."
John Sirois [Thu, 28 Jan 2016 23:26:00 +0000 (16:26 -0700)] 
Revert "Updating CHANGELOG for 0.13.0 release."

This reverts commit 38e8237fe91e4fa74cf563a88330571eaf359424.

2 years agoRevert "Incrementing snapshot version to 0.14.0-SNAPSHOT."
John Sirois [Thu, 28 Jan 2016 23:25:59 +0000 (16:25 -0700)] 
Revert "Incrementing snapshot version to 0.14.0-SNAPSHOT."

This reverts commit bfca7ae7e0138fe4facd256217e1166b605f97ce.

2 years agoRevert "Updating CHANGELOG for 0.14.0 release."
John Sirois [Thu, 28 Jan 2016 23:25:57 +0000 (16:25 -0700)] 
Revert "Updating CHANGELOG for 0.14.0 release."

This reverts commit 34c676d96fd909283dcb1be79424887b428fe73e.

2 years agoRevert "Incrementing snapshot version to 0.14.1-SNAPSHOT."
John Sirois [Thu, 28 Jan 2016 23:25:53 +0000 (16:25 -0700)] 
Revert "Incrementing snapshot version to 0.14.1-SNAPSHOT."

This reverts commit 2365083e1d340f51724caf500380c71a2b0104b3.

2 years agoIncrementing snapshot version to 0.14.1-SNAPSHOT.
John Sirois [Thu, 28 Jan 2016 22:28:22 +0000 (15:28 -0700)] 
Incrementing snapshot version to 0.14.1-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.14.0 release.
John Sirois [Thu, 28 Jan 2016 22:28:22 +0000 (15:28 -0700)] 
Updating CHANGELOG for 0.14.0 release.

2 years agoIncrementing snapshot version to 0.14.0-SNAPSHOT.
John Sirois [Thu, 28 Jan 2016 22:25:21 +0000 (15:25 -0700)] 
Incrementing snapshot version to 0.14.0-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.13.0 release.
John Sirois [Thu, 28 Jan 2016 22:25:21 +0000 (15:25 -0700)] 
Updating CHANGELOG for 0.13.0 release.

2 years agoRevert "Improving job update query performance."
Joshua Cohen [Thu, 28 Jan 2016 22:19:14 +0000 (16:19 -0600)] 
Revert "Improving job update query performance."

This reverts commit fee5943a95c4f08e148dc5f1366486a8c23d5773.

We discovered a bug when deploying this commit that caused corruption of the update store.

Reviewed at

2 years agoFixup RC VOTE email instructions.
John Sirois [Thu, 28 Jan 2016 20:29:01 +0000 (13:29 -0700)] 
Fixup RC VOTE email instructions.

Testing Done:

Reviewed at

2 years agoFixup RC email template tag URL.
John Sirois [Thu, 28 Jan 2016 13:50:34 +0000 (06:50 -0700)] 
Fixup RC email template tag URL.

2 years agoIncrementing snapshot version to 0.13.0-SNAPSHOT.
John Sirois [Thu, 28 Jan 2016 05:29:51 +0000 (22:29 -0700)] 
Incrementing snapshot version to 0.13.0-SNAPSHOT.

2 years agoUpdating CHANGELOG for 0.12.0 release.
John Sirois [Thu, 28 Jan 2016 05:29:51 +0000 (22:29 -0700)] 
Updating CHANGELOG for 0.12.0 release.

2 years agoFixup release-candidate script.
John Sirois [Thu, 28 Jan 2016 05:29:34 +0000 (22:29 -0700)] 
Fixup release-candidate script.

Previously the svn add of the dist artifacts failed with:
Publishing release candidate to
Committing transaction...
Committed revision 12061.
Checked out revision 12061.
svn: E155007: /home/jsirois/dev/3rdparty/aurora-origin is not a working copy
ERROR: Looks like something has failed while creating the release candidate.

Testing Done:
Tested with a slightly different diff - this successfully published to svn (since removed):
$ git diff
diff --git a/build-support/release/release-candidate b/build-support/release/release-candidate
index 78e9a4f..91261c0 100755
--- a/build-support/release/release-candidate
+++ b/build-support/release/release-candidate
@@ -93,7 +93,7 @@ git fetch --tags -q
 # Verify that this is a clean repository
 if [[ -n "`git status --porcelain`" ]]; then
   echo "ERROR: Please run from a clean git repository."
-  exit 1
+#  exit 1
 elif [[ "`git rev-parse --abbrev-ref HEAD`" != "master" ]]; then
   echo "ERROR: This script must be run from master."
   exit 1
@@ -219,8 +219,11 @@ if [[ $publish == 1 ]]; then
   echo "Publishing release candidate to ${aurora_svn_rc_url}"
   svn mkdir ${aurora_svn_rc_url} -m "aurora-${current_version} release candidate ${rc_version_tag}"
   svn co --depth=empty ${aurora_svn_rc_url} ${dist_dir}
+  pushd ${dist_dir}
   svn add ${dist_name}*
   svn ci -m "aurora-${current_version} release candidate ${rc_version_tag}"
+  popd
+  exit 0

   echo "Creating tag ${rc_version_tag}"
   git tag -s ${rc_version_tag} \

Reviewed at

2 years agoRemove deprecated fields made redundant by JobKey.
Bill Farner [Thu, 28 Jan 2016 02:23:20 +0000 (18:23 -0800)] 
Remove deprecated fields made redundant by JobKey.

Bugs closed: AURORA-1598

Reviewed at

2 years agoImproving job update query performance.
Maxim Khutornenko [Thu, 28 Jan 2016 01:19:30 +0000 (17:19 -0800)] 
Improving job update query performance.

Bugs closed: AURORA-1600

Reviewed at

2 years agoFix stray printf style log replacement token when logging triggered cron jobs.
Joshua Cohen [Wed, 27 Jan 2016 22:49:38 +0000 (16:49 -0600)] 
Fix stray printf style log replacement token when logging triggered cron jobs.

Reviewed at

2 years agoEnable H2 logging to slf4j.
Zameer Manji [Wed, 27 Jan 2016 21:52:02 +0000 (13:52 -0800)] 
Enable H2 logging to slf4j.

On a test cluster with DbTaskStore enabled there are several lines in the log
that look like:
2016-01-26 13:07:14 jdbc[15]: exception
There is no other information with these lines. This is a result of setting
`TRACE_LEVEL_SYSTEM_OUT` to `1` for H2. This will print out the error message
but not the associated throwable:

The SLF4J implementation of tracing in H2 does not suffer from this restriction.

Reviewed at

2 years agoRemove deprecated `HealthCheckConfig` fields.
John Sirois [Wed, 27 Jan 2016 20:18:00 +0000 (13:18 -0700)] 
Remove deprecated `HealthCheckConfig` fields.

Remove `endpoint`, `expected_response` and `expected_response_code`
which were all deprecated in Aurora 0.11.0 in favor of the same-named
fields in `HttpHealthChecker`.

This also removes health check validation in the client in favor of
leveraging the pystachio schema.  The one difference this allows for is
an empty string for the `ShellHealthChecker.shell_command`.  Since an
empty string is a valid shell command (equivalent to `true`), this
simplification seems justified.

Testing Done:
Locally green:

Bugs closed: AURORA-1552, AURORA-1563

Reviewed at

2 years agoRe-purposing addInstances RPC to act as scaleOut
Maxim Khutornenko [Wed, 27 Jan 2016 08:07:31 +0000 (00:07 -0800)] 
Re-purposing addInstances RPC to act as scaleOut

Bugs closed: AURORA-1258

Reviewed at

2 years agoRemove the --announcer-enable executor flag.
Bill Farner [Tue, 26 Jan 2016 19:43:20 +0000 (11:43 -0800)] 
Remove the --announcer-enable executor flag.

Reviewed at

2 years agoRemove job update `maxWaitToInstanceRunningMs` field.
John Sirois [Tue, 26 Jan 2016 18:30:09 +0000 (11:30 -0700)] 
Remove job update `maxWaitToInstanceRunningMs` field.

This field in the thrift api `JobUpdateSettings` struct and its sibling
in `UpdateConfig.restart_threshold` on the client side were deprecated
in Aurora 0.11.0.

Testing Done:
Locally green:

Bugs closed: AURORA-1254

Reviewed at

2 years ago`TaskHistoryPruner` controls Lifecycle directly.
John Sirois [Tue, 26 Jan 2016 18:29:48 +0000 (11:29 -0700)] 
`TaskHistoryPruner` controls Lifecycle directly.

This was the original idea in

Mixing the active scheduler `Service` lifecycle with the `EventBus`
lifecycle proves tricky - prune events are fired before scheduler active
services are started.  Instead of queueing up prune events to wait for
service start or re-engineering service / event bus interaction, returns
to the orignal behavior, manipulating the `Lifecycle` directly.

Also kill a confusing unused EventSink discovered during analyis of all
pub-sub event sourcing that might interact with the `TaskHistoryPruner`.

Testing Done:
Locally green:
./gradlew -Pq build
It's the latter - e2e (krb part) - that was the only automated testing
revealing the problem previously.

Bugs closed: AURORA-1593

Reviewed at

2 years agoFixing reference in table of contents to Docker Object(s).
Dmitriy Shirchenko [Mon, 25 Jan 2016 21:02:46 +0000 (13:02 -0800)] 
Fixing reference in table of contents to Docker Object(s).

Reviewed at

2 years agoUpgrade pants to 0.0.69.
John Sirois [Mon, 25 Jan 2016 18:53:08 +0000 (11:53 -0700)] 
Upgrade pants to 0.0.69.

This is the regular weekly release/upgrade.
The CHANGELOG can be read here:

No changes of note for Aurora, this just keeps up with latest to make future
upgrades as small and smooth as possible.

Testing Done:
Locally green:

Reviewed at

2 years agoRemove most direct uses of deprecated TaskConfig fields.
Bill Farner [Sat, 23 Jan 2016 01:14:28 +0000 (17:14 -0800)] 
Remove most direct uses of deprecated TaskConfig fields.

Reviewed at

2 years agoDeprecating TaskQuery in killTasks.
Maxim Khutornenko [Fri, 22 Jan 2016 22:40:27 +0000 (14:40 -0800)] 
Deprecating TaskQuery in killTasks.

Bugs closed: AURORA-1583

Reviewed at

2 years agoRemove storage backfill and TaskStore mutateTasks.
Bill Farner [Fri, 22 Jan 2016 22:15:55 +0000 (14:15 -0800)] 
Remove storage backfill and TaskStore mutateTasks.

Reviewed at

2 years agoSimplify TaskHistoryPruner tie-in to Lifecycle.
John Sirois [Fri, 22 Jan 2016 21:50:54 +0000 (14:50 -0700)] 
Simplify TaskHistoryPruner tie-in to Lifecycle.

This eliminates processing all futures to find the 1st failed one in
favor of directly signalling a Service failure when a unit of async work

Testing Done:
Locally green: `./gradlew -P build`.

Bugs closed: AURORA-1582

Reviewed at

2 years agoRemove scheduler flag -extra_modules.
Bill Farner [Fri, 22 Jan 2016 19:30:04 +0000 (11:30 -0800)] 
Remove scheduler flag -extra_modules.

Reviewed at

2 years agoAdd storage API methods for fetching amd mutating a task by ID.
Bill Farner [Fri, 22 Jan 2016 07:04:35 +0000 (23:04 -0800)] 
Add storage API methods for fetching amd mutating a task by ID.

Reviewed at

2 years agoTurn TaskHistoryPruner into a service and trigger shutdown on pruning failure.
Zameer Manji [Fri, 22 Jan 2016 01:38:25 +0000 (17:38 -0800)] 
Turn TaskHistoryPruner into a service and trigger shutdown on pruning failure.

Task pruning is key to operating a large cluster and failure to prune should
trigger shutdown to prevent unbounded growth of storage. This patch turns
`TaskHistoryPruner` into a service which propagates failure from failed pruning
attempts towards the `ServiceManager`. Also completing a TODO which removes a
test for behaviour that is very awkward to test for.

Bugs closed: AURORA-1582

Reviewed at

2 years agoAllowing dual authorizing params to account for thrift API deprecations.
Maxim Khutornenko [Thu, 21 Jan 2016 23:06:07 +0000 (15:06 -0800)] 
Allowing dual authorizing params to account for thrift API deprecations.

Also, added missing test coverage.

Reviewed at

2 years agoEnable READ COMMITTED transaction isolation.
Bill Farner [Thu, 21 Jan 2016 22:30:28 +0000 (14:30 -0800)] 
Enable READ COMMITTED transaction isolation.

Bugs closed: AURORA-1580

Reviewed at

2 years agoFix broken Thrift benchmark.
George Sirois [Wed, 20 Jan 2016 19:08:11 +0000 (13:08 -0600)] 
Fix broken Thrift benchmark.

Issue introduced with:

Reviewed at

2 years agoIntroduces -default_docker_parameters scheduler flag.
George Sirois [Wed, 20 Jan 2016 18:18:32 +0000 (12:18 -0600)] 
Introduces -default_docker_parameters scheduler flag.

This flag allows cluster administrators to set arbitrary
Docker parameters which will apply to all jobs.

Also cleans up some of the existing unit tests around task config.

Bugs closed: AURORA-1575

Reviewed at

2 years agoRevert "Shim interfaces to preface args system overhaul."
Bill Farner [Wed, 20 Jan 2016 01:47:32 +0000 (17:47 -0800)] 
Revert "Shim interfaces to preface args system overhaul."

This reverts commit fe13e4ed52d4dc0a35f9e50b5e49c6e705f64579.

Reviewed at

2 years agoShim interfaces to preface args system overhaul.
Bill Farner [Tue, 19 Jan 2016 22:05:48 +0000 (14:05 -0800)] 
Shim interfaces to preface args system overhaul.

Reviewed at

2 years agoVagrant change to reserve part of the dev cluster's resources to 'aurora-role'.
Zhitao Li [Tue, 19 Jan 2016 21:24:19 +0000 (13:24 -0800)] 
Vagrant change to reserve part of the dev cluster's resources to 'aurora-role'.

Bugs closed: AURORA-1109

Reviewed at

2 years agoUpgrade pants to 0.0.68.
John Sirois [Tue, 19 Jan 2016 21:07:18 +0000 (14:07 -0700)] 
Upgrade pants to 0.0.68.

This is the regular weekly release/upgrade.
The CHANGELOG can be read here:

Of interest for Aurora is graceful error handling when running
py.test with `--coverage` enabled.

Testing Done:
Locally green:

Reviewed at

2 years agoFixup broken jmh benchmarks.
John Sirois [Tue, 19 Jan 2016 20:14:57 +0000 (13:14 -0700)] 
Fixup broken jmh benchmarks.

The `TierConfig` binding fix for the `SchedulingBenchmarks` in was also needed for the
`StatusUpdateBenchmarks` and a binding for `ConfigurationManager` was
missing for the `ThriftApiBenchmarks` as-of

While debugging these failures I noticed a new version of jmh had been
released recently, so also upgraded to that.  No changes of import, but
the changelog can be read here:

Testing Done:
Both of these now work on master:
`./gradlew jmh -Pbenchmarks='StatusUpdateBenchmark.*'`
`./gradlew jmh -Pbenchmarks='ThriftApiBenchmark.*'`

I also checked that a full run of `./gradlew jmh` was now green for
all benchmarks.

Reviewed at

2 years agoMake required mesos log args required.
John Sirois [Tue, 19 Jan 2016 16:32:10 +0000 (09:32 -0700)] 
Make required mesos log args required.

Both -native_log_file_path and -native_log_zk_group_path are required
but they were not validated (-native_log_file_path) and validated too
late in a provider (-native_log_zk_group_path) to provide useful
failure messages.  Correct this and make the arguments required in
the arg parsing phase.

Testing Done:
./gradlew clean distZip
unzip -qd /tmp/ dist/distributions/
/tmp/aurora-scheduler-0.12.0-SNAPSHOT/bin/aurora-scheduler \
  -mesos_master_address=localhost:5050 \
  -backup_dir=/tmp \
  -serverset_path=/aurora \
  -cluster_name=test -zk_endpoints=localhost:2181
I0115 20:18:37.890 [main, ArgScanner:443] zk_in_proc (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_in_proc): false
I0115 20:18:37.890 [main, ArgScanner:443] zk_session_timeout (org.apache.aurora.scheduler.zookeeper.guice.client.flagged.FlaggedClientConfig.zk_session_timeout): (4, secs)
I0115 20:18:37.890 [main, ArgScanner:445] -------------------------------------------------------------------------
Exception in thread "main" java.lang.IllegalStateException: A value for the -native_log_file_path flag must be supplied
at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.getRequiredArg(
at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.<init>(

Bugs closed: AURORA-1587

Reviewed at

2 years agoAdd metric for counting uncaught exceptions in async executor.
Zameer Manji [Fri, 15 Jan 2016 18:30:54 +0000 (10:30 -0800)] 
Add metric for counting uncaught exceptions in async executor.

Add metric "async_executor_uncaught_exceptions" for tracking uncaught exceptions
in async executor.

Bugs closed: AURORA-1582

Reviewed at

2 years agoAllow for plugging in cli-configurable filters that are invoked post shiro filters.
Amol Deshmukh [Thu, 14 Jan 2016 22:26:46 +0000 (16:26 -0600)] 
Allow for plugging in cli-configurable filters that are invoked post shiro filters.

Bugs closed: AURORA-1576

Reviewed at

2 years agoFix typo in the user guide about Task Updates
Anant Vyas [Thu, 14 Jan 2016 21:25:10 +0000 (14:25 -0700)] 
Fix typo in the user guide about Task Updates

Testing Done:
Fix a minor typo in the user guide and add a missing "and"

Reviewed at

2 years agoAccept resource offers from multiple framework roles.
Zhitao Li [Thu, 14 Jan 2016 18:43:32 +0000 (10:43 -0800)] 
Accept resource offers from multiple framework roles.

Bugs closed: AURORA-1109

Reviewed at

2 years agoAdd `--show-error` to curl when bootstrapping thrift.
Zameer Manji [Wed, 13 Jan 2016 00:07:24 +0000 (16:07 -0800)] 
Add `--show-error` to curl when bootstrapping thrift.

From the curl documentation:
-S, --show-error

When used with -s it makes curl show an error message if it fails.

It's possible for curl to fail when grabbing the tarball or patch and this will
show users why it failed.

Testing Done:
Ran `make` in the `build-support/thrift` directory.

Reviewed at

2 years agoReplace scheduler log scaffolding with logback.
Bill Farner [Tue, 12 Jan 2016 04:57:49 +0000 (23:57 -0500)] 
Replace scheduler log scaffolding with logback.

Reviewed at

2 years agoUse tags instead of branches for release candidates.
Bill Farner [Tue, 12 Jan 2016 04:06:08 +0000 (23:06 -0500)] 
Use tags instead of branches for release candidates.

Reviewed at

2 years agoEnable H2 query statistics collection.
Zameer Manji [Mon, 11 Jan 2016 18:23:06 +0000 (10:23 -0800)] 
Enable H2 query statistics collection.

With this enabled operators can visit the H2 console at /h2console and run
MAX_EXECUTION_TIME DESC;` to diagnose slow schedulers.

Testing Done:
MAX_EXECUTION_TIME DESC;` within vagrant and saw query statistics.


Master (c595228):
Benchmark                                                                     (numPendingTasks)   Mode  Cnt      Score      Error  Units
SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark                           N/A  thrpt   10  64138.084 ± 6732.130  ops/s
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark                  N/A  thrpt   10  23863.861 ± 2101.622  ops/s
SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark                N/A  thrpt   10   2228.883 ±  311.434  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                                1  thrpt   10     50.914 ±    2.488  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                               10  thrpt   10     43.729 ±    3.038  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                              100  thrpt   10     44.409 ±    4.426  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                             1000  thrpt   10     40.429 ±    7.526  ops/s
SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark                N/A  thrpt   10  22942.538 ± 1281.331  ops/s

This change:
Benchmark                                                                     (numPendingTasks)   Mode  Cnt      Score      Error  Units
SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark                           N/A  thrpt   10  65285.628 ± 2422.816  ops/s
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark                  N/A  thrpt   10  24573.332 ± 1332.474  ops/s
SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark                N/A  thrpt   10   2430.402 ±  258.860  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                                1  thrpt   10     43.810 ±    2.669  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                               10  thrpt   10     37.378 ±   14.637  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                              100  thrpt   10     40.180 ±    9.738  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark                             1000  thrpt   10     24.130 ±   15.746  ops/s
SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark                N/A  thrpt   10  18429.830 ± 3077.426  ops/s

Reviewed at

2 years agoChange release script to use rel/ tag prefix.
Bill Farner [Sun, 10 Jan 2016 19:59:23 +0000 (11:59 -0800)] 
Change release script to use rel/ tag prefix.

Reviewed at

2 years agoFix flaky `ServerSetImplTest` test.
John Sirois [Sat, 9 Jan 2016 18:10:38 +0000 (11:10 -0700)] 
Fix flaky `ServerSetImplTest` test.

The `testUnwatchOnException` test method uses a forced
InterruptedException to test that watches are un-registered.  In doing
so, the test inadvertantly set the interrupt bit for the test runner
thread, poisoning subsequent tests that invoked blocking code.  The
poisoning was only evident when the test methods were not run in lexical
order, which is the case in the vagrant vm.  This fix explicitly clears
the interrupt bit for the test thread with an explanation of why this is

Testing Done:
Before the fix, this consistent error in the vagrant VM:
vagrant@aurora:~/aurora$ ./gradlew --rerun-tasks commons:test --tests org.apache.aurora.common.zookeeper.ServerSetImplTest
org.apache.aurora.common.zookeeper.ServerSetImplTest > testOrdering FAILED$MonitorException at
        Caused by: org.apache.aurora.common.zookeeper.Group$WatchException at
            Caused by: org.apache.aurora.common.zookeeper.Group$JoinException at
                Caused by: java.lang.InterruptedException at

Green after the fix in the vm and when run normally on my machine.

Bugs closed: AURORA-1574

Reviewed at

2 years agoUpgrade to pants 0.0.67.
John Sirois [Fri, 8 Jan 2016 19:42:16 +0000 (12:42 -0700)] 
Upgrade to pants 0.0.67.

The CHANGELOG can be read here:

Of note for aurora is an upgrade to pex 1.1.2 which
improves artifact resolution times.

Testing Done:
Locally green: `./build-support/jenkins/`

Reviewed at

2 years agoBump JMH to 1.11.2.
Zameer Manji [Fri, 8 Jan 2016 19:16:42 +0000 (11:16 -0800)] 
Bump JMH to 1.11.2.

Bump JMH to the latest available release which is 1.11.2. There isn't a
CHANGELOG but the commit history shows several bug fixes:

Testing Done:
./gradlew jmh -Pbenchmarks='UpdateStoreBenchmarks.*'

Reviewed at

2 years agoFix exception thrown in SchedulingBenchmarks set up.
Zameer Manji [Fri, 8 Jan 2016 18:27:50 +0000 (10:27 -0800)] 
Fix exception thrown in SchedulingBenchmarks set up.

SchedulingBenchmarks were broken because of a missing binding to `TeirConfig`
and an invalid parameter to `PreemptorModule`.

Testing Done:
./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.*'

Reviewed at

2 years agoThermos: Add ability to specify process outputs destination
Martin Hrabovcin [Fri, 8 Jan 2016 16:18:11 +0000 (09:18 -0700)] 
Thermos: Add ability to specify process outputs destination

This patch will provide way to **optionally** specify running process outputs destination. Implementation was built on top of

**What was changed:**

New `destination` parameter is available on global cluster level and also on each `Process` level. Possible options are `file` (default), `stream` to parent process stdout/stderr, `mixed` will split output to files and stream and finally `none` to discard any logs produced by running process.

Testing Done:
Unit test coverage is provided for new functionality.

I did also manual testing with mesos/docker and I made sure that logs are being written to expected files and also same output gets to docker daemon.

Bugs closed: AURORA-1548

Reviewed at

2 years agoAdding gpg key for
John Sirois [Thu, 7 Jan 2016 21:15:38 +0000 (14:15 -0700)] 
Adding gpg key for

Reviewed at

2 years agoAmend install instructions to cover dependency missing from mesos deb.
Bill Farner [Thu, 7 Jan 2016 05:54:19 +0000 (21:54 -0800)] 
Amend install instructions to cover dependency missing from mesos deb.

Reviewed at

2 years agoUpdate and slightly extend the beginner tutorial
Stephan Erb [Sun, 3 Jan 2016 20:37:28 +0000 (21:37 +0100)] 
Update and slightly extend the beginner tutorial

Reviewed at

2 years agoAdd NEWS entry for "Allow custom announce path."
Bill Farner [Thu, 7 Jan 2016 05:15:01 +0000 (21:15 -0800)] 
Add NEWS entry for "Allow custom announce path."

2 years agoAllow custom announce path.
Kunal Thakar [Thu, 7 Jan 2016 05:07:42 +0000 (21:07 -0800)] 
Allow custom announce path.

Bugs closed: AURORA-1569

Reviewed at

2 years agoPopulating and validating task config in getJobUpdateDiff RPC.
Maxim Khutornenko [Wed, 6 Jan 2016 17:39:15 +0000 (09:39 -0800)] 
Populating and validating task config in getJobUpdateDiff RPC.

Bugs closed: AURORA-1571

Reviewed at

2 years agoKill flaky TaskObserverTest.
John Sirois [Tue, 5 Jan 2016 17:54:16 +0000 (09:54 -0800)] 
Kill flaky TaskObserverTest.

Previously, a mock threading.Event was waited on in one thread
and the count of waits was read in another thread.  Most thread
memory models do not guaranty reads are fresh in this scenario
unless there is a memory barrier of some sort forcing per-cpu
caches to be flushed.

Since the test really only verified correct conversion of a poll
interval to fractional seconds - kill the test as not pulling its

Bugs closed: AURORA-1570

Reviewed at

2 years agoAvoid zk 3.4.7 to fix test hangs.
John Sirois [Tue, 5 Jan 2016 17:52:34 +0000 (09:52 -0800)] 
Avoid zk 3.4.7 to fix test hangs.

The commons tests hang under CI after bumping from zk 3.4.2 to 3.4.7.
Although not root-caused, this zk bug introduced in 3.4.7 seems like a
match for this sort of hang:

Downgrade to 3.4.6 with a note about why 3.4.7 should be skipped.

Reviewed at

2 years agoUpgrade to pants 0.0.66.
John Sirois [Tue, 5 Jan 2016 16:55:13 +0000 (08:55 -0800)] 
Upgrade to pants 0.0.66.

The changelog can be read here:

Of note for aurora is the ability to kill

Additionally, add the pants generated .pids/ dir to .gitignore and
normalize all directory ignore to omit the redundant *.

Reviewed at