Bill Farner [Wed, 8 Nov 2017 04:47:30 +0000 (20:47 -0800)]
Updating .auroraversion to 0.19.0-rc0.
Bill Farner [Wed, 8 Nov 2017 04:47:30 +0000 (20:47 -0800)]
Incrementing snapshot version to 0.20.0-SNAPSHOT.
Bill Farner [Wed, 8 Nov 2017 04:47:30 +0000 (20:47 -0800)]
Updating CHANGELOG for 0.19.0 release.
Bill Farner [Wed, 8 Nov 2017 04:47:04 +0000 (20:47 -0800)]
Update release notes in preparation for 0.19.0 release
Bill Farner [Wed, 8 Nov 2017 02:45:05 +0000 (18:45 -0800)]
Use a pair of fields for caching offer resources rather than a Cache
Reviewed at https://reviews.apache.org/r/63454/
David McLaughlin [Wed, 8 Nov 2017 00:22:08 +0000 (16:22 -0800)]
Display pending task reasons in TaskList
Reviewed at https://reviews.apache.org/r/63650/
David McLaughlin [Wed, 8 Nov 2017 00:13:02 +0000 (16:13 -0800)]
Don't show host data when task is Throttled.
PENDING and THROTTLED tasks are considered active and they dont have hosts. This manifests in having "null" host links.
Reviewed at https://reviews.apache.org/r/63648/
Reza Motamedi [Wed, 8 Nov 2017 00:08:13 +0000 (16:08 -0800)]
Polling updates page if in progress in UI
Reviewed at https://reviews.apache.org/r/63337/
Stephan Erb [Tue, 7 Nov 2017 07:26:58 +0000 (08:26 +0100)]
Migrate from findbugs to spotbugs
Findbugs [1] is no longer developed and replaced by spotbugs [2]
as mostly a drop-in replacement.
[1] https://github.com/findbugsproject/findbugs
[2] https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2017-September/004383.html
Reviewed at https://reviews.apache.org/r/63564/
Jordan Ly [Thu, 2 Nov 2017 21:49:10 +0000 (14:49 -0700)]
Fixed issue where saving attributes are not being persisted to log
A bug was introduced when the old `MemAttributeStore` was revived. Previously,
the `saveHostAttributes` method did not return anything. However, after
migrating to the DB stores, the signature of the interface was changed to return
a `boolean` if the save modified the previous attributes. The new changes
accidentally inverted the order. The `AbstractAttributeStoreTest` did not test
for this scenario so it went unnoticed.
Reviewed at https://reviews.apache.org/r/63521/
Stephan Erb [Thu, 2 Nov 2017 11:01:40 +0000 (12:01 +0100)]
Terminate the executor on unhandled errors
This commit consits of two independent parts:
a) ensure we interrupt the main thread when there are unhandled exceptions
b) ensure the main thread of the executor can be interrupted
Testing Done:
This bug is pretty hard to reproduce and test. I therefore opted for a manual
verification and injected an exception throw shortly before the last statement
of the `AuroraExecutor._shutdown` method. Without this patch, this resulted in
hanging executors on the host. With this patch everything is terminated as
expected.
For details of the suffessful run, please see the executor logs below. Please
note that the `apport.fileutils` is due to Ubuntu messing with its Python
installation. This is not critical.
```
twitter.common.app debug: Initializing: apache.thermos.common.excepthook (Exception termination handler.)
I1031 15:59:37.188621 25437 exec.cpp:162] Version: 1.2.0
I1031 15:59:37.192201 25429 exec.cpp:237] Executor registered on agent
93259518-14f4-4956-a39c-
aa615bff9a5e-S0
Writing log files to disk in /var/lib/mesos/slaves/
93259518-14f4-4956-a39c-
aa615bff9a5e-S0/frameworks/
7b202c2e-8796-4f27-afeb-
8b76ba4b3037-0000/executors/thermos-www-data-prod-hello-0-
d8d50c2f-e79b-467d-8c65-
cca3cb44cf9c/runs/
54a5ed51-aa9b-476f-9f75-
0b42bd6dfa8d
ERROR] Unhandled error in <StatusManager(Thread-7 [TID=25450], started daemon
139968452134656)>. Interrupting main thread.
Traceback (most recent call last):
File "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.
f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py", line 126, in _excepting_run
self.__real_run(*args, **kw)
File "apache/aurora/executor/status_manager.py", line 62, in run
File "apache/aurora/executor/aurora_executor.py", line 236, in _shutdown
RuntimeError: Woops!
Exception in thread Thread-7 [TID=25450]:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/root/.pex/install/twitter.common.decorators-0.3.7-py2-none-any.whl.
b23f2874a4392741fca582d9e0528c08e0335c68/twitter.common.decorators-0.3.7-py2-none-any.whl/twitter/common/decorators/threads.py", line 115, in identified
return instancemethod(self, *args, **kwargs)
File "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.
f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py", line 130, in _excepting_run
sys.excepthook(*sys.exc_info())
File "apache/thermos/common/excepthook.py", line 41, in teardown_handler
self._former_hook()(exc_type, value, trace)
File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
ImportError: No module named apport.fileutils
twitter.common.app debug: main exited with ^C
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for apache.thermos.common.excepthook (Exception termination handler.)
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug: Active thread: <_MainThread(MainThread, started
139968622749504)>
twitter.common.app debug: Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[www-data-prod-hello-0-
d8d50c2f-e79b-467d-8c65-
cca3cb44cf9c] [TID=25449], started daemon
139967951009536)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-13, started daemon
139968485705472)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-9, started daemon
139967934224128)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-12, started daemon
139967942616832)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-3, started daemon
139968510883584)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-11, started daemon
139967925831424)>
twitter.common.app debug: Exiting cleanly.
```
Corresponding agent logs, indicating that Mesos knows about the crash on teardown:
```
I1031 15:59:54.692739 1956 slave.cpp:4769] Executor 'thermos-www-data-prod-hello-0-
d8d50c2f-e79b-467d-8c65-
cca3cb44cf9c' of framework
7b202c2e-8796-4f27-afeb-
8b76ba4b3037-0000 exited with status 130
I1031 15:59:54.692834 1956 slave.cpp:4869] Cleaning up executor 'thermos-www-data-prod-hello-0-
d8d50c2f-e79b-467d-8c65-
cca3cb44cf9c' of framework
7b202c2e-8796-4f27-afeb-
8b76ba4b3037-0000 at executor(1)@192.168.33.7:48931
I1031 15:59:54.692996 1956 slave.cpp:4957] Cleaning up framework
7b202c2e-8796-4f27-afeb-
8b76ba4b3037-0000
```
Bugs closed: AURORA-1955
Reviewed at https://reviews.apache.org/r/63443/
Jordan Ly [Tue, 31 Oct 2017 17:20:27 +0000 (10:20 -0700)]
Refactor staticallyBannedOffers into a LRU cache
Using the new `hold_offers_forever` option, it is possible for the
`staticallyBannedOffers` to grow very large in size as we never release
offers.
1. The current behavior of `staticallyBannedOffers` is (kinda) preserved.
Entries will no longer be removed when the offer is used, but they will be
removed within `maxOfferHoldTime`. This means cluster operators will not
have to think about the new `offer_static_ban_cache_max_size` if they aren't
affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers
indefinitely can cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.
Reviewed at https://reviews.apache.org/r/63199/
Stephan Erb [Tue, 31 Oct 2017 16:24:04 +0000 (17:24 +0100)]
Remove inaccurate "Initializing sandbox" message
The message is no longer completely accurate, now that we remain in
`STARTING` until health checks have passed.
Reviewed at https://reviews.apache.org/r/63435/
Bill Farner [Tue, 31 Oct 2017 04:58:13 +0000 (21:58 -0700)]
Remove endpoint.thrift, ServiceInstance is never serialized to thrift
This enables removal of some unnecessary complexity in the build (commons no
longer needs thrift) and the unused Codec abstraction (we always encode in
JSON).
Reviewed at https://reviews.apache.org/r/63418/
David McLaughlin [Mon, 30 Oct 2017 23:26:13 +0000 (16:26 -0700)]
Condense whitespace of navigation and breadcrumb, reduce impact of quota widget
Reviewed at https://reviews.apache.org/r/63406/
David McLaughlin [Mon, 30 Oct 2017 15:37:19 +0000 (08:37 -0700)]
Add resource units to config summary
Reviewed at https://reviews.apache.org/r/63375/
Bill Farner [Mon, 30 Oct 2017 15:03:44 +0000 (08:03 -0700)]
Add support for generating patch RCs from non-master branches
Reviewed at https://reviews.apache.org/r/63401/
Bill Farner [Sun, 29 Oct 2017 17:27:06 +0000 (10:27 -0700)]
Add release notes for 0.18.1
Bill Farner [Sat, 28 Oct 2017 00:26:34 +0000 (17:26 -0700)]
Suppress multiline logging from mesos callbacks
Reviewed at https://reviews.apache.org/r/63383/
Jordan Ly [Fri, 27 Oct 2017 21:34:01 +0000 (14:34 -0700)]
MesosCallbackHandler uses separate eventbus for registered call
We should have `registered` use its own eventbus so it does not get blocked
by other calls.
Bugs closed: AURORA-1953
Reviewed at https://reviews.apache.org/r/63316/
David McLaughlin [Fri, 27 Oct 2017 20:10:00 +0000 (13:10 -0700)]
Revert to old Job Page tab names and add counts
Changing the names of tabs causes unnecessary confusion. Revert to "active tasks/completed tasks" and add the instance count back to both.
Reviewed at https://reviews.apache.org/r/63374/
David McLaughlin [Fri, 27 Oct 2017 20:08:29 +0000 (13:08 -0700)]
Reduce white-space on role and env pages
Reviewed at https://reviews.apache.org/r/63373/
David McLaughlin [Fri, 27 Oct 2017 20:03:38 +0000 (13:03 -0700)]
Revert role searching in UI to old behavior
Move from prefix search to full-text matching.
Reviewed at https://reviews.apache.org/r/63364/
David McLaughlin [Thu, 26 Oct 2017 22:27:05 +0000 (15:27 -0700)]
Support updates with no desiredState on Job and Update pages
When updates only delete instances, desiredState is null.
Reviewed at https://reviews.apache.org/r/63344/
David McLaughlin [Thu, 26 Oct 2017 22:19:22 +0000 (15:19 -0700)]
Search entire job name for query string on JobList
Reviewed at https://reviews.apache.org/r/63339/
David McLaughlin [Thu, 26 Oct 2017 20:37:12 +0000 (13:37 -0700)]
Do not fetch neighbor tasks if no active task
Reviewed at https://reviews.apache.org/r/63333/
David McLaughlin [Thu, 26 Oct 2017 18:45:56 +0000 (11:45 -0700)]
Clean up TaskList component layout.
Reviewed at https://reviews.apache.org/r/63281/
Reza Motamedi [Wed, 25 Oct 2017 22:30:37 +0000 (15:30 -0700)]
Reload instance page when URL changes.
Reviewed at https://reviews.apache.org/r/63221/
David McLaughlin [Wed, 25 Oct 2017 21:51:46 +0000 (14:51 -0700)]
Add release notes for new UI
Reviewed at https://reviews.apache.org/r/63306/
David McLaughlin [Wed, 25 Oct 2017 20:00:04 +0000 (13:00 -0700)]
Add a package.json file in the plugin directory to allow custom dependencies
Problem: if you're using the plugin mechanism of the new UI, you cannot add your own custom dependencies without a fork of the package.json file. This adds a package.json file into the plugin diretory that is used to install dependencies into the main node_modules directory. With this separate package.json file, we remove the burden of upstream merge conflicts.
Reviewed at https://reviews.apache.org/r/63262/
Jordan Ly [Wed, 25 Oct 2017 17:21:55 +0000 (10:21 -0700)]
Refactor veto logic to use direct method calls as opposed to pubsub events.
SchedulingFilterNotifier currently publishes veto events to be consumed by various metadata classes (NearestFit and TaskVars). These veto events cause a lot object allocations/async tasks. We can reduce the number of objects made by directly calling methods and not using pubsub events.
Reviewed at https://reviews.apache.org/r/63236/
David McLaughlin [Wed, 25 Oct 2017 17:04:47 +0000 (10:04 -0700)]
Remove the old UI and serve the new UI instead
Reviewed at https://reviews.apache.org/r/63282/
Bill Farner [Wed, 25 Oct 2017 06:34:09 +0000 (23:34 -0700)]
Exclusively use Map-based in-memory stores for primary storage
This patch introduces map-based volatile stores, most of which were revived
from git history with minimal changes. The DB storage system is now only
used in a temporary storage when replaying a snapshot containing the `dbScript`
field.
Note that this change removes the transactional nature of in-memory storage
operations as well as the `READ COMMITTED` transaction isolation previously
available to some stores (proven in necessary changes to
`StorageTransactionTest`). This means some stores will permit dirty reads
when they previously did not. `TaskStore` has always had this non-transactional
behavior by default, as the DB task store was never deemed suitable for
production. Nonetheless, this non-transactional behavior should be considered
safe as the scheduler fails over on a storage operation failure, and relies on
the persistent log storage for transaction atomicity.
Reviewed at https://reviews.apache.org/r/62869/
David McLaughlin [Tue, 24 Oct 2017 22:33:41 +0000 (15:33 -0700)]
Do not reserve agents for updates when constraints change.
Reviewed at https://reviews.apache.org/r/63261/
David McLaughlin [Tue, 24 Oct 2017 21:56:56 +0000 (14:56 -0700)]
Fix alignment of text on JobList
Reviewed at https://reviews.apache.org/r/63260/
Mauricio Garavaglia [Mon, 23 Oct 2017 21:09:47 +0000 (23:09 +0200)]
Move job environment validation to the scheduler
Removed the Job environment validation from the command line client. Validation was moved to the
the scheduler side through the `allowed_job_environments` option. By default allowing any of
`devel`, `test`, `production`, and any value matching the regular expression `staging[0-9]*`.
This allows to have a consistent behavior when using the CLI and the API.
Reviewed at https://reviews.apache.org/r/62692/
David McLaughlin [Mon, 23 Oct 2017 19:48:20 +0000 (12:48 -0700)]
Add sorting and filtering controls for TaskList
Reviewed at https://reviews.apache.org/r/63188/
Bill Farner [Mon, 23 Oct 2017 16:41:50 +0000 (09:41 -0700)]
Update to shiro 1.2.5
Reviewed at https://reviews.apache.org/r/63217/
David McLaughlin [Mon, 23 Oct 2017 16:41:05 +0000 (09:41 -0700)]
Fix back button issue on Jobs page
Reviewed at https://reviews.apache.org/r/63197/
Bill Farner [Mon, 23 Oct 2017 15:01:50 +0000 (08:01 -0700)]
Update to guava 23.2
Reviewed at https://reviews.apache.org/r/63204/
Bill Farner [Sat, 21 Oct 2017 18:12:26 +0000 (11:12 -0700)]
Add test case for regression of AURORA-1952
Reviewed at https://reviews.apache.org/r/63202/
David McLaughlin [Fri, 20 Oct 2017 21:52:38 +0000 (14:52 -0700)]
Add an example of using the UI plugin mechanism
Reviewed at https://reviews.apache.org/r/63169/
David McLaughlin [Fri, 20 Oct 2017 21:51:42 +0000 (14:51 -0700)]
Display cron time as UTC
Reviewed at https://reviews.apache.org/r/63187/
David McLaughlin [Fri, 20 Oct 2017 21:51:10 +0000 (14:51 -0700)]
Prevent diff line from overflowing container
Reviewed at https://reviews.apache.org/r/63186/
Reza Motamedi [Fri, 20 Oct 2017 17:44:50 +0000 (10:44 -0700)]
Expose list of neighbors in the instance page
Reviewed at https://reviews.apache.org/r/63062/
David McLaughlin [Fri, 20 Oct 2017 17:36:45 +0000 (10:36 -0700)]
Add Cache-Control header to static assets, to allow for cache expiration
Reviewed at https://reviews.apache.org/r/63176/
Bill Farner [Fri, 20 Oct 2017 02:39:02 +0000 (19:39 -0700)]
Provide a formal way to disable offer declining
Increasing the offer hold time to effectively disable offer declines is a trap,
as the queue of asynchronous declines will grow without bound. This introduces
a command line argument to explicitly disable declining.
Reviewed at https://reviews.apache.org/r/63157/
David McLaughlin [Thu, 19 Oct 2017 21:37:56 +0000 (14:37 -0700)]
Refactor Job Page to make it more pluggable
Reviewed at https://reviews.apache.org/r/63135/
Bill Farner [Thu, 19 Oct 2017 21:25:30 +0000 (14:25 -0700)]
Use LockStore only for backwards compatibility
Enter backwards compatibility mode for LockStore. This means we will restore
and acquire locks as before, but will ignore them otherwise.
Following the next release, `LockStore` will be removed.
Please note that `JobUpdateController` already provides the one-at-a-time
update semantic in addition to using the legacy lock system for the same
purpose.
Reviewed at https://reviews.apache.org/r/63130/
David McLaughlin [Thu, 19 Oct 2017 20:48:28 +0000 (13:48 -0700)]
Add banner pointing to new UI
Reviewed at https://reviews.apache.org/r/63165/
David McLaughlin [Thu, 19 Oct 2017 17:15:23 +0000 (10:15 -0700)]
Cosmetic changes to Navigation and task metadata
Reviewed at https://reviews.apache.org/r/63132/
Bill Farner [Thu, 19 Oct 2017 01:14:02 +0000 (18:14 -0700)]
When scheduling, skip offers with no CPU and no mem
There's no reason for us to evaluate offers with no CPUs or memory,
so reject them early in the offer lifecycle.
This is an incremental performance optimization, but it may net significant
improvements based on observations in some very large clusters.
Reviewed at https://reviews.apache.org/r/62956/
Bill Farner [Thu, 19 Oct 2017 00:33:50 +0000 (17:33 -0700)]
Remove legacy commons ZK code
Reviewed at https://reviews.apache.org/r/62652/
David McLaughlin [Wed, 18 Oct 2017 23:16:21 +0000 (16:16 -0700)]
Add cron configuration to Job Page
Reviewed at https://reviews.apache.org/r/63125/
David McLaughlin [Wed, 18 Oct 2017 22:09:36 +0000 (15:09 -0700)]
Fix lint build error from Fonts patch.
Reviewed at https://reviews.apache.org/r/63129/
David McLaughlin [Wed, 18 Oct 2017 21:24:52 +0000 (14:24 -0700)]
Add fonts again (with line-endings intact!)
David McLaughlin [Wed, 18 Oct 2017 21:18:36 +0000 (14:18 -0700)]
Remove corrupted fonts and add font files to .gitattributes to prevent line-ending formatting by git
Reviewed at https://reviews.apache.org/r/63122/
David McLaughlin [Wed, 18 Oct 2017 20:15:13 +0000 (13:15 -0700)]
Role and Environment Page fixes:
* Remove env column on environment pages.
* Tidy up CSS that caused names to not be lined up properly with long environment name.
* Allow you to search across job type, tier and environment.
Reviewed at https://reviews.apache.org/r/63117/
David McLaughlin [Wed, 18 Oct 2017 17:26:17 +0000 (10:26 -0700)]
Clean up Job Page CSS.
* Make update list smaller (was too dominant on the page).
* Show update progress/size of history.
* Tidy up whitespace.
* Move expander to end of task list item.
* Wrap the main job overview loading element in a panel group to prevent jarring page change as content loads.
Reviewed at https://reviews.apache.org/r/63098/
David McLaughlin [Wed, 18 Oct 2017 17:18:19 +0000 (10:18 -0700)]
Detect and parse Thermos config in Diff output
Reviewed at https://reviews.apache.org/r/63092/
David McLaughlin [Wed, 18 Oct 2017 16:58:35 +0000 (09:58 -0700)]
Add Source Sans Pro font to project
Reviewed at https://reviews.apache.org/r/63099/
David McLaughlin [Tue, 17 Oct 2017 21:57:27 +0000 (14:57 -0700)]
Add diff viewer to Update Page
Reviewed at https://reviews.apache.org/r/63083/
David McLaughlin [Tue, 17 Oct 2017 21:53:50 +0000 (14:53 -0700)]
Add pointer for pagination links
Reviewed at https://reviews.apache.org/r/63088/
David McLaughlin [Tue, 17 Oct 2017 21:44:11 +0000 (14:44 -0700)]
Fix instance range display
Reviewed at https://reviews.apache.org/r/63087/
David McLaughlin [Tue, 17 Oct 2017 20:29:43 +0000 (13:29 -0700)]
Add URL handling for tab switching on Job page
Reviewed at https://reviews.apache.org/r/62958/
David McLaughlin [Tue, 17 Oct 2017 20:15:44 +0000 (13:15 -0700)]
Hide InstanceHistory when there are no old tasks.
Reviewed at https://reviews.apache.org/r/63082/
David McLaughlin [Tue, 17 Oct 2017 20:15:07 +0000 (13:15 -0700)]
Format constraints on Task Config Summary
Reviewed at https://reviews.apache.org/r/63081/
David McLaughlin [Tue, 17 Oct 2017 20:14:25 +0000 (13:14 -0700)]
Center pagination links when not a table.
Reviewed at https://reviews.apache.org/r/63079/
David McLaughlin [Tue, 17 Oct 2017 20:13:31 +0000 (13:13 -0700)]
Clean up State Machine CSS to handle long messages
Reviewed at https://reviews.apache.org/r/63064/
David McLaughlin [Tue, 17 Oct 2017 17:37:47 +0000 (10:37 -0700)]
Hide pagination links on Role and Job lists when only one page
Reviewed at https://reviews.apache.org/r/63078/
David McLaughlin [Tue, 17 Oct 2017 17:37:01 +0000 (10:37 -0700)]
Fix link on Navigation logo
Reviewed at https://reviews.apache.org/r/63065/
Derek Slager [Tue, 17 Oct 2017 01:12:01 +0000 (21:12 -0400)]
Update list of Companies using Aurora.
Reviewed at https://reviews.apache.org/r/63052/
David McLaughlin [Mon, 16 Oct 2017 23:36:56 +0000 (16:36 -0700)]
Manage Bootstrap with webpack.
Reviewed at https://reviews.apache.org/r/62955/
David McLaughlin [Mon, 16 Oct 2017 23:31:34 +0000 (16:31 -0700)]
Add support for controlling API url in UI without modifying code.
Reviewed at https://reviews.apache.org/r/63040/
David McLaughlin [Mon, 16 Oct 2017 23:25:59 +0000 (16:25 -0700)]
Protect against null value in RoleQuota
Reviewed at https://reviews.apache.org/r/63038/
Stephan Erb [Sun, 15 Oct 2017 16:00:17 +0000 (18:00 +0200)]
Use compatible Curator session and connection timeouts
Curator will warn if used with a connection timeout that is lower than
the session timeout [1]. As it uses a default connection timeout of 15s
[2], this warning will be emitted using the Aurora default settings.
This patch remedies this issue in two ways:
* Making the Curator connection timeout configurable
* Bumping the session timeout to 15s. The current default of 4s is
pretty small and could lead to unexpected failovers during long GC
pauses. This is especially problematic as a failover in Aurora can
be lengthy.
[1] https://github.com/apache/curator/blob/
15eb063fa22569e797f850fb8d60a0949f52fbf5/curator-client/src/main/java/org/apache/curator/CuratorZookeeperClient.java#L118-L121
[2] https://github.com/apache/curator/blob/
6ba4de36d4e8b2b65d45c005a6a92dd85c3c497f/curator-framework/src/main/java/org/apache/curator/framework/CuratorFrameworkFactory.java#L60-L61
Reviewed at https://reviews.apache.org/r/62835/
David McLaughlin [Thu, 12 Oct 2017 21:08:39 +0000 (14:08 -0700)]
Implement Job page in React
Reviewed at https://reviews.apache.org/r/62908/
Bill Farner [Wed, 11 Oct 2017 00:24:45 +0000 (17:24 -0700)]
Use a simpler command line argument system
Reviewed at https://reviews.apache.org/r/62623/
Bill Farner [Tue, 10 Oct 2017 22:21:08 +0000 (15:21 -0700)]
Stream backup file from disk
This reduces the memory burden of loading a backup for recovery.
Previously, the backup file would be fully loaded into a `byte[]`, which
may be very large and fail to allocate.
Reviewed at https://reviews.apache.org/r/62873/
David McLaughlin [Tue, 10 Oct 2017 17:14:50 +0000 (10:14 -0700)]
Implement Update and Updates pages in React.
Reviewed at https://reviews.apache.org/r/62763/
Bill Farner [Tue, 10 Oct 2017 16:59:49 +0000 (09:59 -0700)]
Fix broken end-to-end tests
TContentAwareServlet constrains the supported Content-Type headers,
resulting in test_kerberos_end_to_end.sh failing with the error
`Unsupported Content-Type: application/x-www-form-urlencoded`, which
is the Content-Type header curl chooses when the --data-binary
argument is passed
Reviewed at https://reviews.apache.org/r/62857/
David McLaughlin [Tue, 10 Oct 2017 15:53:32 +0000 (08:53 -0700)]
Implement Instance pages in React
Reviewed at https://reviews.apache.org/r/62720/
Stephan Erb [Sun, 8 Oct 2017 17:40:53 +0000 (19:40 +0200)]
Run Jenkins tests without the Gradle daemon
This follows the recommendation of Gradle to only use their daemon when
running in local environments but not in CI environments.
We are seeing spurious build failures from time to time on the shared
Apache build server. Disabling the daemon might help to prevent those.
https://docs.gradle.org/current/userguide/gradle_daemon.html#when_should_i_not_use_the_gradle_daemon
Reviewed at https://reviews.apache.org/r/62832/
Stephan Erb [Sun, 8 Oct 2017 17:17:22 +0000 (19:17 +0200)]
Fix documentation of pystachio Volume struct
For details, see
https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/config/schema/base.py#L145
Reviewed at https://reviews.apache.org/r/62829/
Stephan Erb [Sun, 8 Oct 2017 16:41:35 +0000 (18:41 +0200)]
Switch release checksum to sha512
For our releases we will now be using .sha512 files rather than .sha files
containing sha1 checksums. This change is triggered by a recent update of
the Apache Release Distribution Policy.
Please see this mail for details:
```
Hi PMC,
The Release Distribution Policy[1] changed regarding .sha files.
See under "Cryptographic Signatures and Checksums Requirements" [2].
Old policy :
-- use extension .sha for any SHA checksum (SHA-1, SHA-256, SHA-512)
New policy :
-- use .sha1 for a SHA-1 checksum
-- use .sha256 for a SHA-256 checksum
-- use .sha512 for a SHA-512 checksum
-- [*] .sha should contain a SHA-1
Why this change ?
-- Verifying a checksum under the old policy is/was not handy.
You have to inspect the .sha to find out which algorithm
should be used ; or try them all (SHA-1, SHA256, etc).
The new scheme avoids this ambiguity.
-- The last point[*] was only added for clarity. Most of the
old, stale .sha's contain a SHA-1. The relatively new .sha's
contain a SHA-512. The expectation is that the last catagory will
disappear, when active projects adapt to the 'new' convention.
Impact :
-- Should be none ; many projects already use the 'new' convention.
-- Please ask your release managers to use .sha1, .sha256, .sha512
instead of the .sha extension.
-- Please fix your build-tools if you have any.
Piggyback :
-- The policy requires a .md5 for every package ;
providing a .sha512 is recommended.
Since MD5 is essentially broken, it is to be expected that
in the future a .sha512 will be required.
Perhaps it is wize to start providing .sha512's
with your releases if you do not already do so.
-- Visit http://mirror-vm.apache.org/checker/
to check the health of your /dist/-area ;
my stuff ; any feedback is most welcome.
Thanks ; regards,
Henk Penning
[1] http://www.apache.org/dev/release-distribution
[2] http://www.apache.org/dev/release-distribution#sigs-and-sums
```
Reviewed at https://reviews.apache.org/r/62830/
Jordan Ly [Wed, 4 Oct 2017 20:07:19 +0000 (13:07 -0700)]
Convert Webhook to AbstractIdleService, use async HTTP client
Hijacking https://reviews.apache.org/r/59703
From the above review: "Current code uses a synchronous HTTP client, which can block the EventBus. Switch to an async HTTP client."
Previously, we had an issue where the HTTP client would have a non-daemon thread which caused the Scheduler to fail to shutdown. I converted it into an AbstractIdleService and properly closed the client in the shutdown() method. Additionally, I made a small tweak to the original code where we ABORT any response receieved after the status since we don't care. We just use the response code for stats.
Testing Done:
./gradlew test
Tested proper shutdown occurs in Vagrant.
Scale tested up to 2000 TASK_LOST events with the registered endpoint waiting 5-10 minutes to response -- does not seem to block scheduling.
Bugs closed: AURORA-1773
Reviewed at https://reviews.apache.org/r/62700/
David McLaughlin [Tue, 3 Oct 2017 19:49:38 +0000 (12:49 -0700)]
Implement Role and Environment pages in Preact.
Reviewed at https://reviews.apache.org/r/62451/
Bill Farner [Sun, 1 Oct 2017 15:49:06 +0000 (08:49 -0700)]
Replace auto-generated forwarding code with manual implementations
Reviewed at https://reviews.apache.org/r/62716/
Bill Farner [Fri, 29 Sep 2017 22:30:06 +0000 (15:30 -0700)]
Remove the rewriteConfigs thrift method
Reviewed at https://reviews.apache.org/r/62601/
Jordan Ly [Fri, 29 Sep 2017 00:23:17 +0000 (17:23 -0700)]
Added additional stop() to prevent errors in run() to stop shutdown in SchedulerMain
Ensure that `SchedulerMain.run()` calls stop in the case of exceptions. This prevents the Scheduler from being transitioned to DEAD state, but not actually stopping it's services.
See the attached ticket for an example of issue happening.
Testing Done:
Added an additional unit test for prepare() failing in `SchedulerLifecycle.java`.
./gradlew test
./build-support/jenkin/build.sh
Bugs closed: AURORA-1950
Reviewed at https://reviews.apache.org/r/62626/
Jordan Ly [Thu, 28 Sep 2017 22:18:14 +0000 (00:18 +0200)]
Allow transitions from any state to STOPPED in CallOrderEnforcingStorage
- Allow transitions from any state to STOPPED in CallOrderEnforcingStorage,
including adding a STOPPED -> STOPPED transition so stop() can be used idempotent.
- Use the StateMachines.checkState method (I wasn't sure if the current checkInState
was designed for anything other than throwing a TransientStorageException)
Bugs closed: AURORA-1950
Reviewed at https://reviews.apache.org/r/62621/
David McLaughlin [Wed, 27 Sep 2017 21:34:02 +0000 (14:34 -0700)]
Replace Preact and custom testing with React + Enzyme
Reviewed at https://reviews.apache.org/r/62607/
Bill Farner [Wed, 27 Sep 2017 18:39:56 +0000 (11:39 -0700)]
Workaround to get pants working in macOS high sierra
This is a cheat to use pants' thrift binary from 10.12.
Reviewed at https://reviews.apache.org/r/62608/
Bill Farner [Wed, 27 Sep 2017 18:38:25 +0000 (11:38 -0700)]
Fix binding issues preventing ./gradle run from working
Reviewed at https://reviews.apache.org/r/62620/
Bill Farner [Wed, 27 Sep 2017 01:51:52 +0000 (18:51 -0700)]
Use a more efficient query for instance ID collision detection
Reviewed at https://reviews.apache.org/r/62604/
Bill Farner [Tue, 26 Sep 2017 23:42:07 +0000 (16:42 -0700)]
Restore scheduler benchmarks to working order
Reviewed at https://reviews.apache.org/r/62558/
Bill Farner [Sat, 23 Sep 2017 15:05:08 +0000 (08:05 -0700)]
Update to gradle 4.2
Reviewed at https://reviews.apache.org/r/62517/
Keisuke Nishimoto [Thu, 21 Sep 2017 21:33:27 +0000 (14:33 -0700)]
Improve in-process test ZooKeeper support
MesosLogStreamModule tries to connect to ZooKeeper servers specified by
-zk_endpoints even when -zk_in_proc=true. I updated the module to use
injected server endpoints which will be based on the ephemeral port assigned
to ZooKeeperTestServer if -zk_in_proc=true. This required to make
@ServiceDiscoveryBindings.ZooKeeper public.
I also tweaked shutdown process of ServiceDiscoveryModule.TestServerService
so that it won't close existing ZooKeeper connections before clients close the
session. While just delaying the execution by 1 second doesn't really
guarantee that behavior, in practice this achieved clean shutdown of the
scheduler with in-process ZooKeeper server.
Testing Done:
1. Launch Mesos master and slave on my laptop.
2. Launch Aurora scheduler with following arguments:
```
-backup_dir=/var/lib/aurora/backups
-cluster_name=local
-mesos_master_address=localhost:5050
-serverset_path=/aurora/scheduler
-ip=127.0.0.1
-hostname=localhost
-http_port=8081
-zk_in_proc=true
-zk_endpoints=localhost:2181
-native_log_zk_group_path=/aurora/replicated-log
-native_log_file_path=/var/db/aurora
```
3. Observe that there are no ZooKeeper error log outputs caused by missing
endpoint.
4. Create a simple job, observer it launches normally and then kill it.
5. Stop the scheduler by sending /quitquitquit.
6. Observe that scheduler process shuts down normally.
Bugs closed: AURORA-1947
Reviewed at https://reviews.apache.org/r/62423/
Robert Allen [Sun, 17 Sep 2017 22:10:36 +0000 (00:10 +0200)]
Add Houghton Mifflin Harcourt to adopters list
Reviewed at https://reviews.apache.org/r/62347/
David McLaughlin [Wed, 13 Sep 2017 18:31:30 +0000 (11:31 -0700)]
HomePage implemented in Preact
Reviewed at https://reviews.apache.org/r/62135/