mesos.git
33 hours agoAdded an operation status update manager to the agent. master
Greg Mann [Tue, 11 Dec 2018 21:09:20 +0000 (13:09 -0800)] 
Added an operation status update manager to the agent.

This patch adds an operation status update manager to the agent
in order to handle updates for operations on agent default
resources. A new test is also added which verifies that such
updates are retried.

Later patches will integrate this status update manager with
the agent's checkpointing/recovery code.

Review: https://reviews.apache.org/r/69505/

2 days agoRefactored `LinuxFilesystemIsolator{Test,MesosTest}` tests.
Andrei Budnik [Tue, 11 Dec 2018 00:00:32 +0000 (16:00 -0800)] 
Refactored `LinuxFilesystemIsolator{Test,MesosTest}` tests.

This patch factors out boilerplate code related to initialization of
the agent flags.

Review: https://reviews.apache.org/r/69545/

2 days agoAdded `ROOT_PseudoDevicesWithRootFilesystem` test.
Andrei Budnik [Tue, 11 Dec 2018 00:00:30 +0000 (16:00 -0800)] 
Added `ROOT_PseudoDevicesWithRootFilesystem` test.

This test verifies that pseudo devices like /dev/random are properly
mounted in the container's root filesystem.

Review: https://reviews.apache.org/r/69540/

2 days agoRemoved unnecessary '--all-cpus' option from perf documentation.
Benjamin Mahler [Mon, 10 Dec 2018 22:04:47 +0000 (17:04 -0500)] 
Removed unnecessary '--all-cpus' option from perf documentation.

This flag is the deafult when no target cpus are specified, so
there is no need to include it.

3 days agoAdded UCR bridge network for Mesos Mini.
Jie Yu [Mon, 10 Dec 2018 06:08:54 +0000 (22:08 -0800)] 
Added UCR bridge network for Mesos Mini.

This patch adds the UCR bridge network support for Mesos Mini using CNI
bridge plugin and Mesos port mapper CNI plugin.

Review: https://reviews.apache.org/r/69538

3 days agoMade sure containers runtime dir has device file access.
Jie Yu [Sat, 8 Dec 2018 00:51:19 +0000 (16:51 -0800)] 
Made sure containers runtime dir has device file access.

Make sure that container's runtime dir has device file access.  Some
Linux distributions will mount `/run` with `nodev`, restricting
accessing to device files under `/run`. However, Mesos prepares device
files for containers under container's runtime dir (which is typically
under `/run`) and bind mount into container root filesystems. Therefore,
we need to make sure those device files can be accessed by the
container. We need to do a self bind mount and remount with proper
options if necessary. See MESOS-9462 for more details.

Review: https://reviews.apache.org/r/69532

3 days agoUsed strings::format in os::shell.
Jie Yu [Mon, 10 Dec 2018 03:14:09 +0000 (19:14 -0800)] 
Used strings::format in os::shell.

Previously, `strings::internal::format` was used. It causes issues when
std::string is passed in as parameters. Switched to use
`strings::format` instead in `os::shell` implementation.

Review: https://reviews.apache.org/r/69537

5 days agoFixed a regression in binding GPU container devices.
James Peach [Fri, 7 Dec 2018 21:58:27 +0000 (13:58 -0800)] 
Fixed a regression in binding GPU container devices.

When we changed container devices to be bind mounts, we added an extra
`/dev` path component to the container moount point. This resulted in
devices being mounted as `/dev/dev/nvidia0`.

Review: https://reviews.apache.org/r/69528/

6 days agoAdded a benchmark to compare quota and nonquota allocation performance.
Meng Zhu [Sat, 20 Oct 2018 05:44:04 +0000 (22:44 -0700)] 
Added a benchmark to compare quota and nonquota allocation performance.

This benchmark evaluates the performance difference between nonquota
and quota settings. In both settings, the same allocations are made
for fair comparison. In particular, since the agent will always be
allocated as a whole in nonquota settings, we should also avoid
agent chopping in quota setting as well. Thus in this benchmark,
quotas are only set to be multiples of whole agent resources.
This is also why we have this dedicated benchmark for comparison
rather than extending the existing quota benchmarks (which involves
agent chopping).

Review: https://reviews.apache.org/r/69098

6 days agoAdded an allocator benchmark for quota performance.
Meng Zhu [Sat, 13 Oct 2018 00:28:39 +0000 (17:28 -0700)] 
Added an allocator benchmark for quota performance.

This benchmark evaluates the allocator performance in
the presence of roles with both small quota (which can
be satisfied by half an agent) as well as large quota
(which need resources from two agents). We setup the cluster,
trigger one allocation cycle and measure the elapsed time.

Review: https://reviews.apache.org/r/69097

6 days agoMoved a few allocator test helpers to `tests/allocator.hpp`.
Meng Zhu [Thu, 18 Oct 2018 17:49:38 +0000 (10:49 -0700)] 
Moved a few allocator test helpers to `tests/allocator.hpp`.

This helps to share the helpers between
`hierarchical_allocator_tests.cpp` and
`hierarchical_allocator_benchmarks.cpp`.

Review: https://reviews.apache.org/r/69096

6 days agoRenamed one allocator benchmark to be more descriptive.
Meng Zhu [Fri, 19 Oct 2018 04:45:14 +0000 (21:45 -0700)] 
Renamed one allocator benchmark to be more descriptive.

Renamed `Allocations` to `MultiFrameworkAllocations`.
Also removed `_TestBase` from the fixture name.

Review: https://reviews.apache.org/r/69094

6 days agoRemoved `used` argument in `AgentProfile` in the allocator benchmark.
Meng Zhu [Sat, 13 Oct 2018 00:23:03 +0000 (17:23 -0700)] 
Removed `used` argument in `AgentProfile` in the allocator benchmark.

Currently, it is not easy to initialize frameworks with used
resources in the fixture because when we specify the
`agentProfile`, no framework has been created yet.

Also, we currently do not have any use case for setting `used`
resources during the initialization. We can revisit this once
it becomes necessary.

Review: https://reviews.apache.org/r/69093

6 days agoAdded default arguments to `FrameworkProfile` in allocator benchmark.
Meng Zhu [Sat, 13 Oct 2018 00:04:25 +0000 (17:04 -0700)] 
Added default arguments to `FrameworkProfile` in allocator benchmark.

For frameworks that do not want to configure task launches, we
should provide some default task launch settings to simplify the
benchmark settings.

Review: https://reviews.apache.org/r/69092

7 days agoAdded MESOS-9293 to the 1.7.1 CHANGELOG.
Chun-Hung Hsiao [Wed, 5 Dec 2018 21:10:13 +0000 (13:10 -0800)] 
Added MESOS-9293 to the 1.7.1 CHANGELOG.

7 days agoSet agent and/or resource provider ID in operation status updates.
Benjamin Bannier [Wed, 5 Dec 2018 21:03:09 +0000 (13:03 -0800)] 
Set agent and/or resource provider ID in operation status updates.

This patch sets agent and/or resource provider ID operation status
update messages. This is not always possible, e.g., some operations
might fail validation so that no corresponding IDs can be extracted.

Since operations failing validation are currently directly rejected by
the master without going through a status update manager, they are not
retried either. If a master status update manager for operations is
introduced at a later point it should be possible to forward
acknowledgements for updates to the master's update manager.

Review: https://reviews.apache.org/r/69163/

7 days agoAdded agent and resource provider IDs to operation status messages.
Benjamin Bannier [Wed, 5 Dec 2018 21:03:06 +0000 (13:03 -0800)] 
Added agent and resource provider IDs to operation status messages.

This patch adds agent and resource provider IDs to
`UpdateOperationStatus` and `UpdateOperationStatusMessage`. With that
frameworks are able to reconcile enough information after failover to
construct operation acknowledgements.

We will add code to populate these fields in a follow-up patch.

Review: https://reviews.apache.org/r/69162/

7 days agoMade agent state consistent with forwarded updates.
Benjamin Bannier [Wed, 5 Dec 2018 21:02:59 +0000 (13:02 -0800)] 
Made agent state consistent with forwarded updates.

When the agent handles an `UpdateOperationStatusMessage` from a resource
provider, it injects its own ID which is (at least conceptually) unknown
to the resource provider before forwarding the message to the master,
and also updates its own tracking for the operation.

This patch makes sure that we first mutate the message before handing it
on for updating the internal operation tracking, while previously we
used the unmodified message. Always using the same message reduces error
potential if in future changes we e.g., introduce agent operation status
update managers.

Review: https://reviews.apache.org/r/69458/

7 days agoFixed handling for offer operation updates.
Benjamin Bannier [Wed, 5 Dec 2018 21:01:30 +0000 (13:01 -0800)] 
Fixed handling for offer operation updates.

The handling of offer operation updates introduced in `c946615ec6d`
made use of an update's `latest_status` without making sure that any
value was set. This could lead to situation where an uninitialized
enum value was switched on which would have caused a fatal error at
runtime.

This patch replaces uses of `latest_status` with `state` which does
contain the information we care about. We also adjust the error
logging so we log the value that lead to the error, not some other
value.

Review: https://reviews.apache.org/r/69157/

7 days agoUsed POSIX.1-2001/pax tar format for distribution tarballs.
James Peach [Wed, 5 Dec 2018 21:19:31 +0000 (13:19 -0800)] 
Used POSIX.1-2001/pax tar format for distribution tarballs.

The default tar format used in `make dist` is v7, which only supports
paths of up to 99 bytes in length. This causes errors when building
the CentOS RPM and adding files from 3rd party packages.

Review: https://reviews.apache.org/r/69454/

7 days agoChanged a benign warning log message in slave.cpp to info.
Meng Zhu [Wed, 10 Oct 2018 22:52:45 +0000 (15:52 -0700)] 
Changed a benign warning log message in slave.cpp to info.

Currently, `UpdateFrameworkMessage` is broadcasted by the master
to all agents regardless of whether the framework actually exists
on the agent (see: https://bit.ly/2OiPB4F). So ignoring info
update for framework due to missing framework on the agent is not
unexpected. A warning message would false alarm the user. This
patch changes the log to info to reduce noises.

Review: https://reviews.apache.org/r/68984

7 days agoManually copy test reports to host fs.
Vinod Kone [Fri, 16 Nov 2018 19:26:12 +0000 (13:26 -0600)] 
Manually copy test reports to host fs.

Review: https://reviews.apache.org/r/69513

7 days agoAdded tests to ensure correct quota accounting.
Meng Zhu [Tue, 31 Jul 2018 19:55:34 +0000 (12:55 -0700)] 
Added tests to ensure correct quota accounting.

Added two allocator tests to ensure reserving and
unreserving allocated resources do not affect
quota accounting.

Review: https://reviews.apache.org/r/68138

8 days agoImproved the code comments for `getContainerDevicesPath`.
James Peach [Tue, 4 Dec 2018 21:57:31 +0000 (13:57 -0800)] 
Improved the code comments for `getContainerDevicesPath`.

Review: https://reviews.apache.org/r/69211/

8 days agoApplied the `ContainerMountInfo` protobuf helper.
James Peach [Tue, 4 Dec 2018 21:57:24 +0000 (13:57 -0800)] 
Applied the `ContainerMountInfo` protobuf helper.

Now that there is a protobuf helper to manufacture `ContainerMountInfo`
messages, apply it where appropriate.

Review: https://reviews.apache.org/r/69450/

8 days agoRemoved unnecesssarily verbose container mount logging.
James Peach [Tue, 4 Dec 2018 21:57:14 +0000 (13:57 -0800)] 
Removed unnecesssarily verbose container mount logging.

The logs from making container mounts can be fairly verbose, and
we are primarily interested in failures. This change removes the
default logging, and only logs container mount errors.

Review: https://reviews.apache.org/r/69210/

8 days agoMoved the container root construction to the isolators.
James Peach [Tue, 4 Dec 2018 21:57:00 +0000 (13:57 -0800)] 
Moved the container root construction to the isolators.

Previously, if the container was configured with a root filesytem,
the root was populated by a combination of the `fs::chroot:prepare`
API and the various isolators. The implementation details of some
isolators had leaked into the chroot code, which had a special case
for adding GPU devices.

This change moves all the responsibility for defining the
root filesystem from the `fs::chroot::prepare()` API to the
`filesystem/linux` isolator. The `filesystem/linux` isolator is
now the single place that captures how to mount the container
pseudo-filesystems as well as how to construct a proper `/dev`
directory.

Since the `linux/filesystem` isolator is now entirely responsible
for creating and mounting the container `/dev`, any other isolators
that enable access to devices should populate device nodes in the
container devices directory and add a corresponding bind mount.

Review: https://reviews.apache.org/r/69086/

8 days agoUpdated 'Makefile.am' to make new CLI build step more reliable.
Armand Grillet [Tue, 4 Dec 2018 15:12:33 +0000 (10:12 -0500)] 
Updated 'Makefile.am' to make new CLI build step more reliable.

We list the files in 'MESOS_CLI_SRCDIR' when building the new CLI.
We were previously using 'find' and then removing the files in
the virtual environment but this is not enough as we now also
use tools like 'tox' which creates files we do not care about.

To filter more while simplifying the build step, we now use 'git
ls-files' to get a list of the files we need when building. Git
is a dependency of Apache Mesos and the files we have in the
repository are the only ones required to build the CLI,
making this solution simpler yet more future-proof.

Review: https://reviews.apache.org/r/69084/

11 days agoAdded MESOS-9411 to 1.5.3 CHANGELOG.
Till Toenshoff [Sat, 1 Dec 2018 13:38:40 +0000 (14:38 +0100)] 
Added MESOS-9411 to 1.5.3 CHANGELOG.

11 days agoAdded MESOS-9411 to 1.6.2 CHANGELOG.
Till Toenshoff [Sat, 1 Dec 2018 13:38:29 +0000 (14:38 +0100)] 
Added MESOS-9411 to 1.6.2 CHANGELOG.

11 days agoAdded MESOS-9411 to 1.7.1 CHANGELOG.
Till Toenshoff [Sat, 1 Dec 2018 13:38:13 +0000 (14:38 +0100)] 
Added MESOS-9411 to 1.7.1 CHANGELOG.

11 days agoFixed thread safety issue in jwt signature validation.
Alexander Rojas [Sat, 1 Dec 2018 13:28:14 +0000 (14:28 +0100)] 
Fixed thread safety issue in jwt signature validation.

Fixes the implementation of the OpenSSL utilities which computed an
HMAC 256 signature by making a non thread safe call to the OpenSSL
library.

Review: https://reviews.apache.org/r/69412/

2 weeks agoUsed `OperationID` instead of `string` in test helpers.
Chun-Hung Hsiao [Thu, 15 Nov 2018 09:27:39 +0000 (01:27 -0800)] 
Used `OperationID` instead of `string` in test helpers.

This change makes the helper more consistent with other helpers, and is
more future-proof to changes in `OperationID`.

Review: https://reviews.apache.org/r/69366

2 weeks agoRecovered disk through `CREATE_DISK` in test `AgentRegisteredWithNewId`.
Chun-Hung Hsiao [Thu, 15 Nov 2018 04:27:09 +0000 (20:27 -0800)] 
Recovered disk through `CREATE_DISK` in test `AgentRegisteredWithNewId`.

Test `AgentRegisteredWithNewId` is now improved to exercise the code
path for recovering disks created by the last agent through
`CREATE_DISK`.

Review: https://reviews.apache.org/r/69365

2 weeks agoAdded the `--create_parameters` flag to the test CSI plugin.
Chun-Hung Hsiao [Mon, 19 Nov 2018 23:26:21 +0000 (15:26 -0800)] 
Added the `--create_parameters` flag to the test CSI plugin.

When the flag is specified, `CreateVolume` and `GetCapacity` work only
if the `parameters` argument matches this flag. This will be used to
test the checkpointing of create parameters of CSI volumes in SLRP.

Review: https://reviews.apache.org/r/69364

2 weeks agoFixed `CreateVolume` of the test CSI plugin.
Chun-Hung Hsiao [Mon, 19 Nov 2018 23:18:11 +0000 (15:18 -0800)] 
Fixed `CreateVolume` of the test CSI plugin.

This patch makes sure that `CreateVolume` is idempotent, and check if
the specified volume capability is supported.

Review: https://reviews.apache.org/r/69402

2 weeks agoRefactored the test CSI plugin.
Chun-Hung Hsiao [Mon, 19 Nov 2018 23:10:58 +0000 (15:10 -0800)] 
Refactored the test CSI plugin.

This patch does not introduce any functional change to the test CSI
plugin. It simply refactored the check for the default mount volume
capability and the parsing of `--volumes` flag.

Review: https://reviews.apache.org/r/69400

2 weeks agoAdded MESOS-9275 to the 1.7.1 CHANGELOG.
Chun-Hung Hsiao [Wed, 21 Nov 2018 03:28:47 +0000 (19:28 -0800)] 
Added MESOS-9275 to the 1.7.1 CHANGELOG.

2 weeks agoCheckpointed creation parameters for CSI volumes.
Chun-Hung Hsiao [Fri, 16 Nov 2018 00:39:59 +0000 (16:39 -0800)] 
Checkpointed creation parameters for CSI volumes.

The parameters of CSI volumes created by SLRPs are now checkpointed, and
used to validate volumes created from previous SLRP runs.

Review: https://reviews.apache.org/r/69362

2 weeks agoCleaned up `include/mesos/type_utils.hpp`.
Chun-Hung Hsiao [Thu, 15 Nov 2018 04:13:53 +0000 (20:13 -0800)] 
Cleaned up `include/mesos/type_utils.hpp`.

This patch does the following cleanups:

1. Moved `google::protobuf::Map` equality operator to `type_utils.hpp`.
2. Moved the type helper templates for the protobuf library that do not
   involve mesos protobufs into the `google::protobuf` namespaces so ADL
   works appropriately.
3. Removed the type helper templates for the protobuf library from
   `mesos/v1/mesos.hpp` to avoid redefinition.

Review: https://reviews.apache.org/r/69363

2 weeks agoAdded validation for `Offer.Operation.CreateDisk.target_profile`.
Chun-Hung Hsiao [Thu, 15 Nov 2018 04:34:51 +0000 (20:34 -0800)] 
Added validation for `Offer.Operation.CreateDisk.target_profile`.

Review: https://reviews.apache.org/r/69356

2 weeks agoImplemented the new `CREATE_DISK`/`DESTROY_DISK` semantics in SLRP.
Chun-Hung Hsiao [Thu, 15 Nov 2018 20:25:18 +0000 (12:25 -0800)] 
Implemented the new `CREATE_DISK`/`DESTROY_DISK` semantics in SLRP.

The default mount/block volume capabilities is removed from SLRP.
Instead, `CREATE_DISK` will convert a preprovisioned RAW disk to a
profile disk, and `DESTROY_DISK` will always deprovision a profile disk
as long as the CSI plugin is capable of deprovisioning volumes.

Review: https://reviews.apache.org/r/69361

2 weeks agoRewrote test `ConvertPreExistingVolume` for `CREATE_DISK`.
Chun-Hung Hsiao [Sun, 11 Nov 2018 00:56:36 +0000 (16:56 -0800)] 
Rewrote test `ConvertPreExistingVolume` for `CREATE_DISK`.

Due to the changes of the `CREATE_DISK` semantics, this test is
rewritten to convert a preprovisioned volume to a profile volumes, and
then to destroy it to return the space back to the storage pool.

NOTE: The updated test will fail unless r/69361 (which implements the
new `CREATE_DISK` semantics) is also applied.

Review: https://reviews.apache.org/r/69360

2 weeks agoRewrote test `ReconcileDroppedOperation` for `CREATE_DISK`.
Chun-Hung Hsiao [Thu, 15 Nov 2018 05:03:19 +0000 (21:03 -0800)] 
Rewrote test `ReconcileDroppedOperation` for `CREATE_DISK`.

Previously the `ReconcileDroppedOperation` test relies on converting
preprovisioned volumes. To adapt the new semantics for `CREATE_DISK`,
this test is rewritten to create two disks from a storage pool, with one
operation dropped.

Review: https://reviews.apache.org/r/69359

2 weeks agoAdded profiles to storage pools in tests for `CREATE_DISK`.
Chun-Hung Hsiao [Thu, 15 Nov 2018 09:18:04 +0000 (01:18 -0800)] 
Added profiles to storage pools in tests for `CREATE_DISK`.

This patch adds a new `targetProfile` parameter to the `CREATE_DISK`
test helper, and add profiles to all storage pools in tests, to adhere
to the new semantics of `CREATE_DISK`.

Review: https://reviews.apache.org/r/69357

2 weeks agoChanged the semantics of `CREATE_DISK` and `DESTROY_DISK` operations.
Chun-Hung Hsiao [Tue, 16 Oct 2018 03:15:41 +0000 (20:15 -0700)] 
Changed the semantics of `CREATE_DISK` and `DESTROY_DISK` operations.

The semantics of these two operations has been updated to provide
primitives to import CSI volumes and recover CSI volumes against agent
ID changes and metadata loss.

Review: https://reviews.apache.org/r/69036

2 weeks agoFixed Mesos-Tidy warning: use of redundant 'get'.
Chun-Hung Hsiao [Thu, 29 Nov 2018 03:43:50 +0000 (19:43 -0800)] 
Fixed Mesos-Tidy warning: use of redundant 'get'.

2 weeks agoFixed a flake in test `MasterTest.ExecutorMessageToRecoveredHttpFramework`.
Chun-Hung Hsiao [Thu, 29 Nov 2018 02:43:36 +0000 (18:43 -0800)] 
Fixed a flake in test `MasterTest.ExecutorMessageToRecoveredHttpFramework`.

2 weeks agoAdded MESOS-9419 to the 1.4.3 CHANGELOG.
Chun-Hung Hsiao [Wed, 28 Nov 2018 18:21:01 +0000 (10:21 -0800)] 
Added MESOS-9419 to the 1.4.3 CHANGELOG.

2 weeks agoAdded MESOS-9419 to the 1.5.3 CHANGELOG.
Chun-Hung Hsiao [Wed, 28 Nov 2018 18:20:29 +0000 (10:20 -0800)] 
Added MESOS-9419 to the 1.5.3 CHANGELOG.

2 weeks agoAdded MESOS-9419 to the 1.6.2 CHANGELOG.
Chun-Hung Hsiao [Wed, 28 Nov 2018 18:18:52 +0000 (10:18 -0800)] 
Added MESOS-9419 to the 1.6.2 CHANGELOG.

2 weeks agoAdded MESOS-9418 to the 1.6.2 CHANGELOG.
James Peach [Tue, 27 Nov 2018 19:11:47 +0000 (11:11 -0800)] 
Added MESOS-9418 to the 1.6.2 CHANGELOG.

2 weeks agoAdded MESOS-9419 to the 1.7.1 CHANGELOG.
Chun-Hung Hsiao [Wed, 28 Nov 2018 18:18:20 +0000 (10:18 -0800)] 
Added MESOS-9419 to the 1.7.1 CHANGELOG.

2 weeks agoAdded MESOS-9418 to the 1.7.1 CHANGELOG.
James Peach [Tue, 27 Nov 2018 17:59:33 +0000 (09:59 -0800)] 
Added MESOS-9418 to the 1.7.1 CHANGELOG.

2 weeks agoAdded a test for executors sending messages to recovered frameworks.
Chun-Hung Hsiao [Tue, 27 Nov 2018 21:53:03 +0000 (13:53 -0800)] 
Added a test for executors sending messages to recovered frameworks.

This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
ensure that a master will not crash when forwarding an executor message
to a recovered framework to prevent any future regression of MESOS-9419.

Review: https://reviews.apache.org/r/69452

2 weeks agoMade the `createTask` helper work for both v0 and v1 API.
Chun-Hung Hsiao [Tue, 27 Nov 2018 21:50:37 +0000 (13:50 -0800)] 
Made the `createTask` helper work for both v0 and v1 API.

The patched `createTask` helper originally uses the `slave_id` accessor,
which does not apply to the v1 API. This patch fixes this problem.

Review: https://reviews.apache.org/r/69464

2 weeks agoFixed master crash when executors send messages to recovered frameworks.
Chun-Hung Hsiao [Tue, 27 Nov 2018 04:12:36 +0000 (20:12 -0800)] 
Fixed master crash when executors send messages to recovered frameworks.

The `Framework::send` function assumes that either `http` or `pid` is
set, which is not true for a framework that hasn't yet reregistered yet
but recovered from a reregistered agent. As a result, the master would
crash when a recovered executor tries to send a message to such a
framework (see MESOS-9419). This patch fixes this crash bug.

Review: https://reviews.apache.org/r/69451

2 weeks agoFixed some incorrect CentOS RPM build settings.
se choi [Wed, 28 Nov 2018 03:39:11 +0000 (19:39 -0800)] 
Fixed some incorrect CentOS RPM build settings.

Updated 'mesos-init-wrapper' execute permission, and specified
the master and agent work directories correctly.

This closes #319

2 weeks agoAdded the `DISCARD` blkio cgroup operation.
James Peach [Tue, 27 Nov 2018 17:08:40 +0000 (09:08 -0800)] 
Added the `DISCARD` blkio cgroup operation.

The Linux 4.19 kernel added a new `Discard` operation to the blkio
cgroup statistics. Added this new operation so that the containerizer
won't fail to parse blkio statistics on the latest kernels.

Review: https://reviews.apache.org/r/69449/

2 weeks agoUpdated '--min_allocatable_resources' recommendation to 0.1 cpus.
Benjamin Mahler [Mon, 26 Nov 2018 16:48:28 +0000 (11:48 -0500)] 
Updated '--min_allocatable_resources' recommendation to 0.1 cpus.

2 weeks agoAdded '--all' flag to 'mesos task list'.
Armand Grillet [Mon, 26 Nov 2018 13:55:03 +0000 (08:55 -0500)] 
Added '--all' flag to 'mesos task list'.

With this option, the command is able to show all the tasks that have
ever been run. This makes the command's behavior closer to the one of
'docker ps -a'.

Review: https://reviews.apache.org/r/69395/

2 weeks agoReplaced CLI test helper function 'running_tasks' by 'wait_for_task'.
Armand Grillet [Mon, 26 Nov 2018 13:47:25 +0000 (08:47 -0500)] 
Replaced CLI test helper function 'running_tasks' by 'wait_for_task'.

Replaces 'running_tasks(master)', a function that was not generic nor
explicit, by 'wait_for_task(master, name, state, delay)'. This helper
function waits a 'delay' for a task with a given 'name' to be in a
certain 'state'.

All uses of 'running_tasks' have been replaced by the new function.

Review: https://reviews.apache.org/r/69426/

2 weeks agoFixed name of task created when running mesos-cli-tests.
Armand Grillet [Mon, 26 Nov 2018 13:22:58 +0000 (08:22 -0500)] 
Fixed name of task created when running mesos-cli-tests.

Review: https://reviews.apache.org/r/69425/

2 weeks agoUpdated 'mesos task list' to only display running tasks.
Armand Grillet [Mon, 26 Nov 2018 13:14:42 +0000 (08:14 -0500)] 
Updated 'mesos task list' to only display running tasks.

Review: https://reviews.apache.org/r/69394/

2 weeks agoRemoved some unnecessary intermediate build variables.
James Peach [Thu, 22 Nov 2018 18:12:40 +0000 (10:12 -0800)] 
Removed some unnecessary intermediate build variables.

Review: https://reviews.apache.org/r/69324/

2 weeks agoRemoved separate automake variables for header sources.
James Peach [Thu, 22 Nov 2018 18:06:19 +0000 (10:06 -0800)] 
Removed separate automake variables for header sources.

In some cases, we were using separate automake sources variables
to list source and header files, and in other cases, we just added
both to the same variable. Standardize on the latter form since it
makes it easier to see where files are listed.

Review: https://reviews.apache.org/r/69323/

2 weeks agoAllowed for unbundled leveldb in CMake builds.
Till Toenshoff [Sun, 25 Nov 2018 19:52:32 +0000 (20:52 +0100)] 
Allowed for unbundled leveldb in CMake builds.

Review: https://reviews.apache.org/r/69444/

2 weeks agoUpdated 'REPOSITORY_URL' in 'support/reviewboardrc' to use gitbox.
Armand Grillet [Sat, 24 Nov 2018 13:30:54 +0000 (14:30 +0100)] 
Updated 'REPOSITORY_URL' in 'support/reviewboardrc' to use gitbox.

The Apache Mesos repository was moved from the "git-wip" git server to
the new "gitbox" server earlier in 2018. This change is now also made
in 'support/reviewboardrc', a file used to generate the file
'.reviewboardrc' at the root of the project directory.

Review: https://reviews.apache.org/r/69442/

2 weeks agoSkipped an rlimit test if the environment is incompatible.
Benjamin Bannier [Sat, 24 Nov 2018 01:18:45 +0000 (02:18 +0100)] 
Skipped an rlimit test if the environment is incompatible.

Review: https://reviews.apache.org/r/69438/

2 weeks agoAdded a function to get rlimits.
Benjamin Bannier [Sat, 24 Nov 2018 01:18:35 +0000 (02:18 +0100)] 
Added a function to get rlimits.

Review: https://reviews.apache.org/r/67136/

2 weeks agoAdded guards around convertStringToInt to prevent warning.
Till Toenshoff [Sat, 24 Nov 2018 00:33:49 +0000 (01:33 +0100)] 
Added guards around convertStringToInt to prevent warning.

Review: https://reviews.apache.org/r/69439/

2 weeks agoUpdated 'mesos task list' to display a 'State' field.
Armand Grillet [Thu, 22 Nov 2018 16:17:43 +0000 (11:17 -0500)] 
Updated 'mesos task list' to display a 'State' field.

Review: https://reviews.apache.org/r/69393/

2 weeks agoChanged 'docs/cli.md' to include comments from the review request.
Kevin Klues [Thu, 22 Nov 2018 16:14:00 +0000 (11:14 -0500)] 
Changed 'docs/cli.md' to include comments from the review request.

Commit c132007 was accidentally pushed before applying these changes to
the diff. We are adding on this patch commit to fix that.

2 weeks agoAdded docs describing how to use the new CLI.
Armand Grillet [Thu, 22 Nov 2018 15:33:28 +0000 (10:33 -0500)] 
Added docs describing how to use the new CLI.

The documentation describes the main commands of the new CLI, how to
activate it, how to build Mesos including this component, and how to
write a configuration file for it.

Review: https://reviews.apache.org/r/69390/

3 weeks agoAdded MESOS-9317 to the 1.5.3 CHANGELOG.
Till Toenshoff [Tue, 20 Nov 2018 16:35:52 +0000 (17:35 +0100)] 
Added MESOS-9317 to the 1.5.3 CHANGELOG.

3 weeks agoAdded MESOS-9317 to the 1.6.2 CHANGELOG.
Till Toenshoff [Tue, 20 Nov 2018 16:35:15 +0000 (17:35 +0100)] 
Added MESOS-9317 to the 1.6.2 CHANGELOG.

3 weeks agoAdded MESOS-9317 to the 1.7.1 CHANGELOG.
Till Toenshoff [Tue, 20 Nov 2018 15:55:59 +0000 (16:55 +0100)] 
Added MESOS-9317 to the 1.7.1 CHANGELOG.

3 weeks agoAdded MasterActorResponsiveness_BENCHMARK_Test.
Alexander Rukletsov [Sun, 18 Nov 2018 04:09:39 +0000 (05:09 +0100)] 
Added MasterActorResponsiveness_BENCHMARK_Test.

See summary.

Review: https://reviews.apache.org/r/68131/

3 weeks agoAdded test for ACCESS_MESOS_LOG authorization.
Till Toenshoff [Tue, 20 Nov 2018 13:46:25 +0000 (14:46 +0100)] 
Added test for ACCESS_MESOS_LOG authorization.

Review: https://reviews.apache.org/r/69386/

3 weeks agoRefactored createSubject and authorizeLogAccess to common/authorization.
Till Toenshoff [Tue, 20 Nov 2018 13:46:11 +0000 (14:46 +0100)] 
Refactored createSubject and authorizeLogAccess to common/authorization.

Moves 'createSubject' out of common/http into common/authorization.

Removes duplicate 'authorizeLogAccess' out of master.cpp and slave.cpp.
Introduces 'authorizeLogAccess' within common/authorization.

Review: https://reviews.apache.org/r/69385/

3 weeks agoIntroduced common/authorization and refactored collectAuthorizations.
Till Toenshoff [Tue, 20 Nov 2018 13:46:00 +0000 (14:46 +0100)] 
Introduced common/authorization and refactored collectAuthorizations.

Adds a new collection of authorization specific helper/s to reduce code
duplication and increase efficient test coverage.

Moves the newly introduced 'collectAuthorizations' helper into this new
authorization source unit.

Review: https://reviews.apache.org/r/69384/

3 weeks agoAdded collectAuthorizations helper to master.hpp.
Till Toenshoff [Tue, 20 Nov 2018 13:45:50 +0000 (14:45 +0100)] 
Added collectAuthorizations helper to master.hpp.

Adds the helper function 'collectAuthorizations' to master.hpp. This
function allows for a simple way to collect authorization futures and
only if all supplied futures result in an approved authorization will
the returned future return true.

All identified areas that were formally triggering MESOS-9317 are
being updated to make use of this new helper.

A helper function has been chosen and preferred over copying this
pattern into the areas that needed a fix to allow for an efficient and
complete test coverage.

Additionally we are adding a test validating that new helper.

Review: https://reviews.apache.org/r/69369/

3 weeks agoAdded test reproducing crash on authorization failure.
Till Toenshoff [Tue, 20 Nov 2018 13:45:38 +0000 (14:45 +0100)] 
Added test reproducing crash on authorization failure.

This test reproduces the scenario as described in MESOS-9317. The test
attempts to create a persistent volume by a web request to the
authorized V1 operator endpoint. The test assures that the underlying
authorization request fails as it can in production due to failures in
the authorization backend.

Without fixing MESOS-9317, this test crashes the master process as the
code-path involved will attempt to access the contents of the awaited
future even though the future had failed.

Review: https://reviews.apache.org/r/69368/

3 weeks agoAdded --min_allocatable_resources to the multi-scheduler documentation.
Benjamin Mahler [Mon, 19 Nov 2018 20:43:04 +0000 (12:43 -0800)] 
Added --min_allocatable_resources to the multi-scheduler documentation.

3 weeks agoAdded blog post for Mesos Mini.
Jie Yu [Mon, 19 Nov 2018 05:44:50 +0000 (21:44 -0800)] 
Added blog post for Mesos Mini.

Review: https://reviews.apache.org/r/69377/

3 weeks agoUpdated PyInstaller requirement for new CLI to support Python 3.7.
Armand Grillet [Mon, 19 Nov 2018 16:30:43 +0000 (11:30 -0500)] 
Updated PyInstaller requirement for new CLI to support Python 3.7.

Review: https://reviews.apache.org/r/69255/

3 weeks agoUpdated configuration docs describing how to build the new CLI.
Armand Grillet [Mon, 19 Nov 2018 15:50:47 +0000 (10:50 -0500)] 
Updated configuration docs describing how to build the new CLI.

Review: https://reviews.apache.org/r/69381/

3 weeks agoAdded configuration docs describing how to use Python 3.
Armand Grillet [Mon, 19 Nov 2018 15:49:28 +0000 (10:49 -0500)] 
Added configuration docs describing how to use Python 3.

For Autotools, this means how to use 'PYTHON_3' and 'PYTHON_3_VERSION'.
For CMake, this means how to use '-DPYTHON_3'.

Review: https://reviews.apache.org/r/69380/

3 weeks agoUpdated new CLI test step to use binary created by PyInstaller.
Armand Grillet [Mon, 19 Nov 2018 15:41:13 +0000 (10:41 -0500)] 
Updated new CLI test step to use binary created by PyInstaller.

The integration tests for the new CLI running while building Mesos now
directly use the binary created during the build. That way we make sure
that the binary created using PyInstaller is usable, which is the
artifact that we want to distribute to users in the future.

Previously, we were only activating the virtual environment to run the
tests thus the binary created by PyInstaller was never properly tested.
To use the binary created by PyInstaller, we simply update the PATH
before running 'mesos-cli-tests'.

Review: https://reviews.apache.org/r/69374/

3 weeks agoAdded a test `ROOT_UNPRIVILEGED_USER_SandboxOwnership`.
Qian Zhang [Mon, 19 Nov 2018 03:49:17 +0000 (11:49 +0800)] 
Added a test `ROOT_UNPRIVILEGED_USER_SandboxOwnership`.

Review: https://reviews.apache.org/r/69389

3 weeks agoAdded unit tests for Stout `path::normalize` function in POSIX.
Jason Lai [Mon, 19 Nov 2018 05:12:28 +0000 (21:12 -0800)] 
Added unit tests for Stout `path::normalize` function in POSIX.

Review: https://reviews.apache.org/r/68832/

3 weeks agoAdded Stout `path::normalize` function for POSIX paths.
Jason Lai [Mon, 19 Nov 2018 05:12:06 +0000 (21:12 -0800)] 
Added Stout `path::normalize` function for POSIX paths.

Added `path::normalize` to normalize a given pathname and remove
redundant separators and up-level references.

This function follows the rules described in `path_resolution(7)`
for Linux. However, it only performs pure lexical processing without
touching the actual filesystem.

Review: https://reviews.apache.org/r/65811/

3 weeks agoFixed an issue about inheriting user for nested containers.
Qian Zhang [Sat, 17 Nov 2018 09:28:48 +0000 (17:28 +0800)] 
Fixed an issue about inheriting user for nested containers.

Previously we inherited user from parent container for nested
containers in `MesosContainerizerProcess::_launch`, but that
is too late which will cause an issue that the nested container
is launched as a non-root user but its sandbox directory is
created with root as owner (suppose there is no user specified
in the nested container's `commandInfo` and the default executor
is launched as a non-root user), so the nested container will not
have the permission to write to its own sandbox.

In this patch, we inherit user for nested containers in an earlier
place (i.e., `MesosContainerizerProcess::launch`) to avoid the
above issue.

Review: https://reviews.apache.org/r/69376

3 weeks agoReplaced a log consensus `CHECK()` with CHECK_GE()`.
James Peach [Fri, 16 Nov 2018 22:32:19 +0000 (14:32 -0800)] 
Replaced a log consensus `CHECK()` with CHECK_GE()`.

In scale testing, this `CHECK` failed with the following message:

  F1116 17:50:04.868387 53766 consensus.cpp:771] Check failed:
    highestNackProposal >= proposal

Emitting the values for `highestNackProposal` and `proposal` may
help in debugging this failure.

Review: https://reviews.apache.org/r/69373/

3 weeks agoFixed xml test report path.
Vinod Kone [Fri, 16 Nov 2018 19:54:17 +0000 (13:54 -0600)] 
Fixed xml test report path.

3 weeks agoImproved log messages in master when adding/removing tasks/executors.
Vinod Kone [Thu, 15 Nov 2018 21:01:18 +0000 (15:01 -0600)] 
Improved log messages in master when adding/removing tasks/executors.

Made the log messages and the calling sites consistent and also added
one for adding an executor.

Review: https://reviews.apache.org/r/61128/

3 weeks agoReverted xml output file location.
Vinod Kone [Thu, 15 Nov 2018 23:57:21 +0000 (17:57 -0600)] 
Reverted xml output file location.

Having google test directly write XML reports into the mounted /SRC
directory is giving intermittent "Unable to open file" error.

This change will have google test write xml output into build directory
inside the container for now. Note that since build directory is cleaned
up as part of the docker image, the output reports wont be accessible
to Jenkins.

4 weeks agoAdded MESOS-7574 to 1.5.3, 1.6.2, and 1.7.1 CHANGELOGs.
Joseph Wu [Wed, 14 Nov 2018 21:17:39 +0000 (13:17 -0800)] 
Added MESOS-7574 to 1.5.3, 1.6.2, and 1.7.1 CHANGELOGs.

4 weeks agoFixed flaky agent reconfiguration test.
Benno Evers [Wed, 14 Nov 2018 20:15:58 +0000 (12:15 -0800)] 
Fixed flaky agent reconfiguration test.

Removed some flakyness from the test
SlaveRecoveryTest.AgentReconfigurationWithRunningTask
by removing the `refuse_offers` filter and by pausing
the clock during the test.

Review: https://reviews.apache.org/r/69273/

4 weeks agoAdded `FetcherCacheTest.LocalCachedMissing` test.
Andrei Budnik [Wed, 14 Nov 2018 19:53:33 +0000 (11:53 -0800)] 
Added `FetcherCacheTest.LocalCachedMissing` test.

This test verifies that the fetcher retries downloading URI when the
cache file is missing.

Review: https://reviews.apache.org/r/69172/