summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Ash Berlin-Taylor [Mon, 17 May 2021 08:29:23 +0000 (09:29 +0100)]
Switch to using the built AMI instead of cloud-init
We have packer configued to build an AMI that has everything
pre-installed, so all we have left to do in the cloud init is drop the
env var so that we know which region we are in (without having to query
it every time) and start the runner and log shipper services.
Ash Berlin-Taylor [Mon, 17 May 2021 08:35:46 +0000 (09:35 +0100)]
Start vector log-shipping later, once env var is configured (#28)
Since we have to restart it in the cloud-init anyway (once we know which
region we are in) I have disabled it in the packer build scripts so it
doesn't try to start up on boot too early
Ash Berlin-Taylor [Mon, 10 May 2021 10:47:42 +0000 (11:47 +0100)]
Perform a docker login before starting the actions runner script (#27)
This was done in the cloud-init, but missed from the migration to packer
build scripts.
Ash Berlin-Taylor [Mon, 10 May 2021 08:40:45 +0000 (09:40 +0100)]
Install node in the AMI (#22)
Despite not being in the cloud-init script, it was _somehow_ not causing
a problem, but it not being present in the AMI made production builds
fails
Ash Berlin-Taylor [Fri, 7 May 2021 17:57:04 +0000 (18:57 +0100)]
Send logs to Cloudwatch in the same region, not always to Frankfurt (#25)
Ash Berlin-Taylor [Fri, 7 May 2021 17:56:48 +0000 (18:56 +0100)]
Make AMI available in eu-central-1 and us-east-2 regions (#26)
Ash Berlin-Taylor [Fri, 7 May 2021 14:00:50 +0000 (15:00 +0100)]
Cleanup logs and "build state" from the AMI (#23)
Not doing this doesn't cause any harm, but it is cleaner to not have
this state included in the AMI
Ash Berlin-Taylor [Fri, 7 May 2021 14:00:29 +0000 (15:00 +0100)]
Use the cheaper ASG in Ohio (#24)
Jarek Potiuk [Thu, 6 May 2021 09:34:16 +0000 (11:34 +0200)]
Merge pull request #21 from apache/fix-custom-metric-cron
Fix the custom-Cloudwatch metric cron job in the AMI
Ash Berlin-Taylor [Thu, 6 May 2021 09:22:51 +0000 (10:22 +0100)]
Fix the custom-Cloudwatch metric cron job in the AMI
Jarek Potiuk [Tue, 4 May 2021 11:30:40 +0000 (13:30 +0200)]
Update requirements (#18)
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Ash Berlin-Taylor [Fri, 23 Apr 2021 10:46:49 +0000 (11:46 +0100)]
Don't encrypt the AMI's root snapshot (#17)
We are an open-source project, so we don't need to pay the cost or
complexity of having this, but mainly having an ASG launch this AMI
needs we need to set up a more complex "Service-Linked" IAM role, which
is complexity we just don't need.
Ash Berlin-Taylor [Fri, 23 Apr 2021 09:25:04 +0000 (10:25 +0100)]
Fix runner AMI so it (#16)
- Update to the latest runner version
- Install the vector.toml config file
- Install stop-runner-if-no-job in to correct path
- Don't enable actions.runner service at boot (do it slightly later in
user data)
Mike Hewitt [Thu, 22 Apr 2021 10:41:02 +0000 (06:41 -0400)]
User Packer to build a pre-built AMI with everything we need (#15)
* initial packer and tf
* packer added files a scripts from Ashs repo
* add new folder structure and terraform
* updateing packer files
* added dependencies file permission and apt source repos
* bootstrap and user data
* prepare packer provisioners and set up all files to be executed
* update tinder
* terraform to create packer roles, starting to fill in packer variables
* packer roles added aws backends, terraform reformed and added iam roles as well as autoscaling cloudwatch alarm and policy
* fixed iam role and removed policy attatchments
* first run of packer_roles, terraform add gitignore for terraform
* update packer code from results of validate
* update runner max size of asg
* packer updated to run and terraform roles for packer updated
* Apply suggestions from code review
* Update for pre-commit checks
Add licenses, and remove trailing whitespace
* archieve lambda before upload
* remove terraform for ci infra
* Make the packer build produce a working image.
Summary of changes:
- Files need to be copied to a "staging" folder and then moved in place
- Use the built-in upload ability of the shell provisioner
- Have shell provisioner run scripts with sudo, rather than using sudo
10s of times in the scripts
- Don't set up tmpfs mounts in the AMI -- these have to happen at
instance boot time, not AMI creation
- Preseed the install options for iptables-persistent so that it
installs without asking questions or replacing the rules we already
placed.
- Install the runner-supervisor script from local file, not S3.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Jarek Potiuk [Fri, 2 Apr 2021 17:26:37 +0000 (19:26 +0200)]
Do not pre-bake images in the instance (#13)
The images are cleaned with docker system prune --all anyway
and we save very little (10-20 seconds) and no cost (it's free)
to pull the images as needed from the registry.
Jarek Potiuk [Tue, 23 Mar 2021 11:35:15 +0000 (12:35 +0100)]
Runners more resilient to docker login failure (#12)
Login to docker registry is now done in PreExec and in case it
fails, it also fails the whole service (leading to subsequent
service restart).
Also added `set -eu -o pipefail` to be better protected against
any silent failures.
Ash Berlin-Taylor [Fri, 19 Mar 2021 21:52:10 +0000 (21:52 +0000)]
Update actions.runner to 2.277.1-airflow3 (#11)
This included extra logging and uses `github.actor`, rather than
`github.pull_request.author` for decisions (to match what we use in our
CI.yml file).
Ash Berlin-Taylor [Fri, 19 Mar 2021 21:52:00 +0000 (21:52 +0000)]
Increase logging from actions.runner-supervisor service (#10)
This allows us to have in the logs (and thus searchable in the
CloudWatch Logs) the InstanceId
Ash Berlin-Taylor [Fri, 19 Mar 2021 21:51:52 +0000 (21:51 +0000)]
Strip ANSI escape codes from logs in CloudWatch (#9)
Now that we are included step logs, we need to strip the colour escape
sequences.
Ash Berlin-Taylor [Mon, 15 Mar 2021 14:47:09 +0000 (14:47 +0000)]
Upload job output logs to Cloudwatch too (#8)
We have some cases where logs aren't being uploaded to Github, which
makes debugging failures hard.
This is a problem with GitHub's hosted runners too, but for self-hosted
runners we can at least do something about it.
Ash Berlin-Taylor [Thu, 11 Mar 2021 12:18:07 +0000 (12:18 +0000)]
Add an environment variable to let runners know where they are running (#7)
This makes it easier to set runs-on in our ci.yml workflow
Jarek Potiuk [Wed, 10 Mar 2021 10:09:57 +0000 (11:09 +0100)]
Adds gnu parallel - required to implement semaphores for parallel tests (#6)
Ash Berlin-Taylor [Mon, 1 Mar 2021 11:10:58 +0000 (11:10 +0000)]
Remove left-over docker containers before fixing permissions (#5)
If the docker container is still running and creating files (as might be
the case for the prod image builds) then some files could be left
uncleaned, causing the next job to fail.
Ash Berlin-Taylor [Mon, 1 Mar 2021 10:56:24 +0000 (10:56 +0000)]
User-data script to bootstrap self-hosted runner on ASG (#4)
This runner-supervisor script has been manually uploaded to S3 (it was too big
to include in the userdata)
The cloud init script has been manually uploaded by running, and the ASG
is configured to pick the Latest version already, so new instances will
start using the new script.
```
aws --profile airflow ec2 create-launch-template-version \
--launch-template-name GithubRunner \
--launch-template-data UserData="$(base64 -w0 cloud-init.yml)" \
--source-version='$Latest'
```
Ash Berlin-Taylor [Thu, 18 Feb 2021 09:55:49 +0000 (09:55 +0000)]
Lambda function to scale ASG based on Github webhooks (#2)
Ash Berlin-Taylor [Fri, 15 Jan 2021 14:29:26 +0000 (14:29 +0000)]
Merge pull request #1 from apache/register-runner-script
Add script to help store self-hosted runner creds in AWS SSM
Ash Berlin-Taylor [Fri, 15 Jan 2021 12:47:28 +0000 (12:47 +0000)]
fixup! Add script to help store self-hosted runner creds in AWS SSM
Ash Berlin-Taylor [Tue, 12 Jan 2021 11:54:51 +0000 (11:54 +0000)]
Add script to help store self-hosted runner creds in AWS SSM
We can't create self-hosted runners "on-demand", so we need to
pre-create a "pool" of them for use by the auto-scaled nodes.
This script automated the process of converting the short-lived token in
to long-lived credentials (by using the runner binaries in a temporary
directory) and then storing the resulting files in AWS's ParameterStore
Ash Berlin-Taylor [Mon, 4 Jan 2021 16:10:59 +0000 (16:10 +0000)]
Add readme