This is the legacy workflow for production Crater. It's going to be replaced with bot-controlled Crater, and it's meant to be used only by infra team members.
There are three 'official' Crater machines:
- cargobomb-test (54.177.234.51) - 1 core, 4GB RAM, for experimenting
- cargobomb-try (54.241.86.211) - 8 core, 30GB RAM, for doing PR runs
- cargobomb-prod (54.177.126.219) - 8 core, 30GB RAM, for doing beta runs (but can do PR runs if free)
These can only be accessed via the bastion - you ssh
to the bastion,
then ssh
to the Crater machine. The bastion has restricted access
and you will need a static IP address (if you have a long-running server
in the cloud, that's usually fine) and a public SSH key (you should add
the key to github and then link to https://github.com/yourusername.keys,
once you have access to the bastion you can manage your own keys).
With these two pieces of information in hand, ask acrichto to
add you to the bastion and all three machines and they'll let you know
the bastion IP. You can now either edit your ~/.ssh/config
on your
static IP machine to contain
Host rust-bastion
# Bastion IP below
HostName 0.0.0.0
User bastionusername
Host cargobomb-test
HostName 54.177.234.51
ProxyCommand ssh -q rust-bastion nc -q0 %h 22
User ec2-user
# [...and so on for cargobomb-try and cargobomb-prod...]
which will let you do ssh cargobomb-test
etc from your static IP
machine. If you have a recent OpenSSH, you can use ProxyJump
instead.
The Crater servers use a terminal multiplexer (a way to keep multiple
terminals running on a server). Enter the multiplexer by logging onto a
server and running byobu
. You'll notice a bit of text along the
bottom saying something like "0:master 1:tc1 2:tc2" - these are
the 'windows' in the terminal multiplexer. The one highlighted and with a
*
next to it is the current window. Sending commands to the multiplexer
is achieved by pressing Ctrl+Z, and then another key.
Some useful operations:
- Ctrl+Z d - detach from the multiplexer (or you can just close your terminal)
- Ctrl+Z 0 - switch to window 0 (or any other number)
- Ctrl+Z PageUp - scroll upwards on the terminal. This will enter a sort of 'scrolling mode', so you can use PageUp and PageDown freely (to the limit of terminal scrollback). To return to normal terminal mode, hit Ctrl+C - be sure to only press it once, or you risk returning to normal mode and then killing the process running in the current terminal!
- Ctrl+Z c - create a new window, useful if you accidentally closed one
- Ctrl+Z , - rename a window, useful after recreating an accidentally closed window (hit enter to accept new name)
On your day for Crater triage, open the sheet. Click the top left cell and make sure every PR on that list has an entry on the sheet and make sure every row on the sheet without 'Complete' or 'Failed' is listed on the GitHub search. You may need to update PR tags or add rows to the sheet as appropriate.
Next, you should follow the steps below for eachrequested run on the sheet that does not have a status of 'Complete' or 'Failed'.
- Pending
- Log onto appropriate box and connect to multiplexer by running
byobu
. - Double check each multiplexer window to make sure nothing is running.
- Switch to the
master
multiplexer window. - Run
docker ps
to make sure no containers are running. - Run
df -h /home/ec2-user/crater/work
, disk usage should be <250GB of the 1TB disk (a full run may consume 600GB)- If disk usage is greater, there are probably target directories
left over from a previous run. Run
du -sh work/local/target-dirs/*
, find the culprit (likely a directory with >100GB). - The directory name is the name of an experiment, e.g. MY_EX, so run
cargo run --release -- delete-all-target-dirs --ex MY_EX
.
- If disk usage is greater, there are probably target directories
left over from a previous run. Run
- Run
docker ps -aq | xargs --no-run-if-empty docker rm
to clean up all terminated Docker containers. - Run
git stash && git pull && git stash pop
to get the latest Crater changes. If this fails, it means there were local changes that conflict with upstream changes. Ping aidanhs and tomprince on IRC. - Run
cargo run --release -- prepare-local
. This may take between 5s and 5min, depending on what needs doing. - Log
EX_NAME
,EX_START
andEX_END
in the spreadsheet, where:- If doing a run for PR 12345,
EX_NAME
ispr-12345
,EX_END
istry#deadbeef2...
(deadbeef2
is in the bors comment "Trying commitabcdef
with mergedeadbeef2
" - click through and copy from the URL to get the full commitish) andEX_START
ismaster#deadbeef1...
(deadbeef1
is on the page you clicked through to getdeadbeef2...
, just below the commit message, the left hand commit of "2 parentsdeadbeef1
andbcdef1
" - click through and copy from the URL to get the full commitish, make sure the commit is an auto merge from bors). Just to emphasise, the second commitish you copied goes inEX_START
. - If doing a beta run,
EX_NAME
isstable-STABLE_VERSION-beta-BETA_VERSION
,EX_START
isLAST_STABLE
andEX_END
isBETA_DATE
.STABLE_VERSION
is the version number fromcurl -sSL static.rust-lang.org/dist/channel-rust-stable.toml | grep -A1 -F '[pkg.rust]'
,BETA_VERSION
is the version number fromcurl -sSL static.rust-lang.org/dist/channel-rust-beta.toml | grep -A1 -F '[pkg.rust]'
andBETA_DATE
is the date fromcurl -sSL static.rust-lang.org/dist/channel-rust-beta.toml | grep '^date ='
(it is not necessarily the same date as retrieved in theBETA_VERSION
command).
- If doing a run for PR 12345,
- Run
cargo run --release -- define-ex --crate-select=full --ex EX_NAME EX_START EX_END
. This will complete in a few seconds. - Run
cargo run --release -- run-graph --threads 8 --ex EX_NAME
. - Change status to 'Running'.
- Update either the PR or the person requesting the run to let them know the run has started.
- Go to next run.
- Log onto appropriate box and connect to multiplexer by running
- Running
- Log onto appropriate box and connect to multiplexer.
- Switch to the
master
multiplexer window. - If the run is ongoing go to next run.
- Run
du -sh work/ex/EX_NAME
, output should be <2GB. If not:- Run
find work/ex/EX_NAME -type f -size +100M | xargs --no-run-if-empty du -sh
, there will likely only be a couple of files listed and they should be in theres
directory. - Run
find work/ex/EX_NAME -type f -size +100M | xargs truncate --size='<100M'
. - Check
du -sh work/ex/EX_NAME
is now an appropriate size.
- Run
- Run
cargo run --release -- publish-report --ex EX_NAME s3://cargobomb-reports/EX_NAME
. - Change status to 'Uploading'.
- (optional but much appreciated: come back to this run in 30mins as the upload will be complete)
- Go to next run.
- Uploading
- Switch to the
master
multiplexer window. - If the upload is ongoing, go to the next run.
- If the upload failed, fix it. Known errors:
<Error><Code>InternalError</Code><Message>...
- probably an s3 failure, try running upload again.
- Run
cargo run --release -- delete-all-target-dirs --ex EX_NAME
. This will take ~2min. - Change status to 'Complete' and add the results link,
http://cargobomb-reports.s3.amazonaws.com/EX_NAME/index.html
. - Update either the PR or the person requesting the beta run. Template is:
Hi X (crater requester), Y (PR reviewer)! Crater results are at: <url>. 'Blacklisted' crates (spurious failures etc) can be found [here](https://github.com/rust-lang/crater/blob/master/config.toml). If you see any spurious failures not on the list, please make a PR against that file.
(interested observers: Crater is a tool for testing the impact of changes on parts of the Rust ecosystem. You can find out more at the [repo](https://github.com/rust-lang/crater/) if you're curious)
- Give yourself a pat on the back! Good job!
- Go to next run.
- Switch to the
(The runs can be stopped and restarted at any time. - really? How? asks aidanhs)