"Simplicity is prerequisite for reliability."
— Edsger W. Dijkstra
$ higgsfield init my_llama_project
It creates a folder named my_llama_project
with the following structure:
my_llama_project
├── src
│ ├── __init__.py
│ ├── experiment.py
│ └── config.py
├── Dockerfile
├── env
├── requirements.txt
└── README.md
Get into the project folder:
$ cd my_llama_project
Then start editing the env
file. It should contain the valid SSH key to your training nodes. Make sure the key exists under the given path in your machine.
For example:
$ cat env
SSH_KEY=~/.ssh/id_rsa
Great! Now you should edit the src/config.py
file. It contains your experiments' configuration.
Example
import os
NAME = "my_llama_project"
# You should fill this place with your training nodes IPs
HOSTS = [
"1.2.3.4",
]
# The user name of your training nodes,
# It should be the same for all nodes.
# And it might be different than 'ubuntu'.
HOSTS_USER = "ubuntu"
# The port of your training nodes, same for all nodes.
HOSTS_PORT = 22
# Number of processes per node. Depends on the amount of GPUs you have on each node.
NUM_PROCESSES = 4
# You can list other environment variables here.
WAN_DB_TOKEN = os.environ.get("WAN_DB_TOKEN", None)
You should fill those fields with your own configuration.
You should create a new git repository in Github. Make sure you won't create any README
, .gitignore
or LICENSE
files.
Then follow the first option in the Github page to push an existing repository from your terminal.
Now you should setup your nodes. You can do it running:
$ higgsfield setup-nodes
Which will install all the required tools on your nodes. You might need some patience here, don't worry, it's a one time process. Like this:
$ higgsfield setup-nodes
INSTALLING DOCKER
...
INSTALLING INVOKER
...
SETTING UP DEPLOY KEY
...
PULLING DOCKER IMAGE
But if you're stuck...
But if you're stuck for some reason on this step, because you haven't added your git origin, then you should try to toggle between SSH | HTTPS
options on top of Github page. Then try to run the git remote add origin
command again.
If it's not because of that, then you should try to properly setup your SSH key in env
file along with the config file in src/config.py
.
You're very close to run your first experiment. Take a look at the src/experiment.py
.
@experiment("llama")
@param("size", options=["70b", "13b", "7b"])
def train_llama(params):
print(f"Training llama with size {params.size}")
...
That's exactly the way you will be defining experiments further on. No need for hydra
, argparse
or any other boilerplate code. Just define your experiment, then run the following command:
$ higgsfield build-experiments
Notice anything new? It's a new folder named .github/workflows
with the following structure:
.github
└── workflows
├── run_llama.yml
└── deploy.yml
Curious about them?
These files were exactly intended to be your entrypoint to the simplified deploy of your experiments. Now you can just push your code to Github, and it will automatically deploy the code on your nodes. Not only that, it will also allow you to run your training experiments and save the checkpoints!You should add your SSH_KEY
contents into Github secrets. To achieve that you should go to your Github repository page, then click on Settings
tab, then Secrets
tab, then New repository secret
button. Then add your SSH_KEY
contents as a secret with the name SSH_KEY
.
And add your deploy key into deploy keys. You can get it by running the following command:
$ higgsfield show-deploy-key
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5A000THERESHOULDBEYOURDEPLOYKEYHEREso83os//
Copy the output and add it. You can name it DEPLOY_KEY
.
Push your code:
git add .
git commit -m "Make the way for the LLAMA!"
git push origin main
Now you should go to the Actions
tab in your Github repository page. You should see something like this:
As soon as it turns green (which means it's done), you can go to the left side and run the run_llama.yml
workflow, put any name you like, and click Run workflow
button: