Setup tutorial

"Simplicity is prerequisite for reliability."
— Edsger W. Dijkstra

Initialize the project

$ higgsfield init my_llama_project

It creates a folder named my_llama_project with the following structure:

my_llama_project
├── src 
│   ├── __init__.py
│   ├── experiment.py
│   └── config.py
├── Dockerfile
├── env
├── requirements.txt
└── README.md

Setup the environment

Get into the project folder:

$ cd my_llama_project

Then start editing the env file. It should contain the valid SSH key to your training nodes. Make sure the key exists under the given path in your machine. For example:

$ cat env
SSH_KEY=~/.ssh/id_rsa

Great! Now you should edit the src/config.py file. It contains your experiments' configuration.

Example

import os

NAME = "my_llama_project"

# You should fill this place with your training nodes IPs
HOSTS = [
    "1.2.3.4", 
]

# The user name of your training nodes, 
# It should be the same for all nodes.
# And it might be different than 'ubuntu'.
HOSTS_USER = "ubuntu" 

# The port of your training nodes, same for all nodes.
HOSTS_PORT = 22

# Number of processes per node. Depends on the amount of GPUs you have on each node.
NUM_PROCESSES = 4

# You can list other environment variables here.
WAN_DB_TOKEN = os.environ.get("WAN_DB_TOKEN", None)

You should fill those fields with your own configuration.

Setup git

You should create a new git repository in Github. Make sure you won't create any README, .gitignore or LICENSE files.

Just an empty repository.

Then follow the first option in the Github page to push an existing repository from your terminal.

Details screen.

Time to setup your nodes!

Now you should setup your nodes. You can do it running:

$ higgsfield setup-nodes

Which will install all the required tools on your nodes. You might need some patience here, don't worry, it's a one time process. Like this:

$ higgsfield setup-nodes
INSTALLING DOCKER
...
INSTALLING INVOKER
...
SETTING UP DEPLOY KEY
...
PULLING DOCKER IMAGE

But if you're stuck...

But if you're stuck for some reason on this step, because you haven't added your git origin, then you should try to toggle between SSH | HTTPS options on top of Github page. Then try to run the git remote add origin command again. If it's not because of that, then you should try to properly setup your SSH key in env file along with the config file in src/config.py.

Run your very first experiment

You're very close to run your first experiment. Take a look at the src/experiment.py.

@experiment("llama")
@param("size", options=["70b", "13b", "7b"])
def train_llama(params):
    print(f"Training llama with size {params.size}")
    ...

That's exactly the way you will be defining experiments further on. No need for hydra, argparse or any other boilerplate code. Just define your experiment, then run the following command:

$ higgsfield build-experiments

Notice anything new? It's a new folder named .github/workflows with the following structure:

.github
└── workflows
    ├── run_llama.yml
    └── deploy.yml

Curious about them?

These files were exactly intended to be your entrypoint to the simplified deploy of your experiments. Now you can just push your code to Github, and it will automatically deploy the code on your nodes. Not only that, it will also allow you to run your training experiments and save the checkpoints!

Fasten your seatbelt, it's time to deploy!

You should add your SSH_KEY contents into Github secrets. To achieve that you should go to your Github repository page, then click on Settings tab, then Secrets tab, then New repository secret button. Then add your SSH_KEY contents as a secret with the name SSH_KEY.

Like this.

And add your deploy key into deploy keys. You can get it by running the following command:

$ higgsfield show-deploy-key
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5A000THERESHOULDBEYOURDEPLOYKEYHEREso83os//

Copy the output and add it. You can name it DEPLOY_KEY.

Like this.

Push your code:

git add .
git commit -m "Make the way for the LLAMA!"
git push origin main

Now you should go to the Actions tab in your Github repository page. You should see something like this:

As soon as it turns green (which means it's done), you can go to the left side and run the run_llama.yml workflow, put any name you like, and click Run workflow button:

Run is running...

And finished running! Experiment is training on your nodes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setup.md

setup.md

Setup tutorial

Initialize the project

Setup the environment

Setup git

Time to setup your nodes!

Run your very first experiment

Fasten your seatbelt, it's time to deploy!

Files

setup.md

Latest commit

History

setup.md

File metadata and controls

Setup tutorial

Initialize the project

Setup the environment

Setup git

Time to setup your nodes!

Run your very first experiment

Fasten your seatbelt, it's time to deploy!