Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main dev infctx #19

Merged
merged 16 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Disable code % attribution of the project from notebooks
notebook/* linguist-vendored
100 changes: 98 additions & 2 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ env:
IMAGE_NAME: ${{ github.repository }}

jobs:
build:
build_env:
name: Docker Env Image (cuda-11-8)

runs-on: ubuntu-latest
Expand Down Expand Up @@ -98,7 +98,7 @@ jobs:

# Build and push Docker image with Buildx (don't push on PR)
# https://github.com/docker/build-push-action
- name: Build and push Docker image
- name: Build and push Docker image (env-cuda-11-8)
id: build-and-push
uses: docker/build-push-action@v4
with:
Expand All @@ -109,3 +109,99 @@ jobs:
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,src=docker/env-cuda-11-8
cache-to: type=gha,mode=max

build_runner:
name: Docker Env Image (github-worker-11-8)

needs: build_env
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
# This is used to complete the identity challenge
# with sigstore/fulcio when running outside of PRs.
id-token: write

steps:
# Get and log the free space
- name: Get system free space (Before reclaim)
run: |
echo "Free space:"
df -h

# Due to the docker image being > available space on the runner
# we need to do some optimization, to create more space.
# https://github.com/marketplace/actions/disk-space-reclaimer
# https://stackoverflow.com/questions/76294509/github-actions-docker-service-container-25gb-cannot-be-loaded
- name: Maximize build space
uses: insightsengineering/disk-space-reclaimer@v1
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tools-cache: true

# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
swap-storage: true
docker-images: true

# Get and log the free space
- name: Get system free space (After reclaim)
run: |
echo "Free space:"
df -h

- name: Checkout repository
uses: actions/checkout@v3

# Install the cosign tool except on PR
# https://github.com/sigstore/cosign-installer
- name: Install cosign
if: github.event_name != 'pull_request'
uses: sigstore/cosign-installer@f3c664df7af409cb4873aa5068053ba9d61a57b6 #v2.6.0
with:
cosign-release: 'v1.11.0'

# Workaround: https://github.com/docker/build-push-action/issues/461
- name: Setup Docker buildx
uses: docker/setup-buildx-action@v2

# Login against a Docker registry except on PR
# https://github.com/docker/login-action
- name: Log into registry ${{ env.REGISTRY }}
if: github.event_name != 'pull_request'
uses: docker/login-action@28218f9b04b4f3f62068d7b6ce6ca5b26e35336c
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

# Extract metadata (tags, labels) for Docker
# https://github.com/docker/metadata-action
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: downcase IMAGE_NAME
run: |
echo "IMAGE_NAME_LC=${IMAGE_NAME,,}" >>${GITHUB_ENV}

# Build and push Docker image with Buildx (don't push on PR)
# https://github.com/docker/build-push-action
- name: Build and push Docker image (github-worker-cuda-11-8)
id: build-and-push
uses: docker/build-push-action@v4
with:
context: "{{defaultContext}}:docker/github-worker-cuda-11-8"
push: ${{ github.event_name != 'pull_request' }} # Don't push on PR
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME_LC }}:github-worker-cuda-11-8
# tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,src=docker/github-worker-cuda-11-8
cache-to: type=gha,mode=max
19 changes: 19 additions & 0 deletions .github/workflows/notebook-run.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Notebook-Run

on:
workflow_dispatch:
inputs:
cudaVersion:
description: 'cuda version'
required: true
default: '11-8'

env:

jobs:
build_env:
name: Notebook-Run
runs-on: cuda-${{github.event.inputs.cudaVersion}}
steps:
- run: |
echo "Cuda Version: ${{github.event.inputs.cudaVersion}}"
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,12 @@ notebook/scratch/*
*/lightning_logs/
*/config.yaml

# Add back script files used in docker images
!docker/**/*.sh

# Add .github items
!.github/
!.git*

# Add back requirements.txt
!requirements.txt
Expand Down
2 changes: 1 addition & 1 deletion RWKV-v4neo/src/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1256,7 +1256,7 @@ def __init__(
tokenizer_file = os.path.join(SCRIPT_DIR,"./dataflow/20B_tokenizer.json")
tokenizer = PreTrainedTokenizerFast(tokenizer_file=tokenizer_file)
self.fastTokenizer = tokenizer
elif vocab_size == 65529 or vocab_size == 65536:
elif vocab_size == 65536:
# Use the world tokenizer
from .dataflow.trie_tokenizer import MT_TRIE_TOKENIZER
world_tokenizer = MT_TRIE_TOKENIZER(os.path.join(SCRIPT_DIR, "./dataflow/rwkv_vocab_v20230424.txt"))
Expand Down
2 changes: 1 addition & 1 deletion RWKV-v5/src/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1395,7 +1395,7 @@ def __init__(
tokenizer_file = os.path.join(SCRIPT_DIR,"./dataflow/20B_tokenizer.json")
tokenizer = PreTrainedTokenizerFast(tokenizer_file=tokenizer_file)
self.fastTokenizer = tokenizer
elif vocab_size == 65529 or vocab_size == 65536:
elif vocab_size == 65536:
# Use the world tokenizer
from .dataflow.trie_tokenizer import MT_TRIE_TOKENIZER
world_tokenizer = MT_TRIE_TOKENIZER(os.path.join(SCRIPT_DIR, "./dataflow/rwkv_vocab_v20230424.txt"))
Expand Down
5 changes: 4 additions & 1 deletion docker/env-cuda-11-8/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,7 @@ RUN pip3 install lightning==2.0.5 deepspeed==0.10.0 \

# Install the misc packages we might need for the various experiments
RUN pip3 install \
papermill aiocsv aiofiles
papermill aiocsv aiofiles

# Configure default dir, to the home directory
WORKDIR /root
41 changes: 41 additions & 0 deletions docker/github-worker-cuda-11-8/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Temporary, until the rwkv package is public
FROM ghcr.io/picocreator/rwkv-lm-lora:env-cuda-11-8
# FROM ghcr.io/rwkv/rwkv-infctx-trainer:env-cuda-11-8

# Install the github runner
RUN cd / && mkdir actions-runner && cd actions-runner && \
curl -o actions-runner-linux-x64-2.308.0.tar.gz \
-L https://github.com/actions/runner/releases/download/v2.308.0/actions-runner-linux-x64-2.308.0.tar.gz && \
tar xzf ./actions-runner-linux-x64-2.308.0.tar.gz && \
rm ./actions-runner-linux-x64-2.308.0.tar.gz

# Clone the runner, for lane2 track
RUN cd / && cp -r /actions-runner /actions-runner-lane2

# Install dependencies
RUN cd /actions-runner && ./bin/installdependencies.sh && \
cd /actions-runner-lane2 && ./bin/installdependencies.sh

# Copy the entrypoint script, and set it up
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

# Configure default env variables
ENV RUNNER_LABELS=""
ENV RUNNER_NAME=""
ENV RUNNER_TOKEN=""
ENV RUNNER_REPO_URL="https://github.com/RWKV/RWKV-infctx-trainer"

# Runner with lane2 track
# ---
# this helps setup dual runs on the same machine
# to help ensure better utilization of GPUs.
#
# In general DS2/3_offload should be used.
#
# Tags should be adjusted to be half their original spec
# to account for the fact that we are running two runners
#
# This is only useful for high GPU, and high ram count machines
ENV RUNNER_LANE2="false"
73 changes: 73 additions & 0 deletions docker/github-worker-cuda-11-8/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/bash

export RUNNER_ALLOW_RUNASROOT="1"
cd /actions-runner

# CUDA version for label
CUDA_VER="cuda-11-8"

# Check the URL, token, and name of the runner from the container ENV vars
# and if they are not set, provide default values
if [[ -z "${RUNNER_NAME}" ]]; then
export RUNNER_NAME=$(hostname)
fi
if [[ -z "${RUNNER_TOKEN}" ]]; then
echo "# [WARNING] RUNNER_TOKEN is missing, skipping github runner setup"
else
echo "# [INFO] lane1 starting up ... "

# If lane2 runner is enabled, start it
# this is enabled with RUNNER_LANE2=true
if [ "$RUNNER_LANE2" != true ]; then

# Configure unattended
./config.sh \
--unattended \
--url "${RUNNER_REPO_URL}" \
--token "${RUNNER_TOKEN}" \
--name "${RUNNER_NAME}" \
--replace \
--labels "nolane,${CUDA_VER},${RUNNER_LABELS}"

# Run it in background, and get the PID
./run.sh &

echo "# [INFO] lane2 runner is disabled"
else
# Configure unattended
./config.sh \
--unattended \
--url "${RUNNER_REPO_URL}" \
--token "${RUNNER_TOKEN}" \
--name "${RUNNER_NAME}-lane1" \
--replace \
--labels "lane1,${CUDA_VER},${RUNNER_LABELS}"

# Run it in background, and get the PID
./run.sh &

echo "# [INFO] lane2 starting up ... "

cd /actions-runner-lane2
./config.sh \
--unattended \
--url "${RUNNER_REPO_URL}" \
--token "${RUNNER_TOKEN}" \
--name "${RUNNER_NAME}-lane2" \
--replace \
--labels "lane2,${CUDA_VER},${RUNNER_LABELS}"

# Run it in background, and get the PID
./run.sh &
fi
fi

# Follow up on any forwarded command args
if [[ $# -gt 0 ]]; then
cd /root
exec "$@"
fi

# Wait for everything to exit
# wait $RUNNER_PID
wait
Loading