OpenRouter Runner is a monolith inference engine, built with Modal. It serves as a robust solution for the deployment of tons of open source models that are hosted in a fallback capacity on openrouter.ai.
✨ If you can make the Runner run faster and cheaper, we'll route to your services!
- Adding Models To OpenRouter (Video)
- Prerequisites
- Quickstart
- Adding New Models
- Configuration and Testing
- Deploying
- Contributions
Before you begin, ensure you have the necessary accounts and tools:
- Modal Account: Set up your environment on Modal as this will be your primary deployment platform.
- Hugging Face Account: Obtain a token from Hugging Face for accessing models and libraries.
- Poetry Installed: Make sure you have poetry installed on your machine.
For those familiar with the OpenRouter Runner and wanting to deploy it quickly. This means you have already set up the prerequisites and can start deploying.
-
Navigate to modal directory.
cd path/to/modal
-
Setup Poetry
poetry install poetry shell modal token new
ℹ️ For intellisense, it's recommended to run vscode via the poetry shell:
poetry shell code .
-
Create dev environment
modal environment create dev
ℹ️ If you have a dev environment created already no need to create another one. Just configure to it in the next step.
-
Configure dev environment
modal config set-environment dev
⚠️ We are using our Dev environment right now. Switch to main when deploying to production. -
Configure secret keys
-
HuggingFace Token: Create a Modal secret group with your Hugging Face token. Replace
<your huggingface token>
with the actual token.modal secret create huggingface HUGGINGFACE_TOKEN=<your huggingface token>
-
Runner API Key: Create a Modal secret group for the runner API key. Replace
<generate a random key>
with a strong, random key you've generated. Be sure to save this key somewhere as we'll need it for later!modal secret create ext-api-key RUNNER_API_KEY=<generate a random key>
-
Sentry Configuration Create a Modal secret group for the Sentry error tracking storage. Replace
<optional SENTRY_DSN>
with your DSN from sentry.io or leave it blank to disable Sentry (e.g.SENTRY_DSN=
). You can also add an environment by addingSENTRY_ENVIRONMENT=<environment name>
to the command.modal secret create sentry SENTRY_DSN=<optional SENTRY_DSN>
-
Datadog Configuration Create a Modal secret group for Datadog log persistence. Replace
<optional DD_API_KEY>
with your Datadog API Key or leave it blank to disable Datadog (e.g.DD_API_KEY=
). You can also add an environment by addingDD_ENV=<environment name>
to the command and a site by addingDD_SITE=<site name>
to the command.modal secret create datadog DD_API_KEY=<optional DD_API_KEY> DD_SITE=<site name>
-
-
Download Models
modal run runner::download
-
Deploy Runner
modal deploy runner
With your environment now fully configured, you're ready to dive into deploying the OpenRouter Runner. This section guides you through deploying the Runner, adding new models or containers, and initiating tests to ensure everything is functioning as expected.
Adding new models to OpenRouter Runner is straightforward, especially when using models from Hugging Face that are compatible with existing containers. Here's how to do it:
-
Find and Copy the Model ID: Browse Hugging Face for the model you wish to deploy. For example, let's use
"mistralai/Mistral-7B-Instruct-v0.2"
. -
Update Model List: Open the
runner/containers/__init__.py
file. Add your new model ID to theDEFAULT_CONTAINER_TYPES
dictionary, using the container definition you want to use:DEFAULT_CONTAINER_TYPES = { "Intel/neural-chat-7b-v3-1": ContainerType.VllmContainer_7B, "mistralai/Mistral-7B-Instruct-v0.2": ContainerType.VllmContainer_7B, ... }
-
Handle Access Permissions: If you plan to deploy a model like
"meta-llama/Llama-2-13b-chat-hf"
, and you don't yet have access, visit here for instructions on how to request access. Temporarily, you can comment out this model in the list to proceed with deployment. -
Download and Prepare Models: Use the CLI to execute the
runner::download
function within your application. This command is designed to download and prepare the required models for your containerized app.modal run runner::download
This step does not deploy your app but ensures all necessary models are downloaded and ready for when you do deploy. After running this command, you can check the specified storage location or logs to confirm that the models have been successfully downloaded. Note that depending on the size and number of models, this process can take some time.
-
Start testing the Models: Now you can go to the Configuration and Testing section to start testing your models!
Sometimes the model you want to deploy requires an environment or configurations that aren't supported by the existing containers. This might be due to special software requirements, different machine types, or other model-specific needs. In these cases, you'll need to create a new container.
-
Understand the Requirements: Before creating a new container, make sure you understand the specific requirements of your model. This might include special libraries, hardware needs, or environment settings.
-
Copy a Container File: Start by copying an existing container file from
runner/containers
. This gives you a template that's already integrated with the system.cp runner/containers/existing_container.py runner/containers/new_container.py
-
Customize the Container: Modify the new container file. Change the class name to something unique, and adjust the image, machine type, engine, and any other settings to meet your model's needs. Remember to install any additional libraries or tools required by your model.
-
Register the Container: Open
./containers/__init__.py
. Add an import statement for your new container class at the top of the file, then create a new list of model IDs or update an existing one to include your model.from .new_container import NewContainerClass new_model_ids = [ "your-model-id", # Add more model IDs as needed. ]
-
Associate Models: Add a
ContainerType
for your model inmodal/shared/protocol.py
and define how to build it inget_container(model_path: Path, container_type: ContainerType)
inmodal/runner/containers/__init__.py
. -
Download and Prepare Models: Use the CLI to execute the
runner::download
function within your application. This command is designed to download and prepare the required models for your containerized app.modal run runner::download
This step does not deploy your app but ensures all necessary models are downloaded and ready for when you do deploy. After running this command, you can check the specified storage location or logs to confirm that the models have been successfully downloaded. Note that depending on the size and number of models, this process can take some time.
-
Start Testing: With your new container deployed, proceed to the Configuration and Testing section to begin testing your model!
Note
Creating a new container can be complex and requires a good understanding of the model's needs and the system's capabilities. If you encounter difficulties, consult the detailed documentation, or seek support from the community or help forums.
Before diving into testing your models and endpoints, it's essential to properly configure your environment and install all necessary dependencies. This section guides you through setting up your environment, running test scripts, and ensuring everything is functioning correctly.
-
Create a
.env.dev
File: In the root of your project, create a.env.dev
file to store your environment variables. This file should include:API_URL=<MODAL_API_ENDPOINT_THAT_WAS_DEPLOYED> RUNNER_API_KEY=<CUSTOM_KEY_YOU_CREATED_EARLIER> MODEL=<MODEL_YOU_ADDED_OR_WANT_TO_TEST>
API_URL
: Your endpoint URL, obtained downloading the models. You can find this on your Modal dashboard as well.RUNNER_API_KEY
: The custom key you created earlier.MODEL
: The identifier of the model you wish to test.
-
Install Dependencies: If you haven't already install the following dependencies.
- If you're working with TypeScript scripts, you'll likely need to install Node.js packages. Use the appropriate package manager for your project:
npm install # or pnpm install
- If you're working with TypeScript scripts, you'll likely need to install Node.js packages. Use the appropriate package manager for your project:
-
Ensure the Runner is Active: Make sure your OpenRouter Runner is running. From the
openrouter-runner/modal
directory, you can start it with:modal serve runner
This command will keep your app running and ready for testing.
-
Open Another Terminal for Testing: While keeping the runner active, open a new terminal window. Navigate to the
/openrouter-runner
path to be in the correct directory for running scripts.
Now that your environment is set up and your app is running, you're ready to start testing models.
-
Navigate to Project Root: Ensure you're in the root directory of your project.
cd path/to/openrouter-runner
-
Load Environment Variables: Source your
.env.dev
file to load the environment variables.source .env.dev
-
Choose a Test Script: In the
scripts
directory, you'll find various scripts for testing different aspects of your models. For a simple test, you might start withtest-simple.ts
. -
Run the Test Script: Execute the script with your model identifier using the command below. Replace
YourModel/Identifier
with the specific model you want to test.pnpm x scripts/test-simple.ts YourModel/Identifier
Note
If you wish to make the results more legible, especially for initial tests, consider setting stream: false
in your script to turn off streaming.
-
Viewing Results: After running the script, you'll see a JSON-formatted output in your terminal. It will provide the generated text along with information on the number of tokens used in the prompt and completion. If you've set
stream: false
, the text will be displayed in its entirety, making it easier to review the model's output.Example Response:
{ "text": "Project A119 was a top-secret program run by the United States government... U.S. nuclear and military policies.", "prompt_tokens": 23, "completion_tokens": 770, "done": true }
Note: The response has been truncated for brevity.
-
Troubleshooting: If you encounter errors related to Hugging Face models, ensure you've installed
huggingface_hub
and have the correct access permissions for the models you're trying to use.
By following these steps, you should be able to set up your environment, deploy your app, and start testing various models and endpoints. Remember to consult the detailed documentation and seek support from the community if you face any issues.
Deploying your model is the final step in making your AI capabilities accessible for live use. Here's how to deploy and what to expect:
-
Deploy to Modal: When you feel confident with your setup and testing, deploy your runner to Modal with the following command:
modal deploy runner
This command deploys your runner to Modal, packaging your configurations and models into a live, accessible application.
-
View Your Deployment: After deployment, visit your dashboard on Modal. You should see your newly deployed model listed there. This dashboard provides useful information and controls for managing your deployment.
-
Interact with Your Live Model: With your model deployed, you can now call the endpoints live. Use the API URL provided in your
.env.dev
file (or found on your Modal dashboard) to send requests and receive AI-generated responses. This is where you see the real power of your OpenRouter Runner in action. -
Monitor and Troubleshoot: Keep an eye on your application's performance and logs through the Modal dashboard. If you encounter any issues or unexpected behavior, consult the logs for insights and adjust your configuration as necessary.
By following these steps, your OpenRouter Runner will be live and ready to serve!
We'd love to see you add more models to the Runner! If you're interested in contributing, please follow the section on Adding a New Model to start adding more Open Source models to OpenRouter! In addition, please adhere to our code of conduct to maintain a healthy and welcoming community.