adding example script for hosting an OpenAI API server for OLMo 2 on Modal.com #761

cnewell · 2024-12-05T21:06:17Z

This is a sample script for Modal.com to stand up a standard OpenAI API server for OLMo 2, as a way for folks to get their own copy running quickly using Modal.

…Modal.com

dirkgr

Can you also add a section in the README describing how you use this?

dirkgr · 2024-12-05T21:09:19Z

scripts/olmo2_modal_openai.py

+# https://github.com/modal-labs/modal-examples/blob/ed89980d7288cd35c57f23861ba1b1c8d198f68d/06_gpu_and_ml/llm-serving/vllm_inference.py
+
+import os
+import modal


This is an extra dependency. Is there a place where you can explain what needs to be done to get this working?

Added setup information to the README

dirkgr · 2024-12-05T21:10:29Z

scripts/olmo2_modal_openai.py

+    .pip_install(
+        "git+https://github.com/vllm-project/vllm.git@9db713a1dca7e1bc9b6ecf5303c63c7352c52a13",
+    )
+    .pip_install(
+        "git+https://github.com/huggingface/transformers.git@9121ab8fe87a296e57f9846e70153b1a3c555d75",
+    )


Do we still need to depend on shas? I think both vllm and huggingface now have OLMo in main, if not in the latest released version.

huggingface/transformers is on a tagged version that includes olmo2 as of 5 hours ago, but vLLM still requires a build. I'll try it with main instead of the specific sha, which should work, but that'll be an hour or two for the build to confirm it.

After running this for a while, I'm going to swing back to advocating for pinning, either to a sha or to a tagged version once that catches up, because otherwise we have the likelihood of unacceptable delays to build new versions if we spin up new containers in response to a surge in traffic and a new version has come out since the last time we brought up the container.

With that in mind, do we want to go with a pinned sha we've got, or wait until we get a tagged vllm version with the olmo2 architecture, which I'm guessing is mid-month?

dirkgr · 2024-12-05T22:16:10Z

Let me know when I should look again!

…ge setup since no vllm build required, plus docs cleanup

cnewell · 2024-12-09T03:43:48Z

OK, this should be cleaned up. It's using a version of vLLM that's pinned to a specific SHA per my discussion above, but it's using a prebuilt wheel per https://docs.vllm.ai/en/latest/getting_started/installation.html#install-the-latest-code instead of a build from source, which is much faster and simplifies the image setup.

cnewell · 2024-12-11T23:48:31Z

@dirkgr Should be ready for you to look again! I expect the most controversial aspect remaining is the vllm version, which I want to pin so it's not constantly pausing to pull minor new versions whenever they get released, but if you want to wait until there's an official tagged release that includes the OLMo2 architecture rather than continuing to tie to a git hash I'm open to that (especially if you think we're about due for a new official release)

epwalsh

Just some minor comments about the instructions

epwalsh · 2024-12-17T17:15:27Z

README.md

@@ -154,6 +154,48 @@ The quantized model is sensitive to input types and CUDA handling. To avoid pote

 Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/OLMo-eval) repo.

+## Modal.com Hosting
+
+An example script is provided for hosting an OLMo 2 model on Modal.com using a the OpenAI API in ./scripts/olmo2_modal_openai.py.


typo:

Suggested change

An example script is provided for hosting an OLMo 2 model on Modal.com using a the OpenAI API in ./scripts/olmo2_modal_openai.py.

An example script is provided for hosting an OLMo 2 model on Modal.com using the OpenAI API in ./scripts/olmo2_modal_openai.py.

Good catch!

epwalsh · 2024-12-17T17:16:08Z

README.md

+
+An example script is provided for hosting an OLMo 2 model on Modal.com using a the OpenAI API in ./scripts/olmo2_modal_openai.py.
+To run that:
+<ol>


Why not just use markdown list syntax?

Good question; will change that.

AkshitaB · 2024-12-17T23:36:30Z

scripts/olmo2_modal_openai.py

+APP_NAME = "OLMo-2-1124-13B-Instruct-openai"
+APP_LABEL = APP_NAME.lower()
+
+MINUTES = 60  # seconds


Is it minutes or seconds?

The 60 part is measured in seconds. The intent is to create a constant representing 60 seconds that later allows us to specify timeouts that are measured in seconds by saying "5 * minutes" for 300 seconds.

cnewell · 2024-12-18T01:24:31Z

@epwalsh @AkshitaB Thank you for the feedback today! Updates in place, with typos fixed and hopefully slightly clearer.

epwalsh

Thanks @cnewell, LGTM

@dirkgr

@dirkgr said he's cool with it based on Pete's approval

adding example script for hosting an OpenAI API server for OLMo 2 on …

eebef05

…Modal.com

cnewell requested a review from dirkgr December 5, 2024 21:06

dirkgr previously requested changes Dec 5, 2024

View reviewed changes

cnewell added 2 commits December 5, 2024 13:40

docs updates and typing fix

4bd03d2

doc tweaks

8abde77

cnewell added 4 commits December 8, 2024 18:03

Switch vLLM to pre-build wheel for a specific commit, simplifying ima…

fa8e7bb

…ge setup since no vllm build required, plus docs cleanup

adjusted imports

2ff6d24

isort fixes

573b0a6

formatting fixes

3df2065

epwalsh reviewed Dec 17, 2024

View reviewed changes

epwalsh requested a review from AkshitaB December 17, 2024 17:57

AkshitaB reviewed Dec 17, 2024

View reviewed changes

PR feedback

ca01082

Merge branch 'main' into chrisn/olmo2-modal-script

41472a8

epwalsh approved these changes Dec 18, 2024

View reviewed changes

cnewell merged commit f11c935 into main Dec 18, 2024
11 checks passed

cnewell deleted the chrisn/olmo2-modal-script branch December 18, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding example script for hosting an OpenAI API server for OLMo 2 on Modal.com #761

adding example script for hosting an OpenAI API server for OLMo 2 on Modal.com #761

cnewell commented Dec 5, 2024

dirkgr left a comment

dirkgr Dec 5, 2024

cnewell Dec 5, 2024

dirkgr Dec 5, 2024

cnewell Dec 5, 2024

dirkgr Dec 5, 2024

cnewell Dec 6, 2024

dirkgr commented Dec 5, 2024

cnewell commented Dec 9, 2024

cnewell commented Dec 11, 2024

epwalsh left a comment

epwalsh Dec 17, 2024

cnewell Dec 18, 2024

epwalsh Dec 17, 2024

cnewell Dec 18, 2024

AkshitaB Dec 17, 2024

cnewell Dec 18, 2024

cnewell commented Dec 18, 2024

epwalsh left a comment

	An example script is provided for hosting an OLMo 2 model on Modal.com using a the OpenAI API in ./scripts/olmo2_modal_openai.py.
	An example script is provided for hosting an OLMo 2 model on Modal.com using the OpenAI API in ./scripts/olmo2_modal_openai.py.

adding example script for hosting an OpenAI API server for OLMo 2 on Modal.com #761

adding example script for hosting an OpenAI API server for OLMo 2 on Modal.com #761

Conversation

cnewell commented Dec 5, 2024

dirkgr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Dec 5, 2024

cnewell commented Dec 9, 2024

cnewell commented Dec 11, 2024

epwalsh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cnewell commented Dec 18, 2024

epwalsh left a comment

Choose a reason for hiding this comment