This guide will help you set up and use the Cerebras API to interact with generative AI models. You’ll learn how to configure your environment, install the necessary library, and get a simple chatbot running!
- Setting up your developer environment
- Installing the Cerebras Inference library
- Running a simple chatbot script powered by Cerebras
-
Obtain Your API Key: Log in to your Cerebras account, navigate to the “API Keys” section, and generate a new API key.
-
Set the API Key as an Environment Variable: For security, store your API key as a secret in your repl! Here's more information on how to do that.
This ensures that your API key is available to your script without hardcoding it directly.
Let's make sure we have all of the requirements for this project installed!
pip install -r requirements.txt
With the API key set and the library installed, you’re ready to run the chatbot. This script will allow you to interact with the Cerebras API and process chat completions.
-
API Key Initialization:
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))
The
Cerebras
client is initialized with an API key fetched from environment variables. This key is necessary for authenticating requests to the Cerebras API. -
User Input and Response Handling:
user_input = input("User: ") user_message = {"role": "user", "content": user_input}
User input is collected and formatted into a message object.
-
Model Interaction:
response = client.chat.completions.create( messages=[user_message], model="llama3.1-8b" )
This API call sends the user message to the specified model (e.g., "llama3.1-8b") and retrieves the assistant’s response.
-
Performance Metrics Calculation:
total_tokens = response.usage.total_tokens total_time = response.time_info.total_time tokens_per_second = total_tokens / total_time
After receiving the response, the total tokens used and the total processing time are extracted from the response object. Tokens per second are then calculated by dividing the total tokens by the total time.