Skip to content

Commit

Permalink
Build on helicone (#2692)
Browse files Browse the repository at this point in the history
  • Loading branch information
colegottdank authored Sep 26, 2024
1 parent 4ee6645 commit 5110e38
Show file tree
Hide file tree
Showing 4 changed files with 324 additions and 0 deletions.
10 changes: 10 additions & 0 deletions bifrost/app/blog/blogs/replaying-llm-sessions/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"title": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"title1": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"title2": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"description": "Learn how to enhance your AI agents by replaying and modifying LLM sessions using Helicone. Apply changes directly to real user interactions to gain authentic context, reveal hidden effects, and accelerate iteration.",
"images": "/static/blog/replaying-llm-sessions/sessions.webp",
"time": "15 minute read",
"author": "Cole Gottdank",
"date": "September 26, 2024"
}
309 changes: 309 additions & 0 deletions bifrost/app/blog/blogs/replaying-llm-sessions/src.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
![sessions](/static/blog/replaying-llm-sessions/sessions.webp)
Experimenting with prompts in isolation **limits your understanding**. To truly grasp how a prompt change impacts an entire session, you need to **apply changes directly to real user interactions**. **<span style={{color: '#0ea5e9'}}>Replaying LLM sessions with Helicone unlocks this capability</span>**, providing insights unattainable through isolated testing.

**Why is this powerful?**

- **Authentic Context**: By leveraging actual production data, you see how changes affect real user experiences.
- **Unveiling Hidden Effects**: Discover unintended consequences that only emerge over full sessions.
- **Accelerated Iteration**: Automate testing with real inputs, streamlining your optimization process.

**<span style={{color: '#0ea5e9'}}>Helicone empowers you to replay any complex session</span>**—a capability no other platform offers. Due to our adaptability, more mature product teams often build bespoke solutions atop Helicone to store, aggregate, and analyze their AI workflows, enhancing performance with genuine user data without reinventing the wheel.

In this guide, we'll **<span style={{color: '#0ea5e9'}}>demonstrate how to leverage Helicone to replay LLM sessions</span>**. You'll learn how to set up an initial session, query session data, and replay sessions with modifications. We'll also share tips on customizing this approach for your unique needs.

---

## Overview of the Replay Process With Helicone

The process of replaying LLM sessions with Helicone involves three main steps:

1. **<span style={{color: '#0ea5e9'}}>Setting Up the Initial Session</span>**: Instrument your LLM calls to include Helicone session metadata so that they can be tracked and logged.
2. **<span style={{color: '#0ea5e9'}}>Querying Helicone for Session Data</span>**: Use Helicone's API to retrieve the logs of past sessions that you want to replay.
3. **<span style={{color: '#0ea5e9'}}>Replaying the Session with Modifications</span>**: Programmatically modify the retrieved session data as needed and send requests to the LLM to observe the effects.

Let's explore each of these steps in detail by following an example.

## Example: AI Debate Application

We'll walk through an example of a debate session between a user and an assistant. Between each argument, a impartial assistant scores the argument from 1 to 10.

### Step 1: Setting Up the Initial Session

Before you can replay sessions, you need to log them properly in Helicone. By adding **<span style={{color: '#0ea5e9'}}>only 3 headers</span>** to your LLM API requests, you can tag and group them into sessions.

#### Instrumenting Your LLM Calls

1. **Configure the OpenAI API client with Helicone**

Set up the session headers, configure the OpenAI API client with Helicone, and include the necessary headers when making a request.

```javascript
const { OpenAI } = require("openai");
const { randomUUID } = require("crypto");

// Generate unique session identifiers
const sessionId = randomUUID();
const sessionName = "AI Debate";
const sessionPath = "/debate/climate-change";

// Initialize OpenAI client with Helicone baseURL and auth header
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://oai.helicone.ai/v1",
defaultHeaders: {
"Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
},
});

// Include Helicone session headers when making requests
const response = await openai.chat.completions(
{
model: "gpt-4o-mini",
messages: conversation,
},
{
headers: {
"Helicone-Session-Id": sessionId,
"Helicone-Session-Name": sessionName,
"Helicone-Session-Path": sessionPath,
},
}
);
```

2. **Initialize the conversation with the assistant**

```javascript
const topic = "The impact of climate change on global economies";

const conversation = [
{
role: "system",
content:
"You're a debating professional. You're engaging in a structured debate with the user. Each of you will present arguments for or against the topic. Keep responses concise and to the point.",
},
{
role: "assistant",
content: `Welcome to our debate! Today's topic is: "${topic}". I will argue in favor, and you will argue against. Please present your opening argument.`,
},
];
```

3. **Loop through the debate turns**

```javascript
let turn = 1;

while (turn <= MAX_TURNS) {
// Get user's argument
const userArgument = await promptUser("Your argument: ");
conversation.push({ role: "user", content: userArgument });

// Score the user's argument
await evaluateArgument(
userArgument,
"Your Argument",
sessionId,
sessionName,
sessionPath
);

// Assistant responds with a counter-argument
const assistantResponse = await generateAssistantResponse(
conversation,
sessionId,
sessionName,
sessionPath
);
conversation.push(assistantResponse);

// Score the assistant's argument
await evaluateArgument(
assistantResponse.content,
"Assistant's Argument",
sessionId,
sessionName,
sessionPath
);

turn++;
}
```

**Note:** The functions `promptUser`, `evaluateArgument`, and `generateAssistantResponse` handle user input, argument evaluation, and generating assistant responses, respectively.

**After setting up and running your session through Helicone, you can view it in Helicone:**

_Go fullscreen for the best experience._

<video width="100%" controls>
<source
src="https://marketing-assets-helicone.s3.us-west-2.amazonaws.com/session_debate.mp4"
type="video/mp4"
/>
Your browser does not support the video tag.
</video>

_Read more about how to implement Helicone sessions [here](https://docs.helicone.ai/features/sessions)._

### Step 2: Querying the Session Data from Helicone

```javascript
const response = await fetch("https://api.helicone.ai/v1/request/query", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${HELICONE_API_KEY}`,
},
body: JSON.stringify({
filter: {
properties: {
"Helicone-Session-Id": {
equals: SESSION_ID_TO_REPLAY,
},
},
},
}),
});
const data = await response.json();
```

Read more about Helicone's API [here](https://docs.helicone.ai/rest/request/post-v1requestquery).

### Step 3: Processing and Modifying the Session Data

Now that you have the session data, you'll need to process it.

1. **Parse and sort the requests**

**<span style={{ color: "#0ea5e9" }}>Sorting session data can be complex</span>** because
each use case is unique and may require custom logic. For our debate session example,
we simply sort the requests by their `created_at` timestamp.

```javascript
const requests = data.data.map((request) => ({
created_at: request.request_created_at,
session: request.request_properties["Helicone-Session-Id"],
signed_body_url: request.signed_body_url,
request_path: request.request_path,
path: request.request_properties["Helicone-Session-Path"],
prompt_id: request.request_properties["Helicone-Prompt-Id"],
body: request.body,
}));
requests.sort((a, b) => new Date(a.created_at) - new Date(b.created_at));
```

2. **Modify the request bodies as needed**

For example, we can adjust the system prompts to change the assistants argument or argument evaluation response.

```javascript
function modifyRequestBody(request) {
if (request.prompt_id === "argument-evaluation") {
const systemMessage = request.body.messages.find(
(msg) => msg.role === "system"
);
if (systemMessage) {
systemMessage.content += " Keep the feedback short and concise.";
}
} else if (request.prompt_id === "assistant-argument") {
const systemMessage = request.body.messages.find(
(msg) => msg.role === "system"
);
if (systemMessage) {
systemMessage.content +=
" Take the persona of a genius in this field when responding.";
}
}
return request;
}
```

3. **Replay the modified session**

```javascript
// Create a new session for the replay
const replaySessionId = randomUUID();
for (const request of requests) {
const modifiedRequest = modifyRequestBody(request);

// Reuse session metadata from the original request
await handleChatCompletion(modifiedRequest);
}

async function handleChatCompletion(modifiedRequest) {
const { body, path, prompt_id, request_path } = modifiedRequest;

// Send the modified request to the LLM
const response = await fetch(request_path, {
method: "POST",
headers: {
Authorization: `Bearer ${OPENAI_API_KEY}`,
"Helicone-Auth": `Bearer ${HELICONE_API_KEY}`,
// Reuse the session metadata for logging
"Helicone-Session-Id": replaySessionId,
"Helicone-Session-Name": sessionName,
"Helicone-Session-Path": path,
"Helicone-Prompt-Id": prompt_id,
},
body: JSON.stringify(body),
});
}
```

**Note:** In the `handleChatCompletion` function, we send the modified request to the LLM. By reusing the same `session-name`, `session-path`, `prompt-id`, and `request path` from the original requests, we ensure that the replayed session is logged in Helicone under the same session metadata. This allows you to see the replayed requests in Helicone, grouped under the same session, making it easier to compare and analyze the effects of your modifications.

### After running the replay, you can view it in Helicone:

_Go fullscreen for the best experience._

<video width="100%" controls>
<source
src="https://marketing-assets-helicone.s3.us-west-2.amazonaws.com/session_debate_replay.mp4"
type="video/mp4"
/>
Your browser does not support the video tag.
</video>

With the replayed session now visible in Helicone, you can observe how the modifications impact the AI's responses throughout the session. This visualization shows how changes in prompts or configurations affect subsequent interactions.

---

## Optional: Evaluations & Prompt Versioning

### Evaluations

To assess the impact of your changes quantitatively, use Helicone's [evaluation features](https://docs.helicone.ai/features/evaluation) to assign scores to both the original and replayed sessions. **<span style={{ color: "#0ea5e9" }}>Comparing these scores helps you understand the effects of your modifications</span>** and refine your prompts more effectively.

### Prompt Versioning

Helicone's [prompt versioning](https://docs.helicone.ai/features/prompts) feature allows you to manage and compare different versions of your prompts effectively. By maintaining multiple versions, **<span style={{ color: "#0ea5e9" }}>you can test various combinations within your sessions to identify which prompts yield the best results</span>**. To retrieve specific prompt versions, use Helicone's [Prompt API](https://docs.helicone.ai/rest/prompt/post-v1prompt-query).

Here's how you can iterate over prompt versions to test different compositions in your sessions:

```pseudo
Define a list of prompt versions for each prompt.
For each combination in the cartesian product of prompt versions:
Generate a new session ID.
For each prompt in the combination:
Fetch the prompt content using Helicone's Prompt API.
Run the session using the retrieved prompts.
```

_Alternatively, as described above, you can manually modify the prompts after retrieving the session, and Helicone will automatically log the new prompt versions for you. In this approach, you create the prompt versions first._

### Conclusion

By replaying and modifying LLM sessions with Helicone, you gain deeper insights into how changes affect the entire workflow. This method provides context-rich, real-world data that leads to more effective optimizations and a comprehensive understanding of your AI's behavior.

---

### Helicone Plug 🧊

**<span style={{ color: "#0ea5e9" }}>Helicone is the ultimate open-source LLM observability platform built for developers to monitor, debug and improve their LLM applications</span>**. We're fully open source and built to handle high usage use-cases seamlessly. With over **1.8 billion requests** and **1.6 trillion tokens** logged, it's proven at scale. **Check out our open-source GitHub repository [here](https://github.com/Helicone/helicone).**

**Trusted by Industry Leaders**: Companies like **Sunrun**, **Michelin**, **QAWolf**, **Slate Magazine**, and many more rely on Helicone to enhance their AI models and workflows.

You can view the full code used in this guide on [GitHub](https://github.com/colegottdank/helicone-replay-session-blog).

---
5 changes: 5 additions & 0 deletions bifrost/app/blog/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,11 @@ export type BlogStructure =
};

const blogContent: BlogStructure[] = [
{
dynmaicEntry: {
folderName: "replaying-llm-sessions",
},
},
{
dynmaicEntry: {
folderName: "helicone-recap",
Expand Down
Binary file not shown.

0 comments on commit 5110e38

Please sign in to comment.