1. The Tim Ferriss Show Archivist

The Tim Ferriss Show (TFS) is one of the most popular podcast, focusing on "deconstructing world-class performers from eclectic areas (investing, chess, pro sports, etc.), digging deep to find the tools, tactics, and tricks that listeners can use". After 10 years and over 750 episodes, the content has grown to be intimidating to read and search for the gems.

The TFS Archivist is a conversational AI that can help users search for the relevant idea from a specific guest/episode, saving the need of manually skimming through the library and the hour-long transcript.

This is my final project for DataTalk.Club's LLM Zoomcamp - a free course about LLMs and RAG.

1. The Tim Ferriss Show Archivist
2. Notes
3. Progress
4. Points
5. Overview
6. Dataset
7. App Architecture
8. How to Run the App
9. Code
10. Evaluations
- 10.1. Retrieval evaluation
- 10.2. RAG model evalution
11. Monitoring
- 11.1. Dashboard
- 11.2. Set up
12. Acknowledgements

2. Notes

The app was developed on GitHub Codespaces with a disc constraints of 32GB. If the app is set up normally, the machine would actually crash due to disc overflow. To circumvent this, in docker-compose file I point the volume to /tmp/postgres_data which is a system folder outside the codespace working directory and is not counted towards the 32GB quota. If you are running the app outside of GitHub Codespaces, you may want to this path to just postgres_data.

3. Progress

4. Points

To save you the trouble of looking for the project criteria, I put my marks here. You can double-check while reading through the repo and running it.

Problem description

2 points: The problem is well-described and it's clear what problem the project solves

RAG flow

2 points: Both a knowledge base and an LLM are used in the RAG flow

Retrieval evaluation

2 points: Multiple retrieval approaches are evaluated, and the best one is used

RAG evaluation

2 points: Multiple RAG approaches are evaluated, and the best one is used

Interface

2 points: UI (e.g., Streamlit), web application (e.g., Django), or an API (e.g., built with FastAPI)

Ingestion pipeline

2 points: Automated ingestion with a Python script or a special tool (e.g., Mage, dlt, Airflow, Prefect)

Monitoring

2 point: User feedback is collected and there's a dashboard with at least 5 charts

Containerization

2 points: Everything is in docker-compose

Reproducibility

2 points: Instructions are clear, the dataset is accessible, it's easy to run the code, and it works. The versions for all dependencies are specified.

Best practices

Hybrid search: combining both text and vector search (at least evaluating it) (1 point)
Document re-ranking (1 point)
User query rewriting (1 point)

5. Overview

The TFS Archivist lets user search for a specific content from an episode of The Tim Ferriss Show.

Example use case incluces

Search for background information about a guest.
Search for the episode a guest appears in.
Search for a specific idea that a guest mentioned in the show.

6. Dataset

The dataset is the show transcripts up to episode 766, scraped from https://tim.blog/2018/09/20/all-transcripts-from-the-tim-ferriss-show/. The notebook to process the data is in the scrape folder. The notebook was run on Colab (to make use of the GPU) across different sessions, so it can be messy. The basic steps:

Get all the transcripts, in legacy format (PDF) and current format (web content).
Process to extract out the episode content itself.
Chunk each episode into chunks of 700 words with 20 words overlapped.
Use SentenceTransformer to embed each chunk into 768 dense vectors.

After processing, the data has the following fields:

id: The episode number.
chunk_id: The chunk ID in format id_{auto-increment number}.
title: Episode title
chunk: The text in the chunk.
embedding: The embedding vector of the text chunk.

Note: Based on the clear copyright prominently displayed in his website (e.g., here), commercial usage of his transcript is disallowed. It means that you cannot take an app like this and deploy it on cloud for commercial use.

7. App Architecture

Technologies used:

Python 3.12
Docker and Docker Compose for containerization
ElasticSearch for full-text search (and semantic search during evaluation)
Streamlit as both the app backend and frontend
PostgreSQL as the backend for monitoring
Grafana as monitoring dashboard
OpenAI and Groq as possible LLMs

8. How to Run the App

8.1. `.env` Preparation

Prepare a .env file with the following format

GROQ_API_KEY=your_api_key
OPENAI_API_KEY=your_api_key

TZ=Asia/Singapore

# PostgreSQL Configuration
POSTGRES_HOST=postgres
POSTGRES_DB=tfs_archivist
POSTGRES_USER=admin
POSTGRES_PASSWORD=admin
POSTGRES_PORT=5432

# Grafana Configuration
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=admin
GRAFANA_SECRET_KEY=SECRET_KEY

# Elasticsearch Configuration
ELASTIC_URL_LOCAL=http://127.0.0.1:9200
ELASTIC_URL=http://elasticsearch:9200
ELASTIC_PORT=9200

# Streamlit Configuration
STREAMLIT_PORT=8501

Get your Groq and OpenAI API keys from respective website.

8.2. Database initialization

The database for ElasticSearch and PostgreSQL needs initializing before running the app.

First, run the postgres and elasticsearch containers only

docker-compose up postgres elasticsearch -d

Second, prepare the Python environments and run the prep.py and ingestion.py scripts

conda create -n llm
conda activate llm
pip install -r requirements.txt

export POSTGRES_HOST=localhost
python prep.py
python ingestion.py

8.3. Docker Compose

The easiest way is to use Docker Compose. After database initialization, run

docker-compose up

8.4. Running Locally

If you want to run the application locally, after database initialization, instead of docker-compose up, run

export POSTGRES_HOST=localhost
bash streamlit.sh

8.5. Docker only

If you want to run the application using only Docker for development, after database initialization, build the image and run it

docker build -t streamlit .
docker run -it --rm \
    --network="llm-zoomcamp-tf-show-archivist_default" \
    --env-file=".env" \
    -e OPENAI_API_KEY=${OPENAI_API_KEY} \
    -e GROQ_API_KEY=${GROQ_API_KEY} \
    -p 8501:8501 \
    app

8.6. Using the App

Navigate to http://127.0.0.1:8501/ to use the app via the Streamlit UI.

Demo can be viewed at

LLM Zoomcamp Demo - Watch Video

https://www.loom.com/share/1c3e150ea6c04e9bb21f13c295e201d3

9. Code

grafana - Initialization and dashboard settings for Grafana dashboards.
notebooks - contain experiment notebooks and first prototype
scrape - contain notebook used to scrape and process the data
utils - util functions
app.py - the main app logic
assistant.py - the main RAG logic for building the retrieving the data and building the prompt
ingestion.py - loading the data into the knowledge base
db.py - the logic for logging the requests and responses to postgres database
prep.py - the script for initializing the database

10. Evaluations

Note: Due to a gross mistake on my part during transfer between different GitHub Codespace (I broke the last one), the experiment data were lost. Only the data as the output of the notebooks remain 😔.

2 Jupyter notebooks in the notebooks folder.

evaluation_data_generation.ipynb - Ground truth dataset generation.
evaluation_rag.ipynb - The retrieval and RAG evaluation.

10.1. Retrieval evaluation

Vector approximate search (10,000 sample (max setting for ElasticSearch), top-5, cosine similarity)

Chunk Hit Rate: 0.3820558526440879,
Chunk MRR: 0.44081501287383695,
Document Hit Rate: 0.6316102198455139,
Document MRR: 0.9488710635769462

Keyword search (chunk and title, no boosting, top-5)

Chunk Hit Rate: 0.7890671420083185,
Chunk MRR: 1.0368785898197679,
Document Hit Rate: 0.8722519310754605,
Document MRR: 1.571390374331495

Keyword search performed better. Due to time constraint, I did not test boosting for keyword search. To do so, we can use minsearch.py as an approximate to perform simple optimization, and then use the setting for ElasticSearch.

10.2. RAG model evalution

I evaluated the new llama-3.1-8b-instant and the older llama3-8b-8192 from Groq using GPT-4o-mini as a judge, using 103 samples.

The odd number of sample is due to rate limit from Groq!

Llama 3:

relevance
RELEVANT           0.737864
PARTLY_RELEVANT    0.174757
NON_RELEVANT       0.087379

Llama 3.1

relevance
RELEVANT           0.718447
PARTLY_RELEVANT    0.165049
NON_RELEVANT       0.116505

Based on the 103 samples, GPT-4o-mini judged that Llama-3 8B has some edge over the new Llama-3.1 8B, though it's just 1-2 questions different. Considering they are both free, I used both.

A further evaluation would be to try Llama-3 70B to see if the increased size can lead to a better performance, and if it's worth the cost.

11. Monitoring

A postgres DB was set up as the backend for monitoring, storing the conversations as well as user feedback. Grafana visualized the information using this data.

When the app is running, access it at localhost:3000:

Login: "admin"
Password: "admin"

11.1. Dashboard

The dashboard follows the template in the course, with 7 panels

Last 5 Conversations (Table): Displays a table showing the five most recent conversations, including details such as the question, answer, relevance, and timestamp. This panel helps monitor recent interactions with users.
+1/-1 (Pie Chart): A pie chart that visualizes the feedback from users, showing the count of positive (thumbs up) and negative (thumbs down) feedback received. This panel helps track user satisfaction.
Relevancy (Gauge): A gauge chart representing the relevance of the responses provided during conversations. The chart categorizes relevance and indicates thresholds using different colors to highlight varying levels of response quality.
Tokens Cost (Time Series): A time series line chart depicting the cost associated with API usage over time for both Groq and OpenAI. This panel helps monitor and analyze the expenditure linked to the AI model's usage.
Tokens (Time Series): Another time series chart that tracks the number of tokens used in conversations over time. This helps to understand the usage patterns and the volume of data processed.
Model Used (Bar Chart): A bar chart displaying the count of conversations based on the different models used. This panel provides insights into which AI models are most frequently used.
Response Time (Time Series): A time series chart showing the response time of conversations over time. This panel is useful for identifying performance issues and ensuring the system's responsiveness.

11.2. Set up

All Grafana configurations are in the grafana folder:

init.py - for initializing the datasource and the dashboard.
dashboard.json - the actual dashboard (taken from LLM Zoomcamp without changes).

To initialize the dashboard, first ensure Grafana is running (it starts automatically when you do docker-compose up).

Then run:

export POSTGRES_HOST=localhost
python init.py

Then go to localhost:3000:

Login: "admin"
Password: "admin"

12. Acknowledgements

I would like to thank DataTalks.Club and all the guests and sponsors for the quality content of the course, all totally free.

And I hope you, the reviewer, enjoyed doing the course as much as I do ⸜(｡˃ ᵕ ˂ )⸝♡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. The Tim Ferriss Show Archivist

2. Notes

3. Progress

4. Points

5. Overview

6. Dataset

7. App Architecture

8. How to Run the App

8.1. `.env` Preparation

8.2. Database initialization

8.3. Docker Compose

8.4. Running Locally

8.5. Docker only

8.6. Using the App

9. Code

10. Evaluations

10.1. Retrieval evaluation

10.2. RAG model evalution

11. Monitoring

11.1. Dashboard

11.2. Set up

12. Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
data		data
grafana		grafana
notebooks		notebooks
scrape		scrape
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
assistant.py		assistant.py
db.py		db.py
docker-compose.yml		docker-compose.yml
ingestion.py		ingestion.py
prep.py		prep.py
requirements.txt		requirements.txt
st_requirements.txt		st_requirements.txt
streamlit.sh		streamlit.sh

License

HangenYuu/tim-ferriss-show-RAGchivist

Folders and files

Latest commit

History

Repository files navigation

1. The Tim Ferriss Show Archivist

2. Notes

3. Progress

4. Points

5. Overview

6. Dataset

7. App Architecture

8. How to Run the App

8.1. .env Preparation

8.2. Database initialization

8.3. Docker Compose

8.4. Running Locally

8.5. Docker only

8.6. Using the App

9. Code

10. Evaluations

10.1. Retrieval evaluation

10.2. RAG model evalution

11. Monitoring

11.1. Dashboard

11.2. Set up

12. Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

8.1. `.env` Preparation

Packages