This is an app that lets you do a blind comparison of Ollama models and vote for which ones answered the prompt better. It’s inspired by the LMSYS Chatbot Arena that lets you do the same thing for a whole variety of hosted models.
Make sure that Ollama is running and that it can load multiple models at the same time. You can do this by running the following command:
OLLAMA_MAX_LOADED_MODELS=4 ollama serve
Clone the repository:
git clone [email protected]:mneedham/chatbot-arena.git
cd chatbot-arena
And then run it using Poetry:
poetry run streamlit run Ollama_Chatbot_Arena.py --server.headless True
Navigate to http://localhost:8501 and you should see the following: