Benchmark the search performance of Approximate Nearest Neighbor (ANN) algorithms implemented in various systems. This repository contains notebooks and scripts to evaluate and compare the efficiency and accuracy of ANN searches across different platforms.
Approximate Nearest Neighbor (ANN) search algorithms are essential for handling high-dimensional data spaces, enabling fast and resource-efficient retrieval of similar items from large datasets. This benchmarking suite aims to provide an empirical basis for comparing the performance of several popular ANN-enabled search systems.
Before running the benchmarks, ensure you have the following installed:
- Docker
- Python 3.10 or higher
-
Prepare the Environment:
Create directories for datasets and output files, then download the necessary datasets using the provided script.
/bin/bash ./scripts/setup.sh
-
Install Dependencies:
Install all required Python libraries.
pip install -r requirements.txt
The repository includes the following Jupyter notebooks for conducting benchmarks:
Notebook | GitHub Actions |
---|---|
Chroma | |
Elasticsearch | |
Milvus | |
OpenSearch | |
pgvector | |
Qdrant | |
Vespa | |
Weaviate |
Each notebook guides you through the process of setting up the test environment, loading the dataset, executing the search queries, and analyzing the results.
For a comparison of the results, including response times and precision metrics for different ANN algorithms, see Benchmark Results Page.
We welcome contributions! If you have suggestions for additional benchmarks, improvements to existing ones, or fixes for any issues, please feel free to open an issue or submit a pull request.
This project is licensed under the Apache License 2.0.