SWANN: Searching With Approximate Nearest Neighbor

Welcome to the SWANN repository! This project focuses on developing a highly efficient indexing and retrieval system for high-dimensional binary vectors using Locality-Sensitive Hashing (LSH). It incorporates trie-based indexing and efficient bucket distribution techniques to enhance search performance.

Example: Run a single query

This basic example covers loading, building and querying of the LSH index.

int main() {
  
    // Load dataset
    vector<bitset<D>> dataset = ...;
    
    // Specify hyperparameters
    const float recall = 0.9f;
    const uint32 depth = ...;
    const uint32 count = ...;
    const uint32 nns = 10;

    // Instantiate hash family
    HashFamily<D> pool = HashFamilyFactory<D>::createRandomBits();
    
    // Instantiate maps
    const uint32 steps = 8;
    auto maps = LSHMapFactory<D>::create_optimized(
        dataset,
        pool,
        depth,
        count,
        steps
    );

    // Create and build index
    Index<D> *index = new LSHForest<D>(
        maps,
        dataset,
        SingleBitFailure<D>
    );
    index->build();

    // Run query on query point q
    bitset<D> q = ...;
    auto result = index->query(
        q,
        nns,
        recall
    );
}

Example: Run multiple queries

This example covers the fully optimized loading, building and querying of a list of queries. It uses multithreading to speed up the process.

int main() {
  
    // Load dataset
    vector<bitset<D>> dataset = ...;
    vector<bitset<D>> queries = ...;

    // results[i] : indices of the 10 nearest neighbours to queries[i]
    std::vector<std::vector<ui32>> results;

    // Specify hyperparameters
    const float depth_val = std::min(
        std::ceil(log(dataset.size()) / log(1 / P2)),
        30.0
    );
    // Probabilities of collision 
    const float P1 = 0.860f, P2 = 0.535f;   
    
    // Precision of the query
    const float recall = 0.9f;              
    
    // Number of nearest neighbors to query
    const ui32 nrToQuery = 10;              
    
    // Depth of the trie, we recommend the above value
    const ui32 depth = depth_val;           
    
    // Number of LSH Tries to use in the forest
    const ui32 count = std::ceil(std::pow(P1, -depth_val)); 
    
    // Number of optimizations steps to perform  
    const uint32 steps = 20;                

    // Instantiate hash family
    HashFamily<D> pool = HashFamilyFactory<D>::createRandomBits(D*4);
    
    // Instantiate maps
    auto maps = LSHMapFactory<D>::mthread_create_optimized(
        dataset,
        pool,
        depth,
        count,
        steps
    );

    // Create and build index
    LSHForest<D> *index = new LSHForest<D>(
        maps,
        dataset,
        SingleBitFailure<D>
    );
    index->build();

    // Run queries simoultaneously
    results = index->mthread_queries(
        queries,
        nrToQuery,
        recall
    );
}

Report

The project report provides a comprehensive study of our LSH algorithm, covering its theory, implementation, and performance analysis. It explores fundamental LSH concepts such as hash families, bucket distributions, and failure probabilities.

Running

If you want to compile and run:

$ docker compose up

If you want to only compile write:

$ docker compose up compile

If you want to only compile and test write:

$ docker compose up test

To run benchmarking:

$ docker compose up benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
.github/workflows		.github/workflows
benchmark		benchmark
data		data
eval @ 0a6f90d		eval @ 0a6f90d
notebooks		notebooks
result		result
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
SWANN_Big-ANN-Challenge_Bachelor_Thesis.pdf		SWANN_Big-ANN-Challenge_Bachelor_Thesis.pdf
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWANN: Searching With Approximate Nearest Neighbor

Example: Run a single query

Example: Run multiple queries

Report

Running

About

Releases

Packages

Contributors 3

Languages

flopper123/swann

Folders and files

Latest commit

History

Repository files navigation

SWANN: Searching With Approximate Nearest Neighbor

Example: Run a single query

Example: Run multiple queries

Report

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages