This semantic search application uses a single vector embedding model for retrieval and ColBERT (multi-token vector representation) for re-ranking. The app demonstrates the colbert-embedder and the tensor expressions for ColBERT MaxSim
It also features reciprocal rank fusion to fuse different rankings.
Requires at least Vespa 8.338.38
Follow Vespa getting started
through the vespa deploy
step, cloning colbert
instead of album-recommendation
.
Feed documents (this includes embed inference in Vespa):
vespa feed ext/*.json
Example queries:
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, q))'\ 'input.query(q)=embed(e5, @query)' \ 'input.query(qt)=embed(colbert, @query)' \ 'query=space contains many suns'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, q))'\ 'input.query(q)=embed(e5, @query)' \ 'input.query(qt)=embed(colbert, @query)' \ 'query=shipping stuff over the sea'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, q))'\ 'input.query(q)=embed(e5, @query)' \ 'input.query(qt)=embed(colbert, @query)' \ 'query=exchanging information by sound'
See the model2onnx.py script for exporting the ColBERT model from Hugging Face to ONNX format.
Notice that these three models use different embedding dimensionality.
Example usage:
This is the recommended colbert model for this application as it is optimized for speed and accuracy. See blog post
python3 model2onnx.py --hf_model answerdotai/answerai-colbert-small-v1 --dims 96
Can be used with:
field colbert type tensor<int8>(dt{}, x[12])
python3 model2onnx.py --hf_model mixedbread-ai/mxbai-colbert-large-v1 --dims 128
python3 model2onnx.py --hf_model vespa-engine/col-minilm --dims 32
Can be used with:
field colbert type tensor<int8>(dt{}, x[4])
Remove the container after use:
$ docker rm -f vespa