This semantic search application demonstrates Long-Context ColBERT (multi-token vector representation) with extended context windows for long-document retrieval.
The app demonstrates the colbert-embedder and the tensor expressions for performing two types of extended ColBERT late-interaction for long-context retrieval.
See Announcing Vespa Long-Context ColBERT for details on this application.
Requires at least Vespa 8.311.28
Follow Vespa getting started
through the vespa deploy
step, cloning colbert-long
instead of album-recommendation
.
Feed documents (this includes embed inference in Vespa):
vespa feed ext/sample-docs.jsonl
Example query using BM25:
vespa query 'yql=select * from doc where userQuery()'\ 'ranking=bm25' 'hits=1'\ 'query=What is the frequency of Radio KP?'
Example query using ColBERT :
vespa query 'yql=select * from doc where userQuery()'\ 'ranking=colbert-max-sim-context-level' 'hits=1' \ 'query=What is the frequency of Radio KP?' \ 'input.query(qt)=embed(colbert, @query)'
vespa query 'yql=select * from doc where userQuery()'\ 'ranking=colbert-max-sim-cross-context' 'hits=1'\ 'query=What is the frequency of Radio KP?' \ 'input.query(qt)=embed(colbert, @query)'
Install external dependencies:
pip3 install datasets langchain
Run this script that downloads the MLDR English data split and generates three files; this takes a few minutes (depending on bandwidth).
This simple script writes the feed file to /tmp/vespa_feed_file_en.json
:
python3 scripts/convert.py
Index the dataset (Note that if you are running this on CPU, or with longer documents you want to increase the default operation timeout to avoid re-trying doc operations that will never be able to succeed with default feed operation timeouts.
vespa feed /tmp/vespa_feed_file_en.json --timeout 600 --connections 1
Run the queries (Replace endpoint and mTLS cert)
python3 evaluate.py --endpoint https://b5af15f0.e2b4d78d.z.vespa-app.cloud/search/ \ --ranking colbert-max-sim-context-level --dataset ext/test_queries.tsv --rank_count 10 \ --key $HOME/.vespa/samples.long-colbert.default/data-plane-private-key.pem \ --cert$HOME/.vespa/samples.long-colbert.default/data-plane-public-cert.pem
Then, evaluate effectiveness by using e.g. trec_eval
. The above creates a .run
file
with ranking
argument as the file name.
trec_eval -mndcg_cut.10 ext/test_en_qrels.tsv colbert-max-sim-context-level.run
Remove the container after use (Only relevant for our automatic testing of this sample app)
$ docker rm -f vespa