-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Qdrant support #730
feat: Qdrant support #730
Conversation
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
dadc7d4
to
9836e09
Compare
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Hey @jamesbraza. Could you please approve the CI? |
Signed-off-by: Anush008 <[email protected]>
Signed-off-by: Anush008 <[email protected]>
Weird that the mailman pre-commit doesn't complain locally. I've tried to do a patch. |
Alright. That's through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments, looking good so far
Signed-off-by: Anush008 <[email protected]>
I believe these are the same OpenAI failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @Anush008 , thanks for this
@Anush008 I managed to successfully create a docs object and push it to Qdrant using the following: from paperqa import QdrantVectorStore, Docs
from qdrant_client import QdrantClient
import nest_asyncio
nest_asyncio.apply()
client = QdrantClient(url="localhost", port=6333)
vectorstore = QdrantVectorStore(client=client,
collection_name="test-collection")
docs = Docs(texts_index=vectorstore)
docs.add("testpaper.pdf")
docs.texts_index.add_texts_and_embeddings(docs.texts) My question now is:
|
I think no, as of yet. We can add a something like |
For now I am using this which seems to work: from paperqa import QdrantVectorStore, Docs, Text, Doc
from qdrant_client import QdrantClient
import nest_asyncio
import asyncio
nest_asyncio.apply()
async def recreate_docs_from_qdrant(client: QdrantClient, collection_name: str) -> Docs:
# Initialize empty Docs with the existing vector store
vectorstore = QdrantVectorStore(
client=client,
collection_name=collection_name
)
docs = Docs(texts_index=vectorstore)
# Get all points from the collection
points = client.scroll(
collection_name=collection_name,
with_payload=True,
with_vectors=True,
limit=100 # adjust based on your needs
)[0]
# Reconstruct the texts and docs
for point in points:
payload = point.payload
doc = payload['doc']
if doc['dockey'] not in docs.docs:
docs.docs[doc['dockey']] = Doc(
docname=doc['docname'],
citation=doc['citation'],
dockey=doc['dockey']
)
docs.docnames.add(doc['docname'])
# Reconstruct Text object
text = Text(
text=payload['text'],
name=payload['name'],
doc=docs.docs[doc['dockey']],
embedding=point.vector
)
docs.texts.append(text)
return docs
# Usage:
client = QdrantClient(url="localhost", port=6333)
docs = asyncio.run(recreate_docs_from_qdrant(client, "test-collection")) I think it's clunky to reload the entire vectorstore into RAM however. I wonder if we could just use the Qdrant store as the |
Description
This PR adds support for Qdrant - https://qdrant.tech to be used an external database for vector search.
Qdrant can be run with :
A dashboard will be accessible at http://localhost:6333/dashboard.
Testing
I've Q&A tested
QdrantVectorStore
implementation externally.