Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vespa as Vectordb #1676

Closed
wants to merge 5 commits into from
Closed

Conversation

gnanesh-16
Copy link

@gnanesh-16 gnanesh-16 commented Jan 2, 2025

VespaDb Vector Database Implementation

Description

  • Summary of changes: A new VectorDb implementation using Vespa to support vector and hybrid search capabilities. This class integrates embedding and reranking functionalities, supports document upsert, and uses Vespa's querying system for search operations.
  • Related issues: This implementation fixes issue Vespa as VectorDB #1504.
  • Motivation and context: This feature allows efficient vector-based, keyword-based, and hybrid search operations, addressing the need for scalable and flexible vector databases in the application.
  • Environment or dependencies: Requires the Vespa library and its dependencies to be installed (vespa Python package). Ensure the Vespa instance is running locally or accessible at the specified URI.
  • Impact on AI/ML components: Enhances search capabilities by leveraging vector embeddings and hybrid query strategies. Performance improvements depend on embedding model accuracy and Vespa query efficiency.

Setup Instructions

  1. Ensure Vespa is installed and running.
  2. Install required Python dependencies:
    pip install vespa phi
  3. Verify the Vespa application is accessible at http://localhost:8080 or configure the correct URI.

Usage

Creating a VespaDb Instance

from vespa_db import VespaDb

vespa_db = VespaDb(
    uri="http://localhost:8080",
    app_name="my_vespa_app"
)
vespa_db.create()

Inserting Documents

from phi.document import Document

documents = [
    Document(name="Doc1", content="Sample content for document 1"),
    Document(name="Doc2", content="Sample content for document 2"),
]
vespa_db.insert(documents)

Performing Searches

results = vespa_db.search(query="Sample query", limit=10)
for result in results:
    print(result.name, result.content)

Hybrid Search

results = vespa_db.hybrid_search(query="Sample hybrid query", limit=5)

Dropping the Database

vespa_db.drop()

Development Notes

  • Test cases should validate:
    • Vector search accuracy
    • Hybrid search behavior
    • Document upsertion and retrieval

@manthanguptaa
Copy link
Contributor

Can you also add a cookbook for this?

@gnanesh-16
Copy link
Author

Can you also add a cookbook for this?

I haven’t added the cookbook for this yet. I will include it in a subsequent PR. Thank you for bringing this to my attention, @manthanguptaa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it consistent with the other cookbooks. Here is an example
https://github.com/phidatahq/phidata/blob/main/cookbook/vectordb/chroma_db.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @manthanguptaa, I will raise another pull request with the mentioned considerations for the cookbook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add steps on top of the file on how to run vespa

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add @manthanguptaa all the steps on how to run Vespa at the top of the file. I will ensure the instructions are clear and easy to follow.

try:
import vespa # type: ignore
except ImportError:
raise ImportError("`vespa` not installed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ImportError("vespanot installed. Please install usingpip install vespa")

or whatever the correct way is

@@ -0,0 +1,30 @@
# install vespa - `pip install phi-vespa`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's phi-vespa?

Comment on lines +8 to +18
vector_db = VespaDb(
app_name="recipes",
url="http://localhost:8080",
schema={
"fields": {
"text": {"type": "string"},
"embedding": {"type": "tensor(x[384])", "attribute": True},
"metadata": {"type": "string", "attribute": True}
}
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your VespaDb class takes uri as param and not url. Please make sure to thoroughly test your code before raising a PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is no schema field in your VespaDb class

Copy link
Contributor

@manthanguptaa manthanguptaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code isn't working at all. Please test before raising a PR. It will save a lot of to and fro on both ends. It is okay to use AI to code but you will have to test it on your end as well.

@gnanesh-16
Copy link
Author

Your code isn't working at all. Please test before raising a PR. It will save a lot of to and fro on both ends. It is okay to use AI to code but you will have to test it on your end as well.

Thank you for your feedback. @manthanguptaa I apologize for the oversight I've made and will ensure to test the code again by correcting those on my end before raising a PR moving forward.

@manthanguptaa
Copy link
Contributor

Closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants