Skip to content

Commit

Permalink
feat: add multi vector support (#11)
Browse files Browse the repository at this point in the history
## Description

This PR introduces multi-vector support!

@efriis @zc277584121 I've moved and modified the PR previously submitted
to the main langhcain repo:
langchain-ai/langchain#26500

Milvus 2.4 introduced the option for [multi-vector
support](https://milvus.io/blog/milvus-2-4-nvidia-cagra-gpu-index-multivector-search-sparse-vector-support.md),
which is becoming increasingly popular, especially for use cases like
hybrid search (dense + sparse embeddings).

Lately, @ohadeytan introduced the option to use sparse embeddings in
[this PR](langchain-ai/langchain#25284).

Additionally, @zc277584121 already introduced the
`MilvusCollectionHybridSearchRetriever`, which enables hybrid search
against pre-defined collections directly via pymilvus.

However, this method doesn't take full advantage of the many useful
features offered by `langchain_milvus` when building a collection:
automatic schema creation, indexing and search parameter creation etc.

This PR intend to make developers life easier, by allowing them to use
single-vector or multi-vector with a single `langchain` interface, that
create, connect, and search `Milvus`. For example, at IBM and IBM
Research this feature is requested by many of our developers and
researchers and will be very useful for us.

## Changes

This PR addresses the limitations described above by introducing the
following changes:

1. Allows passing multiple embedding functions with optional matching
indexing parameters, search parameters, and vector field names.
2. Dynamically creates the collection using these functions, similar to
how it's done for a single embedding function.
3. Adds multiple tests to validate this new feature.

We are eager to have this merged into `langchain-milvus` as we utilize
many of langchain features, particularly `langchain-milvus`. We want to
continue benefiting from the many valuable features this package
provides. We'll make any required changes and will be glad to get any
guidance to make it happen!

Twitter handle: @EliyahuOmri, @ohadeytan
  • Loading branch information
omriel1 authored Oct 9, 2024
1 parent 8e42b34 commit 02eb7be
Show file tree
Hide file tree
Showing 5 changed files with 477 additions and 158 deletions.
15 changes: 11 additions & 4 deletions libs/milvus/langchain_milvus/retrievers/milvus_hybrid_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,16 +146,23 @@ def _process_search_result(
documents.append(doc)
return documents

def hybrid_search(
self,
query: str,
) -> List[SearchResult]:
requests = self._build_ann_search_requests(query)
search_result = self.collection.hybrid_search(
requests, self.rerank, limit=self.top_k, output_fields=self.output_fields
)
return search_result

def _get_relevant_documents(
self,
query: str,
*,
run_manager: CallbackManagerForRetrieverRun,
**kwargs: Any,
) -> List[Document]:
requests = self._build_ann_search_requests(query)
search_result = self.collection.hybrid_search(
requests, self.rerank, limit=self.top_k, output_fields=self.output_fields
)
search_result = self.hybrid_search(query)
documents = self._process_search_result(search_result)
return documents
Loading

0 comments on commit 02eb7be

Please sign in to comment.