feat: add multi vector support (#11)

## Description This PR introduces multi-vector support! @efriis @zc277584121 I've moved and modified the PR previously submitted to the main langhcain repo: langchain-ai/langchain#26500 Milvus 2.4 introduced the option for [multi-vector support](https://milvus.io/blog/milvus-2-4-nvidia-cagra-gpu-index-multivector-search-sparse-vector-support.md), which is becoming increasingly popular, especially for use cases like hybrid search (dense + sparse embeddings). Lately, @ohadeytan introduced the option to use sparse embeddings in [this PR](langchain-ai/langchain#25284). Additionally, @zc277584121 already introduced the `MilvusCollectionHybridSearchRetriever`, which enables hybrid search against pre-defined collections directly via pymilvus. However, this method doesn't take full advantage of the many useful features offered by `langchain_milvus` when building a collection: automatic schema creation, indexing and search parameter creation etc. This PR intend to make developers life easier, by allowing them to use single-vector or multi-vector with a single `langchain` interface, that create, connect, and search `Milvus`. For example, at IBM and IBM Research this feature is requested by many of our developers and researchers and will be very useful for us. ## Changes This PR addresses the limitations described above by introducing the following changes: 1. Allows passing multiple embedding functions with optional matching indexing parameters, search parameters, and vector field names. 2. Dynamically creates the collection using these functions, similar to how it's done for a single embedding function. 3. Adds multiple tests to validate this new feature. We are eager to have this merged into `langchain-milvus` as we utilize many of langchain features, particularly `langchain-milvus`. We want to continue benefiting from the many valuable features this package provides. We'll make any required changes and will be glad to get any guidance to make it happen! Twitter handle: @EliyahuOmri, @ohadeytan
langchain-ai · Oct 9, 2024 · 02eb7be · 02eb7be
1 parent 8e42b34
commit 02eb7be
Show file tree

Hide file tree

Showing 5 changed files with 477 additions and 158 deletions.
diff --git a/libs/milvus/langchain_milvus/retrievers/milvus_hybrid_search.py b/libs/milvus/langchain_milvus/retrievers/milvus_hybrid_search.py
@@ -146,16 +146,23 @@ def _process_search_result(
             documents.append(doc)
         return documents
 
+    def hybrid_search(
+        self,
+        query: str,
+    ) -> List[SearchResult]:
+        requests = self._build_ann_search_requests(query)
+        search_result = self.collection.hybrid_search(
+            requests, self.rerank, limit=self.top_k, output_fields=self.output_fields
+        )
+        return search_result
+
     def _get_relevant_documents(
         self,
         query: str,
         *,
         run_manager: CallbackManagerForRetrieverRun,
         **kwargs: Any,
     ) -> List[Document]:
-        requests = self._build_ann_search_requests(query)
-        search_result = self.collection.hybrid_search(
-            requests, self.rerank, limit=self.top_k, output_fields=self.output_fields
-        )
+        search_result = self.hybrid_search(query)
         documents = self._process_search_result(search_result)
         return documents