feat: add multi vector support #11

omriel1 · 2024-10-02T12:20:49Z

Description

This PR introduces multi-vector support!

@efriis @zc277584121 I've moved and modified the PR previously submitted to the main langhcain repo: langchain-ai/langchain#26500

Milvus 2.4 introduced the option for multi-vector support, which is becoming increasingly popular, especially for use cases like hybrid search (dense + sparse embeddings).

Lately, @ohadeytan introduced the option to use sparse embeddings in this PR.

Additionally, @zc277584121 already introduced the MilvusCollectionHybridSearchRetriever, which enables hybrid search against pre-defined collections directly via pymilvus.

However, this method doesn't take full advantage of the many useful features offered by langchain_milvus when building a collection: automatic schema creation, indexing and search parameter creation etc.

This PR intend to make developers life easier, by allowing them to use single-vector or multi-vector with a single langchain interface, that create, connect, and search Milvus. For example, at IBM and IBM Research this feature is requested by many of our developers and researchers and will be very useful for us.

Changes

This PR addresses the limitations described above by introducing the following changes:

Allows passing multiple embedding functions with optional matching indexing parameters, search parameters, and vector field names.
Dynamically creates the collection using these functions, similar to how it's done for a single embedding function.
Adds multiple tests to validate this new feature.

We are eager to have this merged into langchain-milvus as we utilize many of langchain features, particularly langchain-milvus. We want to continue benefiting from the many valuable features this package provides. We'll make any required changes and will be glad to get any guidance to make it happen!

Twitter handle: @EliyahuOmri, @ohadeytan

codingjaguar · 2024-10-02T17:33:58Z

Hi @omriel1 thanks for the PR! FYI, the reviewer @zc277584121 is out for a week. Is this PR urgent? Would prefer to wait till the reviewer is back, or we could ask @efriis to take a look first.

omriel1 · 2024-10-02T23:27:20Z

Hi! Thanks for the reply. This is important for us (it was delayed due to the repo separation), but I completely understand and agree that we should wait for @zc277584121. But in the meantime, if you or @efriis have any comments, I'd be happy to start addressing them!

Thanks!

zc277584121 · 2024-10-08T03:51:28Z

libs/milvus/tests/integration_tests/vectorstores/test_milvus.py

+            dense_embeddings_func_2,
+        ],
+        texts=fake_texts,
+        connection_args={"uri": "./milvus_demo.db"},


Can this uri be passed in the same way as ut above using temp_milvus_db? Just for the code consistency

zc277584121 · 2024-10-08T03:51:38Z

libs/milvus/tests/integration_tests/vectorstores/test_milvus.py

+        embedding=[FakeEmbeddings(), FakeEmbeddings()],
+        index_params=[index_param_1, index_param_2],
+        vector_field=["vec_field_1", "vec_field_2"],
+        connection_args={"uri": "./milvus_demo.db"},


zc277584121 · 2024-10-08T03:51:44Z

libs/milvus/tests/integration_tests/vectorstores/test_milvus.py

+        embedding=[embedding_1, embedding_2],
+        texts=fake_texts,
+        index_params=[index_param_1, index_param_2],
+        connection_args={"uri": "./milvus_demo.db"},


zc277584121 · 2024-10-08T03:55:07Z

libs/milvus/langchain_milvus/vectorstores/zilliz.py

+            for i, embeddings_func in enumerate(embeddings_functions):
+                if not self._get_index(vector_fields[i]):
+                    try:
+                        # If no index params, use a default HNSW based one


maybe we should keep the comment If no index params, use a default *AutoIndex* based one unchanged

zc277584121 · 2024-10-08T03:55:29Z

libs/milvus/langchain_milvus/vectorstores/zilliz.py

+                                using=self.alias,
+                            )
+
+                        # If default did not work, most likely on Zilliz Cloud


zc277584121 · 2024-10-08T03:55:40Z

libs/milvus/langchain_milvus/vectorstores/zilliz.py

+
+                        # If default did not work, most likely on Zilliz Cloud
+                        except MilvusException:
+                            # Use AUTOINDEX based index


zc277584121 · 2024-10-08T03:57:57Z

@omriel1 sorry for delay. thanks for your great contributing. and i have left some small comments here. all others LGTM

omriel1 added 2 commits October 2, 2024 15:04

Add multi-vector support

02e59d5

Add tests for multi-vector support

30b05a4

omriel1 mentioned this pull request Oct 2, 2024

partners/milvus: multi-vector support langchain-ai/langchain#26500

Closed

codingjaguar requested a review from zc277584121 October 2, 2024 17:28

zc277584121 reviewed Oct 8, 2024

View reviewed changes

omriel1 added 3 commits October 8, 2024 13:38

Merge branch 'main' into feat/multi-vector-support

b44940f

Use temp_db_path fixture

1ee5a5f

fix comments

f2273b8

omriel1 requested a review from zc277584121 October 8, 2024 11:24

zc277584121 merged commit 02eb7be into langchain-ai:main Oct 9, 2024
8 checks passed

zc277584121 mentioned this pull request Oct 9, 2024

feat: Support for add_embeddings method #13

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi vector support #11

feat: add multi vector support #11

omriel1 commented Oct 2, 2024

codingjaguar commented Oct 2, 2024

omriel1 commented Oct 2, 2024

zc277584121 Oct 8, 2024

zc277584121 Oct 8, 2024

zc277584121 Oct 8, 2024

zc277584121 Oct 8, 2024

zc277584121 Oct 8, 2024

zc277584121 Oct 8, 2024

zc277584121 commented Oct 8, 2024

feat: add multi vector support #11

feat: add multi vector support #11

Conversation

omriel1 commented Oct 2, 2024

Description

Changes

codingjaguar commented Oct 2, 2024

omriel1 commented Oct 2, 2024

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 Oct 8, 2024

Choose a reason for hiding this comment

zc277584121 commented Oct 8, 2024