Add support for asymmetric embedding models #710

br3no · 2024-04-25T19:45:38Z

Description

This PR adds support for asymmetric embedding models such as https://huggingface.co/intfloat/multilingual-e5-small to the neural-search plugin.

It builds on the work done in opensearch-project/ml-commons#1799.

Asymmetric embedding models behave differently when embedding passages and queries. For that end, the model must "know" on inference time, what kind of data it is embedding.

The changes are:

1. `src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java`

The processor signals it is embedding passages, by passing the new AsymmetricTextEmbeddingParameters using the content type EmbeddingContentType.PASSAGE.

2. `src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java`

Analogously, the query builder uses EmbeddingContentType.QUERY.

3. `src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java`

Here is where most of the work was done. The class has been extended in a backwards-compatible way with inference methods that allow one to pass MLAlgoParams objects. Usage of AsymmetricTextEmbeddingParameters (which implements MLAlgoParams) is mandatory for asymmetric models. At the same time symmetric models do not accept them.

The only way to know whether a model is asymmetric or symmetric is by reading its model configuration (if the models' configuration contains a passage_prefix and/or a query_prefix, they are asymmetric, otherwise they are symmetric).

The src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java class deals with this, keeping the complexity in one place and not requiring any API change to the neural-search plugin (as proposed in #620). When calling the inference methods, clients (such as the TextEmbeddingProcessor) may pass the AsymmetricTextEmbeddingParameters object without caring if the model they are using is symmetric or asymmetric. The accessor class will first read the model's configuration (by calling the getModel API of the mlClient) and deal appropriately.

To avoid adding this extra roundtrip to every inference call, the asymmetry information is kept in a cache in memory.

Issues Resolved

#620

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v · 2024-04-26T06:10:22Z

@br3no can you add an entry in the changelog.

navneet1v · 2024-04-26T06:13:03Z

@br3no Thanks for raising the PR. I am wondering do we require this change? In MLCommons repository a generic MLInference processor is getting launched which is supposed to do the inference of any kind of model both during ingestion and search. RFC: opensearch-project/ml-commons#2173

That capability is getting build as of now. Do you think we still need this feature then?

br3no · 2024-04-26T07:23:28Z

@navneet1v I have been loosely following the discussions in the mentioned RFC. It's a large change that I don't expect to be stable soon – the PR is very much in flux. Also, I don't see the use-case of asymmetric embedding models being addressed.

This PR here is much smaller in comparison and is not in any way in conflict with the RFC work. If once the work on the ML Inference Processors is finished and the use-case is addressed there as well, we can deprecate and eventually remove the functionality again.

Until then, this PR offers users the chance to use more modern local embeddings. I'm eager to put this to spin, tbh.

navneet1v · 2024-04-26T08:16:50Z

Also, I don't see the use-case of asymmetric embedding models being addressed.

If that is the case I would recommend posting the same on the RFC to ensure that your use case is handled.

On the other hand, I do agree this is an interesting feature. I would like to get some eyes on this change mainly in terms of should this be added or not given a more generic processor is around the corner. As I am of my opinion is concerned the main reason of generic processor was to avoid creating new/updating processors to support new model types which is happening in this PR.

Thoughts? @jmazanec15 , @martin-gaievski , @vamshin , @vibrantvarun .

Let me add some PMs too for Opensearch-project to know their thoughts. @dylan-tong-aws

codecov · 2024-04-26T08:19:22Z

Codecov Report

Attention: Patch coverage is 87.12871% with 13 lines in your changes missing coverage. Please review.

Project coverage is 84.41%. Comparing base (7c54c86) to head (44f14ec).
Report is 12 commits behind head on main.

❗ Current head 44f14ec differs from pull request most recent head 6d3dba6

Please upload reports for the commit 6d3dba6 to get more accurate results.

Files	Patch %	Lines
...earch/neuralsearch/ml/MLCommonsClientAccessor.java	85.22%	9 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #710      +/-   ##
============================================
- Coverage     85.02%   84.41%   -0.61%     
+ Complexity      790      785       -5     
============================================
  Files            60       59       -1     
  Lines          2430     2464      +34     
  Branches        410      409       -1     
============================================
+ Hits           2066     2080      +14     
- Misses          202      215      +13     
- Partials        162      169       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

br3no · 2024-04-26T09:41:52Z

@navneet1v I have added a comment earlier today to the RFC (cf. opensearch-project/ml-commons#2173 (comment)).

Sure, let's open the discussion and get some PMs into it.

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

It's not about my contribution – I need the feature. 🙃

navneet1v · 2024-04-26T17:01:03Z

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

I can see the feature is marked for 2.14 release of Opensearch. Let me add maintainers from ML team too. @mingshl , @ylwu-amzn

br3no · 2024-04-29T14:05:46Z

@mingshl @ylwu-amzn, I'd really like to have this feature in 2.14.

Do you think this use-case will be fully supported with opensearch-project/ml-commons#2173? Cf. opensearch-project/ml-commons#2173 (comment)

If not, I'd be happy to help this PR get merged as an interim solution! Let me know what you think!

mingshl · 2024-04-29T17:29:14Z

@br3no ml inference processor is targeting at first supporting remote model only. How did you usually connect this model? is it local or remote?

if remote, can you please provide a SageMaker deployment code piece then I can quickly test it in 2.14 test cluster. Thanks

br3no · 2024-05-13T09:41:48Z

@mingshl sorry for taking so long to answer!

The use-case for now is to use a local, asymmetric model such as https://huggingface.co/intfloat/multilingual-e5-small.

This PR here is the last puzzle piece to allow one to use these kinds of model and should in principle also work with remote models. It makes sure that the neural-search plugin uses the correct inference parameters when embedding passages and queries with asymmetric models. Regardless of whether the model is local or remote, if you are using asymmetric models, you will need to provide this information anyway.

The thing is that asymmetric models need to know at inference time what exactly they are embedding. OpenSearch currently treats embedding models as symmetric, meaning that regardless of whether the text being embedded is a query or a passage, the embedding will be always the same. Asymmetric models require content "hints" to the text being embedded; the model exemplified above uses the string prefixes passage: and query: . These models perform better than similarly sized symmetric models.

In opensearch-project/ml-commons#1799 we have added the concept of asymmetric models into ml-commons, introducing the AsymmetricTextEmbeddingParameters class, used at inference time to signal if the text being embedded is a query or a passage. So this PR is only using this new infrastructure.

I would really be happy to get this merged as an interim solution until the ml inference processor fully supports this use-case.

reuschling · 2024-05-15T13:24:44Z

I also vote for this PR in need for this functionality.

navneet1v · 2024-05-15T17:03:06Z

@br3no will it possible if you can contribute back in MLInference processor for local model support? Is that even an option?

br3no · 2024-05-15T17:42:07Z

@navneet1v you mean making sure this works there as well? Sure, I can commit to that. I'd propose then to merge this PR now and then start the work to eventually replace this once the MLInference processor supports this use case...

br3no · 2024-11-06T08:30:39Z

I have addressed all open issues with the exception of the BWC test. I'll try to have a look in the coming days.

yuye-aws · 2024-11-06T10:05:23Z

Pinging @martin-gaievski to review the cache.

yuye-aws · 2024-11-06T10:22:37Z

I have addressed all open issues with the exception of the BWC test. I'll try to have a look in the coming days.

For the BWC test, you just need to create two classes for TextEmbeddingProcessorIT.java. One for rolling upgrade and another for restart upgrade:

qa/rolling-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/
qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java

br3no · 2024-11-06T16:02:05Z

@yuye-aws aren't the test cases covered completely by the HybridSearchIT classes? There the text embedding processor is created, documents are indexed and a hybrid query is executed.

br3no · 2024-11-06T16:40:09Z

I think I will need some assistance in understanding what to do exactly regarding the BWC tests. I'm not even sure I understand where to look for errors.

The error scenario you folks are seeing is:

a cluster is updated
a query is sent to a node running an older version of OS
the query contains an AsymmetricTextEmbeddingParameters
the legacy version cannot deserialize this class

Did I get this right? Or is it another scenario you are concerned with?

The class AsymmetricTextEmbeddingParameters has been part of ml-commons for about 3 months. This would then be a problem for all versions older than 2.14.

yuye-aws · 2024-11-07T00:40:03Z

The class AsymmetricTextEmbeddingParameters has been part of ml-commons for about 3 months. This would then be a problem for all versions older than 2.14.

You can exclude versions older than 2.14 just like here.

yuye-aws · 2024-11-07T00:41:34Z

@yuye-aws aren't the test cases covered completely by the HybridSearchIT classes? There the text embedding processor is created, documents are indexed and a hybrid query is executed.

Do you mean the BWC tests for HybridSearchIT? If so, you can fill in the gap between BWC tests and integrations tests for TextEmbeddingProcessorIT.

br3no · 2024-11-07T11:43:15Z

@yuye-aws

What I meant with this comment is that I don't see the need to implement a different BWC Test class for the change in this PR. The existing BWC tests (e.g. the HybridQueryIT, but others as well) test the complete process of creating an index, creating a pipeline for document embedding and issuing neural queries. There is no new process introduced with this PR that requires different BWC test code, from my point of view. Please correct me if I'm wrong.

BTW the rolling-upgrade BWC tests are failing on the main branch.

I ran

./gradlew :qa:rolling-upgrade:testRollingUpgrade -D'tests.bwc.version=2.18.0-SNAPSHOT'

without success.

This makes it hard to add new tests.

yuye-aws · 2024-11-08T00:03:23Z

./gradlew :qa:rolling-upgrade:testRollingUpgrade

@vibrantvarun The comment makes sense to me. Can you check whether BWC tests are needed and help him?

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Signed-off-by: br3no <[email protected]>

br3no · 2024-11-08T10:10:14Z

@martin-gaievski I have addressed all your latest comments. Hope this PR can now be approved.

martin-gaievski

It looks good to me, thank you

heemin32 · 2024-11-09T05:31:51Z

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

-        }));
+
+        Consumer<Boolean> predictConsumer = isAsymmetricModel -> {
+            MLInput mlInput = createMLMultimodalInput(targetResponseFilters, inputObjects, isAsymmetricModel ? mlAlgoParams : null);


There will be a case where we might want to pass mlAlgoParams for SymmetricModel in the future. Shouldn't we check if the model is asymmetric or not before constructing the request?

// Check here if model is symmetric or asymmetric InferenceRequest.builder().modelId(this.modelId).inputTexts(inferenceList).mlAlgoParams(PASSAGE_PARAMETERS).build(),

Modifying the request internally will lead to a confusion later when we need to pass mlAlgoParams to symmetric model but it silently omit it before calling the model.

zane-neo · 2024-11-11T08:33:21Z

@br3no How are we ensuring BWC test compatibility?

This is a valid concern, I took a look on the code and it's could have BWC issue during seder. The latest version serializes the AsymmetricTextEmbeddingParameters, and if a node is deployed with a legacy version OS and when deserialization, it fetches the class from the internal cache map here, and since a legacy version doesn't have this class, the IllegalArgumentException will be thrown. @br3no Can you do a test on this case to double confirm if this is true? Thanks.

@br3no Please take a look on this comment, the seder is the only risk I see between old nodes and new nodes, and I don't think HybridSearchIT has covered asymmetric case as it only tested with text embedding model(symmetric model), they don't have the new configuration introduced in asymmetric model so no seder issue either. I would suggest you create a old cluster with two nodes(ml-node) and replace one nodes with latest code, and test two cases:

Send request to old node and make the old node dispatch the request to new node.
Send request to new node and make the new node dispatch request to old node.
You don't need to manually configure anything to enable the dispatch as ml-commons automatically dispatch the requests to different nodes in a round-robin fashion, by controlling which node receives the request and trigger request twice, one of the request will be dispatched to another node. If you don't see any seder exception then it's good to merge the PR, thanks.

br3no · 2024-11-11T15:32:57Z

@zane-neo so if I get this right, I should create a new test that uses the asymmetric model feature. This test should only run for OS versions >= 2.19. Is this right?

Your concern is about making sure future releases will not break compatibility with this feature.

br3no · 2024-11-13T13:43:50Z

Ping.

zane-neo · 2024-11-14T02:00:09Z

@zane-neo so if I get this right, I should create a new test that uses the asymmetric model feature. This test should only run for OS versions >= 2.19. Is this right?

Your concern is about making sure future releases will not break compatibility with this feature.

@br3no That's right to use the asymmetric model feature for OS>=2.19, but the thing is a little bit different. you should test a mixed cluster with OS 2.19 and OS 2.18, since 2.18 has different code base of neural search, so a request to 2.19 node and then being dispatched to 2.18 node could encounter seder issues. My guess is you can test on this and based on the result:

No seder error, you can add BWC tests with OS >= 2.19
Has seder error, fix that and add BWC tests with OS >= 2.19

martin-gaievski · 2024-11-14T16:15:07Z

btw, if you're working on BWC for 2.19+ and want to check how the results are, better rebase on latest main, we've switched 2.18-snapshot to 2.18/2.19-snapshot. This branch will keep failing on running BWCs

br3no requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, sean-zheng-amazon, model-collapse, zane-neo, ylwu-amzn, jngz-es, vibrantvarun and zhichao-aws as code owners April 25, 2024 19:45

br3no mentioned this pull request Apr 25, 2024

[PROPOSAL] Add support for asymmetric embedding models to neural-search #620

Open

br3no mentioned this pull request Apr 26, 2024

[BUG] asymmetric model inference ignores ModelResultFilter opensearch-project/ml-commons#2366

Closed

martin-gaievski reviewed Nov 8, 2024

View reviewed changes

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java Outdated Show resolved Hide resolved

martin-gaievski reviewed Nov 8, 2024

View reviewed changes

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java Outdated Show resolved Hide resolved

br3no added 9 commits November 8, 2024 10:57

adding support for asymmetric embedding models

6dcbd00

Signed-off-by: br3no <[email protected]>

adding changelog entry

760c1db

Signed-off-by: br3no <[email protected]>

missing paramter

e819b13

Signed-off-by: br3no <[email protected]>

Adapt new tests to asymmetric model inference

9a920a0

Signed-off-by: br3no <[email protected]>

revert accidental removal of tests from

2932b84

Signed-off-by: br3no <[email protected]>

Another silly mistake corrected...

4f5c0cd

Signed-off-by: br3no <[email protected]>

refactor test

8b8f871

Signed-off-by: br3no <[email protected]>

After further review round

2ac1f09

Signed-off-by: br3no <[email protected]>

using lombok in InferenceRequest DTO

61347b0

Signed-off-by: br3no <[email protected]>

br3no force-pushed the asymmetric-embeddings-620 branch from 390d04c to 61347b0 Compare November 8, 2024 10:05

make max cache entries private

65a4b39

Signed-off-by: br3no <[email protected]>

martin-gaievski approved these changes Nov 8, 2024

View reviewed changes

heemin32 reviewed Nov 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for asymmetric embedding models #710

Add support for asymmetric embedding models #710

br3no commented Apr 25, 2024

navneet1v commented Apr 26, 2024

navneet1v commented Apr 26, 2024 •

edited

Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

codecov bot commented Apr 26, 2024 •

edited

Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

br3no commented Apr 29, 2024

mingshl commented Apr 29, 2024

br3no commented May 13, 2024

reuschling commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 15, 2024

br3no commented Nov 6, 2024

yuye-aws commented Nov 6, 2024

yuye-aws commented Nov 6, 2024

br3no commented Nov 6, 2024

br3no commented Nov 6, 2024 •

edited

Loading

yuye-aws commented Nov 7, 2024

yuye-aws commented Nov 7, 2024

br3no commented Nov 7, 2024

yuye-aws commented Nov 8, 2024

br3no commented Nov 8, 2024

martin-gaievski left a comment

heemin32 Nov 9, 2024

zane-neo commented Nov 11, 2024

br3no commented Nov 11, 2024 •

edited

Loading

br3no commented Nov 13, 2024

zane-neo commented Nov 14, 2024

martin-gaievski commented Nov 14, 2024

Add support for asymmetric embedding models #710

Are you sure you want to change the base?

Add support for asymmetric embedding models #710

Conversation

br3no commented Apr 25, 2024

Description

1. src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

2. src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java

3. src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Issues Resolved

Check List

navneet1v commented Apr 26, 2024

navneet1v commented Apr 26, 2024 • edited Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

codecov bot commented Apr 26, 2024 • edited Loading

Codecov Report

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

br3no commented Apr 29, 2024

mingshl commented Apr 29, 2024

br3no commented May 13, 2024

reuschling commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 15, 2024

br3no commented Nov 6, 2024

yuye-aws commented Nov 6, 2024

yuye-aws commented Nov 6, 2024

br3no commented Nov 6, 2024

br3no commented Nov 6, 2024 • edited Loading

yuye-aws commented Nov 7, 2024

yuye-aws commented Nov 7, 2024

br3no commented Nov 7, 2024

yuye-aws commented Nov 8, 2024

br3no commented Nov 8, 2024

martin-gaievski left a comment

Choose a reason for hiding this comment

heemin32 Nov 9, 2024

Choose a reason for hiding this comment

zane-neo commented Nov 11, 2024

br3no commented Nov 11, 2024 • edited Loading

br3no commented Nov 13, 2024

zane-neo commented Nov 14, 2024

martin-gaievski commented Nov 14, 2024

1. `src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java`

2. `src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java`

3. `src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java`

navneet1v commented Apr 26, 2024 •

edited

Loading

codecov bot commented Apr 26, 2024 •

edited

Loading

br3no commented Nov 6, 2024 •

edited

Loading

br3no commented Nov 11, 2024 •

edited

Loading