Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default engine to FAISS #2221

Merged
merged 5 commits into from
Nov 7, 2024

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Oct 20, 2024

Description

  • Update default engine to FAISS.
  • Fixed tests that assumed default engine to be nmslib to either use nmslib explicitly or update test assertion to look for faiss specific artifacts like file extension instead of nmslib artifacts
  • removed tests that are just validating default (provided it is nmslib) doesn't support certain features

Related Issues

#2163

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB
Copy link
Member Author

Copy link
Member

@vibrantvarun vibrantvarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Will wait for other engineers review

@@ -203,7 +203,7 @@ static KNNMethodContext createKNNMethodContextFromLegacy(
? topLevelSpaceType
: KNNVectorFieldMapperUtil.getSpaceType(indexSettings);
return new KNNMethodContext(
KNNEngine.NMSLIB,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep as NMSLIB - nmslib is only legacy support

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add comment to like I should have done haha

@jmazanec15
Copy link
Member

@@ -29,7 +29,7 @@ public enum KNNEngine implements KNNLibrary {
FAISS(FAISS_NAME, Faiss.INSTANCE),
LUCENE(LUCENE_NAME, Lucene.INSTANCE);

public static final KNNEngine DEFAULT = NMSLIB;
public static final KNNEngine DEFAULT = FAISS;
Copy link
Collaborator

@navneet1v navneet1v Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we are handling the backward compatibility? Also can we validate if we have proper BWC tests to test this default engine change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious to know , why do we need to handle BWC in this case , So i created a Index , without specifying any engine so it picked NMSLIB , so now this information is already there in Cluster State and Serialised and passed to all Nodes, Now During query time , we do get it information from there and it does not depend on the new Default.

Copy link
Collaborator

@navneet1v navneet1v Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i created a Index , without specifying any engine so it picked NMSLIB , so now this information is already there in Cluster State and Serialised and passed to all Nodes,

The engine value gets persisted in the cluster state if you define the key method while creating the index. But if you don't specify the key method while creating the index(also called as LegacyFieldMapping) and lets say create an index like this where my_vector1 doesn't have the engine defined:

PUT my-knn-index-1
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
        "my_vector1": {
          "type": "knn_vector",
          "dimension": 2
        },
        "my_vector2": {
          "type": "knn_vector",
          "dimension": 2,
          "method": {
            "name": "hnsw"
          }
        }
    }
  }
}

then k-NN plugin infers the engine. Ref: https://github.com/opensearch-project/k-NN/blob/HEAD/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapperUtil.java#L196-L219 for more details. If you try doing a getMapping on the index after creating an index like this you will never get the engine value in return for my_vector1 but you will get engine as nmslib for my_vector2. Same is true for cluster state.

@@ -114,7 +115,7 @@ public void testGetAllEngineFileContexts() throws IOException, ExecutionExceptio
engineFileContexts = knnIndexShard.getAllEngineFileContexts(searcher.getIndexReader());
assertEquals(1, engineFileContexts.size());
List<String> paths = engineFileContexts.stream().map(KNNIndexShard.EngineFileContext::getIndexPath).collect(Collectors.toList());
assertTrue(paths.get(0).contains("hnsw") || paths.get(0).contains("hnswc"));
assertTrue(paths.get(0).contains(FAISS_HNSW_EXTENSION));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the compound extension covered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test to use nmslib

@@ -263,29 +263,6 @@ public void testByteVectorDataTypeWithNmslibEngine() {
assertTrue(ex.getMessage().contains("is not supported for vector data type"));
}

@SneakyThrows
public void testByteVectorDataTypeWithLegacyFieldMapperKnnIndexSetting() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking if this case is covered when engine is explicitly NMSLIB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is covered explicitly for nmslib engine

@@ -146,7 +146,7 @@ public void testTypeParser_build_fromKnnMethodContext() throws IOException {
// Check that knnMethodContext takes precedent over both model and legacy
ModelDao modelDao = mock(ModelDao.class);

SpaceType spaceType = SpaceType.COSINESIMIL;
SpaceType spaceType = SpaceType.DEFAULT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamshin is it okay default engine doesnt support cosine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add this by taking a normalization and then setting to innerproduct. But this would take some effort.

Copy link
Member

@vamshin vamshin Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! I think still we need to make faiss as default based on all the advanced capabilities like nested fields, efficient filters, disk optimized search, in future GPUs. In long term makes sense to build the normalization to enable cosine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas FAISS doesn't support cosine yet.

@VijayanB VijayanB force-pushed the update-default-engine branch 2 times, most recently from 78527e6 to 1ae649e Compare October 23, 2024 06:18
@VijayanB
Copy link
Member Author

Tests are failing not related to this PR. This PR #2236 will fix those issue in main. Will rebase once the other PR is merged.

Since faiss supports more features than nmslib, and, we had seen
data points that there are more number of vector search
users are interesed in faiss, we will be updating default
engine to be faiss. This will benefit users who preffered
to use defaults while working with vector search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB
Copy link
Member Author

VijayanB commented Nov 7, 2024

Test failure is not due to this change.

@VijayanB VijayanB merged commit 7d34456 into opensearch-project:main Nov 7, 2024
29 of 30 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2221-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7d3445631591296d00c37ea16351a07ca08ffbd3
# Push it to GitHub
git push --set-upstream origin backport/backport-2221-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2221-to-2.x.

@VijayanB VijayanB deleted the update-default-engine branch November 7, 2024 19:41
@VijayanB
Copy link
Member Author

VijayanB commented Nov 7, 2024

Failure is due to waiting for this PR to be merged #2248

opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 7, 2024
* Update default engine to FAISS

Since faiss supports more features than nmslib, and, we had seen
data points that there are more number of vector search
users are interesed in faiss, we will be updating default
engine to be faiss. This will benefit users who preffered
to use defaults while working with vector search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update legacy mapping

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Create legacy mapping only up to V_2_17_2

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test engine

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test method

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 7d34456)
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Nov 7, 2024
VijayanB added a commit that referenced this pull request Nov 8, 2024
* Update default engine to FAISS

Since faiss supports more features than nmslib, and, we had seen
data points that there are more number of vector search
users are interesed in faiss, we will be updating default
engine to be faiss. This will benefit users who preffered
to use defaults while working with vector search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update legacy mapping

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Create legacy mapping only up to V_2_17_2

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test engine

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Update test method

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 7d34456)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants