-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update default engine to FAISS #2221
Update default engine to FAISS #2221
Conversation
97240f3
to
daf89f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Will wait for other engineers review
@@ -203,7 +203,7 @@ static KNNMethodContext createKNNMethodContextFromLegacy( | |||
? topLevelSpaceType | |||
: KNNVectorFieldMapperUtil.getSpaceType(indexSettings); | |||
return new KNNMethodContext( | |||
KNNEngine.NMSLIB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep as NMSLIB - nmslib is only legacy support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add comment to like I should have done haha
@@ -29,7 +29,7 @@ public enum KNNEngine implements KNNLibrary { | |||
FAISS(FAISS_NAME, Faiss.INSTANCE), | |||
LUCENE(LUCENE_NAME, Lucene.INSTANCE); | |||
|
|||
public static final KNNEngine DEFAULT = NMSLIB; | |||
public static final KNNEngine DEFAULT = FAISS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we are handling the backward compatibility? Also can we validate if we have proper BWC tests to test this default engine change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just curious to know , why do we need to handle BWC in this case , So i created a Index , without specifying any engine so it picked NMSLIB , so now this information is already there in Cluster State and Serialised and passed to all Nodes, Now During query time , we do get it information from there and it does not depend on the new Default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i created a Index , without specifying any engine so it picked NMSLIB , so now this information is already there in Cluster State and Serialised and passed to all Nodes,
The engine value gets persisted in the cluster state if you define the key method
while creating the index. But if you don't specify the key method
while creating the index(also called as LegacyFieldMapping) and lets say create an index like this where my_vector1
doesn't have the engine defined:
PUT my-knn-index-1
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 2,
"method": {
"name": "hnsw"
}
}
}
}
}
then k-NN plugin infers the engine. Ref: https://github.com/opensearch-project/k-NN/blob/HEAD/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapperUtil.java#L196-L219 for more details. If you try doing a getMapping on the index after creating an index like this you will never get the engine value in return for my_vector1
but you will get engine as nmslib
for my_vector2
. Same is true for cluster state.
@@ -114,7 +115,7 @@ public void testGetAllEngineFileContexts() throws IOException, ExecutionExceptio | |||
engineFileContexts = knnIndexShard.getAllEngineFileContexts(searcher.getIndexReader()); | |||
assertEquals(1, engineFileContexts.size()); | |||
List<String> paths = engineFileContexts.stream().map(KNNIndexShard.EngineFileContext::getIndexPath).collect(Collectors.toList()); | |||
assertTrue(paths.get(0).contains("hnsw") || paths.get(0).contains("hnswc")); | |||
assertTrue(paths.get(0).contains(FAISS_HNSW_EXTENSION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the compound extension covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test to use nmslib
@@ -263,29 +263,6 @@ public void testByteVectorDataTypeWithNmslibEngine() { | |||
assertTrue(ex.getMessage().contains("is not supported for vector data type")); | |||
} | |||
|
|||
@SneakyThrows | |||
public void testByteVectorDataTypeWithLegacyFieldMapperKnnIndexSetting() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking if this case is covered when engine is explicitly NMSLIB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is covered explicitly for nmslib engine
@@ -146,7 +146,7 @@ public void testTypeParser_build_fromKnnMethodContext() throws IOException { | |||
// Check that knnMethodContext takes precedent over both model and legacy | |||
ModelDao modelDao = mock(ModelDao.class); | |||
|
|||
SpaceType spaceType = SpaceType.COSINESIMIL; | |||
SpaceType spaceType = SpaceType.DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vamshin is it okay default engine doesnt support cosine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add this by taking a normalization and then setting to innerproduct. But this would take some effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch! I think still we need to make faiss as default based on all the advanced capabilities like nested fields, efficient filters, disk optimized search, in future GPUs. In long term makes sense to build the normalization to enable cosine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shatejas FAISS doesn't support cosine yet.
aa34e58
to
b5c37b9
Compare
78527e6
to
1ae649e
Compare
93f9c24
to
0d1db53
Compare
Tests are failing not related to this PR. This PR #2236 will fix those issue in main. Will rebase once the other PR is merged. |
3b56639
to
21f1ebd
Compare
21f1ebd
to
73bd246
Compare
Since faiss supports more features than nmslib, and, we had seen data points that there are more number of vector search users are interesed in faiss, we will be updating default engine to be faiss. This will benefit users who preffered to use defaults while working with vector search. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
73bd246
to
c0a0ccc
Compare
Signed-off-by: Vijayan Balasubramanian <[email protected]>
c0a0ccc
to
c79575a
Compare
Test failure is not due to this change. |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2221-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7d3445631591296d00c37ea16351a07ca08ffbd3
# Push it to GitHub
git push --set-upstream origin backport/backport-2221-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
Failure is due to waiting for this PR to be merged #2248 |
* Update default engine to FAISS Since faiss supports more features than nmslib, and, we had seen data points that there are more number of vector search users are interesed in faiss, we will be updating default engine to be faiss. This will benefit users who preffered to use defaults while working with vector search. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update legacy mapping Signed-off-by: Vijayan Balasubramanian <[email protected]> * Create legacy mapping only up to V_2_17_2 Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update test engine Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update test method Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 7d34456)
This reverts commit 7d34456.
* Update default engine to FAISS Since faiss supports more features than nmslib, and, we had seen data points that there are more number of vector search users are interesed in faiss, we will be updating default engine to be faiss. This will benefit users who preffered to use defaults while working with vector search. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update legacy mapping Signed-off-by: Vijayan Balasubramanian <[email protected]> * Create legacy mapping only up to V_2_17_2 Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update test engine Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update test method Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 7d34456) Co-authored-by: Vijayan Balasubramanian <[email protected]>
Description
Related Issues
#2163
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.