Create linear-search-optimized HNSW #12

vadixidav · 2019-09-01T15:38:00Z

Some optimizations can be performed to massively speed up HNSW by utilizing linear search on clusters and in levels of NSW that are small enough that linear search is faster. This will also result in lower memory consumption. In the current architecture, these optimizations require a linear-time operation to:

Order every level so that the indices are the same
- This allows performing incremental linear search on every level until we reach a level where the number of features is too large.
Cluster features below a certain level to their nearest neighbor in a higher level to permit linear search in that cluster
- This allows performing linear search at the bottom of the graph in clusters.

This will make it so that the "small worlds" in the NSW are clusters that can be searched linearly, which is incredibly fast on modern computers.

It is not possible to efficiently maintain this during operation because every insertion would require a linear time operation to invalidate all directed edges connected to a node that must be moved (although they can be moved in log(N) time). We could make an alternate version of HNSW that does allow the first optimization in log(N) by utilizing the heap to store neighbors pointing into a given node which wouldn't affect the performance of normal searching, but would slow insertion some (although the overall speed improvement may make insertion faster).

For the second optimization, it isn't clear exactly what the semantics of insertion would be if you were to maintain clusters at the bottom of the HNSW for speed, but it is clear how you would convert an existing HNSW into a clustered version. If we could do it, allowing insertion into the clustered version would be game-changing, but it would require knowing how to split up a cluster once it becomes too big into a NSW. One idea would be to store clusters and nodes at each level of the graph and then once a cluster (which is always a leaf of the graph) needs to be broken up, one random feature is chosen from the cluster to become a new cluster center and all features are separated into two clusters based on which of the two centers they are closest to. The performance characteristics of this approach are unknown and must be benchmarked.

We should attempt to create static versions of HNSW that are built from a dynamic version in linear time complexity and also a dynamic version of HNSW that contains both of the above optimizations in their dynamic form. The two resulting data structures should be benchmarked to see if there is a benefit to the static version, otherwise the dynamic version should be preferred since the memory consumption difference should be negligable and so performance is the main factor.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create linear-search-optimized HNSW #12

Create linear-search-optimized HNSW #12

vadixidav commented Sep 1, 2019

Create linear-search-optimized HNSW #12

Create linear-search-optimized HNSW #12

Comments

vadixidav commented Sep 1, 2019