Skip to content

Commit

Permalink
Update GitHub organization to Intel (#380)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: 5746e50b30f0dcd05ecbb0ab099a9f6345c60e68
  • Loading branch information
mihaic committed Jul 31, 2024
1 parent 60fa5b9 commit 4d6e58c
Show file tree
Hide file tree
Showing 10 changed files with 22 additions and 22 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ Finally, if environment variable based initialization is not desired, it can be

* The implementation of two-level LVQ has changed from bitwise extension to true cascaded
application of scalar quantization. See the discussion on
[this PR](https://github.com/IntelLabs/ScalableVectorSearch/pull/28).
[this PR](https://github.com/intel/ScalableVectorSearch/pull/28).

Consequently, previously saved two-level LVQ datasets have had their serialization version
incremented from `v0.0.2` to `v0.0.3` and will need to be regenerated.
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ SVS is written in C++ to facilitate its integration into performance-critical ap
## Performance

SVS provides state-of-the-art performance and accuracy [[ABHT23]](#1) for billion-scale similarity search on
[standard benchmarks](https://intellabs.github.io/ScalableVectorSearch/benchs/index.html).
[standard benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/index.html).

For example, for the standard billion-scale [Deep-1B](http://sites.skoltech.ru/compvision/noimi/) dataset,
different configurations of SVS yield significantly increased performance (measured in queries per second, QPS) with a smaller memory footprint (horizontal axis) than the alternatives[^1]:
Expand All @@ -26,7 +26,7 @@ different configurations of SVS yield significantly increased performance (measu
</p>

SVS is primarily optimized for large-scale similarity search but it still offers [state-of-the-art performance
at million-scale](https://intellabs.github.io/ScalableVectorSearch/benchs/small_scale_benchs.html).
at million-scale](https://intel.github.io/ScalableVectorSearch/benchs/small_scale_benchs.html).

Best performance is obtained with 4th generation (Sapphire Rapids) by making use of AVX-512 instructions,
with excellent results also with 2nd and 3rd Intel &reg; Xeon &reg; processors (Cascade Lake
Expand All @@ -47,12 +47,12 @@ SVS supports:
- 3rd generation (Ice Lake)
- 4th generation (Sapphire Rapids)

See [Roadmap](https://intellabs.github.io/ScalableVectorSearch/roadmap.html) for upcoming features.
See [Roadmap](https://intel.github.io/ScalableVectorSearch/roadmap.html) for upcoming features.


## Documentation

[SVS documentation](https://intellabs.github.io/ScalableVectorSearch) includes getting started tutorials with [installation instructions for Python](https://intellabs.github.io/ScalableVectorSearch/start.html#installation) and [C++](https://intellabs.github.io/ScalableVectorSearch/start_cpp.html#building) and step-by-step search examples, an API reference, as well as several guides and benchmark comparisons.
[SVS documentation](https://intel.github.io/ScalableVectorSearch) includes getting started tutorials with [installation instructions for Python](https://intel.github.io/ScalableVectorSearch/start.html#installation) and [C++](https://intel.github.io/ScalableVectorSearch/start_cpp.html#building) and step-by-step search examples, an API reference, as well as several guides and benchmark comparisons.

## References
Reference to cite when you use SVS in a research paper:
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced/build.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ To include the C++ portion of the library in a CMake based project, follow the t
include(FetchContent)
FetchContent_Declare(
svs
GIT_REPOSITORY https://github.com/IntelLabs/ScalableVectorSearch.git
GIT_REPOSITORY https://github.com/intel/ScalableVectorSearch.git
GIT_TAG main
)
Expand Down
4 changes: 2 additions & 2 deletions docs/benchs/static/previous/large_scale_benchs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Parameters Setting
^^^^^^^^^^^^^^^^^^^

We used the following versions of each method:
SVS `commit ad821d8 <https://github.com/IntelLabs/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c>`_,
SVS `commit ad821d8 <https://github.com/intel/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c>`_,
Vamana `commit 647f68f <https://github.com/microsoft/DiskANN/commit/647f68fe5aa7b45124ae298c219fe909d46a1834>`_,
HNSWlib `commmit 4b2cb72 <https://github.com/nmslib/hnswlib/commit/4b2cb72c3c1bbddee55535ec6f360a0b2e40a81e>`_,
ScaNN `commit d170ac5 <https://github.com/google-research/google-research/commit/d170ac58ce1d071614b2813b056afa292f5e490c>`_,
Expand Down Expand Up @@ -214,4 +214,4 @@ In all cases where several parameter settings are evaluated, the results show th
.. [#ft3] All experimental results were completed by April 30th 2023.
.. [#ft2] NGT [IwMi18]_ is included in the :ref:`small_scale_benchs` and not in the large scale evaluation because the algorithm designed for
large-scale datasets (NGT-QBG) achieves low accuracy saturating at 0.86 recall even for a small 1-million vectors dataset.
large-scale datasets (NGT-QBG) achieves low accuracy saturating at 0.86 recall even for a small 1-million vectors dataset.
4 changes: 2 additions & 2 deletions docs/benchs/static/previous/small_scale_benchs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Comparison to Other Implementations
We compare SVS to five widely adopted approaches: Vamana [SDSK19]_, HSNWlib [MaYa18]_, FAISS-IVFPQfs [JoDJ19]_, ScaNN
[GSLG20]_, and NGT [IwMi18]_. We use the implementations available through `ANN-Benchmarks <https://github.com/erikbern/ann-benchmarks>`_
(`commit 167f129 <https://github.com/erikbern/ann-benchmarks/commit/167f1297b21789d13a9fa82646c522011df8c163>`_ , October 4th 2022)
and for SVS we use `commit ad821d8 <https://github.com/IntelLabs/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c>`_.
and for SVS we use `commit ad821d8 <https://github.com/intel/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c>`_.
See :ref:`param_setting_bench_small_scale` for details on the evaluated configurations for
each method. We run the evaluation in the three different :ref:`query modes <query_batch_size>`. [#ft2]_

Expand Down Expand Up @@ -185,4 +185,4 @@ LVQ-compressed vectors are padded to half cache lines (``padding`` = 32).
.. [#ft3] All experimental results were completed by April 30th 2023.
.. [#ft2] NGT-qg is not included in the query batch mode evaluation because the available implementation does not support
multi-query processing.
multi-query processing.
4 changes: 2 additions & 2 deletions docs/howtos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ data format (e.g. images, text).

We will walk through a simple example below. For complete examples, please
see our `VectorSearchDatasets repository <https://github
.com/intellabs/vectorsearchdatasets>`_, which contains code to generate compatible
.com/IntelLabs/VectorSearchDatasets>`_, which contains code to generate compatible
vector embeddings for common datasets such as `open-images <https://storage.googleapis.com/openimages/web/index.html>`_.

Example: vector embeddings of images
Expand Down Expand Up @@ -370,5 +370,5 @@ with :py:func:`svs.write_vecs`. A description of the ``*vecs`` file format is gi
svs.write_vecs(embeddings, out_file)
Other data format helper functions are described in our `I/O and Conversion Tools
<https://intellabs.github
<https://intel.github
.io/ScalableVectorSearch/io.html>`_ documentation.
2 changes: 1 addition & 1 deletion docs/start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ To build and install the SVS Python module, clone the repo and run the following
.. code-block:: sh
# Clone the repository
git clone https://github.com/IntelLabs/ScalableVectorSearch
git clone https://github.com/intel/ScalableVectorSearch
cd ScalableVectorSearch
# Install svs using pip
Expand Down
2 changes: 1 addition & 1 deletion examples/cpp/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# SVS C++ examples

The examples provided here showcase SVS features. `vamana.cpp` shows search features; see the [getting started tutorial](https://intellabs.github.io/ScalableVectorSearch/start_cpp.html) for more details. `types.cpp` shows the types supported. `saveload.cpp` shows data structure saving and loading. `dispatcher.cpp` shows compile-time specialization with generic fallbacks.
The examples provided here showcase SVS features. `vamana.cpp` shows search features; see the [getting started tutorial](https://intel.github.io/ScalableVectorSearch/start_cpp.html) for more details. `types.cpp` shows the types supported. `saveload.cpp` shows data structure saving and loading. `dispatcher.cpp` shows compile-time specialization with generic fallbacks.
2 changes: 1 addition & 1 deletion examples/python/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# SVS Python examples

The examples provided here showcase SVS features. `example_vamana.py` shows search features; see the [getting started tutorial](https://intellabs.github.io/ScalableVectorSearch/start.html) for more details. `example_vamana_dynamic.py` shows the search using dynamic types. `example_vamana_leanvec.py` shows search using dimensionality reduction.
The examples provided here showcase SVS features. `example_vamana.py` shows search features; see the [getting started tutorial](https://intel.github.io/ScalableVectorSearch/start.html) for more details. `example_vamana_dynamic.py` shows the search using dynamic types. `example_vamana_leanvec.py` shows search using dimensionality reduction.
14 changes: 7 additions & 7 deletions reproducibility/VLDB2023.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ SOTA for both small and large scale datasets. For the small and large scale expe
[ANN-benchmarks](https://ann-benchmarks.com/) and [Big-ANN-benchmarks](https://github.com/harsha-simhadri/big-ann-benchmarks)
protocols respectively.

The **code to run OG-LVQ** can be found [here](https://github.com/IntelLabs/ScalableVectorSearch), and it was included in the
The **code to run OG-LVQ** can be found [here](https://github.com/intel/ScalableVectorSearch), and it was included in the
ANN-benchmarks and Big-ANN-benchmarks evaluation codes following their
[guidelines](https://github.com/erikbern/ann-benchmarks/#including-your-algorithm). See the corresponding sections below
for details on the code and setup used for the compared implementations at [small](#small-scale-experiments) and
Expand All @@ -24,7 +24,7 @@ elements, data types and metrics as described in the table below.
| [deep-96-10M](http://sites.skoltech.ru/compvision/noimi/) | 96 | 10M | float32 | cosine similarity | 10000 | 3.6 |
| [glove-50-1.2M](https://nlp.stanford.edu/projects/glove/) | 50 | 1.2M | float32 | cosine similarity | 10000 | 0.2 |
| [glove-25-1.2M](https://nlp.stanford.edu/projects/glove/) | 25 | 1.2M | float32 | cosine similarity | 10000 | 0.1 |
| [DPR-768-10M](https://github.com/IntelLabs/DPR-dataset-generator) | 768 | 10M | float32 | inner product | 10000 | 28.6 |
| [DPR-768-10M](https://github.com/IntelLabs/VectorSearchDatasets) | 768 | 10M | float32 | inner product | 10000 | 28.6 |
| [t2i-200-100M](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) | 200 | 100M | float32 | inner product | 10000 | 74.5 |
| [deep-96-100M](http://sites.skoltech.ru/compvision/noimi/) | 96 | 100M | float32 | cosine similarity | 10000 | 35.8 |
| [deep-96-1B](http://sites.skoltech.ru/compvision/noimi/) | 96 | 1B | float32 | cosine similarity | 10000 | 257.6 |
Expand All @@ -36,7 +36,7 @@ elements, data types and metrics as described in the table below.
DPR is a dataset containing 10 million 768-dimensional embeddings generated with the dense passage retriever (DPR)
[[KOML20]](#2) model. Text snippets from the C4 dataset [[RSRL20]](#3) are used to generate: 10 million context DPR embeddings
(base set) and 10000 question DPR embeddings (query set).
The code to generate the dataset can be found [here](https://github.com/IntelLabs/DPR-dataset-generator).
The code to generate the dataset can be found [here](https://github.com/IntelLabs/VectorSearchDatasets).

### Evaluation Metrics

Expand Down Expand Up @@ -72,7 +72,7 @@ for details about these datasets):
We compare OG-LVQ to five widely adopted approaches: Vamana [[SDSK19]](#4), HSNWlib [[MaYa18]](#5), FAISS-IVFPQfs
[[JoDJ19]](#6), ScaNN [[GSLG20]](#7), and NGT [[IwMi18]](#8). We use the implementations available through [ANN-Benchmarks](https://github.com/erikbern/ann-benchmarks)
([commit 167f129](https://github.com/erikbern/ann-benchmarks/commit/167f1297b21789d13a9fa82646c522011df8c163) , October 4th 2022)
and for OG-LVQ we use [commit ad821d8](https://github.com/IntelLabs/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c).
and for OG-LVQ we use [commit ad821d8](https://github.com/intel/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c).

#### Parameters Setting
For the graph-based methods (HSNWlib, Vamana, OG-LVQ) we use the same ``graph_max_degree`` values (32, 64 and 128).
Expand All @@ -90,7 +90,7 @@ We ran all experiments in a single socket (using ``numactl``) to avoid introduci
remote NUMA memory accesses.

We use the ``hugeadm`` Linux utility to preallocate a sufficient number of 1GB huge pages for each algorithm.
[OG-LVQ explicitly uses huge pages](https://intellabs.github.io/ScalableVectorSearch/performance/hugepages.html) to reduce the virtual memory overheads.
[OG-LVQ explicitly uses huge pages](https://intel.github.io/ScalableVectorSearch/performance/hugepages.html) to reduce the virtual memory overheads.
For a fair comparison, we run other methods with system flags enabled to automatically use huge pages for large allocations.

We consider datasets that are large scale because of their total footprint (see [Datasets and Metrics](#datasets-and-metrics)
Expand All @@ -106,7 +106,7 @@ We compare the performance of OG-LVQ vs. four widely adopted approaches: Vamana
[[JoDJ19]](#6), and ScaNN [[GSLG20]](#7).

We used the following versions of each method:
OG-LVQ [commit ad821d8](https://github.com/IntelLabs/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c),
OG-LVQ [commit ad821d8](https://github.com/intel/ScalableVectorSearch/commit/ad821d8c94cb69a67c8744b98ee1c79d3e3a299c),
Vamana [commit 647f68f](https://github.com/microsoft/DiskANN/commit/647f68fe5aa7b45124ae298c219fe909d46a1834),
HNSWlib [commmit 4b2cb72](https://github.com/nmslib/hnswlib/commit/4b2cb72c3c1bbddee55535ec6f360a0b2e40a81e),
ScaNN [commit d170ac5](https://github.com/google-research/google-research/commit/d170ac58ce1d071614b2813b056afa292f5e490c),
Expand Down Expand Up @@ -203,4 +203,4 @@ Performance results are based on testing as of dates shown in configurations and
available updates. No product or component can be absolutely secure. Your costs and results may vary. Intel
technologies may require enabled hardware, software or service activation. &copy; Intel Corporation. Intel,
the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and
brands may be claimed as the property of others.
brands may be claimed as the property of others.

0 comments on commit 4d6e58c

Please sign in to comment.