-

The voyager Python API#

+

The voyager Python API#

This module provides classes and functions for creating indices of vector data.

A quick example on how to get started:

import numpy as np
@@ -233,7 +233,7 @@ 

The voyager
-class voyager.Index(space: Space, num_dimensions: int, M: int = 12, ef_construction: int = 200, random_seed: int = 1, max_elements: int = 1, storage_data_type: StorageDataType = StorageDataType.Float32)#
+class voyager.Index(space: Space, num_dimensions: int, M: int = 12, ef_construction: int = 200, random_seed: int = 1, max_elements: int = 1, storage_data_type: StorageDataType = StorageDataType.Float32)#

A nearest-neighbor search index containing vector data (i.e. lists of floating-point values, each list labeled with a single integer ID).

Think of a Voyager Index as a Dict[int, List[float]] @@ -293,7 +293,7 @@

The voyager
-__contains__(id: int) bool#
+__contains__(id: int) bool#

Check to see if a provided vector’s ID is present in this index.

Returns true iff the provided integer ID has a corresponding (non-deleted) vector in this index. Use the in operator to call this method:

@@ -304,7 +304,7 @@

The voyager
-__len__() int#
+__len__() int#

Returns the number of non-deleted vectors in this index.

Use the len operator to call this method:

len(index) # => 1234
@@ -318,7 +318,7 @@ 

The voyager
-add_item(vector: ndarray[Any, dtype[float32]], id: int | None = None) int#
+add_item(vector: ndarray[Any, dtype[float32]], id: Optional[int] = None) int#

Add a vector to this index.

Parameters:
@@ -347,7 +347,7 @@

The voyager
-add_items(vectors: ndarray[Any, dtype[float32]], ids: List[int] | None = None, num_threads: int = -1) List[int]#
+add_items(vectors: ndarray[Any, dtype[float32]], ids: Optional[list[int]] = None, num_threads: int = -1) list[int]#

Add multiple vectors to this index simultaneously.

This method may be faster than calling add_items() multiple times, as passing a batch of vectors helps avoid Python’s Global Interpreter Lock.

@@ -378,7 +378,7 @@

The voyager
-as_bytes() bytes#
+as_bytes() bytes#

Returns the contents of this index as a bytes object. The resulting object will contain the same data as if this index was serialized to disk and then read back into memory again.

@@ -400,13 +400,13 @@

The voyager
-get_distance(a: List[float], b: List[float]) float#
+get_distance(a: list[float], b: list[float]) float#

Get the distance between two provided vectors. The vectors must share the dimensionality of the index.

-get_vector(id: int) ndarray[Any, dtype[float32]]#
+get_vector(id: int) ndarray[Any, dtype[float32]]#

Get the vector stored in this index at the provided integer ID. If no such vector exists, a KeyError will be thrown.

@@ -424,7 +424,7 @@

The voyager
-get_vectors(ids: List[int]) ndarray[Any, dtype[float32]]#
+get_vectors(ids: list[int]) ndarray[Any, dtype[float32]]#

Get one or more vectors stored in this index at the provided integer IDs. If one or more of the provided IDs cannot be found in the index, a KeyError will be thrown.

@@ -438,7 +438,7 @@

The voyager
-static load(filename: str, space: Space, num_dimensions: int, storage_data_type: StorageDataType = StorageDataType.Float32) Index#
+static load(filename: str, space: Space, num_dimensions: int, storage_data_type: StorageDataType = StorageDataType.Float32) Index#
static load(filename: str) Index
@@ -466,7 +466,7 @@

The voyager
-mark_deleted(id: int) None#
+mark_deleted(id: int) None#

Mark an ID in this index as deleted.

Deleted IDs will not show up in the results of calls to query(), but will still take up space in the index, and will slow down queries.

@@ -503,7 +503,7 @@

The voyager
-query(vectors: ndarray[Any, dtype[float32]], k: int = 1, num_threads: int = -1, query_ef: int = -1) Tuple[ndarray[Any, dtype[uint64]], ndarray[Any, dtype[float32]]]#
+query(vectors: ndarray[Any, dtype[float32]], k: int = 1, num_threads: int = -1, query_ef: int = -1) tuple[numpy.ndarray[Any, numpy.dtype[numpy.uint64]], numpy.ndarray[Any, numpy.dtype[numpy.float32]]]#

Query this index to retrieve the k nearest neighbors of the provided vectors.

Parameters:
@@ -556,11 +556,18 @@

The voyagerprint(f" {i}-th closest neighbor is {neighbor_id}, {distance} away")

+
+

Warning

+

If using E4M3 storage with the Cosine Space, some queries may return +negative distances due to the reduced floating-point precision of the storage +data type. While confusing, these negative distances still result in a correct +ordering between results.

+

-resize(new_size: int) None#
+resize(new_size: int) None#

Resize this index, allocating space for up to new_size elements to be stored. This changes the max_elements property and may cause this Index object to use more memory. This is a fairly @@ -578,7 +585,7 @@

The voyager
-save(output_path: str) None#
+save(output_path: str) None#
save(file_like: BinaryIO) None

Save this index to the provided file path or file-like object.

@@ -590,7 +597,7 @@

The voyager
-unmark_deleted(id: int) None#
+unmark_deleted(id: int) None#

Unmark an ID in this index as deleted.

Once unmarked as deleted, an existing ID will show up in the results of calls to query() again.

@@ -598,7 +605,7 @@

The voyager
-property M: int#
+property M: int#

The number of connections between nodes in the tree’s internal data structure.

Larger values give better recall, but use more memory. This parameter cannot be changed after the index is instantiated.

@@ -606,7 +613,7 @@

The voyager
-property ef: int#
+property ef: int#

The default number of vectors to search through when calling query().

Higher values make queries slower, but give better recall.

@@ -621,7 +628,7 @@

The voyager
-property ef_construction: int#
+property ef_construction: int#

The number of vectors that this index searches through when inserting a new vector into the index. Higher values make index construction slower, but give better recall. This parameter cannot be changed after the index is instantiated.

@@ -629,7 +636,7 @@

The voyager
-property ids: LabelSetView#
+property ids: LabelSetView#

A set-like object containing the integer IDs stored as ‘keys’ in this index.

Use these indices to iterate over the vectors in this index, or to check for inclusion of a specific integer ID in this index:

@@ -649,7 +656,7 @@

The voyager
-property max_elements: int#
+property max_elements: int#

The maximum number of elements that can be stored in this index.

If max_elements is much larger than num_elements, this index may use more memory @@ -665,13 +672,13 @@

The voyager
-property num_dimensions: int#
+property num_dimensions: int#

The number of dimensions in each vector stored by this index.

-property num_elements: int#
+property num_elements: int#

The number of elements (vectors) currently stored in this index.

Note that the number of elements will not decrease if any elements are deleted from the index; those deleted elements simply become invisible.

@@ -679,39 +686,39 @@

The voyager
-property space: Space#
+property space: Space#

Return the Space used to calculate distances between vectors.

-property storage_data_type: StorageDataType#
+property storage_data_type: StorageDataType#

The StorageDataType used to store vectors in this Index.

-

Enums#

+

Enums#

-class voyager.Space(value)#
+class voyager.Space(value)#

The method used to calculate the distance between vectors.

-Euclidean = 0#
+Euclidean = 0#

Euclidean distance; also known as L2 distance. The square root of the sum of the squared differences between each element of each vector.

-InnerProduct = 1#
+InnerProduct = 1#

Inner product distance.

-Cosine = 2#
+Cosine = 2#

Cosine distance; also known as normalized inner product.

@@ -719,7 +726,7 @@

Enums#<
-class voyager.StorageDataType(value)#
+class voyager.StorageDataType(value)#

The data type used to store vectors in memory and on-disk.

The StorageDataType used for an Index directly determines its memory usage, disk space usage, and recall. Both Float8 and @@ -727,30 +734,34 @@

Enums#< memory usage and index size by a factor of 4 compared to Float32.

-Float8 = 16#
+Float8 = 16#

8-bit fixed-point decimal values. All values must be within [-1, 1.00787402].

-Float32 = 32#
+Float32 = 32#

32-bit floating point (default).

-E4M3 = 48#
+E4M3 = 48#

8-bit floating point with a range of [-448, 448], from the paper “FP8 Formats for Deep Learning” by Micikevicius et al.

+
+

Warning

+

Using E4M3 with the Cosine Space may cause some queries to return negative distances due to the reduced floating-point precision. While confusing, these negative distances still result in a correct ordering between results.

+

-

Utilities#

+

Utilities#

-class voyager.LabelSetView#
+class voyager.LabelSetView#

A read-only set-like object containing 64-bit integers. Use this object like a regular Python set object, by either iterating through it, or checking for membership with the in operator.

@@ -869,11 +880,14 @@

Utilities - - - - - +

+ + + + + + + + \ No newline at end of file diff --git a/docs/python/search.html b/docs/python/search.html index 8ef82d9f..125dcee3 100644 --- a/docs/python/search.html +++ b/docs/python/search.html @@ -1,14 +1,14 @@ - + - Search - Voyager 2.0.8 Python Documentation - - - + Search - Voyager 2.0.8 Python Documentation + + + @@ -142,7 +142,7 @@
-