-
-
Notifications
You must be signed in to change notification settings - Fork 751
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2360 from drizzle-team/pgvector
Pgvector
- Loading branch information
Showing
27 changed files
with
1,685 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
## Breaking changes | ||
|
||
### PostgreSQL indexes API was changed | ||
|
||
The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit | ||
|
||
Previous API | ||
|
||
- No way to define SQL expressions inside `.on`. | ||
- `.using` and `.on` in our case are the same thing, so the API is incorrect here. | ||
- `.asc()`, `.desc()`, `.nullsFirst()`, and `.nullsLast()` should be specified for each column or expression on indexes, but not on an index itself. | ||
|
||
```ts | ||
// Index declaration reference | ||
index('name') | ||
.on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...) | ||
.concurrently() | ||
.using(sql``) // sql expression | ||
.asc() or .desc() | ||
.nullsFirst() or .nullsLast() | ||
.where(sql``) // sql expression | ||
``` | ||
|
||
Current API | ||
|
||
```ts | ||
// First example, with `.on()` | ||
index('name') | ||
.on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...) | ||
.concurrently() | ||
.where(sql``) | ||
.with({ fillfactor: '70' }) | ||
|
||
// Second Example, with `.using()` | ||
index('name') | ||
.using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops')) | ||
.where(sql``) // sql expression | ||
.with({ fillfactor: '70' }) | ||
``` | ||
|
||
## New Features | ||
|
||
### 🎉 "pg_vector" extension support | ||
|
||
> There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the `pg_vector` extension installed. | ||
You can now specify indexes for `pg_vector` and utilize `pg_vector` functions for querying, ordering, etc. | ||
|
||
Let's take a few examples of `pg_vector` indexes from the `pg_vector` docs and translate them to Drizzle | ||
|
||
#### L2 distance, Inner product and Cosine distance | ||
|
||
```ts | ||
// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops); | ||
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops); | ||
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops); | ||
|
||
const table = pgTable('items', { | ||
embedding: vector('embedding', { dimensions: 3 }) | ||
}, (table) => ({ | ||
l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops')) | ||
ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops')) | ||
cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops')) | ||
})) | ||
``` | ||
|
||
#### L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version | ||
|
||
```ts | ||
// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops); | ||
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops); | ||
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops); | ||
|
||
const table = pgTable('table', { | ||
embedding: vector('embedding', { dimensions: 3 }) | ||
}, (table) => ({ | ||
l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops')) | ||
hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops')) | ||
bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops')) | ||
})) | ||
``` | ||
|
||
For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator. | ||
|
||
You can also use the following helpers: | ||
|
||
```ts | ||
import { l2Distance, l1Distance, innerProduct, | ||
cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm' | ||
|
||
l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]' | ||
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]' | ||
|
||
innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]' | ||
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]' | ||
|
||
hammingDistance(table.column, '101') // table.column <~> '101' | ||
jaccardDistance(table.column, '101') // table.column <%> '101' | ||
``` | ||
|
||
If `pg_vector` has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done | ||
|
||
```ts | ||
export function l2Distance( | ||
column: SQLWrapper | AnyColumn, | ||
value: number[] | string[] | TypedQueryBuilder<any> | string, | ||
): SQL { | ||
if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') { | ||
return sql`${column} <-> ${value}`; | ||
} | ||
return sql`${column} <-> ${JSON.stringify(value)}`; | ||
} | ||
``` | ||
|
||
Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR | ||
|
||
#### Examples | ||
|
||
Let's take a few examples of `pg_vector` queries from the `pg_vector` docs and translate them to Drizzle | ||
|
||
```ts | ||
import { l2Distance } from 'drizzle-orm'; | ||
|
||
// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5; | ||
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2])) | ||
|
||
// SELECT embedding <-> '[3,1,2]' AS distance FROM items; | ||
db.select({ distance: l2Distance(items.embedding, [3,1,2]) }) | ||
|
||
// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5; | ||
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1)); | ||
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5) | ||
|
||
// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items; | ||
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items) | ||
|
||
// and more! | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.