Skip to content

Commit

Permalink
Merge pull request #2360 from drizzle-team/pgvector
Browse files Browse the repository at this point in the history
Pgvector
  • Loading branch information
AndriiSherman authored May 23, 2024
2 parents 2c59442 + c7963ca commit 7a05232
Show file tree
Hide file tree
Showing 27 changed files with 1,685 additions and 63 deletions.
138 changes: 138 additions & 0 deletions changelogs/drizzle-orm/0.31.0-beta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
## Breaking changes

### PostgreSQL indexes API was changed

The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit

Previous API

- No way to define SQL expressions inside `.on`.
- `.using` and `.on` in our case are the same thing, so the API is incorrect here.
- `.asc()`, `.desc()`, `.nullsFirst()`, and `.nullsLast()` should be specified for each column or expression on indexes, but not on an index itself.

```ts
// Index declaration reference
index('name')
.on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
.concurrently()
.using(sql``) // sql expression
.asc() or .desc()
.nullsFirst() or .nullsLast()
.where(sql``) // sql expression
```

Current API

```ts
// First example, with `.on()`
index('name')
.on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
.concurrently()
.where(sql``)
.with({ fillfactor: '70' })

// Second Example, with `.using()`
index('name')
.using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
.where(sql``) // sql expression
.with({ fillfactor: '70' })
```

## New Features

### 🎉 "pg_vector" extension support

> There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the `pg_vector` extension installed.
You can now specify indexes for `pg_vector` and utilize `pg_vector` functions for querying, ordering, etc.

Let's take a few examples of `pg_vector` indexes from the `pg_vector` docs and translate them to Drizzle

#### L2 distance, Inner product and Cosine distance

```ts
// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

const table = pgTable('items', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))
```

#### L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version

```ts
// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);

const table = pgTable('table', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))
```

For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.

You can also use the following helpers:

```ts
import { l2Distance, l1Distance, innerProduct,
cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'

l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'

innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'

hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'
```

If `pg_vector` has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done

```ts
export function l2Distance(
column: SQLWrapper | AnyColumn,
value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
return sql`${column} <-> ${value}`;
}
return sql`${column} <-> ${JSON.stringify(value)}`;
}
```

Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR

#### Examples

Let's take a few examples of `pg_vector` queries from the `pg_vector` docs and translate them to Drizzle

```ts
import { l2Distance } from 'drizzle-orm';

// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))

// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })

// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)

// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)

// and more!
```
2 changes: 1 addition & 1 deletion drizzle-orm/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "drizzle-orm",
"version": "0.30.10",
"version": "0.31.0",
"description": "Drizzle ORM package for SQL databases",
"type": "module",
"scripts": {
Expand Down
23 changes: 22 additions & 1 deletion drizzle-orm/src/column-builder.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { entityKind } from '~/entity.ts';
import type { Column } from './column.ts';
import type { MySqlColumn } from './mysql-core/index.ts';
import type { PgColumn } from './pg-core/index.ts';
import type { ExtraConfigColumn, PgColumn } from './pg-core/index.ts';
import type { SQL } from './sql/sql.ts';
import type { SQLiteColumn } from './sqlite-core/index.ts';
import type { Simplify } from './utils.ts';
Expand Down Expand Up @@ -235,6 +235,17 @@ export type BuildColumn<
: TDialect extends 'common' ? Column<MakeColumnConfig<TBuilder['_'], TTableName>>
: never;

export type BuildIndexColumn<
TDialect extends Dialect,
> = TDialect extends 'pg' ? ExtraConfigColumn : never;

// TODO
// try to make sql as well + indexRaw

// optional after everything will be working as expected
// also try to leave only needed methods for extraConfig
// make an error if I pass .asc() to fk and so on

export type BuildColumns<
TTableName extends string,
TConfigMap extends Record<string, ColumnBuilderBase>,
Expand All @@ -245,6 +256,16 @@ export type BuildColumns<
}
& {};

export type BuildExtraConfigColumns<
_TTableName extends string,
TConfigMap extends Record<string, ColumnBuilderBase>,
TDialect extends Dialect,
> =
& {
[Key in keyof TConfigMap]: BuildIndexColumn<TDialect>;
}
& {};

export type ChangeColumnTableName<TColumn extends Column, TAlias extends string, TDialect extends Dialect> =
TDialect extends 'pg' ? PgColumn<MakeColumnConfig<TColumn['_'], TAlias>>
: TDialect extends 'mysql' ? MySqlColumn<MakeColumnConfig<TColumn['_'], TAlias>>
Expand Down
7 changes: 6 additions & 1 deletion drizzle-orm/src/mysql-core/table.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import type { BuildColumns } from '~/column-builder.ts';
import type { BuildColumns, BuildExtraConfigColumns } from '~/column-builder.ts';
import { entityKind } from '~/entity.ts';
import { Table, type TableConfig as TableConfigBase, type UpdateTableConfig } from '~/table.ts';
import type { CheckBuilder } from './checks.ts';
Expand Down Expand Up @@ -89,6 +89,11 @@ export function mysqlTableWithSchema<
const table = Object.assign(rawTable, builtColumns);

table[Table.Symbol.Columns] = builtColumns;
table[Table.Symbol.ExtraConfigColumns] = builtColumns as unknown as BuildExtraConfigColumns<
TTableName,
TColumnsMap,
'mysql'
>;

if (extraConfig) {
table[MySqlTable.Symbol.ExtraConfigBuilder] = extraConfig as unknown as (
Expand Down
102 changes: 102 additions & 0 deletions drizzle-orm/src/pg-core/columns/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import type { ForeignKey, UpdateDeleteAction } from '~/pg-core/foreign-keys.ts';
import { ForeignKeyBuilder } from '~/pg-core/foreign-keys.ts';
import type { AnyPgTable, PgTable } from '~/pg-core/table.ts';
import { iife } from '~/tracing-utils.ts';
import type { PgIndexOpClass } from '../indexes.ts';
import { uniqueKeyName } from '../unique-constraint.ts';
import { makePgArray, parsePgArray } from '../utils/array.ts';

Expand Down Expand Up @@ -105,6 +106,13 @@ export abstract class PgColumnBuilder<
abstract build<TTableName extends string>(
table: AnyPgTable<{ name: TTableName }>,
): PgColumn<MakeColumnConfig<T, TTableName>>;

/** @internal */
buildExtraConfigColumn<TTableName extends string>(
table: AnyPgTable<{ name: TTableName }>,
): ExtraConfigColumn {
return new ExtraConfigColumn(table, this.config);
}
}

// To understand how to use `PgColumn` and `PgColumn`, see `Column` and `AnyColumn` documentation.
Expand All @@ -126,6 +134,100 @@ export abstract class PgColumn<
}
}

export type IndexedExtraConfigType = { order?: 'asc' | 'desc'; nulls?: 'first' | 'last'; opClass?: string };

export class ExtraConfigColumn<
T extends ColumnBaseConfig<ColumnDataType, string> = ColumnBaseConfig<ColumnDataType, string>,
> extends PgColumn<T, IndexedExtraConfigType> {
static readonly [entityKind]: string = 'ExtraConfigColumn';

override getSQLType(): string {
return this.getSQLType();
}

indexConfig: IndexedExtraConfigType = {
order: this.config.order ?? 'asc',
nulls: this.config.nulls ?? 'last',
opClass: this.config.opClass,
};
defaultConfig: IndexedExtraConfigType = {
order: 'asc',
nulls: 'last',
opClass: undefined,
};

asc(): Omit<this, 'asc' | 'desc'> {
this.indexConfig.order = 'asc';
return this;
}

desc(): Omit<this, 'asc' | 'desc'> {
this.indexConfig.order = 'desc';
return this;
}

nullsFirst(): Omit<this, 'nullsFirst' | 'nullsLast'> {
this.indexConfig.nulls = 'first';
return this;
}

nullsLast(): Omit<this, 'nullsFirst' | 'nullsLast'> {
this.indexConfig.nulls = 'last';
return this;
}

/**
* ### PostgreSQL documentation quote
*
* > An operator class with optional parameters can be specified for each column of an index.
* The operator class identifies the operators to be used by the index for that column.
* For example, a B-tree index on four-byte integers would use the int4_ops class;
* this operator class includes comparison functions for four-byte integers.
* In practice the default operator class for the column's data type is usually sufficient.
* The main point of having operator classes is that for some data types, there could be more than one meaningful ordering.
* For example, we might want to sort a complex-number data type either by absolute value or by real part.
* We could do this by defining two operator classes for the data type and then selecting the proper class when creating an index.
* More information about operator classes check:
*
* ### Useful links
* https://www.postgresql.org/docs/current/sql-createindex.html
*
* https://www.postgresql.org/docs/current/indexes-opclass.html
*
* https://www.postgresql.org/docs/current/xindex.html
*
* ### Additional types
* If you have the `pg_vector` extension installed in your database, you can use the
* `vector_l2_ops`, `vector_ip_ops`, `vector_cosine_ops`, `vector_l1_ops`, `bit_hamming_ops`, `bit_jaccard_ops`, `halfvec_l2_ops`, `sparsevec_l2_ops` options, which are predefined types.
*
* **You can always specify any string you want in the operator class, in case Drizzle doesn't have it natively in its types**
*
* @param opClass
* @returns
*/
op(opClass: PgIndexOpClass): Omit<this, 'op'> {
this.indexConfig.opClass = opClass;
return this;
}
}

export class IndexedColumn {
static readonly [entityKind]: string = 'IndexedColumn';
constructor(
name: string | undefined,
type: string,
indexConfig: IndexedExtraConfigType,
) {
this.name = name;
this.type = type;
this.indexConfig = indexConfig;
}

name: string | undefined;
type: string;
indexConfig: IndexedExtraConfigType;
}

export type AnyPgColumn<TPartial extends Partial<ColumnBaseConfig<ColumnDataType, string>> = {}> = PgColumn<
Required<Update<ColumnBaseConfig<ColumnDataType, string>, TPartial>>
>;
Expand Down
4 changes: 4 additions & 0 deletions drizzle-orm/src/pg-core/columns/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ export * from './time.ts';
export * from './timestamp.ts';
export * from './uuid.ts';
export * from './varchar.ts';
export * from './vector_extension/bit.ts';
export * from './vector_extension/halfvec.ts';
export * from './vector_extension/sparsevec.ts';
export * from './vector_extension/vector.ts';
Loading

0 comments on commit 7a05232

Please sign in to comment.