Skip to content

Commit

Permalink
feat: support pgvector extension
Browse files Browse the repository at this point in the history
The pgvector plugin is a plugin for PostgreSQL that provides
high-performance vector data types and related query capabilities.
This plugin makes it more efficient and convenient to store and
process vector data in PostgreSQL.

Here are some features and functionalities of the pgvector plugin:

1. Vector data type: The pgvector plugin introduces a new data
type called "vector" that can be used to store and manipulate

The pgvector plugin is a plugin for PostgreSQL that provides
high-performance vector data types and related query capabilities.
This plugin makes it more efficient and convenient to store and
process vector data in PostgreSQL.

Here are some features and functionalities of the pgvector plugin:

1. Vector data type: The pgvector plugin introduces a new data
type called "vector" that can be used to store and manipulate
high-dimensional vector data.

2. Vector indexing: pgvector provides support for indexing and
querying vector data, making it easier to perform similarity searches,
clustering, and classification operations on vector data. The plugin
supports GIN and GiST index types.

3. Vector operations: pgvector provides a range of vector operation
functions, allowing for addition, subtraction, dot product calculation,
and length calculation between vectors.

4. Similarity search: The pgvector plugin supports similarity searches
based on vector similarity metrics. It enables similarity calculation and
search using cosine similarity and other measurement methods.

5. High performance: pgvector is optimized for vector data by utilizing
SIMD instructions and efficient data structures, ensuring high-performance
vector operations and queries.

6. Extensibility: As an open-source project, the pgvector plugin offers flexible
custom extension interfaces, allowing users to customize and extend its
functionality based on their specific needs.
  • Loading branch information
Tilendlesa authored and mrdrivingduck committed Nov 8, 2023
1 parent 6071c25 commit e691aeb
Show file tree
Hide file tree
Showing 107 changed files with 11,794 additions and 1 deletion.
1 change: 1 addition & 0 deletions external/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ SUBDIRS += polar_monitor_preload

# NB: those will be ignored in minimal mode.
ifeq ($(enable_polar_minimal),no)
SUBDIRS += pgvector
SUBDIRS += polar_worker
SUBDIRS += polar_tde_utils
SUBDIRS += polar_parameter_check
Expand Down
8 changes: 8 additions & 0 deletions external/pgvector/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/.git/
/dist/
/results/
/tmp_check/
/sql/vector--?.?.?.sql
regression.*
*.o
*.so
6 changes: 6 additions & 0 deletions external/pgvector/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
root = true

[*.{c,h,pl,pm,sql}]
indent_style = tab
indent_size = tab
tab_width = 4
102 changes: 102 additions & 0 deletions external/pgvector/.github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
name: build
on: [push, pull_request]
jobs:
ubuntu:
runs-on: ${{ matrix.os }}
if: ${{ !startsWith(github.ref_name, 'mac') && !startsWith(github.ref_name, 'windows') }}
strategy:
fail-fast: false
matrix:
include:
- postgres: 17
os: ubuntu-22.04
- postgres: 16
os: ubuntu-22.04
- postgres: 15
os: ubuntu-22.04
- postgres: 14
os: ubuntu-22.04
- postgres: 13
os: ubuntu-20.04
- postgres: 12
os: ubuntu-20.04
- postgres: 11
os: ubuntu-20.04
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: ${{ matrix.postgres }}
dev-files: true
- run: make
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter -Wno-sign-compare
- run: |
export PG_CONFIG=`which pg_config`
sudo --preserve-env=PG_CONFIG make install
- run: make installcheck
- if: ${{ failure() }}
run: cat regression.diffs
- run: |
sudo apt-get update
sudo apt-get install libipc-run-perl
- run: make prove_installcheck
mac:
runs-on: macos-latest
if: ${{ !startsWith(github.ref_name, 'windows') }}
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: 14
- run: make
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter
- run: make install
- run: make installcheck
- if: ${{ failure() }}
run: cat regression.diffs
- run: |
brew install cpanm
cpanm --notest IPC::Run
wget -q https://github.com/postgres/postgres/archive/refs/tags/REL_14_5.tar.gz
tar xf REL_14_5.tar.gz
- run: make prove_installcheck PROVE_FLAGS="-I ./postgres-REL_14_5/src/test/perl" PERL5LIB="/Users/runner/perl5/lib/perl5"
- run: make clean && /usr/local/opt/llvm@15/bin/scan-build --status-bugs make
windows:
runs-on: windows-latest
if: ${{ !startsWith(github.ref_name, 'mac') }}
steps:
- uses: actions/checkout@v4
- uses: ankane/setup-postgres@v1
with:
postgres-version: 14
- run: |
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat" && ^
nmake /NOLOGO /F Makefile.win && ^
nmake /NOLOGO /F Makefile.win install && ^
nmake /NOLOGO /F Makefile.win installcheck && ^
nmake /NOLOGO /F Makefile.win clean && ^
nmake /NOLOGO /F Makefile.win uninstall
shell: cmd
i386:
if: ${{ !startsWith(github.ref_name, 'mac') && !startsWith(github.ref_name, 'windows') }}
runs-on: ubuntu-latest
container:
image: debian:11
options: --platform linux/386
steps:
- run: apt-get update && apt-get install -y build-essential git libipc-run-perl postgresql-13 postgresql-server-dev-13 sudo
- run: service postgresql start
- run: |
git clone https://github.com/${{ github.repository }}.git pgvector
cd pgvector
git fetch origin ${{ github.ref }}
git reset --hard FETCH_HEAD
make
make install
chown -R postgres .
sudo -u postgres make installcheck
sudo -u postgres make prove_installcheck
env:
PG_CFLAGS: -Wall -Wextra -Werror -Wno-unused-parameter -Wno-sign-compare
13 changes: 13 additions & 0 deletions external/pgvector/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/dist/
/log/
/results/
/tmp_check/
/sql/vector--?.?.?.sql
*.o
*.so
*.bc
*.dll
*.dylib
*.obj
*.lib
*.exp
148 changes: 148 additions & 0 deletions external/pgvector/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
## 0.5.1 (2023-10-10)

- Improved performance of HNSW index builds
- Added check for MVCC-compliant snapshot for index scans

## 0.5.0 (2023-08-28)

- Added HNSW index type
- Added support for parallel index builds for IVFFlat
- Added `l1_distance` function
- Added element-wise multiplication for vectors
- Added `sum` aggregate
- Improved performance of distance functions
- Fixed out of range results for cosine distance
- Fixed results for NULL and NaN distances for IVFFlat

## 0.4.4 (2023-06-12)

- Improved error message for malformed vector literal
- Fixed segmentation fault with text input
- Fixed consecutive delimiters with text input

## 0.4.3 (2023-06-10)

- Improved cost estimation
- Improved support for spaces with text input
- Fixed infinite and NaN values with binary input
- Fixed infinite values with vector addition and subtraction
- Fixed infinite values with list centers
- Fixed compilation error when `float8` is pass by reference
- Fixed compilation error on PowerPC
- Fixed segmentation fault with index creation on i386

## 0.4.2 (2023-05-13)

- Added notice when index created with little data
- Fixed dimensions check for some direct function calls
- Fixed installation error with Postgres 12.0-12.2

## 0.4.1 (2023-03-21)

- Improved performance of cosine distance
- Fixed index scan count

## 0.4.0 (2023-01-11)

If upgrading with Postgres < 13, see [this note](https://github.com/pgvector/pgvector#040).

- Changed text representation for vector elements to match `real`
- Changed storage for vector from `plain` to `extended`
- Increased max dimensions for vector from 1024 to 16000
- Increased max dimensions for index from 1024 to 2000
- Improved accuracy of text parsing for certain inputs
- Added `avg` aggregate for vector
- Added experimental support for Windows
- Dropped support for Postgres 10

## 0.3.2 (2022-11-22)

- Fixed `invalid memory alloc request size` error

## 0.3.1 (2022-11-02)

If upgrading from 0.2.7 or 0.3.0, [recreate](https://github.com/pgvector/pgvector#031) all `ivfflat` indexes after upgrading to ensure all data is indexed.

- Fixed issue with inserts silently corrupting `ivfflat` indexes (introduced in 0.2.7)
- Fixed segmentation fault with index creation when lists > 6500

## 0.3.0 (2022-10-15)

- Added support for Postgres 15
- Dropped support for Postgres 9.6

## 0.2.7 (2022-07-31)

- Fixed `unexpected data beyond EOF` error

## 0.2.6 (2022-05-22)

- Improved performance of index creation for Postgres < 12

## 0.2.5 (2022-02-11)

- Reduced memory usage during index creation
- Fixed index creation exceeding `maintenance_work_mem`
- Fixed error with index creation when lists > 1600

## 0.2.4 (2022-02-06)

- Added support for parallel vacuum
- Fixed issue with index not reusing space

## 0.2.3 (2022-01-30)

- Added indexing progress for Postgres 12+
- Improved interrupt handling during index creation

## 0.2.2 (2022-01-15)

- Fixed compilation error on Mac ARM

## 0.2.1 (2022-01-02)

- Fixed `operator is not unique` error

## 0.2.0 (2021-10-03)

- Added support for Postgres 14

## 0.1.8 (2021-09-07)

- Added cast for `vector` to `real[]`

## 0.1.7 (2021-06-13)

- Added cast for `numeric[]` to `vector`

## 0.1.6 (2021-06-09)

- Fixed segmentation fault with `COUNT`

## 0.1.5 (2021-05-25)

- Reduced memory usage during index creation

## 0.1.4 (2021-05-09)

- Fixed kmeans for inner product
- Fixed multiple definition error with GCC 10

## 0.1.3 (2021-05-06)

- Added Dockerfile
- Fixed version

## 0.1.2 (2021-04-26)

- Vectorized distance calculations
- Improved cost estimation

## 0.1.1 (2021-04-25)

- Added binary representation for `COPY`
- Marked functions as `PARALLEL SAFE`

## 0.1.0 (2021-04-20)

- First release
20 changes: 20 additions & 0 deletions external/pgvector/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
ARG PG_MAJOR=15
FROM postgres:$PG_MAJOR
ARG PG_MAJOR

COPY . /tmp/pgvector

RUN apt-get update && \
apt-mark hold locales && \
apt-get install -y --no-install-recommends build-essential postgresql-server-dev-$PG_MAJOR && \
cd /tmp/pgvector && \
make clean && \
make OPTFLAGS="" && \
make install && \
mkdir /usr/share/doc/pgvector && \
cp LICENSE README.md /usr/share/doc/pgvector && \
rm -r /tmp/pgvector && \
apt-get remove -y build-essential postgresql-server-dev-$PG_MAJOR && \
apt-get autoremove -y && \
apt-mark unhold locales && \
rm -rf /var/lib/apt/lists/*
20 changes: 20 additions & 0 deletions external/pgvector/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group

Portions Copyright (c) 1994, The Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
49 changes: 49 additions & 0 deletions external/pgvector/META.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"name": "vector",
"abstract": "Open-source vector similarity search for Postgres",
"description": "Supports L2 distance, inner product, and cosine distance",
"version": "0.5.1",
"maintainer": [
"Andrew Kane <[email protected]>"
],
"license": {
"PostgreSQL": "http://www.postgresql.org/about/licence"
},
"prereqs": {
"runtime": {
"requires": {
"PostgreSQL": "11.0.0"
}
}
},
"provides": {
"vector": {
"file": "sql/vector.sql",
"docfile": "README.md",
"version": "0.5.1",
"abstract": "Open-source vector similarity search for Postgres"
}
},
"resources": {
"homepage": "https://github.com/pgvector/pgvector",
"bugtracker": {
"web": "https://github.com/pgvector/pgvector/issues"
},
"repository": {
"url": "https://github.com/pgvector/pgvector.git",
"web": "https://github.com/pgvector/pgvector",
"type": "git"
}
},
"generated_by": "Andrew Kane",
"meta-spec": {
"version": "1.0.0",
"url": "http://pgxn.org/meta/spec.txt"
},
"tags": [
"vectors",
"datatype",
"nearest neighbor search",
"approximate nearest neighbors"
]
}
Loading

0 comments on commit e691aeb

Please sign in to comment.