-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* First updates. * Update about section. * Updates on review. * First updates. * Update about section. * Updates on review. * chore: first draft. * chore: first draft. * chore: first draft. * chore: align about. * chore: align about. * chore: align about. * First updates. * Updates on review. * First updates. * chore: weird rebase stuff. * chore: put license back. * Fix spelling of PostgreSQL Signed-off-by: Mike Freedman <[email protected]> --------- Signed-off-by: Mike Freedman <[email protected]> Co-authored-by: Mike Freedman <[email protected]> Co-authored-by: Matvey Arye <[email protected]>
- Loading branch information
1 parent
5eae3d7
commit b083782
Showing
6 changed files
with
249 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
You find the Timescale Code of Conduct at <https://www.timescale.com/code-of-conduct>. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Contributing to pgvectorscale | ||
|
||
We appreciate any help the community can provide to make pgvectorscale better! | ||
|
||
You can help in different ways: | ||
|
||
* Open an [issue](https://github.com/timescale/pgvectorscale/issues) with a | ||
bug report, build issue, feature request, suggestion, etc. | ||
|
||
* Fork this repository and submit a pull request | ||
|
||
For any particular improvement you want to make, it can be beneficial to | ||
begin discussion on the GitHub issues page. This is the best place to | ||
discuss your proposed improvement (and its implementation) with the core | ||
development team. | ||
|
||
Before we accept any code contributions, pgvectorscale contributors need to | ||
sign the [Contributor License Agreement](https://cla-assistant.io/timescale/pgvectorscale) (CLA). By signing a CLA, we can | ||
ensure that the community is free and confident in its ability to use your | ||
contributions. | ||
|
||
## Development | ||
|
||
Please follow our DEVELOPMENT doc for [instructions how to develop and test](https://github.com/timescale/pgvectorscale/blob/main/DEVELOPMENT.md). | ||
|
||
## Code review workflow | ||
|
||
* Sign the [Contributor License Agreement](https://cla-assistant.io/timescale/pgvectorscale) (CLA) if you're a new contributor. | ||
|
||
* Develop on your local branch: | ||
|
||
* Fork the repository and create a local feature branch to do work on, | ||
ideally on one thing at a time. Don't mix bug fixes with unrelated | ||
feature enhancements or stylistical changes. | ||
|
||
* Hack away. Add tests for non-trivial changes. | ||
|
||
* Run the [test suite](#testing) and make sure everything passes. | ||
|
||
* When committing, be sure to write good commit messages according to [these | ||
seven rules](https://chris.beams.io/posts/git-commit/#seven-rules). Doing | ||
`git commit` prints a message if any of the rules is violated. | ||
Stylistically, | ||
we use commit message titles in the imperative tense, e.g., `Add | ||
merge-append query optimization for time aggregate`. In the case of | ||
non-trivial changes, include a longer description in the commit message | ||
body explaining and detailing the changes. That is, a commit message | ||
should have a short title, followed by a empty line, and then | ||
followed by the longer description. | ||
|
||
* When committing, link which GitHub issue of [this | ||
repository](https://github.com/timescale/pgvectorscale/issues) is fixed or | ||
closed by the commit with a [linking keyword recognised by | ||
GitHub](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). | ||
For example, if the commit fixes bug 123, add a line at the end of the | ||
commit message with `Fixes #123`, if the commit implements feature | ||
request 321, add a line at the end of the commit message `Closes #321`. | ||
This will be recognized by GitHub. It will close the corresponding issue | ||
and place a hyperlink under the number. | ||
|
||
* Push your changes to an upstream branch: | ||
|
||
* Make sure that each commit in the pull request will represent a | ||
logical change to the code, will compile, and will pass tests. | ||
|
||
* Make sure that the pull request message contains all important | ||
information from the commit messages including which issues are | ||
fixed and closed. If a pull request contains one commit only, then | ||
repeating the commit message is preferred, which is done automatically | ||
by GitHub when it creates the pull request. | ||
|
||
* Rebase your local feature branch against main (`git fetch origin`, | ||
then `git rebase origin/main`) to make sure you're | ||
submitting your changes on top of the newest version of our code. | ||
|
||
* When finalizing your PR (i.e., it has been approved for merging), | ||
aim for the fewest number of commits that | ||
make sense. That is, squash any "fix up" commits into the commit they | ||
fix rather than keep them separate. Each commit should represent a | ||
clean, logical change and include a descriptive commit message. | ||
|
||
* Push your commit to your upstream feature branch: `git push -u <yourfork> my-feature-branch` | ||
|
||
* Create and manage pull request: | ||
|
||
* [Create a pull request using GitHub](https://help.github.com/articles/creating-a-pull-request). | ||
If you know a core developer well suited to reviewing your pull | ||
request, either mention them (preferably by GitHub name) in the PR's | ||
body or [assign them as a reviewer](https://help.github.com/articles/assigning-issues-and-pull-requests-to-other-github-users/). | ||
|
||
* Address feedback by amending your commit(s). If your change contains | ||
multiple commits, address each piece of feedback by amending that | ||
commit to which the particular feedback is aimed. | ||
|
||
* The PR is marked as accepted when the reviewer thinks it's ready to be | ||
merged. Most new contributors aren't allowed to merge themselves; in | ||
that case, we'll do it for you. | ||
|
||
## Testing | ||
|
||
Every non-trivial change to the code base should be accompanied by a | ||
relevant addition to or modification of the test suite. | ||
|
||
Please check that the full test suite (including your test additions | ||
or changes) passes successfully on your local machine **before you | ||
open a pull request**. | ||
|
||
See our [testing](https://github.com/timescale/pgvectorscale/blob/main/DEVELOPMENT.md#testing) | ||
instructions for help with how to test. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Setup your pgvectorscale developer environment | ||
|
||
You build pgvectorscale from source, then integrate the extension into each database in your PostgreSQL environment. | ||
|
||
## pgvectorscale prerequisites | ||
|
||
To create a pgvectorscale developer environment, you need the following on your local machine: | ||
|
||
* [PostgreSQL v16](https://docs.timescale.com/self-hosted/latest/install/installation-linux/#install-and-configure-timescaledb-on-postgresql) | ||
* [pgvector](https://github.com/pgvector/pgvector/blob/master/README.md) | ||
* Development packages: | ||
``` | ||
sudo apt-get install make gcc pkg-config clang postgresql-server-dev-16 libssl-dev | ||
``` | ||
|
||
* [Rust][rust-language]: | ||
```shell | ||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh | ||
``` | ||
|
||
* [Cargo-pgrx][cargo-pgrx]: | ||
```shell | ||
cargo install --locked cargo-pgrx | ||
``` | ||
You must reinstall cargo-pgrx whenever you update Rust, cargo-pgrx must | ||
be built with the same compiler as pgvectorscale. | ||
|
||
* The pgrx development environment: | ||
```shell | ||
cargo pgrx init --pg16 pg_config | ||
``` | ||
|
||
## Build and install pgvectorscale on your database | ||
|
||
1. In Terminal, clone this repository and switch to the extension subdirectory: | ||
|
||
```shell | ||
git clone https://github.com/timescale/pgvectorscale && \ | ||
cd pgvectorscale/pgvectorscale | ||
``` | ||
|
||
1. Build pgvectorscale: | ||
|
||
```shell | ||
cargo pgrx install --release | ||
``` | ||
|
||
1. Connect to the database: | ||
|
||
```bash | ||
psql -d "postgres://<username>@<password>:<port>/<database-name>" | ||
``` | ||
|
||
1. Add pgvectorscale to your database: | ||
|
||
```postgresql | ||
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE; | ||
``` | ||
|
||
|
||
[pgvector]: https://github.com/pgvector/pgvector/blob/master/README.md | ||
[rust-language]: https://www.rust-lang.org/ | ||
[cargo-pgrx]: https://lib.rs/crates/cargo-pgrx |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,76 +1,97 @@ | ||
# pgvectorscale | ||
|
||
A vector index for speeding up ANN search in `pgvector`. | ||
<p></p> | ||
<div align=center> | ||
<picture align=center> | ||
<source media="(prefers-color-scheme: dark)" srcset="https://assets.timescale.com/docs/images/timescale-logo-dark-mode.svg"> | ||
<source media="(prefers-color-scheme: light)" srcset="https://assets.timescale.com/docs/images/timescale-logo-light-mode.svg"> | ||
<img alt="Timescale logo" > | ||
</picture> | ||
|
||
## 💾 Building and Installing pgvectorscale | ||
<h3>Use pgvectorscale to build scalable AI applications with higher performance, | ||
embedding search and cost-efficient storage. </h3> | ||
|
||
### From source | ||
[![Docs](https://img.shields.io/badge/Read_the_Timescale_docs-black?style=for-the-badge&logo=readthedocs&logoColor=white)](https://docs.timescale.com/) | ||
[![SLACK](https://img.shields.io/badge/Ask_the_Timescale_community-black?style=for-the-badge&logo=slack&logoColor=white)](https://timescaledb.slack.com/archives/C4GT3N90X) | ||
[![Try Timescale for free](https://img.shields.io/badge/Try_Timescale_for_free-black?style=for-the-badge&logo=timescale&logoColor=white)](https://console.cloud.timescale.com/signup) | ||
</div> | ||
|
||
#### Prerequisites | ||
|
||
Building the extension requires valid rust, along with the postgres headers for whichever version of postgres you are running, and pgrx. We recommend installing rust using the official instructions: | ||
```shell | ||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh | ||
``` | ||
|
||
You should install the appropriate build tools and postgres headers in the preferred manner for your system. You may also need to install OpenSSL. For Ubuntu you can follow the postgres install instructions then run | ||
pgvectorscale complements [pgvector][pgvector], the open-source vector data extension for PostgreSQL, and introduces the following key innovations: | ||
- A DiskANN index: based on research from Microsoft | ||
- Statistical Binary Quantization: developed by Timescale researchers, This feature improves on standard | ||
Binary Quantization. | ||
|
||
```shell | ||
sudo apt-get install make gcc pkg-config clang postgresql-server-dev-16 libssl-dev | ||
``` | ||
Timescale’s benchmarks reveal that with pgvectorscale, PostgreSQL achieves **28x lower p95 latency**, and | ||
**16x higher query throughput** for approximate nearest neighbor queries at 99% recall. | ||
|
||
Next you need cargo-pgrx, which can be installed with | ||
```shell | ||
cargo install --locked cargo-pgrx | ||
``` | ||
<div align=center> | ||
|
||
You must reinstall cargo-pgrx whenever you update your Rust compiler, since cargo-pgrx needs to be built with the same compiler as pgvectorscale. | ||
![Benchmarks](https://assets.timescale.com/docs/images/benchmark-comparison-pgvectorscale-pinecone.png) | ||
|
||
Finally, setup the pgrx development environment with | ||
```shell | ||
cargo pgrx init --pg16 pg_config | ||
``` | ||
PostgreSQL costs are 21% those of Pinecone s1, just saying. | ||
</div> | ||
|
||
#### Building and installing the extension | ||
In contrast to pgvector, which is written in C, pgvectorscale is developed in [Rust][rust-language], | ||
offering the PostgreSQL community a new avenue for contributing to vector support. | ||
|
||
Download or clone this repository, and switch to the extension subdirectory, e.g. | ||
```shell | ||
git clone https://github.com/timescale/pgvectorscale && \ | ||
cd pgvectorscale/pgvectorscale | ||
``` | ||
Timescale offers the following high performance journeys: | ||
|
||
Then run | ||
```shell | ||
cargo pgrx install --release | ||
``` | ||
* **App developer and DBA**: try out pgvectorscale functionality in Timescale Cloud. | ||
* [Enable pgvectorscale in a Timescale service](#enable-pgvectorscale-in-a-timescale-service) | ||
* **Extension contributor**: contribute to pgvectorscale. | ||
* [Build pgvectorscale from source in a developer environment](./DEVELOPMENT.md) | ||
* **Everyone**: check the benchmark results for yourself. | ||
* [Test pgvectorscale performance](#test-pgvectorscale-performance) | ||
|
||
To initialize the extension after installation, enter the following into psql: | ||
## Enable pgvectorscale in a Timescale service | ||
|
||
```sql | ||
CREATE EXTENSION vectorscale; | ||
``` | ||
To enable pgvectorscale: | ||
|
||
## ✏️ Get Involved | ||
1. Create a new [Timescale Service](https://console.cloud.timescale.com/dashboard/create_services). | ||
|
||
The pgvectorscale project is still in it's early stage as we decide our priorities and what to implement. As such, now is a great time to help shape the project's direction! Have a look at the list of features we're thinking of working on and feel free to comment on the features, expand the list, or hop on the Discussions forum for more in-depth discussions. | ||
If you want to use an existing service, pgvectorscale is added as an available extension on the first maintenance window | ||
after the pgvectorscale release date. | ||
|
||
### 🔨 Testing | ||
See above for prerequisites and installation instructions. | ||
1. Connect to your Timescale service: | ||
```bash | ||
psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>" | ||
``` | ||
|
||
You can run tests against a postgres version pg16 using | ||
```shell | ||
cargo pgrx test ${postgres_version} | ||
``` | ||
1. Create the pgvectorscale extension: | ||
|
||
To run all tests run: | ||
```shell | ||
cargo test -- --ignored && cargo pgrx test ${postgres_version} | ||
``` | ||
```postgresql | ||
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE; | ||
``` | ||
|
||
### 🐯 About Timescale | ||
The `CASCADE` automatically installs the dependencies. | ||
|
||
TimescaleDB is a distributed time-series database built on PostgreSQL that scales to over 10 million of metrics per second, supports native compression, handles high cardinality, and offers native time-series capabilities, such as data retention policies, continuous aggregate views, downsampling, data gap-filling and interpolation. | ||
## Test pgvectorscale performance | ||
|
||
TimescaleDB also supports full SQL, a variety of data types (numerics, text, arrays, JSON, booleans), and ACID semantics. Operationally mature capabilities include high availability, streaming backups, upgrades over time, roles and permissions, and security. | ||
To check the Timescale benchmarks in your pgvectorscale environment: | ||
|
||
TimescaleDB has a large and active user community (tens of millions of downloads, hundreds of thousands of active deployments, Slack channels with thousands of members). | ||
1. Jonetas, this is for you :-). | ||
|
||
## Get involved | ||
|
||
pgvectorscale is still at an early stage. Now is a great time to help shape the | ||
direction of this project; we are currently deciding priorities. Have a look at the | ||
list of features we're thinking of working on. Feel free to comment, expand | ||
the list, or hop on the Discussions forum. | ||
## About Timescale | ||
Timescale Cloud is a high-performance developer focused cloud that provides PostgreSQL services | ||
enhanced with our blazing fast vector search. Timescale services are built using TimescaleDB and | ||
PostgreSQL extensions, like this one. Timescale Cloud provides high availability, streaming | ||
backups, upgrades over time, roles and permissions, and great security. | ||
TimescaleDB is an open-source time-series database designed for scalability and performance, | ||
built on top of PostgreSQL. It provides SQL support for time-series data, allowing users to | ||
leverage PostgreSQL's rich ecosystem while optimizing for high ingest rates and fast query | ||
performance. TimescaleDB includes features like automated data retention policies, compression | ||
and continuous aggregates, making it ideal for applications like monitoring, IoT, AI and | ||
real-time analytics. | ||
|
||
|
||
[pgvector]: https://github.com/pgvector/pgvector/blob/master/README.md | ||
[rust-language]: https://www.rust-lang.org/ |