-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add example python project #197
Draft
Askir
wants to merge
3
commits into
main
Choose a base branch
from
jascha/sample-python-project
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,594
−0
Draft
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Python-generated files | ||
__pycache__/ | ||
*.py[oc] | ||
build/ | ||
dist/ | ||
wheels/ | ||
*.egg-info | ||
|
||
.venv | ||
.idea | ||
|
||
.env | ||
|
||
data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
3.11 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Code LLM Sync | ||
|
||
This is an example project using pgai in python with FastAPI, SQLAlchemy, alembic and pytest as well as python-pgvector. The purpose of this project is to figure out how we can better integrate with existing python frameworks and tooling. | ||
|
||
## Idea | ||
The idea is to provide a small service that keeps track of a code base through file watchers in postgres and embeds the files. You can then use these embeddings to find relevant code files for any LLM queries related to improvements on that code base without having to manually copy all the related code for it each time. | ||
|
||
Changes you make based on those results will then immediately propagate into the store -> repeat. | ||
|
||
Status: | ||
Currently there is a single API endpoint that allows to send a query and retrieve relevant code files based on the query (see tests for how it works). | ||
|
||
# Installation | ||
This project uses `uv` so `uv sync` and `uv run pytest` should get you going. | ||
|
||
## Useful commands | ||
### Run migrations | ||
```bash | ||
uv run alembic upgrade head | ||
``` | ||
|
||
### Run the server | ||
```bash | ||
uv run fastapi dev main.py | ||
``` | ||
|
||
### Run tests | ||
```bash | ||
uv run pytest | ||
``` | ||
|
||
### Run linting | ||
```bash | ||
uv run ruff format | ||
uv run ruff check --fix | ||
``` | ||
|
||
### Run type checking | ||
```bash | ||
uv run pyright | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# A generic, single database configuration. | ||
|
||
[alembic] | ||
# path to migration scripts. | ||
# Use forward slashes (/) also on windows to provide an os agnostic path | ||
script_location = alembic | ||
|
||
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s | ||
# Uncomment the line below if you want the files to be prepended with date and time | ||
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s | ||
|
||
# sys.path path, will be prepended to sys.path if present. | ||
# defaults to the current working directory. | ||
prepend_sys_path = . | ||
file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s | ||
# timezone to use when rendering the date within the migration file | ||
# as well as the filename. | ||
# If specified, requires the python>=3.9 or backports.zoneinfo library. | ||
# Any required deps can installed by adding `alembic[tz]` to the pip requirements | ||
# string value is passed to ZoneInfo() | ||
# leave blank for localtime | ||
# timezone = | ||
|
||
# max length of characters to apply to the "slug" field | ||
# truncate_slug_length = 40 | ||
|
||
# set to 'true' to run the environment during | ||
# the 'revision' command, regardless of autogenerate | ||
# revision_environment = false | ||
|
||
# set to 'true' to allow .pyc and .pyo files without | ||
# a source .py file to be detected as revisions in the | ||
# versions/ directory | ||
# sourceless = false | ||
|
||
# version location specification; This defaults | ||
# to alembic/versions. When using multiple version | ||
# directories, initial revisions must be specified with --version-path. | ||
# The path separator used here should be the separator specified by "version_path_separator" below. | ||
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions | ||
|
||
# version path separator; As mentioned above, this is the character used to split | ||
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep. | ||
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas. | ||
# Valid values for version_path_separator are: | ||
# | ||
# version_path_separator = : | ||
# version_path_separator = ; | ||
# version_path_separator = space | ||
# version_path_separator = newline | ||
version_path_separator = os # Use os.pathsep. Default configuration used for new projects. | ||
|
||
# set to 'true' to search source files recursively | ||
# in each "version_locations" directory | ||
# new in Alembic version 1.10 | ||
# recursive_version_locations = false | ||
|
||
# the output encoding used when revision files | ||
# are written from script.py.mako | ||
# output_encoding = utf-8 | ||
|
||
sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost/postgres | ||
|
||
|
||
[post_write_hooks] | ||
# post_write_hooks defines scripts or Python functions that are run | ||
# on newly generated revision scripts. See the documentation for further | ||
# detail and examples | ||
|
||
# format using "black" - use the console_scripts runner, against the "black" entrypoint | ||
# hooks = black | ||
# black.type = console_scripts | ||
# black.entrypoint = black | ||
# black.options = -l 79 REVISION_SCRIPT_FILENAME | ||
|
||
# lint with attempts to fix using "ruff" - use the exec runner, execute a binary | ||
# hooks = ruff | ||
# ruff.type = exec | ||
# ruff.executable = %(here)s/.venv/bin/ruff | ||
# ruff.options = --fix REVISION_SCRIPT_FILENAME | ||
|
||
# Logging configuration | ||
[loggers] | ||
keys = root,sqlalchemy,alembic | ||
|
||
[handlers] | ||
keys = console | ||
|
||
[formatters] | ||
keys = generic | ||
|
||
[logger_root] | ||
level = WARN | ||
handlers = console | ||
qualname = | ||
|
||
[logger_sqlalchemy] | ||
level = WARN | ||
handlers = | ||
qualname = sqlalchemy.engine | ||
|
||
[logger_alembic] | ||
level = INFO | ||
handlers = | ||
qualname = alembic | ||
|
||
[handler_console] | ||
class = StreamHandler | ||
args = (sys.stderr,) | ||
level = NOTSET | ||
formatter = generic | ||
|
||
[formatter_generic] | ||
format = %(levelname)-5.5s [%(name)s] %(message)s | ||
datefmt = %H:%M:%S |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Generic single-database configuration with an async dbapi. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
import asyncio | ||
from logging.config import fileConfig | ||
|
||
from sqlalchemy import pool | ||
from sqlalchemy.engine import Connection | ||
from sqlalchemy.ext.asyncio import async_engine_from_config | ||
|
||
from alembic import context | ||
from db.engine import Base | ||
|
||
# this is the Alembic Config object, which provides | ||
# access to the values within the .ini file in use. | ||
config = context.config | ||
|
||
# Interpret the config file for Python logging. | ||
# This line sets up loggers basically. | ||
if config.config_file_name is not None: | ||
fileConfig(config.config_file_name) | ||
|
||
# add your model's MetaData object here | ||
# for 'autogenerate' support | ||
# from myapp import mymodel | ||
# target_metadata = mymodel.Base.metadata | ||
target_metadata = Base.metadata | ||
|
||
# other values from the config, defined by the needs of env.py, | ||
# can be acquired: | ||
# my_important_option = config.get_main_option("my_important_option") | ||
# ... etc. | ||
|
||
|
||
def run_migrations_offline() -> None: | ||
"""Run migrations in 'offline' mode. | ||
|
||
This configures the context with just a URL | ||
and not an Engine, though an Engine is acceptable | ||
here as well. By skipping the Engine creation | ||
we don't even need a DBAPI to be available. | ||
|
||
Calls to context.execute() here emit the given string to the | ||
script output. | ||
|
||
""" | ||
url = config.get_main_option("sqlalchemy.url") | ||
context.configure( | ||
url=url, | ||
target_metadata=target_metadata, | ||
literal_binds=True, | ||
dialect_opts={"paramstyle": "named"}, | ||
) | ||
|
||
with context.begin_transaction(): | ||
context.run_migrations() | ||
|
||
|
||
def do_run_migrations(connection: Connection) -> None: | ||
context.configure(connection=connection, target_metadata=target_metadata) | ||
|
||
with context.begin_transaction(): | ||
context.run_migrations() | ||
|
||
|
||
async def run_async_migrations() -> None: | ||
"""In this scenario we need to create an Engine | ||
and associate a connection with the context. | ||
|
||
""" | ||
|
||
connectable = async_engine_from_config( | ||
config.get_section(config.config_ini_section, {}), | ||
prefix="sqlalchemy.", | ||
poolclass=pool.NullPool, | ||
) | ||
|
||
async with connectable.connect() as connection: | ||
await connection.run_sync(do_run_migrations) | ||
|
||
await connectable.dispose() | ||
|
||
|
||
def run_migrations_online() -> None: | ||
"""Run migrations in 'online' mode.""" | ||
connectable = config.attributes.get("connection", None) | ||
|
||
if connectable is None: | ||
if asyncio.get_event_loop().is_running(): | ||
# We're already in an event loop (e.g., pytest-asyncio) | ||
asyncio.create_task(run_async_migrations()) | ||
else: | ||
# No event loop running, create one | ||
asyncio.run(run_async_migrations()) | ||
else: | ||
do_run_migrations(connectable) | ||
|
||
|
||
if context.is_offline_mode(): | ||
run_migrations_offline() | ||
else: | ||
run_migrations_online() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
"""${message} | ||
|
||
Revision ID: ${up_revision} | ||
Revises: ${down_revision | comma,n} | ||
Create Date: ${create_date} | ||
|
||
""" | ||
from typing import Sequence, Union | ||
|
||
from alembic import op | ||
import sqlalchemy as sa | ||
${imports if imports else ""} | ||
|
||
# revision identifiers, used by Alembic. | ||
revision: str = ${repr(up_revision)} | ||
down_revision: Union[str, None] = ${repr(down_revision)} | ||
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)} | ||
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)} | ||
|
||
|
||
def upgrade() -> None: | ||
${upgrades if upgrades else "pass"} | ||
|
||
|
||
def downgrade() -> None: | ||
${downgrades if downgrades else "pass"} |
38 changes: 38 additions & 0 deletions
38
examples/code-llm-sync/alembic/versions/2024_11_04_1147-497e69a2bca9_add_code_files_table.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
"""empty message | ||
|
||
Revision ID: 497e69a2bca9 | ||
Revises: | ||
Create Date: 2024-11-04 11:47:57.345379 | ||
|
||
""" | ||
|
||
from typing import Sequence, Union | ||
|
||
from alembic import op | ||
import sqlalchemy as sa | ||
|
||
|
||
# revision identifiers, used by Alembic. | ||
revision: str = "497e69a2bca9" | ||
down_revision: Union[str, None] = None | ||
branch_labels: Union[str, Sequence[str], None] = None | ||
depends_on: Union[str, Sequence[str], None] = None | ||
|
||
|
||
def upgrade() -> None: | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.create_table( | ||
"code_files", | ||
sa.Column("id", sa.Integer(), nullable=False), | ||
sa.Column("file_name", sa.String(length=255), nullable=False), | ||
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=True), | ||
sa.Column("contents", sa.Text(), nullable=True), | ||
sa.PrimaryKeyConstraint("id"), | ||
) | ||
# ### end Alembic commands ### | ||
|
||
|
||
def downgrade() -> None: | ||
# ### commands auto generated by Alembic - please adjust! ### | ||
op.drop_table("code_files") | ||
# ### end Alembic commands ### |
54 changes: 54 additions & 0 deletions
54
...es/code-llm-sync/alembic/versions/2024_11_04_1148-bb79790304f7_add_embeddings_to_table.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
"""Add embeddings to table | ||
|
||
Revision ID: bb79790304f7 | ||
Revises: 497e69a2bca9 | ||
Create Date: 2024-11-04 11:48:35.807278 | ||
|
||
""" | ||
|
||
from typing import Sequence, Union | ||
|
||
from alembic import op | ||
|
||
|
||
# revision identifiers, used by Alembic. | ||
revision: str = "bb79790304f7" | ||
down_revision: Union[str, None] = "497e69a2bca9" | ||
branch_labels: Union[str, Sequence[str], None] = None | ||
depends_on: Union[str, Sequence[str], None] = None | ||
|
||
|
||
def upgrade() -> None: | ||
# Enable required extensions | ||
op.execute("CREATE EXTENSION IF NOT EXISTS vector CASCADE;") | ||
op.execute("CREATE EXTENSION IF NOT EXISTS ai CASCADE;") | ||
|
||
# Create the vectorizer | ||
op.execute(""" | ||
SELECT ai.create_vectorizer( | ||
'code_files'::regclass, | ||
destination => 'code_files_embeddings', | ||
embedding => ai.embedding_openai('text-embedding-3-small', 768), | ||
chunking => ai.chunking_recursive_character_text_splitter( | ||
'contents', | ||
chunk_size => 1000, | ||
chunk_overlap => 200 | ||
), | ||
formatting => ai.formatting_python_template( | ||
'File: $file_name\n\nContents:\n$chunk' | ||
) | ||
); | ||
""") | ||
Comment on lines
+28
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is how I am creating a vectorizer via migrations. I guess it works but it's not as pretty as the native alembic functions. It's probably possible to have nice wrappers around those somehow. |
||
|
||
|
||
def downgrade() -> None: | ||
# Drop the vectorizer | ||
op.execute(""" | ||
SELECT ai.drop_vectorizer( | ||
(SELECT id FROM ai.vectorizer WHERE target_table = 'code_files_embeddings_store') | ||
); | ||
""") | ||
|
||
# Drop the created views and tables | ||
op.execute("DROP VIEW IF EXISTS code_files_embeddings;") | ||
op.execute("DROP TABLE IF EXISTS code_files_embeddings_store;") |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was generated with
alembic revision --autogenerate
which compares the database to the sqlalchemy models and generates a naive version of a diff. Works quite well for adding new tables. This of course doesn't work for our views.