Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Embedded Python VM, Plugins and UDFs #25537

Open
pauldix opened this issue Nov 12, 2024 · 0 comments
Open

Epic: Embedded Python VM, Plugins and UDFs #25537

pauldix opened this issue Nov 12, 2024 · 0 comments

Comments

@pauldix
Copy link
Member

pauldix commented Nov 12, 2024

This is an umbrella for many issues related to adding a Python VM to the database. This will require work in API, CLI, and the internals. There should be an easy way for users to define Python based plugins that run inside the database that are able to receive data, process it, interact with third party APIs and services, and send data back into the database. Ideally, the runtime would be able to import libraries from the broader Python ecosystem and work with them.

This issue is by no means exhaustive, but can serve as a jumping off point for further refinement and detail.

Here are the contexts under which we'd want to run:

  1. On write (or rather on wal flush, send data to the VM)
  2. On Parquet persist (when persistence is triggered, we'll want to either persist and then run the persisted data through the plugin, or run through the plugin and persist the output)
  3. On a schedule (like cron)
  4. Ad-hoc (submitted like a query)

Within the plugin context, the Python script should have some API automatically imported that allows it to make queries to the database, or write data out to the database. We'll also want to have an in-memory key/value store accessible by the script. Each script should have its own sandboxed store.

We'll need to have an API and CLI for submitting new Python scripts to the DB. We'll also want to collect logs from the scripts and make those accessible via system tables in the query API.

We'll need to have a method for storing and accessing secrets in plugins for connecting to services.

We ultimately want these scripts to be user defined plugins or functions. We'll run a service (like crates.io) for hosting these and will want a method to quickly and easily bring plugins or functions from that service into the database.

Some plugin ideas:

  1. Consume data from Kafka to write into the DB
  2. Collect system metrics to write into the DB
  3. Monitor and alert for specific conditions (threshold values, deadman alerts, etc)
  4. Write to an Iceberg Catalog (i.e. on Parquet persist)

We should create these plugins ourselves to test drive the developer experience of plugin creation, operation, and debugging.

@pauldix pauldix changed the title Epic: Plugins and UDFs Epic: Embedded Pythong VM, Plugins and UDFs Nov 12, 2024
@philjb philjb changed the title Epic: Embedded Pythong VM, Plugins and UDFs Epic: Embedded Python VM, Plugins and UDFs Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant