Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Add IO plugins to Python API reference #21028

Merged
merged 6 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/user-guide/plugins/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Plugins

Polars allows you to extend it's functionality with either Expression plugins or IO plugins.
Polars allows you to extend its functionality with either Expression plugins or IO plugins.

- [Expression plugins](./expr_plugins.md)
- [IO plugins](./io_plugins.md)
Expand Down
10 changes: 5 additions & 5 deletions docs/source/user-guide/plugins/io_plugins.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# IO Plugins

Besides [expression plugins](./index.md), we also support IO plugins. These allow you to register
different file formats as sources to the Polars engines. Because sources can move data zero copy via
Arrow FFI and sources can produce large chunks of data before returning, we've decided to interface
to IO plugins via Python for now, as we don't think the short time the GIL is needed should lead to
any contention.
Besides [expression plugins](./expr_plugins.md), we also support IO plugins. These allow you to
register different file formats as sources to the Polars engines. Because sources can move data zero
copy via Arrow FFI and sources can produce large chunks of data before returning, we've decided to
interface to IO plugins via Python for now, as we don't think the short time the GIL is needed
should lead to any contention.

E.g. an IO source can read their dataframe's in rust and only at the rendez-vous move the data
zero-copy having only a short time the GIL is needed.
Expand Down
48 changes: 40 additions & 8 deletions py-polars/docs/source/reference/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,45 @@ Plugins
=======
.. currentmodule:: polars

Plugins allow for extending Polars' functionality. See the
`user guide <https://docs.pola.rs/user-guide/plugins/>`_ for more information
and resources.
Polars allows you to extend its functionality with either Expression plugins or IO plugins.
See the `user guide <https://docs.pola.rs/user-guide/plugins/>`_ for more information and resources.

Available plugin utility functions are:
Expression plugins
------------------

.. automodule:: polars.plugins
:members:
:autosummary:
:autosummary-no-titles:
Expression plugins are the preferred way to create user defined functions. They allow you to compile
a Rust function and register that as an expression into the Polars library. The Polars engine will
dynamically link your function at runtime and your expression will run almost as fast as native
expressions. Note that this works without any interference of Python and thus no GIL contention.

See the `expression plugins section of the user guide <https://docs.pola.rs/user-guide/plugins/expr_plugins/>`_
for more information.

.. autosummary::
:toctree: api/

plugins.register_plugin_function


IO plugins
------------------

IO plugins allow you to register different file formats as sources to the Polars engines.

See the `IO plugins section of the user guide <https://docs.pola.rs/user-guide/plugins/io_plugins/>`_
for more information.

.. note::

The ``io.plugins`` module is not imported by default in order to optimise import speed of
the primary ``polars`` module. Either import ``polars.io.plugins`` and *then* use that
namespace, or import ``register_io_source`` from the full module path, e.g.:

.. code-block:: python

from polars.io.plugins import register_io_source

.. autosummary::
:toctree: api/

io.plugins.register_io_source
14 changes: 13 additions & 1 deletion py-polars/polars/io/plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ def register_io_source(
"""
Register your IO plugin and initialize a LazyFrame.

See the `user guide <https://docs.pola.rs/user-guide/plugins/io_plugins>`_
for more information about plugins.

.. warning::
This functionality is considered **unstable**. It may be changed
at any point without it being considered a breaking change.


Parameters
----------
callable
Expand All @@ -36,17 +44,21 @@ def register_io_source(
predicate
Polars expression. The reader must filter
their rows accordingly.
n_rows:
n_rows
Materialize only n rows from the source.
The reader can stop when `n_rows` are read.
batch_size
A hint of the ideal batch size the reader's
generator must produce.

The function should return a DataFrame batch
(an iterator over individual DataFrames).
schema
Schema that the reader will produce before projection pushdown.

Returns
-------
LazyFrame
"""

def wrap(
Expand Down
2 changes: 1 addition & 1 deletion py-polars/polars/plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def register_plugin_function(
"""
Register a plugin function.
See the `user guide <https://docs.pola.rs/user-guide/plugins/>`_
See the `user guide <https://docs.pola.rs/user-guide/plugins/expr_plugins>`_
for more information about plugins.
Parameters
Expand Down