feat(datasets): Created table_args to pass to `create_table`, `create_view`, and `table` methods #909

mark-druffel · 2024-10-25T22:16:26Z

Description

Development notes

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes
Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

…o avoid breaking changes Signed-off-by: Mark Druffel <[email protected]>

Signed-off-by: Mark Druffel <[email protected]>

deepyaman

Just leaving initial comments; happy to review later once it's ready.

kedro-datasets/RELEASE.md

deepyaman · 2024-10-27T20:14:11Z

kedro-datasets/kedro_datasets/ibis/table_dataset.py


    def save(self, data: ir.Table) -> None:
        if self._table_name is None:
            raise DatasetError("Must provide `table_name` for materialization.")

        writer = getattr(self.connection, f"create_{self._materialized}")
-        writer(self._table_name, data, **self._save_args)
+        writer(self._table_name, data, **self._table_args)


Is this right? I think the table args should only apply to the table call, but haven't looked into it deeply before commenting now.

@deepyaman Sorry this is a little confusing so just adding a bit more context.

This PR

The table method takes the database argument, butcreate_table & create_view methods both take the database and overwrite arguments. The overwrite argument is already in save_args, but I'm assuming save_args will be removed from TableDataset in version 6. To avoid breaking changes, but also minimize change between this release and version 6 I just added the new parameters (database) to table_args and left the old parameters alone. is already in the save_args they both also have overwrite which is already in _save_args.

To avoid breaking changes but still allow create_table and create_view arguments to flow through, I combined _save_args and _table_args here.

Version 6

I am assuming that save_args & load_args will be dropped from TableDataset in version 6. In that change, I'd assume the arguments still used from load_args and save_args would be added to table_args. To make TableDataset and FileDataset look / feel similar, we could consider just making a commensurate file_args. I've not used 5.1 enough yet to say with certainty, but I can't think of a reason a user would want different values in load_args than save_args now that it's split from TableDataset (i.e. the filepath, file_type, sep, etc. would be same for load and save)? I may be totally overlooking some things though 🤷‍♂️

bronze_tracks: type: ibis.FileDataset # use `to_<file_format>` (write) & `read_<file_format>` (read) connection: backend: pyspark file_args: filepath: hf://datasets/maharshipandya/spotify-tracks-dataset/dataset.csv file_format: csv materialized: view overwrite: True table_name: tracks #`to_<file_format>` in ibis has no database parameter so there's no ability to write to a specific catalog / db schema atm, `to_<file_format>` just writes to w/e is active sep: "," silver_tracks: type: ibis.TableDataset # would use `create_<materialized>` (write) & `table` (read) connection: backend: pyspark table_args: name: tracks database: spotify.silver overwrite: True

Signed-off-by: Mark Druffel <[email protected]>

Signed-off-by: Deepyaman Datta <[email protected]>

Signed-off-by: Mark Druffel <[email protected]>

…ark-druffel/kedro-plugins into fix/datasets/ibis-TableDataset

mark-druffel · 2024-11-01T22:55:30Z

@deepyaman I changed this to ready for review, but I'm failing a bunch of steps. I tried to follow the guidelines, but when I run the make tests they all fail saying No rule. Any chance you can take a look and give me a bit of guidance? Sorry just not sure where to go from here 😬

Aside from the failing checks, I tested this version of table_dataset.py on a duckdb pipeline, a pyspark pipeline, and a pyspark pipeline on databricks and it seems to be working. My only open question relates to my musing above about the expected format of TableDataset and FileDataset above.

mark-druffel · 2024-11-05T18:54:23Z

@jakepenzak For visibility

mark-druffel changed the title ~~Created table_args to pass to create_table, create_view, and table methods~~ Fix(datasets): Created table_args to pass to create_table, create_view, and table methods Oct 25, 2024

mark-druffel changed the title ~~Fix(datasets): Created table_args to pass to create_table, create_view, and table methods~~ fix(datasets): Created table_args to pass to create_table, create_view, and table methods Oct 25, 2024

mark-druffel added 2 commits October 25, 2024 15:44

Added table_args, combined save_args and table_args for save method t…

add5a38

…o avoid breaking changes Signed-off-by: Mark Druffel <[email protected]>

Added docstring and release note

ef3712e

Signed-off-by: Mark Druffel <[email protected]>

mark-druffel force-pushed the fix/datasets/ibis-TableDataset branch from 47331ff to ef3712e Compare October 25, 2024 22:44

deepyaman reviewed Oct 27, 2024

View reviewed changes

mark-druffel changed the title ~~fix(datasets): Created table_args to pass to create_table, create_view, and table methods~~ feat(datasets): Created table_args to pass to create_table, create_view, and table methods Oct 28, 2024

Updated release notes as feature, not bug

0a40d23

Signed-off-by: Mark Druffel <[email protected]>

mark-druffel closed this Oct 28, 2024

mark-druffel deleted the fix/datasets/ibis-TableDataset branch October 28, 2024 19:39

deepyaman reopened this Oct 28, 2024

deepyaman and others added 5 commits October 28, 2024 16:58

Merge branch 'main' into fix/datasets/ibis-TableDataset

d6ea74e

Signed-off-by: Deepyaman Datta <[email protected]>

Changes to fix bug with table_args & save_args

8897a01

Signed-off-by: Mark Druffel <[email protected]>

Merge branch 'main' into fix/datasets/ibis-TableDataset

e0e2df9

Linting on table_dataset.py

6b86358

Signed-off-by: Mark Druffel <[email protected]>

Merge branch 'fix/datasets/ibis-TableDataset' of https://github.com/m…

b738a92

…ark-druffel/kedro-plugins into fix/datasets/ibis-TableDataset

mark-druffel marked this pull request as ready for review November 1, 2024 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets): Created table_args to pass to `create_table`, `create_view`, and `table` methods #909

feat(datasets): Created table_args to pass to `create_table`, `create_view`, and `table` methods #909

mark-druffel commented Oct 25, 2024 •

edited

Loading

deepyaman left a comment

deepyaman Oct 27, 2024

mark-druffel Oct 28, 2024 •

edited

Loading

mark-druffel commented Nov 1, 2024

mark-druffel commented Nov 5, 2024

feat(datasets): Created table_args to pass to create_table, create_view, and table methods #909

Are you sure you want to change the base?

feat(datasets): Created table_args to pass to create_table, create_view, and table methods #909

Conversation

mark-druffel commented Oct 25, 2024 • edited Loading

Description

Development notes

Checklist

deepyaman left a comment

Choose a reason for hiding this comment

deepyaman Oct 27, 2024

Choose a reason for hiding this comment

mark-druffel Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

This PR

Version 6

mark-druffel commented Nov 1, 2024

mark-druffel commented Nov 5, 2024

feat(datasets): Created table_args to pass to `create_table`, `create_view`, and `table` methods #909

feat(datasets): Created table_args to pass to `create_table`, `create_view`, and `table` methods #909

mark-druffel commented Oct 25, 2024 •

edited

Loading

mark-druffel Oct 28, 2024 •

edited

Loading