-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to export tables #4090
base: develop
Are you sure you want to change the base?
Conversation
@ghislaineguerin @kgodey I've assigned you to this PR to get a sign off on the UX. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall method seems reasonable. Nice work on that!
However, I think most of the SQL code and a large part of the Python code could be removed if you can get ServerCursor.fetchmany
working (or just use the builtin itersize
param and loop through rows one at a time clientside).
If there's a really good reason that doesn't work, what is it?
Also, please just pull apart the msar.list_records_from_table
result to get the actual records to return. I don't want to have to maintain that formatted query in multiple places if we can avoid it. Alternatively, you could extract the CTE-generating code and call it from both the record lister and the iterable record lister SQL functions. I prefer the former approach, since it prioritizes the efficiency of the hot path, i.e., listing records for display normally.
while True: | ||
cursor.execute(fetch_query) | ||
rows = cursor.fetchall() | ||
if not rows: | ||
break | ||
yield rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the fetchmany
functionality of the psycopg.ServerCursor
class? This seems like it's reinventing the wheel.
See the docs for ServerCursor.
I think it has all the pieces you need, and it'd be more convenient than rolling our own if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need a SQL call to build the select query which accepts filters and sorting in the exact format as
list
does. - We need a SQL call to fetch the names of the columns (as the existing functions rely on oids).
- We need a server side cursor for the records.
If we use ServerCursor
class for the records, we'd need to make additional calls for build_select_query
and fetch_selectable_column_names
. We'd also need to have additional logic to ensure that the column names are in the same order as the columns in the records result.
It's much simpler to fetch everything via a single plpgsql function.
'SELECT %1$s FROM %2$I.%3$I %7$s %6$s LIMIT %4$L OFFSET %5$L', | ||
COALESCE(columns_sql, 'NULL'), | ||
msar.get_relation_schema_name(tab_id), | ||
msar.get_relation_name(tab_id), | ||
limit_, | ||
offset_, | ||
msar.build_order_by_expr(tab_id, order_), | ||
msar.build_where_clause(tab_id, filter_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code replication is avoidable by just calling the record lister, then pulling apart the result. Given that this is not exactly a "Hot Path", I think the very minor inefficiency involved is well worth the increase in maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code replication is avoidable by just calling the record lister, then pulling apart the result.
The record lister queries all the records and stores it into a single json, so calling that function means that we cannot use a cursor to iterate through the results. That would make the entire process inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, maintaining that SQL construction in multiple different functions isn't a good idea. We'll need to find a better way.
Update for anyone following the PR: @mathemancer and I had a call discussing the approach to take and we'll be continuing the discussion after the holidays. This PR might remain idle until then. |
Fixes #4067
This PR implements downloads for tables using the following approach:
Querying data:
Transferring:
Approaches tested
Querying data:
COPY (SELECT * FROM table) TO
instead ofCOPY table TO
(since we need filtering & sorting and support for explorations in the future). This syntax has negligible performance difference compared to reading from a cursor.COPY
is not supported inside a db function.Transferring:
What is not implemented in this PR:
UX changes from the design
Screen.Recording.2024-12-18.at.7.32.06.PM.mov
Checklist
Update index.md
).develop
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin