Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid normalising identifiers #30

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Dietr1ch
Copy link

Normalising identifiers for column names and tables yields to surprising behaviour (#29, apache/datafusion#13649) and while there's workarounds, it's hard for people that are just trying to start using bdt to discover them (quoting, renaming their data).

I think that a tool shouldn't have surprising behaviour like this.

The tracked `ahash` no longer builds in nightly as some SIMD features were
dropped.
This is awfully surprising to new users, and I think it's a bad thing to do even
if people more familiar with SQL DB engines won't be too surprised about it.
@Dietr1ch
Copy link
Author

This is a breaking change, but given the nature of the tool it's an improvement too, as this normalisation prevents querying non-snake_case tables and fields coming from imported files (csv, parquet, etc...).

This is also untested, but I didn't find existing tests to mimic. I did try a local build and runs as expected. I was now able to query sample parquet files for Iris data from https://www.tablab.app/parquet/sample, which is the kind of thing that bdt should help with, but wasn't able to because of normalisation (which is "expected" on true DB/SQL world, but not in the real world of data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant