-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"native" table semantics are confusing #2778
Comments
I agree in theory: I think our storage engine integration could be more clear, and we've colloquially been using the word "native" (because it's the name of the module in the code, mostly). I just edited the word native out of the docs page, because I think it ends up being confusing. So if the desired outcome of the issue is "change the language we use to talk about this feature," then I think we can make this change. There are other layers here too:
That seems like a bigger effort and one that we should undertake (and are starting to work on even this week.) but I'm not sure that this issue is the best place to plan and track that work. |
I think it's this and also to rename the "native" modules to "delta" in the code to reflect this. I'm not suggesting any features or adding additional storage engine support here, just a rename to provide more transparency around our usage of delta as a storage engine. |
There's a |
I don't think there would need to be any functional changes, but on further inspection, i think it may be easier to do the rename after #2744. That should put us in a much better spot for moving around datasource logic. |
would it be reasonable to have this issue "combine native datasource into delta, and rebrand as 'default'" or something along those lines. As written now, this isn't particularly actionable, and it seems like we've agreed on a solution. |
There's also #2053, which is related to this (potentially) larger effort. |
Description
I think it's quite confusing that "delta" tables are referred to as both "native" tables and "delta" tables. If anything, "native" should mean arrow as it's what we actually natively. "delta" doesn't have native support for a lot of datatypes and it causes a lot of issues that should just work. (such as #2777).
I'm not suggesting we change the implementation details, just the semantics around what it means to be "native". To me, "native" implies that everything should work out of the box, and it should support all datatypes that are supported in the query engine. Delta does neither of these, and we have to jump through a lot of hoops to get it to work in many cases. Our entire engine is built around the arrow data model, so "arrow" is conceptually our native format.
"delta" is our default storage format, but not the native.
I propose we rename the "native" table formats as simply just "delta", and then specify that "delta" is the default storage engine. By reprhasing this from "native" to "default", it implies that delta is not the native engine, just the default due to it's popularity over alternatives.
I also think this'll make things conceptually a lot clearer when we add deeper support for alternative storage engines (lance, iceberg).
The text was updated successfully, but these errors were encountered: