Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"native" table semantics are confusing #2778

Open
universalmind303 opened this issue Mar 13, 2024 · 6 comments
Open

"native" table semantics are confusing #2778

universalmind303 opened this issue Mar 13, 2024 · 6 comments

Comments

@universalmind303
Copy link
Contributor

Description

I think it's quite confusing that "delta" tables are referred to as both "native" tables and "delta" tables. If anything, "native" should mean arrow as it's what we actually natively. "delta" doesn't have native support for a lot of datatypes and it causes a lot of issues that should just work. (such as #2777).

I'm not suggesting we change the implementation details, just the semantics around what it means to be "native". To me, "native" implies that everything should work out of the box, and it should support all datatypes that are supported in the query engine. Delta does neither of these, and we have to jump through a lot of hoops to get it to work in many cases. Our entire engine is built around the arrow data model, so "arrow" is conceptually our native format.

"delta" is our default storage format, but not the native.

I propose we rename the "native" table formats as simply just "delta", and then specify that "delta" is the default storage engine. By reprhasing this from "native" to "default", it implies that delta is not the native engine, just the default due to it's popularity over alternatives.

I also think this'll make things conceptually a lot clearer when we add deeper support for alternative storage engines (lance, iceberg).

@universalmind303 universalmind303 added feat New feature or request question ❔ and removed feat New feature or request labels Mar 13, 2024
@tychoish
Copy link
Contributor

tychoish commented Mar 13, 2024

I agree in theory: I think our storage engine integration could be more clear, and we've colloquially been using the word "native" (because it's the name of the module in the code, mostly). I just edited the word native out of the docs page, because I think it ends up being confusing.

So if the desired outcome of the issue is "change the language we use to talk about this feature," then I think we can make this change.

There are other layers here too:

  • getting first-class support for other storage engines (updates, inserts, etc.)
  • making the delta data source feature compatible with the default storage engine storage (inserts, vacuum, updates, etc.)
  • making "create table" (particularly in cloud) support other table formats.

That seems like a bigger effort and one that we should undertake (and are starting to work on even this week.) but I'm not sure that this issue is the best place to plan and track that work.

@universalmind303
Copy link
Contributor Author

So if the desired outcome of the issue is "change the language we use to talk about this feature," then I think we can make this change.

I think it's this and also to rename the "native" modules to "delta" in the code to reflect this. I'm not suggesting any features or adding additional storage engine support here, just a rename to provide more transparency around our usage of delta as a storage engine.

@tychoish
Copy link
Contributor

I think it's this and also to rename the "native" modules to "delta" in the code to reflect this...

There's a delta module and a native module already, so that's... something that probably not quite a straight rename operation. You've touched this code more recently than anyone though: is there anything that's cloud specific about the "native" implementation or other handling of "external" delta tables that our native storage couldn't handdle?

@universalmind303
Copy link
Contributor Author

I don't think there would need to be any functional changes, but on further inspection, i think it may be easier to do the rename after #2744. That should put us in a much better spot for moving around datasource logic.

@tychoish
Copy link
Contributor

would it be reasonable to have this issue "combine native datasource into delta, and rebrand as 'default'" or something along those lines. As written now, this isn't particularly actionable, and it seems like we've agreed on a solution.

@tychoish
Copy link
Contributor

There's also #2053, which is related to this (potentially) larger effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants