-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Coercion fails for List with inner type struct which has large/view types #14154
Comments
I tested this in a fork of the deltalake repo but could not reproduce the error:
|
@kosiew you are testing it against an older version of deltalake from what I can see in the commit: https://github.com/delta-io/delta-rs/blob/d8080b13f5724aa09fb268b17f507dfd8559255f/python/pyproject.toml In that commit you can see it was Datafusion v43, we are now at datafusion v44. |
Good catch @ion-elgreco ☝! I investigated the delta-rs repo, because I earlier tested the coercion in datafusion v44, but could not trigger the error: |
Given the description on this PR it appears it is a regression (something that used to work, but now does not). Is this the case? I'll put it on the list of things to investigate before the 45 relese |
@alamb I'll have to double check, but I think it worked in DF43 |
@alamb yes this indeed worked fine on DF43. I tested it against deltalake v0.23.3 |
Possibly related:
|
Ao the next step for this PR is to find a DataFusion only reproducer that works in DF 43 but not in DF 44. I will try to do so tomorrow |
I tried to make a datafusion only reproducer but it turns out I can't create LargeLists of structs via SQL 🤔 I'll have to think about how to do so a bit more tomorrow... |
Well, I found another bug when working on this one I still haven't found a reproducer for this one. But I have another idea |
Ok, I have a datafusion only reproducer: create or replace table t as values
(
100, -- column1 int (so the case isn't constant folded)
[{ 'foo': arrow_cast('baz', 'Utf8View') }], -- column2 has List of Struct w/ Utf8View
[{ 'foo': 'bar' }], -- column3 has List of Struct w/ Utf8
[{ 'foo': 'blarg' }] -- column4 has List of Struct w/ Utf8
);
SELECT column2, column3, column4 FROM t;
SELECT
case
when column1 > 0 then column2
when column1 < 0 then column3
else column4
end
FROM t; Fails with
It works with > SELECT
case
when column1 > 0 then column2
when column1 < 0 then column3
else column4
end
FROM t;
+-----------------------------------------------------------------------------------------------------------+
| CASE WHEN t.column1 > Int64(0) THEN t.column2 WHEN t.column1 < Int64(0) THEN t.column3 ELSE t.column4 END |
+-----------------------------------------------------------------------------------------------------------+
| [{foo: baz}] |
+-----------------------------------------------------------------------------------------------------------+
1 row(s) fetched. |
This works fine with only structs, just not with structs in lists 🤔 |
I think this PR fixes the issue (it is related, but not the same as #14383) |
Describe the bug
A
LargeList(Struct({"foo": LargeUtf8})
) cannot be coerced toList(Struct({"foo": Utf8}))
. It however it works fine forLargeList(LargeUtf8) -> List(Utf8)
andStruct({"foo": LargeUtf8}) -> Struct({"foo": Utf8})
.To Reproduce
Expected behavior
Be able to coerce Large/view and normal arrow types in deeply nested types.
Additional context
Luckly we still can downcast in python using the large_dtypes=False, but datafusion should be able to coerce any deeply nested dtype.
The text was updated successfully, but these errors were encountered: