-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create datafusion-functions-array
crate and move ArrayToString
function into it
#9113
Conversation
200508b
to
cc325e6
Compare
/// * `DOC`: documentation string for the function | ||
/// * `SCALAR_UDF_FUNC`: name of the function to create (just) the `ScalarUDF` | ||
/// * `GNAME`: name for the single static instance of the `ScalarUDF` | ||
macro_rules! make_udf_function { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be copied into every function crate or is this just here for this example migration? Seems perhaps something that could be put into a common/util/etc location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have mentioned that -- there is some version of this macro in the datafusion-functions
crate - https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/macros.rs
However it isn't quite the same (as the feature flag is on the entire datafusion-functions-array
crate rather than on subsets of functions)
My thinking was that we would only have 2 function crates -- datatfusion_functions
and datafusion_functions_arrays
and thus the replication was ok
We could potentially move the macros into datafusion_expr or datafusion_common perhaps 🤔
a07d74b
to
1089df9
Compare
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! implementation kernels for array functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code in this module was just moved from array_expressions.rs
use std::any::type_name; | ||
use std::sync::Arc; | ||
|
||
macro_rules! downcast_arg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to move into utils,math_expressions.rs
has very similar macros, I'll do it in following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comphead -- once this PR is approved and merged, that would be great
I suspect as we go about porting over the old to new code, we will find other examples that can be cleaned up (some of the functions were implemented very early in DataFusion's life)
@@ -641,7 +641,7 @@ enum ScalarFunction { | |||
ArrayPrepend = 94; | |||
ArrayRemove = 95; | |||
ArrayReplace = 96; | |||
ArrayToString = 97; | |||
// 97 was ArrayToString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed because it is proto for ScalarFunction.
But it seems we need to implement another version for UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the nice things about making these functions UDF -- they can be serialized / deserialized without any special code.
Here is a test showing it working for encode
/decode
: https://github.com/apache/arrow-datafusion/blob/8413da8803665c07c2250a0d015f6b67fb7ef743/datafusion/proto/tests/cases/roundtrip_logical_plan.rs#L573
I will add a test case for round tripping array functions too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 97a789a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb
Very needed initiative
@Weijun-H @jayzhan211 cc
Should we include the package/feature in |
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! [`ScalarUDFImpl`] definitions for array functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the API defined for the function definitions
@@ -1870,27 +1830,6 @@ pub fn array_replace_all(args: &[ArrayRef]) -> Result<ArrayRef> { | |||
} | |||
} | |||
|
|||
macro_rules! to_string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to the datafusion-functions-array crate
@@ -1462,10 +1460,6 @@ pub fn parse_expr( | |||
parse_expr(&args[2], registry)?, | |||
parse_expr(&args[3], registry)?, | |||
)), | |||
ScalarFunction::ArrayToString => Ok(array_to_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this PR is to remove this enum (and set the pattern on how to do so for the rest of the array functionality
@@ -641,7 +641,7 @@ enum ScalarFunction { | |||
ArrayPrepend = 94; | |||
ArrayRemove = 95; | |||
ArrayReplace = 96; | |||
ArrayToString = 97; | |||
// 97 was ArrayToString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the nice things about making these functions UDF -- they can be serialized / deserialized without any special code.
Here is a test showing it working for encode
/decode
: https://github.com/apache/arrow-datafusion/blob/8413da8803665c07c2250a0d015f6b67fb7ef743/datafusion/proto/tests/cases/roundtrip_logical_plan.rs#L573
I will add a test case for round tripping array functions too
Since ❯ select array_to_string([1,2,3], '---');
+---------------------------------------------------------------------+
| array_to_string(make_array(Int64(1),Int64(2),Int64(3)),Utf8("---")) |
+---------------------------------------------------------------------+
| 1---2---3 |
+---------------------------------------------------------------------+
1 row in set. Query took 0.006 seconds.
They will get a "no such function" error. It may be worth considering a way to hint / add stub functions that could be registered that say something like "no such function, but there is a function named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Epic work
Which issue does this PR close?
Part of #8045
See #9100 for discussion
Rationale for this change
We are trying to make DataFusion more modular, and as more array functionality
gets added, it is important to be support:
What changes are included in this PR?
datafusion-functions-array
crateArrayToString
fromBuiltInScalarFunction
enum anddatafusion-physical-expr
todatafusion-functions-array
,array_expressions
If this pattern looks good, I think we can then move the other functions over in a similar way
Are these changes tested?
Yes, by existing and new tests
Are there any user-facing changes?
A new feature flag