-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Improve user documentation for supported operators and expressions #520
Changes from 6 commits
8295e53
b979cbf
6f02238
b2b8ff9
53d6e39
5e20a6a
2c1e518
14e6e14
1476b9e
4dcba12
a8c409b
5d8a384
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,99 +19,192 @@ | |
|
||
# Supported Spark Expressions | ||
|
||
The following Spark expressions are currently available: | ||
|
||
- Literals | ||
- Arithmetic Operators | ||
- UnaryMinus | ||
- Add/Minus/Multiply/Divide/Remainder | ||
- Conditional functions | ||
- Case When | ||
- If | ||
- Cast | ||
- Coalesce | ||
- BloomFilterMightContain | ||
- Boolean functions | ||
- And | ||
- Or | ||
- Not | ||
- EqualTo | ||
- EqualNullSafe | ||
- GreaterThan | ||
- GreaterThanOrEqual | ||
- LessThan | ||
- LessThanOrEqual | ||
- IsNull | ||
- IsNotNull | ||
- In | ||
- String functions | ||
- Substring | ||
- Coalesce | ||
- StringSpace | ||
- Like | ||
- Contains | ||
- Startswith | ||
- Endswith | ||
- Ascii | ||
- Bit_length | ||
- Octet_length | ||
- Upper | ||
- Lower | ||
- Chr | ||
- Initcap | ||
- Trim/Btrim/Ltrim/Rtrim | ||
- Concat_ws | ||
- Repeat | ||
- Length | ||
- Reverse | ||
- Instr | ||
- Replace | ||
- Translate | ||
- Bitwise functions | ||
- Shiftright/Shiftleft | ||
- Date/Time functions | ||
- Year/Hour/Minute/Second | ||
- Hash functions | ||
- Md5 | ||
- Sha2 | ||
- Hash | ||
- Xxhash64 | ||
- Math functions | ||
- Abs | ||
- Acos | ||
- Asin | ||
- Atan | ||
- Atan2 | ||
- Cos | ||
- Exp | ||
- Ln | ||
- Log10 | ||
- Log2 | ||
- Pow | ||
- Round | ||
- Signum | ||
- Sin | ||
- Sqrt | ||
- Tan | ||
- Ceil | ||
- Floor | ||
- Aggregate functions | ||
- Count | ||
- Sum | ||
- Max | ||
- Min | ||
- Avg | ||
- First | ||
- Last | ||
- BitAnd | ||
- BitOr | ||
- BitXor | ||
- BoolAnd | ||
- BoolOr | ||
- CovPopulation | ||
- CovSample | ||
- VariancePop | ||
- VarianceSamp | ||
- StddevPop | ||
- StddevSamp | ||
- Corr | ||
The following Spark expressions are currently available. Any known compatibility issues are noted in the following tables. | ||
|
||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
## Literal Values | ||
|
||
| Expression | Notes | | ||
| -------------------------------------- | ----- | | ||
| Literal values of supported data types | | | ||
|
||
## Unary Arithmetic | ||
|
||
| Expression | Notes | | ||
| ---------------- | ----- | | ||
| UnaryMinus (`-`) | | | ||
|
||
## Binary Arithmeticx | ||
|
||
| Expression | Notes | | ||
| --------------- | --------------------------------------------------- | | ||
| Add (`+`) | | | ||
| Subtract (`-`) | | | ||
| Multiply (`*`) | | | ||
| Divide (`/`) | | | ||
| Remainder (`%`) | Comet produces `NaN` instead of `NULL` for `% -0.0` | | ||
|
||
## Conditional Expressions | ||
|
||
| Expression | Notes | | ||
| ---------- | ----- | | ||
| CaseWhen | | | ||
| If | | | ||
|
||
## Comparison | ||
|
||
| Expression | Notes | | ||
| ------------------------- | ----- | | ||
| EqualTo (`=`) | | | ||
| EqualNullSafe (`<=>`) | | | ||
| GreaterThan (`>`) | | | ||
| GreaterThanOrEqual (`>=`) | | | ||
| LessThan (`<`) | | | ||
| LessThanOrEqual (`<=`) | | | ||
| IsNull (`IS NULL`) | | | ||
| IsNotNull (`IS NOT NULL`) | | | ||
| In (`IN`) | | | ||
|
||
## String Functions | ||
|
||
| Expression | Notes | | ||
| --------------- | ----------------------------------------------------------------------------------------------------------- | | ||
| Ascii | | | ||
| BitLength | | | ||
| Chr | | | ||
| ConcatWs | | | ||
| Contains | | | ||
| EndsWith | | | ||
| InitCap | | | ||
| Instr | | | ||
| Length | | | ||
| Like | | | ||
| Lower | | | ||
| OctetLength | | | ||
| Repeat | Negative argument for number of times to repeat causes exception | | ||
| Replace | | | ||
| Reverse | | | ||
| RLike | Disabled by default. Uses Rust regular expression engine which is not compatible with Java's regexp engine. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have removed this and will add it back in the RLike PR after this one is merged |
||
| StartsWith | | | ||
| StringSpace | | | ||
| StringTrim | | | ||
| StringTrimBoth | | | ||
| StringTrimLeft | | | ||
| StringTrimRight | | | ||
| Substring | | | ||
| Translate | | | ||
| Upper | | | ||
|
||
## Date/Time Functions | ||
|
||
| Expression | Notes | | ||
| -------------- | ------------------------ | | ||
| DatePart | Only `year` is supported | | ||
| Extract | Only `year` is supported | | ||
| Hour | | | ||
| Minute | | | ||
| Second | | | ||
| TruncDate | | | ||
| TruncTimestamp | | | ||
| Year | | | ||
|
||
## Math Expressions | ||
|
||
| Expression | Notes | | ||
| ---------- | ------------------------------------------------------------------- | | ||
| Abs | | | ||
| Acos | | | ||
| Asin | | | ||
| Atan | | | ||
| Atan2 | | | ||
| Ceil | | | ||
| Cos | | | ||
| Exp | | | ||
| Floor | | | ||
| Log | log(0) will produce `-Infinity` unlike Spark which returns `null` | | ||
| Log2 | log2(0) will produce `-Infinity` unlike Spark which returns `null` | | ||
| Log10 | log10(0) will produce `-Infinity` unlike Spark which returns `null` | | ||
| Pow | | | ||
| Round | | | ||
| Signum | Signum does not differentiate between `0.0` and `-0.0` | | ||
| Sin | | | ||
| Sqrt | | | ||
| Tan | | | ||
|
||
## Hashing Functions | ||
|
||
| Expression | Notes | | ||
| ---------- | ----- | | ||
| Md5 | | | ||
| Hash | | | ||
| Sha2 | | | ||
| XxHash64 | | | ||
|
||
## Boolean Expressions | ||
|
||
| Expression | Notes | | ||
| ---------- | ----- | | ||
| And | | | ||
| Or | | | ||
| Not | | | ||
|
||
## Bitwise Expressions | ||
|
||
| Expression | Notes | | ||
| -------------------- | ----- | | ||
| ShiftLeft (`<<`) | | | ||
| ShiftRight (`>>`) | | | ||
| BitAnd (`&`) | | | ||
| BitOr (`\|`) | | | ||
| BitXor (`^`) | | | ||
| BitwiseNot (`~`) | | | ||
| BoolAnd (`bool_and`) | | | ||
| BoolOr (`bool_or`) | | | ||
|
||
## Aggregate Expressions | ||
|
||
| Expression | Notes | | ||
| ------------- | ----- | | ||
| Avg | | | ||
| BitAndAgg | | | ||
| BitOrAgg | | | ||
| BitXorAgg | | | ||
| Corr | | | ||
| Count | | | ||
| CovPopulation | | | ||
| CovSample | | | ||
| First | | | ||
| Last | | | ||
| Max | | | ||
| Min | | | ||
| StddevPop | | | ||
| StddevSamp | | | ||
| Sum | | | ||
| VariancePop | | | ||
| VarianceSamp | | | ||
|
||
## Other | ||
|
||
| Expression | Notes | | ||
| ----------------------- | ------------------------------------------------------------------------------- | | ||
| Cast | See compatibility guide for list of supported cast expressions and known issues | | ||
| BloomFilterMightContain | | | ||
| ScalarSubquery | | | ||
| Coalesce | | | ||
| NormalizeNaNAndZero | | |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,15 +19,18 @@ | |
|
||
# Supported Spark Operators | ||
|
||
The following Spark operators are currently available: | ||
The following Spark operators are currently replaced with native versions. Query stages that contain any operators | ||
not supported by Comet will fall back to regular Spark execution. | ||
|
||
- FileSourceScanExec/BatchScanExec for Parquet | ||
- Projection | ||
- Filter | ||
- Sort | ||
- Hash Aggregate | ||
- Limit | ||
- Sort-merge Join | ||
- Hash Join | ||
- Shuffle | ||
- Expand | ||
| Operator | Notes | | ||
| -------------------------------------------- | -------------------------------------- | | ||
| FileSourceScanExec/BatchScanExec for Parquet | | | ||
| Projection | | | ||
| Filter | | | ||
| Sort | | | ||
| Hash Aggregate | | | ||
| Limit | | | ||
| Sort-merge Join | Sort-merge join is disabled by default | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, is SMJ disabled by default? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like I misunderstood this. I have update this. |
||
| Hash Join | | | ||
| Shuffle | | | ||
| Expand | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this because it was inconsistent with Spark naming