Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve user documentation for supported operators and expressions #520

Merged
merged 12 commits into from
Jun 7, 2024
2 changes: 1 addition & 1 deletion core/src/execution/datafusion/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ impl PhysicalPlanner {
let child = self.create_expr(expr.child.as_ref().unwrap(), input_schema)?;
Ok(Arc::new(NotExpr::new(child)))
}
ExprStruct::Negative(expr) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this because it was inconsistent with Spark naming

ExprStruct::UnaryMinus(expr) => {
let child: Arc<dyn PhysicalExpr> =
self.create_expr(expr.child.as_ref().unwrap(), input_schema.clone())?;
let result = negative::create_negate_expr(child, expr.fail_on_error);
Expand Down
4 changes: 2 additions & 2 deletions core/src/execution/proto/expr.proto
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ message Expr {
CaseWhen caseWhen = 38;
In in = 39;
Not not = 40;
Negative negative = 41;
UnaryMinus unary_minus = 41;
BitwiseShiftRight bitwiseShiftRight = 42;
BitwiseShiftLeft bitwiseShiftLeft = 43;
IfExpr if = 44;
Expand Down Expand Up @@ -452,7 +452,7 @@ message Not {
Expr child = 1;
}

message Negative {
message UnaryMinus {
Expr child = 1;
bool fail_on_error = 2;
}
Expand Down
267 changes: 171 additions & 96 deletions docs/source/user-guide/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,99 +19,174 @@

# Supported Spark Expressions

The following Spark expressions are currently available:

- Literals
- Arithmetic Operators
- UnaryMinus
- Add/Minus/Multiply/Divide/Remainder
- Conditional functions
- Case When
- If
- Cast
- Coalesce
- BloomFilterMightContain
- Boolean functions
- And
- Or
- Not
- EqualTo
- EqualNullSafe
- GreaterThan
- GreaterThanOrEqual
- LessThan
- LessThanOrEqual
- IsNull
- IsNotNull
- In
- String functions
- Substring
- Coalesce
- StringSpace
- Like
- Contains
- Startswith
- Endswith
- Ascii
- Bit_length
- Octet_length
- Upper
- Lower
- Chr
- Initcap
- Trim/Btrim/Ltrim/Rtrim
- Concat_ws
- Repeat
- Length
- Reverse
- Instr
- Replace
- Translate
- Bitwise functions
- Shiftright/Shiftleft
- Date/Time functions
- Year/Hour/Minute/Second
- Hash functions
- Md5
- Sha2
- Hash
- Xxhash64
- Math functions
- Abs
- Acos
- Asin
- Atan
- Atan2
- Cos
- Exp
- Ln
- Log10
- Log2
- Pow
- Round
- Signum
- Sin
- Sqrt
- Tan
- Ceil
- Floor
- Aggregate functions
- Count
- Sum
- Max
- Min
- Avg
- First
- Last
- BitAnd
- BitOr
- BitXor
- BoolAnd
- BoolOr
- CovPopulation
- CovSample
- VariancePop
- VarianceSamp
- StddevPop
- StddevSamp
- Corr
The following Spark expressions are currently available. Any known compatibility issues are noted in the following tables.

## Literal Values

| Expression | Notes |
| -------------------------------------- | ----- |
| Literal values of supported data types | |

## Unary Arithmetic

| Expression | Notes |
| ---------------- | ----- |
| UnaryMinus (`-`) | |

## Binary Arithmeticx

| Expression | Notes |
| --------------- | --------------------------------------------------- |
| Add (`+`) | |
| Subtract (`-`) | |
| Multiply (`*`) | |
| Divide (`/`) | |
| Remainder (`%`) | Comet produces `NaN` instead of `NULL` for `% -0.0` |

## Conditional Expressions

| Expression | Notes |
| ---------- | ----- |
| CaseWhen | |
| If | |

## Comparison

| Expression | Notes |
| ------------------------- | ----- |
| EqualTo (`=`) | |
| EqualNullSafe (`<=>`) | |
| GreaterThan (`>`) | |
| GreaterThanOrEqual (`>=`) | |
| LessThan (`<`) | |
| LessThanOrEqual (`<=`) | |
| IsNull (`IS NULL`) | |
| IsNotNull (`IS NOT NULL`) | |
| In (`IN`) | |

## String Functions

| Expression | Notes |
| --------------- | ----------------------------------------------------------------------------------------------------------- |
| Ascii | |
| BitLength | |
| Chr | |
| ConcatWs | |
| Contains | |
| EndsWith | |
| InitCap | |
| Instr | |
| Length | |
| Like | |
| Lower | |
| OctetLength | |
| Repeat | Negative argument for number of times to repeat causes exception |
| Replace | |
| Reverse | |
| StartsWith | |
| StringSpace | |
| StringTrim | |
| StringTrimBoth | |
| StringTrimLeft | |
| StringTrimRight | |
| Substring | |
| Translate | |
| Upper | |

## Date/Time Functions

| Expression | Notes |
| -------------- | ------------------------ |
| DatePart | Only `year` is supported |
| Extract | Only `year` is supported |
| Hour | |
| Minute | |
| Second | |
| TruncDate | |
| TruncTimestamp | |
| Year | |

## Math Expressions

| Expression | Notes |
| ---------- | ------------------------------------------------------------------- |
| Abs | |
| Acos | |
| Asin | |
| Atan | |
| Atan2 | |
| Ceil | |
| Cos | |
| Exp | |
| Floor | |
| Log | log(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log2 | log2(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log10 | log10(0) will produce `-Infinity` unlike Spark which returns `null` |
| Pow | |
| Round | |
| Signum | Signum does not differentiate between `0.0` and `-0.0` |
| Sin | |
| Sqrt | |
| Tan | |

## Hashing Functions

| Expression | Notes |
| ---------- | ----- |
| Md5 | |
| Hash | |
| Sha2 | |
| XxHash64 | |

## Boolean Expressions

| Expression | Notes |
| ---------- | ----- |
| And | |
| Or | |
| Not | |

## Bitwise Expressions

| Expression | Notes |
| -------------------- | ----- |
| ShiftLeft (`<<`) | |
| ShiftRight (`>>`) | |
| BitAnd (`&`) | |
| BitOr (`\|`) | |
| BitXor (`^`) | |
| BitwiseNot (`~`) | |
| BoolAnd (`bool_and`) | |
| BoolOr (`bool_or`) | |

## Aggregate Expressions

| Expression | Notes |
| ------------- | ----- |
| Avg | |
| BitAndAgg | |
| BitOrAgg | |
| BitXorAgg | |
| Corr | |
| Count | |
| CovPopulation | |
| CovSample | |
| First | |
| Last | |
| Max | |
| Min | |
| StddevPop | |
| StddevSamp | |
| Sum | |
| VariancePop | |
| VarianceSamp | |

## Other

| Expression | Notes |
| ----------------------- | ------------------------------------------------------------------------------- |
| Cast | See compatibility guide for list of supported cast expressions and known issues |
| BloomFilterMightContain | |
| ScalarSubquery | |
| Coalesce | |
| NormalizeNaNAndZero | |
27 changes: 16 additions & 11 deletions docs/source/user-guide/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,20 @@

# Supported Spark Operators

The following Spark operators are currently available:
The following Spark operators are currently replaced with native versions. Query stages that contain any operators
not supported by Comet will fall back to regular Spark execution.

- FileSourceScanExec/BatchScanExec for Parquet
- Projection
- Filter
- Sort
- Hash Aggregate
- Limit
- Sort-merge Join
- Hash Join
- Shuffle
- Expand
| Operator | Notes |
| -------------------------------------------- | ----- |
| FileSourceScanExec/BatchScanExec for Parquet | |
| Projection | |
| Filter | |
| Sort | |
| Hash Aggregate | |
| Limit | |
| Sort-merge Join | |
| Hash Join | |
andygrove marked this conversation as resolved.
Show resolved Hide resolved
| BroadcastHashJoinExec | |
| Shuffle | |
| Expand | |
andygrove marked this conversation as resolved.
Show resolved Hide resolved
| Union | |
Original file line number Diff line number Diff line change
Expand Up @@ -1987,13 +1987,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim
case UnaryMinus(child, failOnError) =>
val childExpr = exprToProtoInternal(child, inputs)
if (childExpr.isDefined) {
val builder = ExprOuterClass.Negative.newBuilder()
val builder = ExprOuterClass.UnaryMinus.newBuilder()
builder.setChild(childExpr.get)
builder.setFailOnError(failOnError)
Some(
ExprOuterClass.Expr
.newBuilder()
.setNegative(builder)
.setUnaryMinus(builder)
.build())
} else {
withInfo(expr, child)
Expand Down
Loading