Skip to content

Commit

Permalink
Merge branch 'main' into add-sample-top-rare-command
Browse files Browse the repository at this point in the history
  • Loading branch information
YANG-DB committed Nov 8, 2024
2 parents 2197eab + 4303057 commit 1f2ae52
Show file tree
Hide file tree
Showing 55 changed files with 2,894 additions and 179 deletions.
3 changes: 2 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,8 @@ lazy val integtest = (project in file("integ-test"))
inConfig(IntegrationTest)(Defaults.testSettings ++ Seq(
IntegrationTest / javaSource := baseDirectory.value / "src/integration/java",
IntegrationTest / scalaSource := baseDirectory.value / "src/integration/scala",
IntegrationTest / parallelExecution := false,
IntegrationTest / resourceDirectory := baseDirectory.value / "src/integration/resources",
IntegrationTest / parallelExecution := false,
IntegrationTest / fork := true,
)),
inConfig(AwsIntegrationTest)(Defaults.testSettings ++ Seq(
Expand Down
43 changes: 29 additions & 14 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,11 @@ source = table | where ispresent(a) |
- `source = table1 | left semi join left = l right = r on l.a = r.a table2`
- `source = table1 | left anti join left = l right = r on l.a = r.a table2`
- `source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ]`
- `source = table1 | inner join on table1.a = table2.a table2 | fields table1.a, table2.a, table1.b, table1.c` (directly refer table name)
- `source = table1 | inner join on a = c table2 | fields a, b, c, d` (ignore side aliases as long as no ambiguous)
- `source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields l.a, r.a` (side alias overrides table alias)
- `source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a` (error, side alias overrides table alias)
- `source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a` (error, side alias overrides subquery alias)
#### **Lookup**
[See additional command details](ppl-lookup-command.md)
Expand Down Expand Up @@ -439,8 +443,30 @@ Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table in
_- **Limitation: another command usage of (relation) subquery is in `appendcols` commands which is unsupported**_
---
#### Experimental Commands:
#### **fillnull**
[See additional command details](ppl-fillnull-command.md)
```sql
- `source=accounts | fillnull fields status_code=101`
- `source=accounts | fillnull fields request_path='/not_found', timestamp='*'`
- `source=accounts | fillnull using field1=101`
- `source=accounts | fillnull using field1=concat(field2, field3), field4=2*pi()*field5`
- `source=accounts | fillnull using field1=concat(field2, field3), field4=2*pi()*field5, field6 = 'N/A'`
```
#### **expand**
[See additional command details](ppl-expand-command.md)
```sql
- `source = table | expand field_with_array as array_list`
- `source = table | expand employee | stats max(salary) as max by state, company`
- `source = table | expand employee as worker | stats max(salary) as max by state, company`
- `source = table | expand employee as worker | eval bonus = salary * 3 | fields worker, bonus`
- `source = table | expand employee | parse description '(?<email>.+@.+)' | fields employee, email`
- `source = table | eval array=json_array(1, 2, 3) | expand array as uid | fields name, occupation, uid`
- `source = table | expand multi_valueA as multiA | expand multi_valueB as multiB`
```
#### Correlation Commands:
[See additional command details](ppl-correlation-command.md)
```sql
Expand All @@ -452,14 +478,3 @@ _- **Limitation: another command usage of (relation) subquery is in `appendcols`
> ppl-correlation-command is an experimental command - it may be removed in future versions
---
### Planned Commands:
#### **fillnull**
[See additional command details](ppl-fillnull-command.md)
```sql
- `source=accounts | fillnull fields status_code=101`
- `source=accounts | fillnull fields request_path='/not_found', timestamp='*'`
- `source=accounts | fillnull using field1=101`
- `source=accounts | fillnull using field1=concat(field2, field3), field4=2*pi()*field5`
- `source=accounts | fillnull using field1=concat(field2, field3), field4=2*pi()*field5, field6 = 'N/A'`
```
6 changes: 6 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).
- [`correlation commands`](ppl-correlation-command.md)

- [`trendline commands`](ppl-trendline-command.md)

- [`expand commands`](ppl-expand-command.md)

* **Functions**

Expand Down Expand Up @@ -104,6 +106,10 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).
### Example PPL Queries
See samples of [PPL queries](PPL-Example-Commands.md)

---
### TPC-H PPL Query Rewriting
See samples of [TPC-H PPL query rewriting](ppl-tpch.md)

---
### Planned PPL Commands

Expand Down
73 changes: 59 additions & 14 deletions docs/ppl-lang/functions/ppl-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@

**Description**

`json(value)` Evaluates whether a value can be parsed as JSON. Returns the json string if valid, null otherwise.
`json(value)` Evaluates whether a string can be parsed as JSON format. Returns the string value if valid, null otherwise.

**Argument type:** STRING/JSON_ARRAY/JSON_OBJECT
**Argument type:** STRING

**Return type:** STRING
**Return type:** STRING/NULL

A STRING expression of a valid JSON object format.

Expand Down Expand Up @@ -47,15 +47,15 @@ A StructType expression of a valid JSON object.

Example:

os> source=people | eval result = json(json_object('key', 123.45)) | fields result
os> source=people | eval result = json_object('key', 123.45) | fields result
fetched rows / total rows = 1/1
+------------------+
| result |
+------------------+
| {"key":123.45} |
+------------------+

os> source=people | eval result = json(json_object('outer', json_object('inner', 123.45))) | fields result
os> source=people | eval result = json_object('outer', json_object('inner', 123.45)) | fields result
fetched rows / total rows = 1/1
+------------------------------+
| result |
Expand All @@ -81,29 +81,58 @@ Example:

os> source=people | eval `json_array` = json_array(1, 2, 0, -1, 1.1, -0.11)
fetched rows / total rows = 1/1
+----------------------------+
| json_array |
+----------------------------+
| 1.0,2.0,0.0,-1.0,1.1,-0.11 |
+----------------------------+
+------------------------------+
| json_array |
+------------------------------+
| [1.0,2.0,0.0,-1.0,1.1,-0.11] |
+------------------------------+

os> source=people | eval `json_array_object` = json(json_object("array", json_array(1, 2, 0, -1, 1.1, -0.11)))
os> source=people | eval `json_array_object` = json_object("array", json_array(1, 2, 0, -1, 1.1, -0.11))
fetched rows / total rows = 1/1
+----------------------------------------+
| json_array_object |
+----------------------------------------+
| {"array":[1.0,2.0,0.0,-1.0,1.1,-0.11]} |
+----------------------------------------+

### `TO_JSON_STRING`

**Description**

`to_json_string(jsonObject)` Returns a JSON string with a given json object value.

**Argument type:** JSON_OBJECT (Spark StructType/ArrayType)

**Return type:** STRING

Example:

os> source=people | eval `json_string` = to_json_string(json_array(1, 2, 0, -1, 1.1, -0.11)) | fields json_string
fetched rows / total rows = 1/1
+--------------------------------+
| json_string |
+--------------------------------+
| [1.0,2.0,0.0,-1.0,1.1,-0.11] |
+--------------------------------+

os> source=people | eval `json_string` = to_json_string(json_object('key', 123.45)) | fields json_string
fetched rows / total rows = 1/1
+-----------------+
| json_string |
+-----------------+
| {'key', 123.45} |
+-----------------+


### `JSON_ARRAY_LENGTH`

**Description**

`json_array_length(jsonArray)` Returns the number of elements in the outermost JSON array.
`json_array_length(jsonArrayString)` Returns the number of elements in the outermost JSON array string.

**Argument type:** STRING/JSON_ARRAY
**Argument type:** STRING

A STRING expression of a valid JSON array format, or JSON_ARRAY object.
A STRING expression of a valid JSON array format.

**Return type:** INTEGER

Expand All @@ -119,6 +148,21 @@ Example:
| 4 | 5 | null |
+-----------+-----------+-------------+


### `ARRAY_LENGTH`

**Description**

`array_length(jsonArray)` Returns the number of elements in the outermost array.

**Argument type:** ARRAY

ARRAY or JSON_ARRAY object.

**Return type:** INTEGER

Example:

os> source=people | eval `json_array` = json_array_length(json_array(1,2,3,4)), `empty_array` = json_array_length(json_array())
fetched rows / total rows = 1/1
+--------------+---------------+
Expand All @@ -127,6 +171,7 @@ Example:
| 4 | 0 |
+--------------+---------------+


### `JSON_EXTRACT`

**Description**
Expand Down
45 changes: 45 additions & 0 deletions docs/ppl-lang/ppl-expand-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## PPL `expand` command

### Description
Using `expand` command to flatten a field of type:
- `Array<Any>`
- `Map<Any>`


### Syntax
`expand <field> [As alias]`

* field: to be expanded (exploded). The field must be of supported type.
* alias: Optional to be expanded as the name to be used instead of the original field name

### Usage Guidelines
The expand command produces a row for each element in the specified array or map field, where:
- Array elements become individual rows.
- Map key-value pairs are broken into separate rows, with each key-value represented as a row.

- When an alias is provided, the exploded values are represented under the alias instead of the original field name.
- This can be used in combination with other commands, such as stats, eval, and parse to manipulate or extract data post-expansion.

### Examples:
- `source = table | expand employee | stats max(salary) as max by state, company`
- `source = table | expand employee as worker | stats max(salary) as max by state, company`
- `source = table | expand employee as worker | eval bonus = salary * 3 | fields worker, bonus`
- `source = table | expand employee | parse description '(?<email>.+@.+)' | fields employee, email`
- `source = table | eval array=json_array(1, 2, 3) | expand array as uid | fields name, occupation, uid`
- `source = table | expand multi_valueA as multiA | expand multi_valueB as multiB`

- Expand command can be used in combination with other commands such as `eval`, `stats` and more
- Using multiple expand commands will create a cartesian product of all the internal elements within each composite array or map

### Effective SQL push-down query
The expand command is translated into an equivalent SQL operation using LATERAL VIEW explode, allowing for efficient exploding of arrays or maps at the SQL query level.

```sql
SELECT customer exploded_productId
FROM table
LATERAL VIEW explode(productId) AS exploded_productId
```
Where the `explode` command offers the following functionality:
- it is a column operation that returns a new column
- it creates a new row for every element in the exploded column
- internal `null`s are ignored as part of the exploded field (no row is created/exploded for null)
23 changes: 12 additions & 11 deletions docs/ppl-lang/ppl-join-command.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ WHERE t1.serviceName = `order`
SEARCH source=<left-table>
| <other piped command>
| [joinType] JOIN
leftAlias
rightAlias
[leftAlias]
[rightAlias]
[joinHints]
ON joinCriteria
<right-table>
Expand All @@ -79,12 +79,12 @@ SEARCH source=<left-table>

**leftAlias**
- Syntax: `left = <leftAlias>`
- Required
- Optional
- Description: The subquery alias to use with the left join side, to avoid ambiguous naming.

**rightAlias**
- Syntax: `right = <rightAlias>`
- Required
- Optional
- Description: The subquery alias to use with the right join side, to avoid ambiguous naming.

**joinHints**
Expand Down Expand Up @@ -138,11 +138,11 @@ Rewritten by PPL Join query:
```sql
SEARCH source=customer
| FIELDS c_custkey
| LEFT OUTER JOIN left = c, right = o
ON c.c_custkey = o.o_custkey AND o_comment NOT LIKE '%unusual%packages%'
| LEFT OUTER JOIN
ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%'
orders
| STATS count(o_orderkey) AS c_count BY c.c_custkey
| STATS count(1) AS custdist BY c_count
| STATS count(o_orderkey) AS c_count BY c_custkey
| STATS count() AS custdist BY c_count
| SORT - custdist, - c_count
```
_- **Limitation: sub-searches is unsupported in join right side**_
Expand All @@ -151,14 +151,15 @@ If sub-searches is supported, above ppl query could be rewritten as:
```sql
SEARCH source=customer
| FIELDS c_custkey
| LEFT OUTER JOIN left = c, right = o ON c.c_custkey = o.o_custkey
| LEFT OUTER JOIN
ON c_custkey = o_custkey
[
SEARCH source=orders
| WHERE o_comment NOT LIKE '%unusual%packages%'
| FIELDS o_orderkey, o_custkey
]
| STATS count(o_orderkey) AS c_count BY c.c_custkey
| STATS count(1) AS custdist BY c_count
| STATS count(o_orderkey) AS c_count BY c_custkey
| STATS count() AS custdist BY c_count
| SORT - custdist, - c_count
```

Expand Down
Loading

0 comments on commit 1f2ae52

Please sign in to comment.