Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jonathanc-n committed Nov 13, 2024
1 parent 54ab128 commit bc88136
Show file tree
Hide file tree
Showing 4 changed files with 110 additions and 313 deletions.
10 changes: 2 additions & 8 deletions dev/update_function_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ npx [email protected] --write "$TARGET_FILE"

echo "'$TARGET_FILE' successfully updated!"

TARGET_FILE="docs/source/user-guide/sql/window_functions_new.md"
TARGET_FILE="docs/source/user-guide/sql/window_functions.md"
PRINT_WINDOW_FUNCTION_DOCS_COMMAND="cargo run --manifest-path datafusion/core/Cargo.toml --bin print_functions_docs -- window"

echo "Inserting header"
Expand Down Expand Up @@ -146,13 +146,7 @@ dev/update_function_docs.sh file for updating surrounding text.
-->
# Window Functions (NEW)
Note: this documentation is in the process of being migrated to be [automatically created from the codebase].
Please see the [Window Functions (Old)](window_functions.md) page for
the rest of the documentation.
[automatically created from the codebase]: https://github.com/apache/datafusion/issues/12740
# Window Functions
A _window function_ performs a calculation across a set of table rows that are somehow related to the current row.
This is comparable to the type of calculation that can be done with an aggregate function.
Expand Down
1 change: 0 additions & 1 deletion docs/source/user-guide/sql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ SQL Reference
operators
aggregate_functions
window_functions
window_functions_new
scalar_functions
special_functions
sql_status
Expand Down
122 changes: 108 additions & 14 deletions docs/source/user-guide/sql/window_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,25 @@
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either expressioness or implied. See the License for the
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

<!---
This file was generated by the dev/update_function_docs.sh script.
Do not edit it manually as changes will be overwritten.
Instead, edit the WindowUDFImpl's documentation() function to
update documentation for an individual UDF or the
dev/update_function_docs.sh file for updating surrounding text.
-->

# Window Functions

A _window function_ performs a calculation across a set of table rows that are somehow related to the current row.

Note: this documentation is in the process of being migrated to be [automatically created from the codebase].
Please see the [Window Functions (new)](window_functions_new.md) page for
the rest of the documentation.

[automatically created from the codebase]: https://github.com/apache/datafusion/issues/12740

Window functions are comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result
This is comparable to the type of calculation that can be done with an aggregate function.
However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would.
Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result

Here is an example that shows how to compare each employee's salary with the average salary in his or her department:

Expand Down Expand Up @@ -146,45 +149,136 @@ RANGE and GROUPS modes require an ORDER BY clause (with RANGE the ORDER BY must

All [aggregate functions](aggregate_functions.md) can be used as window functions.

## Analytical functions
## Ranking Functions

- [cume_dist](#cume_dist)
- [dense_rank](#dense_rank)
- [ntile](#ntile)
- [percent_rank](#percent_rank)
- [rank](#rank)
- [row_number](#row_number)

### `cume_dist`

Relative rank of the current row: (number of rows preceding or peer with current row) / (total rows).

```
cume_dist()
```

### `dense_rank`

Returns the rank of the current row without gaps. This function ranks rows in a dense manner, meaning consecutive ranks are assigned even for identical values.

```
dense_rank()
```

### `ntile`

Integer ranging from 1 to the argument value, dividing the partition as equally as possible

```
ntile(expression)
```

#### Arguments

- **expression**: An integer describing the number groups the partition should be split into

### `percent_rank`

Returns the percentage rank of the current row within its partition. The value ranges from 0 to 1 and is computed as `(rank - 1) / (total_rows - 1)`.

```
percent_rank()
```

### `rank`

Returns the rank of the current row within its partition, allowing gaps between ranks. This function provides a ranking similar to `row_number`, but skips ranks for identical values.

```
rank()
```

### `row_number`

Number of the current row within its partition, counting from 1.

```
row_number()
```

## Analytical Functions

- [first_value](#first_value)
- [lag](#lag)
- [last_value](#last_value)
- [lead](#lead)
- [nth_value](#nth_value)

### `first_value`

Returns value evaluated at the row that is the first row of the window frame.

```sql
```
first_value(expression)
```

#### Arguments

- **expression**: Expression to operate on

### `lag`

Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value).

```
lag(expression, offset, default)
```

#### Arguments

- **expression**: Expression to operate on
- **offset**: Integer. Specifies how many rows back the value of expression should be retrieved. Defaults to 1.
- **default**: The default value if the offset is not within the partition. Must be of the same type as expression.

### `last_value`

Returns value evaluated at the row that is the last row of the window frame.

```sql
```
last_value(expression)
```

#### Arguments

- **expression**: Expression to operate on

### `lead`

Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value).

```
lead(expression, offset, default)
```

#### Arguments

- **expression**: Expression to operate on
- **offset**: Integer. Specifies how many rows forward the value of expression should be retrieved. Defaults to 1.
- **default**: The default value if the offset is not within the partition. Must be of the same type as expression.

### `nth_value`

Returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row.

```sql
```
nth_value(expression, n)
```

#### Arguments

- **expression**: The name the column of which nth value to retrieve
- **n**: Integer. Specifies the _n_ in nth
- **n**: Integer. Specifies the n in nth
Loading

0 comments on commit bc88136

Please sign in to comment.