Skip to content

Commit

Permalink
Fix ignore errors in DataFrame section (#338)
Browse files Browse the repository at this point in the history
  • Loading branch information
prrao87 authored Jan 21, 2025
1 parent 9f4d9d0 commit 604954a
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 4 deletions.
57 changes: 56 additions & 1 deletion src/content/docs/import/copy-from-dataframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,59 @@ conn.execute("COPY Person FROM pa_table")

## Ignore erroneous rows

See the [Ignore erroneous rows](/import#ignore-erroneous-rows) section for more details.
When copying from DataFrames, you can ignore rows in DataFrames that contain duplicate, null
or missing primary key errors.

:::note[Note]
Currently, you cannot ignore parsing or type-casting errors when copying from DataFrames (the
underlying data must be parseable and type-castable).
:::

Let's understand this with an example.

```py
import pandas as pd

persons = ["Rhea", "Alice", "Rhea", None]
age = [25, 23, 25, 24]

df = pd.DataFrame({"name": persons, "age": age})
print(df)
```
The given DataFrame is as follows:
```
name age
0 Rhea 25
1 Alice 23
2 Rhea 25
3 None 24
```
As can be seen,the Pandas DataFrame has a duplicate name "Rhea", and null value (`None`)
for the `name`, which is the desired primary key field. We can ignore the erroneous rows during import
by setting the `ignore_errors` parameter to `True` in the `COPY FROM` command.

```py
import kuzu

db = kuzu.Database("test_db")
conn = kuzu.Connection(db)

# Create a Person node table with name as the primary key
conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
# Enable the `ignore_errors` parameter below to ignore the erroneous rows
conn.execute("COPY Person FROM df (ignore_errors=true)")

# Display results
res = conn.execute("MATCH (p:Person) RETURN p.name, p.age")
print(res.get_as_df())
```
This is the resulting DataFrame after ignoring errors:
```
p.name p.age
0 Rhea 25
1 Alice 23
```
If the `ignore_errors` parameter is not set, the import operation will fail with an error.

You can see [Ignore erroneous rows](/import#ignore-erroneous-rows) section for details on
which kinds of errors can be ignored when copying from Pandas or Polars DataFrames.
13 changes: 10 additions & 3 deletions src/content/docs/import/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,13 @@ If the error is not skippable for a specific source, `COPY/LOAD FROM` will inste
Below is a table that shows the errors that are skippable by each source.

||Parsing Errors|Casting Errors|Duplicate/Null/Missing Primary-Key errors|
|----|----|----|----|
|CSV| X | X | X |
|JSON/Numpy/Parquet/PyArrow/Pandas/Polars Dataframes|||X|
|---|:---:|:---:|:---:|
|CSV||||
|JSON||||
|Numpy||||
|Parquet||||
|PyArrow tables||||
|Pandas DataFrames||||
|Polars DataFrames||||


0 comments on commit 604954a

Please sign in to comment.