Fix ignore errors in DataFrame section (#338)

kuzudb · Jan 21, 2025 · 604954a · 604954a
1 parent 9f4d9d0
commit 604954a
Show file tree

Hide file tree

Showing 2 changed files with 66 additions and 4 deletions.
diff --git a/src/content/docs/import/copy-from-dataframe.md b/src/content/docs/import/copy-from-dataframe.md
@@ -76,4 +76,59 @@ conn.execute("COPY Person FROM pa_table")
 
 ## Ignore erroneous rows
 
-See the [Ignore erroneous rows](/import#ignore-erroneous-rows) section for more details.
+When copying from DataFrames, you can ignore rows in DataFrames that contain duplicate, null
+or missing primary key errors.
+
+:::note[Note]
+Currently, you cannot ignore parsing or type-casting errors when copying from DataFrames (the
+underlying data must be parseable and type-castable).
+:::
+
+Let's understand this with an example.
+
+```py
+import pandas as pd
+
+persons = ["Rhea", "Alice", "Rhea", None]
+age = [25, 23, 25, 24]
+
+df = pd.DataFrame({"name": persons, "age": age})
+print(df)
+```
+The given DataFrame is as follows:
+```
+    name  age
+0   Rhea   25
+1  Alice   23
+2   Rhea   25
+3   None   24
+```
+As can be seen,the Pandas DataFrame has a duplicate name "Rhea", and null value (`None`)
+for the `name`, which is the desired primary key field. We can ignore the erroneous rows during import
+by setting the `ignore_errors` parameter to `True` in the `COPY FROM` command.
+
+```py
+import kuzu
+
+db = kuzu.Database("test_db")
+conn = kuzu.Connection(db)
+
+# Create a Person node table with name as the primary key
+conn.execute("CREATE NODE TABLE Person(name STRING PRIMARY KEY, age INT64)")
+# Enable the `ignore_errors` parameter below to ignore the erroneous rows
+conn.execute("COPY Person FROM df (ignore_errors=true)")
+
+# Display results
+res = conn.execute("MATCH (p:Person) RETURN p.name, p.age")
+print(res.get_as_df())
+```
+This is the resulting DataFrame after ignoring errors:
+```
+  p.name  p.age
+0   Rhea     25
+1  Alice     23
+```
+If the `ignore_errors` parameter is not set, the import operation will fail with an error.
+
+You can see [Ignore erroneous rows](/import#ignore-erroneous-rows) section for details on
+which kinds of errors can be ignored when copying from Pandas or Polars DataFrames.
diff --git a/src/content/docs/import/index.mdx b/src/content/docs/import/index.mdx
@@ -208,6 +208,13 @@ If the error is not skippable for a specific source, `COPY/LOAD FROM` will inste
 Below is a table that shows the errors that are skippable by each source.
 
 ||Parsing Errors|Casting Errors|Duplicate/Null/Missing Primary-Key errors|
-|----|----|----|----|
-|CSV| X | X | X |
-|JSON/Numpy/Parquet/PyArrow/Pandas/Polars Dataframes|||X|
+|---|:---:|:---:|:---:|
+|CSV| ✅ | ✅ | ✅ |
+|JSON| ❌ | ❌ | ✅ |
+|Numpy| ❌ | ❌ | ✅ |
+|Parquet| ❌ | ❌ | ✅ |
+|PyArrow tables| ❌ | ❌ | ✅ |
+|Pandas DataFrames| ❌ | ❌ | ✅ |
+|Polars DataFrames| ❌ | ❌ | ✅ |
+
+