Skip to content

Commit

Permalink
Cast types on read
Browse files Browse the repository at this point in the history
`COPY FROM parquet` is too strict when matching Postgres tupledesc schema to the parquet file schema.
e.g. `INT32` type in the parquet schema cannot be read into a Postgres column with `int64` type.
We can avoid this situation by casting arrow array to the array that is expected by the tupledesc
schema, if the cast is possible. We can make use of `arrow-cast` crate, which is in the same project
with `arrow`. Its public api lets us check if a cast possible between 2 arrow types and perform the cast.

To make sure the cast is possible, we need to do 2 checks:
1. arrow-cast allows the cast from "arrow type at the parquet file" to "arrow type at the schema that is
   generated for tupledesc",
2. the cast is meaningful at Postgres. We check if there is an explicit cast from "Postgres type that corresponds
   for the arrow type at Parquet file" to "Postgres type at tupledesc".

With that we can cast between many castable types as shown below:
- INT16 => INT32
- UINT32 => INT64
- FLOAT32 => FLOAT64
- LargeUtf8 => UTF8
- LargeBinary => Binary
- Struct, Array, and Map with castable fields, e.g. [UINT16] => [INT64] or struct {'x': UINT16} => struct {'x': INT64}

**NOTE**: Struct fields must match by name and position to be cast.

Closes #67.
  • Loading branch information
aykut-bozkurt committed Nov 17, 2024
1 parent 518a5ac commit c8af064
Show file tree
Hide file tree
Showing 12 changed files with 1,518 additions and 292 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ pg_test = []

[dependencies]
arrow = {version = "53", default-features = false}
arrow-cast = {version = "53", default-features = false}
arrow-schema = {version = "53", default-features = false}
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
aws-credential-types = {version = "1.2", default-features = false}
Expand Down
Loading

0 comments on commit c8af064

Please sign in to comment.