Skip to content

Commit

Permalink
Cast types on read
Browse files Browse the repository at this point in the history
`COPY FROM parquet` is too strict when matching Postgres tupledesc schema to the schema from parquet file.
e.g. `INT32` type in the parquet schema cannot be read into a Postgres column with `int64` type.
We can avoid this situation by adding a `is_coercible(from_type, to_type)` check while matching the expected schema
from the parquet file.

With that we can coerce as shown below from parquet source type to Postgres destination types:
- INT16 => {int32, int64}
- INT32 => {int64}
- UINT16 => {int16, int32, int64}
- UINT32 => {int32, int64}
- UINT64 => {int64}
- FLOAT32 => {double}

As we use arrow as intermediate format, it might be the case that `LargeUtf8` or `LargeBinary` types are used by the external writer instead of `Utf8` and `Binary`.
That is why we also need to support below coercions for arrow source types:
- `Utf8 | LargeUtf8` => {text}
- `Binary | LargeBinary` => {bytea}

Closes #67.
  • Loading branch information
aykut-bozkurt committed Nov 14, 2024
1 parent 518a5ac commit 857989e
Show file tree
Hide file tree
Showing 10 changed files with 779 additions and 266 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ pg_test = []

[dependencies]
arrow = {version = "53", default-features = false}
arrow-cast = {version = "53", default-features = false}
arrow-schema = {version = "53", default-features = false}
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
aws-credential-types = {version = "1.2", default-features = false}
Expand Down
Loading

0 comments on commit 857989e

Please sign in to comment.