You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hadoop now supports a vectored read API optimized for seek() heavy workloads; to use it, your file system must implement a new method readVectored in its positional stream class. Parquet 1.14+ supports vectored IO via the parquet.hadoop.vectored.io.enabledconfiguration option, and gcs-connector implements readVectored for GCS file streams in 3.0.1+.
Currently we're blocked from upgrading to gcs-connector 3.x until Beam does (I ran into a NoSuchMethodError on the Beam side when I tried just upgrading Scio). once Beam is released with gcs-connector we should benchmark vectored IO, particularly on Parquet SMB reads.
The text was updated successfully, but these errors were encountered:
Another note, gcs-connector 3.x drops Java 8 support , so we're blocked on #5067. Additionally, it has some breaking API changes, so we need Beam to upgrade first: apache/beam#31896
Hadoop now supports a vectored read API optimized for seek() heavy workloads; to use it, your file system must implement a new method
readVectored
in its positional stream class. Parquet 1.14+ supports vectored IO via theparquet.hadoop.vectored.io.enabled
configuration option, and gcs-connector implementsreadVectored
for GCS file streams in 3.0.1+.Currently we're blocked from upgrading to gcs-connector 3.x until Beam does (I ran into a NoSuchMethodError on the Beam side when I tried just upgrading Scio). once Beam is released with gcs-connector we should benchmark vectored IO, particularly on Parquet SMB reads.
The text was updated successfully, but these errors were encountered: