S3AsyncClient getObject write file directly to disk #5660
Labels
crt-client
feature-request
A feature should be added or improved.
p2
This is a standard priority issue
Describe the feature
Allow
S3AsyncClient.getObject
to write downloaded objects directly to disk, rather than buffering in ByteBuffers via anAsyncResponseTransformer
.aws-crt-java
recently added support for this under the hood: awslabs/aws-crt-java#825Use Case
When dealing with large objects (10GB+) and high speeds (10Gb/s), the Java heap is quickly exhausted when downloading files via GetObject, even if their destination is on disk via e.g.
client.getObject(req, AsyncResponseTransformer.toFile(file))
This causes gigabytes of unnecessary allocations and GCs, to the point of the AWS Java SDK not being feasible for my application that deals with large files.
My current solution is to call a standalone native binary to perform this download to disk, which adds plenty of extra complexity and loses the many benefits of using your SDK.
Another advantage was stated in the crt-java repo: awslabs/aws-crt-java#825 (comment)
Proposed Solution
No preference how this is implemented, either a standalone
S3AsyncClient::getObjectToFile
interface method, or an option on GetObjectRequest.Other Information
I don't have any issue when calling
PutObject
for very large files from disk. The JVM heap usage stay very low.Acknowledgements
AWS Java SDK version used
2.28.20
JDK version used
21
Operating System and version
Ubuntu 24.04
The text was updated successfully, but these errors were encountered: