S3AsyncClient getObject write file directly to disk #5660

earlybard · 2024-10-11T03:28:14Z

Describe the feature

Allow S3AsyncClient.getObject to write downloaded objects directly to disk, rather than buffering in ByteBuffers via an AsyncResponseTransformer.

aws-crt-java recently added support for this under the hood: awslabs/aws-crt-java#825

Use Case

When dealing with large objects (10GB+) and high speeds (10Gb/s), the Java heap is quickly exhausted when downloading files via GetObject, even if their destination is on disk via e.g. client.getObject(req, AsyncResponseTransformer.toFile(file))

This causes gigabytes of unnecessary allocations and GCs, to the point of the AWS Java SDK not being feasible for my application that deals with large files.

My current solution is to call a standalone native binary to perform this download to disk, which adds plenty of extra complexity and loses the many benefits of using your SDK.

Another advantage was stated in the crt-java repo: awslabs/aws-crt-java#825 (comment)

It would lower latency, by removing an additional copy from C -> Java, and improve memory usage (no need to allocate a ByteBuffer to hold the additional copy).

Proposed Solution

No preference how this is implemented, either a standalone S3AsyncClient::getObjectToFile interface method, or an option on GetObjectRequest.

Other Information

I don't have any issue when calling PutObject for very large files from disk. The JVM heap usage stay very low.

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

AWS Java SDK version used

2.28.20

JDK version used

21

Operating System and version

Ubuntu 24.04

The text was updated successfully, but these errors were encountered:

debora-ito · 2024-10-11T18:31:54Z

@earlybard feature request acknowledged, we'll work on the integration with aws-crt-java.

earlybard added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Oct 11, 2024

debora-ito added crt-client p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3AsyncClient getObject write file directly to disk #5660

S3AsyncClient getObject write file directly to disk #5660

earlybard commented Oct 11, 2024 •

edited

Loading

debora-ito commented Oct 11, 2024

S3AsyncClient getObject write file directly to disk #5660

S3AsyncClient getObject write file directly to disk #5660

Comments

earlybard commented Oct 11, 2024 • edited Loading

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

AWS Java SDK version used

JDK version used

Operating System and version

debora-ito commented Oct 11, 2024

earlybard commented Oct 11, 2024 •

edited

Loading