Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no possibility to omit double file reading caused by checksum calculation #2968

Closed
ThatEmbeddedGuy opened this issue May 23, 2024 · 4 comments
Labels
bug This issue is a bug. duplicate This issue is a duplicate. p3 This is a minor priority issue

Comments

@ThatEmbeddedGuy
Copy link

ThatEmbeddedGuy commented May 23, 2024

Describe the bug

Even if
Aws::S3::Model::ChecksumAlgorithm::NOT_SET

object_request.WithBucket(bucket).WithKey(s3key).WithChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::NOT_SET);
              object_request.SetBody(stdstream);

SDK rewrites it to MD5

Aws::String PutObjectRequest::GetChecksumAlgorithmName() const
{
  if (m_checksumAlgorithm == ChecksumAlgorithm::NOT_SET)
  {
    return "md5";
  }
  else
  {
    return ChecksumAlgorithmMapper::GetNameForChecksumAlgorithm(m_checksumAlgorithm);
  }
} 

https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3/source/model/PutObjectRequest.cpp#L329

It leads to checksum calculation which is passed in the header

content-md5: aOEJ8PQMpyoV4FzCJ4b45g==

full log from curl :

[DEBUG] CURL: (HeaderOut) PUT /XXXXXXXX HTTP/1.1
Host: XXXXX.s3.ap-southeast-1.amazonaws.com
Accept: /
amz-sdk-invocation-id: E01AF2D1-E5F1-4469-B02F-76304CFB417F
amz-sdk-request: attempt=1
authorization: AWS4-HMAC-SHA256 Credential=XXX/XXX/ap-southeast-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content->length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=35038b4005c0f8ec52b6084c2cd84461408ae0ed1056e4b3788543e7009e5360
content-length: 10
content-md5: aOEJ8PQMpyoV4FzCJ4b45g==
content-type: binary/octet-stream
user-agent: aws-sdk-cpp/1.9.310 Linux/5.15.0-202.135.2.el9uek.x86_64 x86_64 GCC/11.4.1
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240425T125503Z
Expect: 100-continue

Headers are calculated before factual sending of file, therefore data has been read 2 times. One time for calculating checksumm and one for factual sending

Related issue was considered as bug there
#711

Expected Behavior

I should have possibility to disable twice data readings caused by checksum calculations

Current Behavior

Checksum calculation is passed in the header and is calculated before request

content-md5: aOEJ8PQMpyoV4FzCJ4b45g==

full log from curl :

[DEBUG] CURL: (HeaderOut) PUT /XXXXXXXX HTTP/1.1
Host: XXXXX.s3.ap-southeast-1.amazonaws.com
Accept: /
amz-sdk-invocation-id: E01AF2D1-E5F1-4469-B02F-76304CFB417F
amz-sdk-request: attempt=1
authorization: AWS4-HMAC-SHA256 Credential=XXX/XXX/ap-southeast-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content->length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=35038b4005c0f8ec52b6084c2cd84461408ae0ed1056e4b3788543e7009e5360
content-length: 10
content-md5: aOEJ8PQMpyoV4FzCJ4b45g==
content-type: binary/octet-stream
user-agent: aws-sdk-cpp/1.9.310 Linux/5.15.0-202.135.2.el9uek.x86_64 x86_64 GCC/11.4.1
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240425T125503Z
Expect: 100-continue

Headers are calculated before factual sending of file, therefore data has been read 2 times. One for calculating checksumm, one for factual sending

Reproduction Steps

Standard example from aws sdk with ChecksumAlgorithm::NOT_SET

bool AwsDoc::S3::PutObject(const Aws::String &bucketName,
                           const Aws::String &fileName,
                           const Aws::Client::ClientConfiguration &clientConfig) {
    Aws::S3::S3Client s3_client(clientConfig);

object_request.WithBucket(bucket).WithKey(s3key).WithChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::NOT_SET);

    std::shared_ptr<Aws::IOStream> inputData =
            Aws::MakeShared<Aws::FStream>("SampleAllocationTag",
                                          fileName.c_str(),
                                          std::ios_base::in | std::ios_base::binary);

    if (!*inputData) {
        std::cerr << "Error unable to read file " << fileName << std::endl;
        return false;
    }

    request.SetBody(inputData);

    Aws::S3::Model::PutObjectOutcome outcome =
            s3_client.PutObject(request);

    if (!outcome.IsSuccess()) {
        std::cerr << "Error: PutObject: " <<
                  outcome.GetError().GetMessage() << std::endl;
    }
    else {
        std::cout << "Added object '" << fileName << "' to bucket '"
                  << bucketName << "'.";
    }

    return outcome.IsSuccess();
}

Possible Solution

Doesn't rewrite checksum algorythm as MD5 when is passed as ChecksumAlgorithm::NOT_SET

Additional Information/Context

No response

AWS CPP SDK version used

1.9.310

Compiler and Version used

GCC/11.4.1

Operating System and version

Oracle Linux 9

@ThatEmbeddedGuy ThatEmbeddedGuy added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 23, 2024
@ThatEmbeddedGuy
Copy link
Author

I suppose original issue is caused by misinterpretation of NOT_SET value. It can be interpreted as explicitly not set and not set just in case, therefore we need to set default value

I propose to add something like ChecksumAlgorithm::DISABLED
which is explicitly disables checksum calculation

@DmitriyMusatkin
Copy link
Contributor

Yes, this is a known issue and something we are trying to figure out how to address. Related ticket is #2933.
In short, with the release of additional checksums for S3 in 1.9.x, checksums were made required by default on put, so either old md5 checksum or the new additional checksum must be present. ChecksumAlgorithm::NOT_SET adds some confusion, as it does not mean that no checksum should be used, it means that additional checksum is not set and SDK should fallback to md5.
With that said, all additional checksums are calculated as body is streamed and are sent in the trailer, so the body will only be read once.

@jmklix jmklix added p3 This is a minor priority issue duplicate This issue is a duplicate. and removed needs-triage This issue or PR still needs to be triaged. labels May 24, 2024
@jmklix
Copy link
Member

jmklix commented May 28, 2024

Closing as duplicate of this feature request: #2933

@jmklix jmklix closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. duplicate This issue is a duplicate. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

3 participants