Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower Fetch Times for S3 Objects in INTELLIGENT_TIERING compared to STANDARD Tier #3192

Open
1 task
Manjunathagopi opened this issue Nov 14, 2024 · 15 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@Manjunathagopi
Copy link

Describe the bug

Currently, we are attempting to download S3 objects in part size of 24MB, the fetch time for each 24MB chunk is noticeably slower in the INTELLIGENT_TIERING storage class compared to the STANDARD tier.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

As we know initial fetching time for intelligent tiering will be slow but once the first download is complete, the rest of the download will be invariant.

Current Behavior

As we know initial fetching time for intelligent tiering will be slow but once the first download is complete, the rest of the download will be invariant. But this is not happening using AWS CPP SDK.

Reproduction Steps

To reproduce the issue, start downloading S3 objects from the INTELLIGENT_TIERING storage class in small chunks and compare with downloading from STANDARD tiering. You'll easily observe that fetch times for each part are significantly slower in INTELLIGENT_TIERING.

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.408

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

@Manjunathagopi Manjunathagopi added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 14, 2024
@jmklix
Copy link
Member

jmklix commented Nov 15, 2024

Can you include some trace level logs of the GetObjectRequests that you are making? There should be a header included in the response that says some info about the current tier that your objects currently have

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 15, 2024
@jmklix jmklix self-assigned this Nov 15, 2024
@Manjunathagopi
Copy link
Author

@jmklix please find the trace level logs for both intelligent tiering and standard tiering below.
Intelligent-tiering logs , Standard-tiering logs

@jmklix jmklix removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 18, 2024
@jmklix
Copy link
Member

jmklix commented Nov 20, 2024

Sorry, but I was mistaken. The logs only state the the storage class is INTELLIGENT_TIERING rather then tell us what tier each object is currently at:

[TRACE] 2024-11-18 11:01:37.792 http-stream [140011056908032] id=0x7f56d001e680: Incoming header: x-amz-storage-class: INTELLIGENT_TIERING

This looks like the s3 might not have you object in the tier that you are expecting. This might be because something is wrong on the s3 side, s3 is taking longer than expected to change the tier, or s3 documentation might not be clear with it's documentation for how intelligent tiering is supposed to work. Can you try analyzing what storage tier some objects are before and after you try accessing them? You can to this with s3 Inventory and look for this field S3 Intelligent-Tiering access tier

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 20, 2024
@Manjunathagopi
Copy link
Author

@jmklix but aws s3 cp cli command is taking the same time to download the file irrespective of STANDARD or INTELLIGENT_TIERING

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 23, 2024
@DmitriyMusatkin
Copy link
Contributor

Is cli and cpp perf similar for standard tier?
Im wondering of cli is equally slow for both tiers, but for cpp something is making standard tier faster, but not intelligent tier.

In general there should be no tier specific code in sdks. To sdk is just all endpoint and it does not care what data it is pulling. My initial guess is that it might have something to do with dns resolution or connection pooling. S3 supported mva dns for over a year now, but maybe something in how cpp sdk chooses ip or how it reuses connection causes intelligent to be slower

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 27, 2024
@Manjunathagopi
Copy link
Author

@DmitriyMusatkin relatively CLI performance for both STANDARD and INTELLIGENT-TIERING is the same, so why its not the same in the case of CPP performance?

@DmitriyMusatkin
Copy link
Contributor

Hard to tell off hand without a deeper dive. What we know is on sdk side there is no difference between the tiers, sdk ends up calling the same endpoints regardless of tier. This will require some bandwidth from someone sdk team to investigate.

Some potential theories:

  • cli is just slow for both tiers, while on cpp sdk side some optimization makes standard tier go faster, but not intelligent tier. im not an expert on how intelligent tier is implemented, but i think its reasonable to assume that s3 cannot make the object hot in all hosts after initial download in intelligent tier. So its possible that CPP SDK is more eager to connect to new host on which the object is not hot
  • maybe cli and cpp send slightly different requests and that has impact on perf. thing like transfer-encoding or some defaults might be different

Note: cli does not save anything on client side between runs. So whatever results in improved perf on subsequent runs must be server side

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Dec 3, 2024
@jmklix
Copy link
Member

jmklix commented Dec 11, 2024

What timings are you seeing for STANDARD vs INTELLIGENT-TIERING are you seeing when using the cpp sdk?

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Dec 11, 2024
@Manjunathagopi
Copy link
Author

@jmklix To fetch 24MB from intelligent-tiering using CPP SDK it is taking around 1200ms, while fetching from STANDARD tiering it is taking around 400ms for the same. There is around 800ms diff between the two.
Yes we are see this using CPP SDK

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Dec 17, 2024
@Manjunathagopi
Copy link
Author

Hi,
Any update on this issue?
Thanks in advance.

@sbera87
Copy link
Contributor

sbera87 commented Jan 9, 2025

Could you please share the code snippet including which region being configured for to get the 800ms difference.
I profiled getObject on a 24MB file accessing US_WEST_2 and found standard tier to be ~37ms slower to sometimes 30ms faster which doesn't match your observation.

@jmklix jmklix removed their assignment Jan 13, 2025
@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 13, 2025
Copy link

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jan 20, 2025
@Manjunathagopi
Copy link
Author

@sbera87 I have the following code snippet for configuring the S3 CRT client. Regarding the region, it is automatically set to the instance's region because we are using an IAM role for the instance. The tests I conducted were in the ap-south-1 region. Additionally, we are manually switching the storage tier between STANDARD and INTELLIGENT_TIERING using the AWS console.

    Aws::S3Crt::ClientConfiguration config;
    config.throughputTargetGbps = 1.0;
    config.partSize = 8*1024*1024;
    config.httpRequestTimeoutMs = 1000;
    config.connectTimeoutMs = 1000;
    config.requestTimeoutMs = 1000;  
    s3_crt_client_ = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jan 21, 2025
@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 21, 2025
@sbera87
Copy link
Contributor

sbera87 commented Jan 21, 2025

I tried to replicate tests over 40+ iterations using the following test code and I don't really have the same observation that Intelligent tier is 400ms slower than standard. My observation is over the many runs, intelligent tier is still faster.

`
void DownloadFile(const Aws::String& bucket_name, const Aws::String& object_key, const Aws::String& destination_file)
{
auto Limiter = Aws::MakeShared<Aws::Utils::RateLimits::DefaultRateLimiter<>>(ALLOCATION_TAG, 50000000);

    Aws::S3Crt::ClientConfiguration config;
    config.throughputTargetGbps = 1.0;
    config.partSize = 8*1024*1024;
    config.httpRequestTimeoutMs = 1000;
    config.connectTimeoutMs = 1000;
    config.requestTimeoutMs = 1000;  

    Aws::S3Crt::S3CrtClient s3_client(config);


    // Open the destination file for writing
    std::ofstream output_file(destination_file.c_str(), std::ios::binary);

    if (!output_file) {
        std::cerr << "Failed to open destination file." << std::endl;
        return;
    }

    // Create a GetObjectRequest with the byte range
    Aws::S3Crt::Model::GetObjectRequest get_object_request;
    get_object_request.SetBucket(bucket_name);
    get_object_request.SetKey(object_key);
    auto start = std::chrono::high_resolution_clock::now();
    auto get_object_outcome = s3_client.GetObject(get_object_request);
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);

    if (get_object_outcome.IsSuccess()) {
        // Write the part to the destination file
        auto& retrieved_file = get_object_outcome.GetResultWithOwnership().GetBody();
        std::ofstream output_file(destination_file, std::ios::binary);
        output_file << retrieved_file.rdbuf(); // Write the stream content to the file
        std::cout << "File downloaded to " << destination_file << std::endl;

    } else {
        std::cerr << "Failed to download file: " << get_object_outcome.GetError().GetMessage() << std::endl;
    }

    std::cout<<"took "<<duration.count()<<" microseconds"<<std::endl;
}`

@Manjunathagopi
Copy link
Author

Hi @sbera87 Thanks for the update.
I tried downloading all the S3 object at once and observed the same results as you—Intelligent-Tiering performed slightly faster compared to the Standard storage class.

However, the issue arises when fetching data in parts(using SetRange) instead of downloading the entire object at once. For example, consider an object of size 250MB that needs to be downloaded sequentially in 10MB chunks by specifying byte ranges such as 0-10MB, 10-20MB, and so on. In this scenario, Intelligent-Tiering is significantly slower compared to the Standard storage class.

Please try this approach on your end to confirm if you observe the same behavior. Also attaching the code snippet for this.

    Aws::S3Crt::S3CrtClient *s3_crt_client;
    Aws::S3Crt::Model::GetObjectRequest object_request;
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration config;
    config.httpRequestTimeoutMs = 100;
    config.connectTimeoutMs = 100;
    config.requestTimeoutMs = 100;  
    config.throughputTargetGbps = 1;
    config.partSize = 8*1024*1024;
   uint64_t read_size; // total file size.
   uint64_t part_size = 24*1024*1024;
    s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

    char bucket[128] = {0};
    char key[128] = {0};
    Aws::S3Crt::Model::HeadObjectRequest head_object_request;
    Aws::S3Crt::Model::HeadObjectOutcome outcome;
    head_object_request.SetBucket(bucket);
    head_object_request.SetKey(key);
    outcome = s3_crt_client->HeadObject(head_object_request);
    object_request.SetBucket(bucket);
    object_request.SetKey(key);
    pos = 0;
    buf = new char[part_size];
    read_left = read_size;
    while(read_left != 0)
    {
        ret = 0;
        to_read = part_size <= read_left ? part_size : read_left;
        iter ++;
        uint64_t start,end;
        Aws::S3Crt::Model::GetObjectOutcome outcome;
        object_request.SetRange(std::string("bytes=") + std::to_string(pos) + "-" + std::to_string(pos+to_read-1));
        object_request.SetResponseStreamFactory(
                [buf, to_read]()
                {
                std::unique_ptr<Aws::StringStream>
                stream(Aws::New<Aws::StringStream>("test"));
                stream->rdbuf()->pubsetbuf(static_cast<char*>(buf),
                        to_read);

                return stream.release();
                });
        start = get_curr_time_ms();
        outcome = s3_crt_client->GetObject(object_request);
        end = get_curr_time_ms();
        printf("Time taken for GetObject %lums\n",end-start);
        if(outcome.IsSuccess())
        {
            ret = outcome.GetResult().GetContentLength();

            if(ret <= to_read)
            {
                pos += ret;
                //log_info("Testing data %02x %02x", ((uint8_t*)buf)[0], ((uint8_t*)buf)[1]);
            }
            else
            {
                printf("This should never happen, pos_:%lu, requested:%lu, read:%lu\n",
                        pos, to_read, ret);
                break;
            }
        }
        else
        {
            printf("Failed to read, pos_:%lu, requested:%lu, error:%s, error_type:%d, error_code:%d\n",
                    pos, to_read, outcome.GetError().GetMessage().c_str(), outcome.GetError().GetErrorType(), outcome.GetError().GetResponseCode()) ;
            break;
        }
        if(ret == 0)
        {
            printf("Nothing read, read_left:%lubytes, iter:%lu\n", read_left, iter);
            break;
        }
        else if(ret < to_read)
        {
            printf("Read %ldbytes instead of %ldbytes, iter:%lu\n", ret, to_read, iter);
        }
        else
        {
            printf("Read %ld bytes, iter:%lu\n", ret, iter);
        }
        read_left -= ret;
    }

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

4 participants