-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added ListObjectsV2 call #64
base: master
Are you sure you want to change the base?
Conversation
CC @pszufe |
ListObjects is limited to 1,000 objects.
I think this makes the most sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a first pass on this CR. I'm in favour of just marking list_objects
as deprecated and getting people to use v2.
Also, could you bump the version number in the Project.toml
, and add some testing for this?
if delimiter != "" | ||
q["delimiter"] = delimiter | ||
end | ||
if contoken ≠ "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency we should probably keep using !=
instead of ≠
# Add each object from the response and update our object count / marker | ||
if haskey(r, "Contents") | ||
l = isa(r["Contents"], Vector) ? r["Contents"] : [r["Contents"]] | ||
for object in l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of iterating over these objects, could we not just put!
the vector into the channel? (Not sure if this is possible).
And then just do num_objects += size(l)[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the xml_dict() function still need to be applied to each element in that vector before put!
-ing the vector to the channel?
In production systems an S3 bucket can contain millions of items and S3 has no directory structure (this what you see as directories are in fact name prefixes so there is not such thing as directory index). I just noticed that you have a |
Note that there is (and has been) an option for limiting the number of items returned, so I'd assume it would work fine as is. Is there a consensus on deprecating the older If so I will eliminate the old function and try to make sure we don't lose any functionality in the new one (might have to add a couple of keys, I will check). |
The AWS S# REST API docs reads (https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html):
So it seems to be reasonable to depreciate the old version. |
I am in favour of adding the |
Any concensus about what should be done here so I can update the PR? Are we ok with deprecating the old version? |
The only reason I can think of for keeping both is that the pagination arguments are different between calls. ListObjects uses |
In the V1 ListObjects API, the "marker" parameter causes the API to return objects that follow the marker lexicographically: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax In the ListObjectsV2 API, the "start-after" parameter causes the API to return objects that follow the start-after parameter lexicographically: My feeling is that exposing the "start-after" parameter in the s3_list_objects_v2 function would allow anyone relying on the old "marker" parameter to do what they need to do. Thus you could deprecate the old ListObjects / s3_list_objects. Do others agree? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding the start-after
query parameter, this should allow people to achieve with the ListObjectsV2 API what they were able to do with the deprecated ListObjects API.
|
||
""" | ||
s3_list_objects_v2([::AWSConfig], bucket, [path_prefix]; delimiter="/", max_items=1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3_list_objects_v2([::AWSConfig], bucket, [path_prefix]; delimiter="/", max_items=1000) | |
s3_list_objects_v2([::AWSConfig], bucket, [path_prefix]; delimiter="/", start_after="", max_items=1000) |
|
||
This uses the `ListObjectV2` function call. | ||
""" | ||
function s3_list_objects_v2(aws::AWSS3.AWSConfig, bucket, path_prefix=""; delimiter="/", max_items=nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function s3_list_objects_v2(aws::AWSS3.AWSConfig, bucket, path_prefix=""; delimiter="/", max_items=nothing) | |
function s3_list_objects_v2(aws::AWSS3.AWSConfig, bucket, path_prefix=""; delimiter="/", start_after="", max_items=nothing) |
end | ||
if contoken ≠ "" | ||
q["continuation-token"] = contoken | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | |
end | |
if start_after != "" | |
q["start-after"] = start_after | |
end |
if delimiter != "" | ||
q["delimiter"] = delimiter | ||
end | ||
if contoken ≠ "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if contoken ≠ "" | |
if contoken != "" |
Just bumping here to see if there's anything more I can do to help move this along. |
If it is useful, I have collected I think all of the feedback, plus tests into the master branch of my fork: https://github.com/pschmied/AWSS3.jl |
I'm currently quite busy with other work, but at some point I will have time to dedicate to this. If someone else could take the lead on reviewing / giving feedback that'd be greatly appreciated! |
This adds the ListObjectsV2 call. As far as I can tell, this is the only way to actually list all objects in a bucket. No matter how I try, the older version never lists more than 1000 objects for me.
Right now the way I have this written gives lots of code duplication. I left it this way for now because merging
s3_list_objects
ands3_list_objects_v2
was a little bit messy. As I see it we have 3 options:s3_list_objects
in favor of the new version. I can't see any reason not to do this as the older version seems like a subset of the new one.