Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give AWSConfig an abstract supertype #212

Closed
wants to merge 12 commits into from

Conversation

meggart
Copy link
Contributor

@meggart meggart commented Oct 7, 2020

This is in response to the problems which came up in JuliaIO/Zarr.jl#11 and which would partially be solved in #211 . The basic idea is to type the functions of the API a bit more loosely so that users can create their own subtype of AbstractAWSConfig which can modify the behavior of creating a request in several places. For example, in case a downstream package only wants to access public S3 buckets, they can define the following:

using AWS
struct AnonymousAWS <: AbstractAWSConfig
    region::String
end
struct NoCredentials end
AWS.region(aws::AnonymousAWS) = aws.region
AWS.credentials(aws::AnonymousAWS) = NoCredentials()
AWS.check_credentials(c::NoCredentials) = c
AWS._sign!(aws::AnonymousAWS, ::AWS.Request) = nothing

This simply overloads the _sign! function by skipping the step, leaving the rest unchanged and we can access the data without credentials:

@service S3
r = S3.list_objects_v2("/zarr-demo", Dict("prefix"=>"store/foo/", "delimiter"=>"/"))

Another example, suppose a downstream lib wants to access anonymous data from GCS using this package, they do it in a similar way:

struct AnonymousGCS <:AbstractAWSConfig end
AWS.region(aws::AnonymousGCS) = ""
AWS.credentials(aws::AnonymousGCS) = NoCredentials()
AWS._sign!(aws::AnonymousGCS, ::AWS.Request) = nothing
function AWS._generate_service_url(aws::AnonymousGCS, service::String, resource::String)
    service == "s3" || throw(ArgumentError("Can only handle s3 requests to GCS"))

    return string("https://storage.googleapis.com.", resource)
end
AWS.global_aws_config(AnonymousGCS())

This modifies the way the service urls are generated and the following works:

r = S3.list_objects("cmip6", Dict("prefix"=>"ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp370/r4i1p1f1/Amon/", "delimiter"=>"/"))

The nice thing is that this way we don't need to anticipate the exact way service urls might be constructed by a yet unknown service, or in which way they are signed, we can always overload the functions without having to modify the AWS.jl code.

I know this is quite a big changeand more intrusive than #211 would solver the problem, but for me would be a huge step in being able to use Zarr with different S3 backends.

I have not added tests for this yet, since I wanted to get initial feedback on the design first, please let me know, if I can move ahead.

@mattBrzezinski mattBrzezinski self-requested a review October 7, 2020 14:37
@mattBrzezinski
Copy link
Member

@meggart These changes look good to me! I have a few minor things with the changes (I'll add comments when this is near completion) but overall this makes sense to me.
The big thing here I think would be to have a test case for uploading / retrieving files from GCS (or some other provider). Similar to how they're done for S3;

https://github.com/JuliaCloud/AWS.jl/blob/master/test/AWS.jl#L746-L761

I can handle setting up an account, encrypting credentials, and writing those tests.

@meggart
Copy link
Contributor Author

meggart commented Oct 8, 2020

An additional test that I tried because it can be run locally is a small Minio instance. If you just launch a server with the following settings:

export MINIO_SECRET_KEY=12345678
export MINIO_REGION_NAME="jena"
export MINIO_ACCESS_KEY=fgans
minio server data

it is possible to do read and write operations using this package:

using AWS
struct MinioConfig <: AbstractAWSConfig 
endpoint::String
region::String
creds
end
AWS.region(c::MinioConfig) = c.region
AWS.credentials(c::MinioConfig) = c.creds
struct SimpleCredentials
    access_key_id::String
    secret_key::String
    token::String
end
AWS.check_credentials(c::SimpleCredentials) = c
function AWS._generate_service_url(aws::MinioConfig, service::String, resource::String)
    service == "s3" || throw(ArgumentError("Can only handle s3 requests to Minio"))
    return string(aws.endpoint, resource)
end

AWS.global_aws_config(MinioConfig("http://127.0.0.1:9000","jena",SimpleCredentials("fgans", "12345678","")))

@service S3

S3.create_bucket("anewbucket")

S3.put_object("anewbucket","myobject",Dict("body"=>"Hi from Minio"))

S3.get_object("anewbucket","myobject") |> String

This successfully prevents AWS from checking the credentials with the AWS server while still signing the message and connects to the custom endpoint. Would that qualify as a test case?

@mattBrzezinski
Copy link
Member

This successfully prevents AWS from checking the credentials with the AWS server while still signing the message and connects to the custom endpoint. Would that qualify as a test case?

Yes that works. I would want to include this for the Travis CI system we have in place. I actually created an issue for this a long time ago back in AWSCore.jl

JuliaCloud/AWSCore.jl#100

Would need to change the .travis.yml to include a before_script which downloads, and starts the Minio server. Then just include your test case above in the Julia package.

@meggart meggart changed the title RFC: Give AWSConfig an abstract supertype WIP: Give AWSConfig an abstract supertype Oct 9, 2020
@iamed2
Copy link
Member

iamed2 commented Oct 26, 2020

This is how minio-go used to test on Travis: https://github.com/minio/minio-go/blob/c5ac8248d5e1f0a70329451f0a90209e9c1d5e62/.travis.yml#L24

(they use GitHub Actions now)

@meggart
Copy link
Contributor Author

meggart commented Oct 30, 2020

Thanks for this and sorry for the long silence. Ok, then I will launch the minio server through the travis script. As an alternative I was thinking about defining an artifact for the minio executable, which would have the advantage that tests would also "just work" locally and not only on github, but this would have other problems like we having to update the artifact every time a new version of minio is released.

So I will copy the approach form here https://github.com/minio/minio-go/blob/c5ac8248d5e1f0a70329451f0a90209e9c1d5e62/.travis.yml#L24

@meggart
Copy link
Contributor Author

meggart commented Nov 2, 2020

A short question: Is there any way to trigger a travis build? It looks like this is not happening automatically

@mattBrzezinski
Copy link
Member

A short question: Is there any way to trigger a travis build? It looks like this is not happening automatically

This package like other AWS ones are protected for triggering Travis builds. This is because we at Invenia are using our credentials for testing CI, we don't want someone to maliciously commit AWS requests which might leak, or mess up the account. We use bors to protect the repo so only maintainers can trigger CI pipelines.

@mattBrzezinski
Copy link
Member

bors try

@bors
Copy link
Contributor

bors bot commented Nov 2, 2020

try

Merge conflict.

@meggart meggart changed the title WIP: Give AWSConfig an abstract supertype Give AWSConfig an abstract supertype Nov 5, 2020
@meggart
Copy link
Contributor Author

meggart commented Nov 5, 2020

Hey all,

I have now rebased my changes, some basic minio unit tests are added and I have added a section to the Readme, basically giving the short examples that I posted in this PR. I would be happy about a review and some hints how to proceed. My main problem was that I could not re-generate the high-level API files as I did in a previous version of this PR, I would need some Github credentials and I have not looked into this yet.

Any hints about how you handle this auto-generation inside PRs would be most welcome.

@meggart
Copy link
Contributor Author

meggart commented Nov 13, 2020

Bump, can one of the package maintainers give a hint how to re-generate the API functions, maybe in an automated way?

@mattBrzezinski
Copy link
Member

Bump, can one of the package maintainers give a hint how to re-generate the API functions, maybe in an automated way?

I'm so sorry I forgot to respond to your last post here! If you're on the Julia Slack and ever have a question just ping me there and I can get back to you ASAP.

To regenerate the definitions:

julia --project -e "using AWS; AWS.AWSMetadata.parse_aws_metadata()"

@meggart
Copy link
Contributor Author

meggart commented Nov 13, 2020

This is exactly what I tried but I get

ERROR: KeyError: key "GITHUB_AUTH" not found
Stacktrace:
 [1] (::Base.var"#477#478")(::String) at ./env.jl:79
 [2] access_env at ./env.jl:43 [inlined]
 [3] getindex at ./env.jl:79 [inlined]
 [4] parse_aws_metadata() at /home/fgans/julia_depots/dev/AWS/src/AWSMetadata.jl:23
 [5] top-level scope at REPL[3]:1

which worked a few weeks ago but you seem to have changed something in the way the metadata is downloaded which requires some github authorization now.

@mattBrzezinski
Copy link
Member

This is exactly what I tried but I get

ERROR: KeyError: key "GITHUB_AUTH" not found
Stacktrace:
 [1] (::Base.var"#477#478")(::String) at ./env.jl:79
 [2] access_env at ./env.jl:43 [inlined]
 [3] getindex at ./env.jl:79 [inlined]
 [4] parse_aws_metadata() at /home/fgans/julia_depots/dev/AWS/src/AWSMetadata.jl:23
 [5] top-level scope at REPL[3]:1

which worked a few weeks ago but you seem to have changed something in the way the metadata is downloaded which requires some github authorization now.

Right. The issue came from the aws-sdk-js having more than 1,000 files in it's repo and subsequently us hitting the GH API Rate limit when making unauthorized requests. I believe all you need to do is create a Personal Access Token in GitHub, I believe all you need is read repo permissions, and set it in your environment.

I added a line in my ~/.bash_profile:

export GITHUB_AUTH={token}

@meggart
Copy link
Contributor Author

meggart commented Nov 14, 2020

Thanks for the hints. I have re-generated the API files

@mattBrzezinski
Copy link
Member

Thanks for the hints. I have re-generated the API files

Logged an issue for creating a contributors guide so clarify these points #249, will get around to this in the next month or so.

Copy link
Member

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sorry this has taken this long to review. I've just been busy with non-open source work for the last month :(

src/AWS.jl Outdated Show resolved Hide resolved
src/AWSConfig.jl Outdated Show resolved Hide resolved
src/AWS.jl Show resolved Hide resolved
@mattBrzezinski
Copy link
Member

@meggart just the one comment about the Ref type, and regeneration of API definitions and I think this is good to go!

@meggart
Copy link
Contributor Author

meggart commented Dec 9, 2020

@mattBrzezinski I regenerated the API definitions and was about to rebase when I realized that .travis.yml does not exist on master anymore, so the minio tests will probably not work. Shall I remove the tests for now or is there a way to launch the server using bors?

@mattBrzezinski
Copy link
Member

@mattBrzezinski I regenerated the API definitions and was about to rebase when I realized that .travis.yml does not exist on master anymore, so the minio tests will probably not work. Shall I remove the tests for now or is there a way to launch the server using bors?

Not sure if you have been following along with the whole Travis CI debacle, to make a long story short Travis is now using a credit based system even for open source projects. If you run out of credits (like the JuliaCloud organization) has quite quickly, you need to request more. TravisCI is not providing more credits at this time, so effectively no more CI through them.

I've since ported away from Travis onto GitHub Actions. The CI file for this can be found in: .github/workflows/CI.yml. It would be nice to add in the Min.IO feature, I believe it's possible to do so w/ GHA. Although I have not looked in-depth to this, a starting point could be here.

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Dec 9, 2020
@bors
Copy link
Contributor

bors bot commented Dec 9, 2020

try

Build failed:

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Dec 9, 2020
@bors
Copy link
Contributor

bors bot commented Dec 9, 2020

try

Build failed:

@mattBrzezinski
Copy link
Member

Looks like there's an issue connecting to the Min.IO server on the CI.

Also, feel free to just delete the generate high-level and generate low-level tests. Honestly I don't think they really provide anything useful other than bumping the non-existant code coverage.

@meggart
Copy link
Contributor Author

meggart commented Dec 9, 2020

I think not all test failures are related to Minio.

I looks like some tests on _generate_service_url fail because this PR changed the signature.
This is actually something that was bothering me as well. The function starts with an underscore, so should be more or less private. However, in all the examples above we add new methods to this function, which is not good style if it is deemed "private".

Would you mind if I added a new function generate_service_url(::AbstractAWSConfig, String, String) (without underscore) which by default calls the internal _generate_service_url? So users add methods to the former and tests will still pass on the latter.

@mattBrzezinski
Copy link
Member

I think not all test failures are related to Minio.

I looks like some tests on _generate_service_url fail because this PR changed the signature.
This is actually something that was bothering me as well. The function starts with an underscore, so should be more or less private. However, in all the examples above we add new methods to this function, which is not good style if it is deemed "private".

Would you mind if I added a new function generate_service_url(::AbstractAWSConfig, String, String) (without underscore) which by default calls the internal _generate_service_url? So users add methods to the former and tests will still pass on the latter.

You should be able to just do the following in these tests, link to tests

region = "us-east-2"
config = AWSConfig()
config.region = region

@testset "regionless endpoints" for regionless_endpoint in ("iam", "route53")
  # blah blah blah
  result = AWS._generate_service_url(config, request.service, request.resource)
...

@meggart
Copy link
Contributor Author

meggart commented Dec 9, 2020

Yes, that's simpler. I have tried as you suggested.

@ExpandingMan
Copy link
Contributor

I just wanted to comment that because AWSS3 now depends on AWS, the inability to specify a custom URL now has me blocked from updating on AWSS3, this will be true for anyone using min.io, even for tests. So it would be really nice to merge either this or #235 as a temporary measure.

Also, I really don't think there should be an interface that has other packages adding methods to _ methods such as _sign! or _generate_service_url. The convention in which those functions are considered "private" and this is true for a number of programming languages including Julia, so their inclusion here is quite confusing. Could we eliminate the preceding _ from the names?

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Dec 16, 2020
@mattBrzezinski
Copy link
Member

I just wanted to comment that because AWSS3 now depends on AWS, the inability to specify a custom URL now has me blocked from updating on AWSS3, this will be true for anyone using min.io, even for tests. So it would be really nice to merge either this or #235 as a temporary measure.

The change to AWSS3 doesn't really affect functionality, it's just changing the backend of the package. I'm going to spend some of my free time getting this PRs functionality in, hopefully before holidays start off.

Also, I really don't think there should be an interface that has other packages adding methods to _ methods such as _sign! or _generate_service_url. The convention in which those functions are considered "private" and this is true for a number of programming languages including Julia, so their inclusion here is quite confusing. Could we eliminate the preceding _ from the names?

Agreed. I think if we're opening up AWSConfig, we'll need to be explicit with what it does under the hood and having that functionality exported.

@bors
Copy link
Contributor

bors bot commented Dec 16, 2020

try

Build failed:

@ExpandingMan
Copy link
Contributor

The change to AWSS3 doesn't really affect functionality, it's just changing the backend of the package

Not quite, the use of AWSConfig makes it harder to hack it so that it uses a custom URL endpoint.

@mattBrzezinski
Copy link
Member

The change to AWSS3 doesn't really affect functionality, it's just changing the backend of the package

Not quite, the use of AWSConfig makes it harder to hack it so that it uses a custom URL endpoint.

Yeah I suppose it's not really possible since in AWSCore it's a SymDict while here it's a struct.

I'm at the tail end of wrapping up a project for non-OSS work. But I'll make it a priority when I get an e-mail through about this thread to follow up on what's happening.

@mattBrzezinski
Copy link
Member

Coming back around to this, I think that @ExpandingMan has some good points around,

Also, I really don't think there should be an interface that has other packages adding methods to _ methods such as _sign! or _generate_service_url. The convention in which those functions are considered "private" and this is true for a number of programming languages including Julia, so their inclusion here is quite confusing. Could we eliminate the preceding _ from the names?

If you're able to make these changes @meggart that'd be great. If you're busy, I can double check this merge it in, and make the changes in a separate PR before tagging a new version.

@meggart
Copy link
Contributor Author

meggart commented Jan 19, 2021

Thanks for checking back. I got hit by another lockdown here in Germany with three kids at home, so my time is very limited right now and doesn't look like getting better soon. I would be happy if you could merge and then add the remaining points later, otherwise we would have to wait until schools open again and I hopefully find some time here again.

@mattBrzezinski
Copy link
Member

Thanks for checking back. I got hit by another lockdown here in Germany with three kids at home, so my time is very limited right now and doesn't look like getting better soon. I would be happy if you could merge and then add the remaining points later, otherwise we would have to wait until schools open again and I hopefully find some time here again.

No worries! Hopefully you're able to stay safe and healthy. Would you be able to re-generate the definitions once more and I will merge this in? I can tackle the remaining tasks in a new PR at a future time.

@mattBrzezinski
Copy link
Member

Unfortunately I don't have permissions to push to your fork. :( I have taken this PR over in this one,

#274

Closing this one to take the other over the line. Thanks for all the initial work here @meggart ❤️

bors bot added a commit that referenced this pull request Jan 28, 2021
274: AbstractConfig 2, (take over from 212) r=mattBrzezinski a=mattBrzezinski

@meggart is quite busy currently with personal stuff so I have decided to take over their original PR #212. I do not have permissions to write to their fork of `AWS.jl` so instead I'm opening this merge request here to take it across the file steps.

I reviewed the original PR and it LGTM, only changes between this one and #212 are:
- Rebased and re-generated service definitions off of `origin/master`
- Locked the MinIO testing behind an ENV VAR `TEST_MINIO`, as these will fail locally unless a server is running
- @ExpandingMan suggestion of `exporting` and removal of underscores in functions dealing with `AbstractAWSConfig` such as `sign!()`, etc.
- CI changed Julia 1.3 -> 1.5 testing
- Version bump to `v1.25.0`

Co-authored-by: Fabian Gans <[email protected]>
Co-authored-by: Matt Brzezinski <[email protected]>
Co-authored-by: mattBrzezinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants