-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement] add cloud-init boostrap via cluster object store #586
Conversation
8473efd
to
6201a39
Compare
0e849d4
to
753c02c
Compare
if mscope == nil { | ||
return "", errors.New("machine scope can't be nil") | ||
} | ||
|
||
if mscope.S3Client == nil { | ||
return "", errors.New("nil S3 client in machine scope") | ||
} | ||
|
||
if mscope.LinodeCluster.Spec.ObjectStore == nil { | ||
return "", errors.New("nil cluster object store") | ||
} | ||
|
||
if len(data) == 0 { | ||
return "", errors.New("got empty data") | ||
} | ||
|
||
bucket, err := mscope.GetBucketName(ctx) | ||
if err != nil || bucket == "" { | ||
return "", errors.New("no bucket name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we define these errors in a var block up top similar to internal/controller/linodemachine_controller_helpers.go? Might be useful for tests to have these defined. Also can then be reused in DeleteObject below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also make sense to create a helper function to check all the required variables are set and that we can get the bucket since there seems to be some overlap in the DeleteObject function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define these errors in a var block
QQ: Why do we want(?) this level of insight into the implementation details in our unit tests?
create a helper function to check all the required variables are set
Sounds good! 👮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to see the test errored for the right reason rather than that it just threw an error in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to litter the package with one-off error globals here, so I just matched the error string in the tests 👼
// In the case that the IAM policy does not have sufficient | ||
// permissions to get the object, we will attempt to delete it | ||
// anyway for backwards compatibility reasons. | ||
case errors.As(err, &ae) && ae.ErrorCode() == "Forbidden": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this is a string instead of an int. Is there at least some kind of const for this from smithy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the github.com/aws/smithy-go.APIError
type here is just a super generic error container. The actual error formats and any values are unique to the service themselves, e.g. Amazon S3.
// Clean up after instance creation. | ||
if linodeInstance.Status == linodego.InstanceRunning && machineScope.Machine.Status.Phase == "Running" { | ||
if err := deleteBootstrapData(ctx, machineScope); err != nil { | ||
logger.Error(err, "Fail to bootstrap data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to have deleteBootstrapData function log the deletion error. We might also not want to log it as an error if we're not doing anything with that error like requeuing to retry the deletion (though we might want to do that in the case we get some kind of error like rate-limiting?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to log it in the caller instead of the deleteBootstrapData
function so the caller can determine what to do if the deleteBootstrap
data function errors out.
Like, for now, I'm just logging it as an error in the caller(s), but not failing the reconciliation itself since I don't think object storage issues here should block reconciliation.
if err != nil { | ||
logger.Info(fmt.Sprintf("Failed to compress bootstrap data: %v", err)) | ||
logger.Info("Falling back to Object Storage") | ||
goto obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead have a helper function called something like storeMetadataInObj
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can pull out the logic after this entire switch into it's own helper function to make it easier to read.
However, are you asking to remove the goto
statement here? I think we still should keep this since the aim of this switch
(including the goto
statements) is to bail out early when we know the delivery "method" to use for the bootstrap data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I meant doing something like this:
case machineScope.LinodeMachine.Status.CloudinitMetadataSupport && gzipCompressionEnabled:
compressed, err = compressUserData(bootstrapdata)
// Fallback to Object Storage workaround on compression failure.
if err != nil {
logger.Info(fmt.Sprintf("Failed to compress bootstrap data: %v", err))
logger.Info("Falling back to Object Storage")
return storeMetadataInObj(ctx, machineScope, logger, bootstrapdata)
}
size = len(compressed)
if size < limit {
return compressed, nil
}
return storeMetadataInObj(ctx, machineScope, logger, bootstrapdata)
// Worst case: Object Storage workaround.
default:
logger.Info("decoded bootstrap data exceeds size limit", "limit", limit, "size", size)
return storeMetadataInObj(ctx, machineScope, logger, bootstrapdata)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the logic inside the switch statement to:
- Reduce repetition between all the different path while preserving the overall "layered" visual structure
- Make the (now single)
goto
statement function more like abreak
out of the nested switch logic and the continue executing the rest of the function
753c02c
to
4107799
Compare
28936db
to
bf02585
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
a5ddd3c
to
46acf03
Compare
46acf03
to
19ed8fa
Compare
Adds .BucketName and .S3Endpoint as templateable values to the generated Secret format of the LinodeObjectStorageKey resource.
19ed8fa
to
f591736
Compare
f591736
to
4e9378f
Compare
What this PR does / why we need it:
This change introduces an optional Cluster Object Store in the
LinodeCluster
resource definition. The Cluster Object Store definition references an object storage bucket used for internal cluster operations, e.g. bootstrapping.Currently, for all Cluster API bootstrap providers (e.g. Cluster API Provider RKE2) , CAPL uses cloud-init as the default configuration engine to bootstrap workload cluster machines. Cloud-init bootstrap configurations (cloud-configs) are deployed to the cluster's Linodes via the Metadata service as user data. Unfortunately, the Metadata service currently the limits amount of data the can be transmitted to a Linode. The restrictions are:
This limitation restricts the adoption of CAPL, as users may need to include custom configuration to in the bootstrap configuration of their Linodes, which can result in a cloud-config that exceeds this 16kB Metadata service limit. To bypass this limitation, this submission introduces the following Linode creation workflow changes:
LinodeMachine
creation, check the Cluster API bootstrap Secret to determine if the generated bootstrap cloud-config fits within the Metadata or Stackscript service limits.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Verification:
Machine
(s) andLindoeMachine
(s) are able to successfully provision:TODOs: