Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add BoundServiceAccountToken trigger authentication type #6272

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

maxcao13
Copy link
Contributor

@maxcao13 maxcao13 commented Oct 24, 2024

Provide a description of what has been changed
Proposes to add a new [Cluster]TriggerAuthentication type called BoundServiceAccountToken which allows users to bind a ServiceAccount token to a [Cluster]TriggerAuthentication object. You can specify it like so:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-prometheus
  namespace: openshift-ingress-operator
spec:
  boundServiceAccountToken:
  - parameter: bearerToken
    serviceAccountName: thanos 
    expiry: 15m

You could already inject Kubernetes service account tokens in triggerAuth refs before by using the Secret trigger auth type, but instead of manually embedding it in a long-lived secret, you can now directly specify the service account instead, and it will embed the sa token in an annotation in the triggerAuth object, and the keda-operator will autorotate the token if the expiry is at least 50% stale. You can specify which parameter you pull into the trigger with parameter, the serviceAccountName in the same namespace as the TriggerAuth CR, and the expiry as a duration. Note that Kubernetes doesn't allow expirys less than 10m. If you use a ClusterTriggerAuth, note that this works similarly to the Secret trigger auth, and the service account then has to be in the KEDA_CLUSTER_OBJECT_NAMESPACE namespace.

Before I write any tests, I'd like to get feedback first! :)

Checklist

Fixes: #6136

@maxcao13 maxcao13 requested a review from a team as a code owner October 24, 2024 20:20
@maxcao13 maxcao13 force-pushed the bound-serviceacctoken-trigauth branch from 538290e to ea01caa Compare October 24, 2024 20:28
@@ -58,6 +58,7 @@ import (
// +kubebuilder:rbac:groups=autoscaling,resources=horizontalpodautoscalers,verbs="*"
// +kubebuilder:rbac:groups="",resources=configmaps;configmaps/status,verbs=get;list;watch
// +kubebuilder:rbac:groups="",resources=events,verbs="*"
// +kubebuilder:rbac:groups="",resources=serviceaccounts/token,verbs=create;get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this as default or disable as default and explain how to enable this @kedacore/keda-core-contributors ? This setting as default, allows KEDA's service account to request tokens on behalf of any other service account in the cluster, which could mean a privilege escalation.

Users should define the resourceNames if they want to reduce the exposition of the RBAC setting-> https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources
We can (and we must) handle this option in helm as we do with the * of the custom resources, but I'm not sure about the best approach here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, this is not ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can restrict it to an environment variable? I know for secrets, there exists the variable KEDA_CLUSTER_OBJECT_NAMESPACE, but I'm not exactly sure how it works. Is there existing code that allows us to use something like WATCH_NAMESPACE as a comma list of namespaces that keda is allowed to create/get service account tokens for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can just limit this using RBAC and KEDA's code doesn't need to know about this. We added the flag for secrets because we get secrets across all the cluster and the lister/informer generates a cluster scoped cache. In this case as the TokenRequest isn't a cached API, I think that we don't need to inform to KEDA about limited accesses, it will just fail if the RBAC blocks the request.
The RBAC protection can be done just removing the default permission and documenting how to add RBAC permission for a given service account (using resourceNames) or globally. As the end users will be who deploy that RBAC, the decision will be on their own

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense to me, thanks!

Comment on lines 645 to 697
func resolveBoundServiceAccountToken(ctx context.Context, client client.Client, logger logr.Logger, namespace string, bsat *kedav1alpha1.BoundServiceAccountToken, triggerAuthRef *kedav1alpha1.AuthenticationRef, acs *authentication.AuthClientSet) string {
serviceAccountName, expiry := bsat.ServiceAccountName, bsat.Expiry
if serviceAccountName == "" {
logger.Error(fmt.Errorf("error trying to get token"), "serviceAccountName is required")
return ""
}
var err error
expirySeconds := ptr.Int64(3600)
if expiry != "" {
duration, err := time.ParseDuration(expiry)
if err != nil {
logger.Error(err, "error trying to parse expiry duration", "expiry", expiry)
return ""
}
// convert duration to seconds
expirySeconds = ptr.Int64(int64(duration.Seconds()))
}

triggerAuth, _, err := getTriggerAuth(ctx, client, triggerAuthRef, namespace)
if err != nil {
logger.Error(err, "error trying to get [cluster]triggerAuth", "TriggerAuth.Namespace", namespace, "TriggerAuth.Name", triggerAuthRef.Name)
return ""
}
currentAnnotations := triggerAuth.GetAnnotations()
encodedToken := currentAnnotations["keda-serviceAccountToken"]

// check if service account exists in the namespace
serviceAccount := &corev1.ServiceAccount{}
err = client.Get(ctx, types.NamespacedName{Name: serviceAccountName, Namespace: namespace}, serviceAccount)
if err != nil {
logger.Error(err, "error trying to get service account from namespace", "ServiceAccount.Namespace", namespace, "ServiceAccount.Name", serviceAccountName)
return ""
}

// check if token is already referenced in the TriggerAuthentication annotation
if encodedToken != "" {
tokenValid := checkTokenValidity(ctx, logger, encodedToken, expirySeconds, acs)
switch tokenValid {
case tokenStatusInvalid:
// token is invalid, or if more than 50% of the token's expiry has passed, create new token
return generateAndAnnotateNewToken(ctx, client, logger, serviceAccountName, namespace, expirySeconds, triggerAuth, acs)
case tokenStatusValid:
return encodedToken
default:
// tokenStatusUnknown
return ""
}
} else {
// token doesn't exist; create new token and embed it in the the TriggerAuth
logger.Info("Token doesn't exist; creating new token and embedding it in the triggerauth", "ServiceAccount.Namespace", namespace, "ServiceAccount.Name", serviceAccountName)
return generateAndAnnotateNewToken(ctx, client, logger, serviceAccountName, namespace, expirySeconds, triggerAuth, acs)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, KEDA only re-executes the resolver in case of scaler creation. It means that the expiration of the token won't be checked until it has expired, triggering the error in the scaler and re-evaluating this code.
About storing the token in the TriggerAuthentication, I don't like this approach too much as it can be exposed just accessing the TriggerAuthentication CR, which isn't a sensitive API.
Considering these 2 things together, I'd not store the token anywhere and just request a fresh token each time, storing it in the memory of the scaler. It solves the storing issue and removed the verification code (which won't be executed until the token has already expired in the common scenario)

Copy link
Contributor Author

@maxcao13 maxcao13 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, make sense. Then should users still be able to embed an "expiry"? I would imagine a failing query every expiry period might be annoying to some users for availability reasons, but it might help if it fails at expected intervals? Plus, it will be an extra TokenRequest API request every interval.

Or maybe it would better to register scheduled goroutines with Timer to autorotate every half expiry on each authref?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to handle the expired token? The we can try to refresh the token and fail -> error if we are not able to do that.

Copy link
Member

@JorTurFer JorTurFer Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine a failing query every expiry period might be annoying to some users for availability reasons

Actually, the query won't fail because it's silently retried once after the first fail, refreshing the cache and enforcing a new token from TokenRequest API

Copy link
Contributor Author

@maxcao13 maxcao13 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback everyone! Does my current implementation in 6c9e605 make sense then?

@maxcao13 maxcao13 force-pushed the bound-serviceacctoken-trigauth branch 3 times, most recently from d1785b0 to 44d840c Compare November 5, 2024 20:01
@maxcao13 maxcao13 force-pushed the bound-serviceacctoken-trigauth branch from 67740fd to 48ccb5f Compare November 5, 2024 23:43
Signed-off-by: Max Cao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support bound service account tokens
3 participants