feat: add `BoundServiceAccountToken` trigger authentication type #6272

maxcao13 · 2024-10-24T20:20:12Z

Provide a description of what has been changed
Proposes to add a new [Cluster]TriggerAuthentication type called BoundServiceAccountToken which allows users to bind a ServiceAccount token to a [Cluster]TriggerAuthentication object. You can specify it like so:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-prometheus
  namespace: openshift-ingress-operator
spec:
  boundServiceAccountToken:
  - parameter: bearerToken
    serviceAccountName: thanos 
    expiry: 15m

You could already inject Kubernetes service account tokens in triggerAuth refs before by using the Secret trigger auth type, but instead of manually embedding it in a long-lived secret, you can now directly specify the service account instead, and it will embed the sa token in an annotation in the triggerAuth object, and the keda-operator will autorotate the token if the expiry is at least 50% stale. You can specify which parameter you pull into the trigger with parameter, the serviceAccountName in the same namespace as the TriggerAuth CR, and the expiry as a duration. Note that Kubernetes doesn't allow expirys less than 10m. If you use a ClusterTriggerAuth, note that this works similarly to the Secret trigger auth, and the service account then has to be in the KEDA_CLUSTER_OBJECT_NAMESPACE namespace.

Before I write any tests, I'd like to get feedback first! :)

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes: #6136

JorTurFer · 2024-11-03T12:36:32Z

controllers/keda/scaledobject_controller.go

@@ -58,6 +58,7 @@ import (
 // +kubebuilder:rbac:groups=autoscaling,resources=horizontalpodautoscalers,verbs="*"
 // +kubebuilder:rbac:groups="",resources=configmaps;configmaps/status,verbs=get;list;watch
 // +kubebuilder:rbac:groups="",resources=events,verbs="*"
+// +kubebuilder:rbac:groups="",resources=serviceaccounts/token,verbs=create;get


Should we include this as default or disable as default and explain how to enable this @kedacore/keda-core-contributors ? This setting as default, allows KEDA's service account to request tokens on behalf of any other service account in the cluster, which could mean a privilege escalation.

Users should define the resourceNames if they want to reduce the exposition of the RBAC setting-> https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources
We can (and we must) handle this option in helm as we do with the * of the custom resources, but I'm not sure about the best approach here

Yeah, I agree, this is not ideal.

Maybe we can restrict it to an environment variable? I know for secrets, there exists the variable KEDA_CLUSTER_OBJECT_NAMESPACE, but I'm not exactly sure how it works. Is there existing code that allows us to use something like WATCH_NAMESPACE as a comma list of namespaces that keda is allowed to create/get service account tokens for?

I think that we can just limit this using RBAC and KEDA's code doesn't need to know about this. We added the flag for secrets because we get secrets across all the cluster and the lister/informer generates a cluster scoped cache. In this case as the TokenRequest isn't a cached API, I think that we don't need to inform to KEDA about limited accesses, it will just fail if the RBAC blocks the request.
The RBAC protection can be done just removing the default permission and documenting how to add RBAC permission for a given service account (using resourceNames) or globally. As the end users will be who deploy that RBAC, the decision will be on their own

Cool, makes sense to me, thanks!

JorTurFer · 2024-11-03T12:44:47Z

pkg/scaling/resolver/scale_resolvers.go

+func resolveBoundServiceAccountToken(ctx context.Context, client client.Client, logger logr.Logger, namespace string, bsat *kedav1alpha1.BoundServiceAccountToken, triggerAuthRef *kedav1alpha1.AuthenticationRef, acs *authentication.AuthClientSet) string {
+	serviceAccountName, expiry := bsat.ServiceAccountName, bsat.Expiry
+	if serviceAccountName == "" {
+		logger.Error(fmt.Errorf("error trying to get token"), "serviceAccountName is required")
+		return ""
+	}
+	var err error
+	expirySeconds := ptr.Int64(3600)
+	if expiry != "" {
+		duration, err := time.ParseDuration(expiry)
+		if err != nil {
+			logger.Error(err, "error trying to parse expiry duration", "expiry", expiry)
+			return ""
+		}
+		// convert duration to seconds
+		expirySeconds = ptr.Int64(int64(duration.Seconds()))
+	}
+
+	triggerAuth, _, err := getTriggerAuth(ctx, client, triggerAuthRef, namespace)
+	if err != nil {
+		logger.Error(err, "error trying to get [cluster]triggerAuth", "TriggerAuth.Namespace", namespace, "TriggerAuth.Name", triggerAuthRef.Name)
+		return ""
+	}
+	currentAnnotations := triggerAuth.GetAnnotations()
+	encodedToken := currentAnnotations["keda-serviceAccountToken"]
+
+	// check if service account exists in the namespace
+	serviceAccount := &corev1.ServiceAccount{}
+	err = client.Get(ctx, types.NamespacedName{Name: serviceAccountName, Namespace: namespace}, serviceAccount)
+	if err != nil {
+		logger.Error(err, "error trying to get service account from namespace", "ServiceAccount.Namespace", namespace, "ServiceAccount.Name", serviceAccountName)
+		return ""
+	}
+
+	// check if token is already referenced in the TriggerAuthentication annotation
+	if encodedToken != "" {
+		tokenValid := checkTokenValidity(ctx, logger, encodedToken, expirySeconds, acs)
+		switch tokenValid {
+		case tokenStatusInvalid:
+			// token is invalid, or if more than 50% of the token's expiry has passed, create new token
+			return generateAndAnnotateNewToken(ctx, client, logger, serviceAccountName, namespace, expirySeconds, triggerAuth, acs)
+		case tokenStatusValid:
+			return encodedToken
+		default:
+			// tokenStatusUnknown
+			return ""
+		}
+	} else {
+		// token doesn't exist; create new token and embed it in the the TriggerAuth
+		logger.Info("Token doesn't exist; creating new token and embedding it in the triggerauth", "ServiceAccount.Namespace", namespace, "ServiceAccount.Name", serviceAccountName)
+		return generateAndAnnotateNewToken(ctx, client, logger, serviceAccountName, namespace, expirySeconds, triggerAuth, acs)
+	}
+}


Currently, KEDA only re-executes the resolver in case of scaler creation. It means that the expiration of the token won't be checked until it has expired, triggering the error in the scaler and re-evaluating this code.
About storing the token in the TriggerAuthentication, I don't like this approach too much as it can be exposed just accessing the TriggerAuthentication CR, which isn't a sensitive API.
Considering these 2 things together, I'd not store the token anywhere and just request a fresh token each time, storing it in the memory of the scaler. It solves the storing issue and removed the verification code (which won't be executed until the token has already expired in the common scenario)

Okay, make sense. Then should users still be able to embed an "expiry"? I would imagine a failing query every expiry period might be annoying to some users for availability reasons, but it might help if it fails at expected intervals? Plus, it will be an extra TokenRequest API request every interval.

Or maybe it would better to register scheduled goroutines with Timer to autorotate every half expiry on each authref?

Can we try to handle the expired token? The we can try to refresh the token and fail -> error if we are not able to do that.

I would imagine a failing query every expiry period might be annoying to some users for availability reasons

Actually, the query won't fail because it's silently retried once after the first fail, refreshing the cache and enforcing a new token from TokenRequest API

Thanks for the feedback everyone! Does my current implementation in 6c9e605 make sense then?

Signed-off-by: Max Cao <[email protected]>

maxcao13 requested a review from a team as a code owner October 24, 2024 20:20

maxcao13 force-pushed the bound-serviceacctoken-trigauth branch from 538290e to ea01caa Compare October 24, 2024 20:28

JorTurFer reviewed Nov 3, 2024

View reviewed changes

maxcao13 force-pushed the bound-serviceacctoken-trigauth branch 3 times, most recently from d1785b0 to 44d840c Compare November 5, 2024 20:01

maxcao13 added 6 commits November 5, 2024 15:43

Add BoundServiceAccountToken trigger authentication type

ad38a2c

Signed-off-by: Max Cao <[email protected]>

Fix broken tests

7b1a927

Signed-off-by: Max Cao <[email protected]>

Fix golangci-lint

296ecca

Signed-off-by: Max Cao <[email protected]>

Remove verification code; add test case

6c9e605

Signed-off-by: Max Cao <[email protected]>

Rollback privilege escalation

c5bd54f

Signed-off-by: Max Cao <[email protected]>

Additional test cases

48ccb5f

Signed-off-by: Max Cao <[email protected]>

maxcao13 force-pushed the bound-serviceacctoken-trigauth branch from 67740fd to 48ccb5f Compare November 5, 2024 23:43

Update CHANGELOG.md

042c286

Signed-off-by: Max Cao <[email protected]>

maxcao13 mentioned this pull request Nov 6, 2024

feat: update CRDs for BoundServiceAccountToken triggerAuth provider kedacore/charts#701

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `BoundServiceAccountToken` trigger authentication type #6272

feat: add `BoundServiceAccountToken` trigger authentication type #6272

maxcao13 commented Oct 24, 2024 •

edited

Loading

JorTurFer Nov 3, 2024

zroubalik Nov 5, 2024

maxcao13 Nov 5, 2024

JorTurFer Nov 5, 2024

maxcao13 Nov 5, 2024

JorTurFer Nov 3, 2024

maxcao13 Nov 4, 2024 •

edited

Loading

zroubalik Nov 5, 2024

JorTurFer Nov 5, 2024 •

edited

Loading

maxcao13 Nov 5, 2024 •

edited

Loading

feat: add BoundServiceAccountToken trigger authentication type #6272

Are you sure you want to change the base?

feat: add BoundServiceAccountToken trigger authentication type #6272

Conversation

maxcao13 commented Oct 24, 2024 • edited Loading

Checklist

JorTurFer Nov 3, 2024

Choose a reason for hiding this comment

zroubalik Nov 5, 2024

Choose a reason for hiding this comment

maxcao13 Nov 5, 2024

Choose a reason for hiding this comment

JorTurFer Nov 5, 2024

Choose a reason for hiding this comment

maxcao13 Nov 5, 2024

Choose a reason for hiding this comment

JorTurFer Nov 3, 2024

Choose a reason for hiding this comment

maxcao13 Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

zroubalik Nov 5, 2024

Choose a reason for hiding this comment

JorTurFer Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

maxcao13 Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

feat: add `BoundServiceAccountToken` trigger authentication type #6272

feat: add `BoundServiceAccountToken` trigger authentication type #6272

maxcao13 commented Oct 24, 2024 •

edited

Loading

maxcao13 Nov 4, 2024 •

edited

Loading

JorTurFer Nov 5, 2024 •

edited

Loading

maxcao13 Nov 5, 2024 •

edited

Loading