-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design Document] Spire CSI Driver component #2693
Comments
Thank you for opening this @loveyana! I have spent some time reviewing it, and am still thinking about the appropriate-ness of the project supporting a file-based delivery mechanism in general. As I've mentioned before, it's been a controversial topic. In the meantime, I have a few questions 🤗
These both feel like things that can be accomplished without relying on a file based api. I do think that a SPIRE CSI driver makes sense similar to what's been described here, but the file api bit is the harder pill to swallow and only one of the four "Why" bullet points justifies it
This is a great idea! I think it fits better in the "Overview" section though
As an observation, the challenges this project have faced with regards to file based SVID delivery is not so much with reliability of file api or with support for file based certs etc ... instead, most of the problems come in the areas of management, software support for hot reload and root rotation, support for federation... all the latter of which are core tenets of the SPIRE project.
😮 if you have something you can share, I'd love to learn more about their concerns. It's the first time I'm hearing of it.
I read through this issue but I didn't see anything talking about risk related to grpc call to SPIRE Agent? Anything else you may be able to share here would be great In the end, I see two arguments here .. the first is that its hard to get SPIFFE stuff into my k8s pods, and the second is that speaking to the workload api isn't fun. I feel we have a lot of options on the first problem, and the design you have here is a great start towards that IMO. The second one is harder to address, and I'm not sure that file based api is the answer... but I continue to think about it :) |
Thank you @evan2645 for your reply, today is the first day of the Chinese New Year. I can receive your reply is also a New Year gift I received. I will reply to some questions here later and modify some contents in the document accordingly. 🤗 |
@evan2645 I'm sorry it took a long time to reply. I just finished my holiday. First of all, I would like to express my views on this project. I prefer to think that it is based on the All it need is a delegated Identity API request from the agent of the orchestration system to get and provide the identity information to the Workload Pods. In Kubernetes, I think csi-driver is an implementation suitable for this working mode, and the file storage of certificates is only one of the usual implementations. We can update tlsconfig based on the fsNotify of the certificate file, thus realizing the automatic update of the certificate. Hopefully, in this mode of work, we can reach a consensus and let this project continue. Below I would like to specifically answer each question, there may be some subjective views, we may need further communication:
These two things are really not problems that can be solved based on a file based API, but the Orchestrate mode we mentioned above. I have not been able to describe these accurately in the why spire csi driver section, so I have made some additions below.
I agree with you that you can put this in "Overview", because other projects in the current organization do not mention this implementation, so I wrote it in this section as a feature.
As you said, hot reload and root rotation of certificates are common problems for applications, but as mentioned earlier, envoy supports local tlsconfig updates, which is why we think this solution is available. At the same time, most open source projects will support the configuration certificate file in their tls options, and the cost of hot reload will be smaller by listening to the certificate file. The implementation in most programming languages is similar to tlsconfig in golang. I have learned something about sdk like go-spiffe these days. I think if SDK needs to support local certificates, our changes will not be too big, refer to spiffetls/tlsconfig/config.go. In addition, after thinking about more questions, I didn't think why it is more difficult to support federation. At the same time, I didn't see its verification implementation in go-spiffe, so I hope you can add this part of information.
the standard Istio SPIFFE ID pattern, i.e. spiffe://trustdomain/ns/default/sa/example. The standard istio spiffeId is defined as above, you can see Istio CA integration through Envoy SDS In the current integration scheme, there is a Limitations that only istio spiffeID format can be used, but it is not enough to meet our needs. And it is worth mentioning that this design mentioned that in the absence of uds socket, for the monitoring of certificate files, spire-csi-driver well meets this requirement. Also can refer to the current istio spiffe implementation. Tracking the call to the
I followed the document further and did not see this problem. I think I made a mistake and I will correct it.
Finally, I would like to express my opinion that in the Kubernetes , the options of provide SPIFFE stuff to pods, including but not limited to customize kube scheduler, csi-driver, Mountable CRD like secret or configmap. But I think csi-driver is the one with the least changes to the cluster and higher security. We do not need to set up a scheduler for some pods separately, and will not cause We hope to enrich the community's solutions and make spire more popular and easy to use, which is also our goal. |
Hi @loveyana, very sorry for the delay in my response here.. and thank you for the very detailed information 🤗 Before I start:
@maxlambrecht is working with upstream Istio on a fully supported and documented Istio integration, I think you should talk with him about this interest. It sounds like he is making good progress there. With that out of the way, and in an effort to speed this along, I'll try to summarize my thoughts here rather than respond point by point. I think the following statement serves as a good starting point for that:
A CSI driver that is powered by the Delegated Identity API is a great idea. I am only one SPIRE maintainer vote, but I would definitely vote to do this. The point of contention is not this, but rather how to get the SVIDs and bundles into the container. I think maybe you missed some of the complexities around federation (which go-spiffe does support), and you can find some of that here and here. I know that the current shape of the Delegated Identity API is such that what is returned is an SVID .. however, this project has had bad experiences in trying to support file based SVID and bundle delivery. We have the workload API for exactly this reason. The current Delegated Identity API was designed to support in-memory proxies, so even though we are giving SVID data, the consumer that this API was written for is not writing these to disk. In my opinion, the ideal solution here is to extend the Delegated Identity API to provide an RPC that allows the caller to provide a set of selectors, and a file path. The agent will then bind a UDS at the specified file path, and this UDS will be statically bound to the set of selectors that was specified. This allows us to skip workload attestation, and negate the need for hostNetwork, hostPID, and also the 100+ ms overhead of performing workload attestation in Kubernetes. Of course, this option does not solve the pain of "now I have to use workload API and gRPC"... however it does solve all of the other problems that we have discussed here. My recommendation is this:
Finally, I'll just repeat that I am only one vote here, and it is my ideal solution, maybe not representative of the other maintainers. I know that the SPIRE maintainers have general distaste for file APIs based on our experiences there, but I will ultimately defer to consensus based vote on whether or not folks want to bring file api support under the SPIFFE umbrella. |
Hi @loveyana, did you ever get around to publishing something like what you've described in this proposal? It would be nice to see a solution for file-based ingestion of identity, but we haven't seen activity on this issue in a while. Just checking if it's something you'd like to discuss further. |
I'm looking for something like this as well. Doing something like a k8s job, with curl in it to talk to some other service using a jwt created by spire is a lot of work currently. |
This issue is stale because it has been open for 365 days with no activity. |
I've got something like this being developed under the helm charts umbrella: |
As mentioned in the previous #2684, the following is the design documentation of the csi driver component.
We temporarily call it Spire CSI Driver, hoping to get feedback from the you all.
@azdagron @evan2645
Authors: @loveyana
creation-date: 2022-01-20
last-updated: 2022-01-24
status: draft
Acknowledgement
This project started as a fork of cert-manager/csi-driver. And most of the inspiration also comes from it.
At the same time, I referred to the spiffe/spiffe-csi documents, related discussions, and also referred to the cilium-spire integration.
Introduction
Overview
Spire CSI Driver provides a method for injecting SVIDs in the
Orchestrate mode
based on the Delegated Identity API and the Kubernetes CSI into the pod and periodicity updating it.When the pod mount
csi.spire.io
CSI driver, in the starting phase Kubelet will provide its information to Spire CSI Driver. Spire CSI Driver generate selectors to subscribe svids from Spire Agent Admin socket.You can use Spire CSI Driver with k8s-workload-registrar or spire-controller-manager that automate workload entry registration, this will make it easy to use Spire on your Kubernetes.
Why Spire CSI Driver
Requirements
This CSI driver plugin makes use of the 'CSI inline volume' feature - Alpha as of
v1.15
and beta inv1.16
. Kubernetes versionsv1.16
and higher require no extra configuration howeverv1.15
requires the following feature gate set:The Spire CSI Driver needs Spire Agent version v1.1.2 or above because it relies on the
Delegated Identity API
.FsGroup
In addition, This CSI driver plugin makes optional use of the 'CSI inline volume' feature - Alpha as of
v1.19
and beta inv1.20
. Kubernetes versionsv1.19
and higher require no extra configuration, howeverv1.19
requires the following feature gate set:Reference: https://kubernetes-csi.github.io/docs/support-fsgroup.html
Limitations
Same as all CSI Drivers, Spire CSI driver is registered as plugins and otherwise interact with the Kubelet, which requires several
hostPath
volumes. As such, this driver cannot be used in environments wherehostPath
volumes are forbidden.Normally you need the following three paths for the hostPath mount .
Spire CSI Driver also needs to be started as a privileged container, so you need to pay attention to whether the deployed cluster is enabled with Pod Security Policies and is configured to be unprivileged by default.
Windows
According to Kubernetes CSI official documents, you can use
CSI Proxy
under Windows, but I haven't used it yet, explore it yourself.Refer to the following two documents:
Motivation
Practice
In ByteDance, although we now have more than 600k hosts installed Spire Agent and running services based on Workload API calls, but at the same time we have more Kubernetes nodes based on self-developed container platforms.
For a self-developed container platform, custom kubernetes scheduler issue and inject jwt tokens and certificates when scheduling containers.
This integration method has been running steadily within us for about two years, so I think the model based on file injection is stable enough and universally applicable.
In order to solve the problem of certificate injection for native Kubernetes clusters, I decided to develop Spire CSI Driver.
Usability
For applications that can use the spire SDK, developers need to make them code modification. Usually this is very difficult, we need to develop various versions of the SDK and solve the usage problem, the dependency problem caused by GRPC.
In addition, for CPP services that are very sensitive to latency, the latency caused by grpc calls is difficult to accept.
In addition, for open source software or closed source commercial software that cannot modify the code, in order to make them have spiffe identity and use the spire certificate system, we need to use
spiffe-helper
to bypass certificate loading.In kubernetes, we usually use sidecar to start a
spiffe-helper
container. This way has a certain impact on the deployment of services, and at the same time allows each service to have a dependency component outside its own monitoring system.Or we can use the
Envoy SideCar
(such as https://spiffe.io/docs/latest/microservices/envoy/ ) method to directly perform mtls proxy for the external connection of the service, and use the SDS method to get and refresh its certificates from theSpire Agent
.However, in this scenario, the introduction of the Spire CSI Driver can make the sds call of the envoy from the SDS interface of the
Spire Agent
to use the envoy local filesystem SDS capability instead.Please note that it also supports file monitoring to achieve certificate rotation. Reference: https://www.envoyproxy.io/docs/envoy/latest/configuration/security/secret#key-rotation
Integration
I think that providing file-based certificate injection is also beneficial to integrate with other applications.
In the use of cilium, I noticed that @rscampos is advancing the gradual integration of Cilium and Spiffe. (cilium/cilium#17335)))
I think it can be roughly divided into two steps:
I think in the second stage, it is a smaller and less dependent change to let envoy change to SDS to use certificates in the local filesystem, which can have similar effects to cilium's original L7 proxy secret injection method. (https://docs.cilium.io/en/stable/gettingstarted/tls-visibility/#gs-tls-inspection)))
At the same time, I would like to mention integration with istio. I think the istio community has some concerns about the spiffeID format, but a large part of the reason why integration is difficult is because each istio-proxy relies on the GRPC call of the Spire Agent, which is also a very risky change. (istio/istio#30762)))
The above are all my personal understanding, welcome any discussion.
Design
Definitions
Meta Data File
In order to retain information about the volume attributes, even if the driver has been restarted, data needs to be persisted on local storage. Without that, we must query the API server for all pods in the cluster, then sort the pods that have a Spire CSI Driver volume attached and match by node. By keeping a local store, we eliminate our need for API server access to pods and reduce complexity. Volume mount attributes are immutable so the data shall not change during a pod's life cycle.
The contents of the meta data includes the volume attributes
map[string]string
, the volume ID and target path. With this a full recreation of the information for each volume can be made.The file will be stored as a
.json
file type. This is a widely used and understood file type and Go has good support for converting to and from Go structs and Json string.SVID Data File
SVID data file represents a certificate file written to disk storage.
CSI Driver use some csi attributes to let users decide the name of certificate public and private key, the name of the CA certificate, and their fsgroup.
CSI Driver by calling Spire Agent
delegatedidentityv1.SubscribeToX509Bundles
anddelegatedidentityv1.SubscribeToX509SVIDs
interface for the workload SVIDs and the CA certificate is updated regularly, and the new svid data file is updated to the pod by using anatomic write
operation.Volume Attributes
Workflow
Publish
The above is the architecture diagram given from the perspective of csi driver when the workload pod is started.
The following steps:
csi.spire.io
andreadonly
as true, Kubelet sends a NodePublishVolume grpc request to theSpire CSI Driver
, including Pod Info and the container path allocated to the pod.Meta Data File
already exists. If it does not exist, it will create aMeta Data File
path and write to Meta Data File, register the corresponding volume of this pod in the Spire CSI Driver, similar to the PVC declaration.SubscribeToX509SVIDs
GRPC Stream with Spire Agent Admin Socket, build selector with Pod Info submitted by Kubelet, includingPod Name
,Pod Uid
,ServiceAccount
,Namespace
,Node Name
, and write it to HostPath of SVID Data File after getting svid files.atomic write
operation so if you need fsnotify to watch file changes, you need to get the remove event and re-add a watch for the new file.Unpublish
As with creation, Kubelet sends a
NodeUnpublishVolume
request to the Spire CSI Driver, which removes the Meta Data file and SVID Data file for the Volume and closes theSubscribeToX509SVIDs
GRPC Stream.Discovery
Meta Data Files save state locally, and SVID Data Files exist in the kubelet local folder.
The restart of Spire CSI Driver will not affect the running state of the Pod, and will be based on Meta Data File Volumes taking over the certificate update in the pod.
Follow-up
Example
If the community can support this component, I hope we can create a repository. I can provide the current code and examples to let it enter the dev phase.
I have completed the project documents that need to be provided to the dev phase project with reference to https://github.com/spiffe/spiffe/blob/main/MATURITY.md, but I still need a little time to take it out of the internal codebase.
Other ideas
My team and I are working on turning Spire into Cloud Service. This project belongs to a small idea of ours. We are currently working on Istio-Spire integration and would like to communicate with the community. At the same time, anything about Spire Cloud Service hopes to be consistent with the community.
If the current plan passes, I will also be happy to share our other
secrets-store-csi-driver
provider implementation, which is also the current recommended method for Secret providers such asVault
.I am also wondering if the Spire Agent can also provide JWT tokens through the same pipeline as SubscribeToX509SVIDs.
The text was updated successfully, but these errors were encountered: