You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// clean up after ourselves if provisioning fails.
// this is required because if publishing never succeeds, unpublish is not
// called which leaves files around (and we may continue to renew if so).
success:=false
deferfunc() {
if!success {
ns.manager.UnmanageVolume(req.GetVolumeId())
_=ns.mounter.Unmount(req.GetTargetPath())
_=ns.store.RemoveVolume(req.GetVolumeId())
}
}()
If the driver is stopped during this initial 30s period, and the pod is also deleted whilst the driver is stopped, because the publish step never succeeded, the UnpublishVolume step will never be called in future.
Upon the driver starting up again, it will read the metadata.json file and then attempt to request the certificate for that pod again.
Because the pod no longer exists, the UnpublishVolume step will never be called and therefore the certificate data will never be cleaned up on disk (and the driver will continue to process renewals for the volume indefinitely, until an administrator manually cleans up the metadata file on disk and triggers a restart of the driver).
We should do whatever we can to avoid this state occurring, as it causes excessive churn on nodes, in the apiserver, and for signers.
One option would be, on startup of the driver, if metadata.json files that do not have a nextIssuanceTime set on them are found (which implies the issuance has never succeeded), delete this data/clean it up on disk and await NodePublishVolume being called again (for the case where the pod does still exist and is waiting to startup). There may be some edge cases we have not thought of however, whereby the pod does still exist and is provisioned and for some reason this field is not set. Though I don't think that is a possible state to get into...
The text was updated successfully, but these errors were encountered:
We may want to also consider using a separate field in the metadata to represent this. Using netIssuanceTime is a bit indirect and surprising, whereas a dedicated mounted: true (not the best name) is far more clear and has little downside AFAIK.
In the
NodePublishVolume
call, we have adefer
that callsUnmanageVolume
(and deletes metadata from the storage backend) if initial issuance fails:csi-lib/driver/nodeserver.go
Lines 51 to 61 in 031cdcd
If the driver is stopped during this initial 30s period, and the pod is also deleted whilst the driver is stopped, because the publish step never succeeded, the
UnpublishVolume
step will never be called in future.Upon the driver starting up again, it will read the
metadata.json
file and then attempt to request the certificate for that pod again.Because the pod no longer exists, the
UnpublishVolume
step will never be called and therefore the certificate data will never be cleaned up on disk (and the driver will continue to process renewals for the volume indefinitely, until an administrator manually cleans up the metadata file on disk and triggers a restart of the driver).We should do whatever we can to avoid this state occurring, as it causes excessive churn on nodes, in the apiserver, and for signers.
One option would be, on startup of the driver, if
metadata.json
files that do not have anextIssuanceTime
set on them are found (which implies the issuance has never succeeded), delete this data/clean it up on disk and awaitNodePublishVolume
being called again (for the case where the pod does still exist and is waiting to startup). There may be some edge cases we have not thought of however, whereby the pod does still exist and is provisioned and for some reason this field is not set. Though I don't think that is a possible state to get into...The text was updated successfully, but these errors were encountered: