Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pod crashed when creating init pod, new pod always fail because the init pod already exists #226

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

styshoo
Copy link

@styshoo styshoo commented Dec 24, 2024

Pull Request template

Why is this PR required? What issue does it fix?:
dynamic-localpv-provisioner pod crashed(for example: crashed because of renewing lease failed) when creating init pod, new provisioner pod starts, and it continues creating volume. But it always fails because the init pod already exists. So the pvc will always be pending status.

What this PR does?:
When creating init pods, it would ignore pod existes error.

Does this PR require any upgrade changes?:
No.

If the changes in this PR are manually verified, list down the scenarios covered::

Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

  • Fixes #<225>
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

Copy link
Contributor

@tiagolobocastro tiagolobocastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please DCO sign the commits?

iPod, err := p.launchPod(ctx, config)
if err != nil {
_, err := p.launchPod(ctx, config)
if err != nil && !k8serror.IsAlreadyExists(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if the config is compatible with the existing pod?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need check the config.
But if we do, I have two questions:
1, When do we need to check? Only when this "k8serror.IsAlreadyExists" happens?
2, What kinds of config should be checked? Compare "pod commands"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably say we need to compare the entire pod spec? I wonder if the error code here already has such information?
CC @niladrih

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some research, I find that it's difficult to compare entire pod spec between the helperPod and existingPod. Because existingPod would be added lots of default fields by Kubernetes.

For example, the two pods's volume would be as following, the Type filed is modified.

helperPod.Spec.Volumes[i]: {data {&HostPathVolumeSource{Path:/opt/storage,Type:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}
existingPod.Spec.Volumes[i]: {data {&HostPathVolumeSource{Path:/opt/storage,Type:*,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}

By the way, do we really need to check the compatiblity? Or, do we need to compare the entire pod spec? I didn't find out a good way to check it. Are there any ideas?
@tiagolobocastro

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@niladrih niladrih Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could work, I think:
Add labels containing the PVC name and namespace to the init pod and the cleanup pod, e.g.

openebs.io/pvc-name: <pvc-name>
openebs.io/pvc-namespace: <pvc-namespace>

During launchPod(), check to see if there exists a pod with the LabelSelector openebs.io/pvc-name=<pvc-name>,openebs.io/pvc-namespace=<pvc-namespace>.

WDYT @styshoo @tiagolobocastro?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that.
And I've modified the code, pls help me review it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to prevent confusion in init/cleanup, do we need add an external label? such as :

openebs.io/helper-pod-name: <pvc-name>

or

openebs.io/`helper-pod-type` : "init"

@tiagolobocastro @niladrih

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea to me. How about calling it

openebs.io/helper-type: hostpath-init

... in case the helper shows up in other openebs projects later..

Comment on lines 318 to 322
if podList.Items != nil && len(podList.Items) != 0 {
return &podList.Items[0], nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to klog.Info the slice of Pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what should the log say?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified it to this:

	if podList.Items != nil && len(podList.Items) != 0 {
		klog.Infof("existing helper podList length: %d", len(podList.Items))
		klog.Infof("existing helper pod: %v", podList.Items[0])
		return &podList.Items[0], nil
	}

However, it would print a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we need to do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what should we do next? Remove the log code, or change it from "Info" to "debug"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put it behind a higher level of verbosity, e.g. klog.V(2).Info(). Also maybe all that is required is the name of the pod and the namespace (or maybe not even that). We need not %v print the whole Pod object, I feel.

The reason for this log is to be able to debug contention introduced by more than one init container, by reviewing the logs of the provisioner. Maybe even just printing the number of Pods found at level 2 verbosity is good enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've logged the pod count and pod namespace/name, I think level 2 verbosity won't print too much.

	if podList.Items != nil && len(podList.Items) != 0 {
		klog.V(2).Infof("existing helper podList length: %d", len(podList.Items))
		klog.V(2).Infof("existing helper pod: %s/%s", podList.Items[0].Namespace, podList.Items[0].Name)
		return &podList.Items[0], nil
	}

@@ -2,6 +2,7 @@ package app

import (
"context"
"k8s.io/apimachinery/pkg/labels"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import order seems off. Please run go fmt cmd/provisioner-localpv/app/helper_hostpath.go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the code, please review it for me again. Thanks.

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 37.35%. Comparing base (cab53c4) to head (bea5825).
Report is 18 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #226      +/-   ##
===========================================
- Coverage    37.91%   37.35%   -0.57%     
===========================================
  Files           36        1      -35     
  Lines         3373      779    -2594     
===========================================
- Hits          1279      291     -988     
+ Misses        2012      479    -1533     
+ Partials        82        9      -73     
Flag Coverage Δ
integrationtests 37.35% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +308 to +312
matchLabels := map[string]string{
"openebs.io/pvc-name": config.pOpts.name,
"openebs.io/pvc-namespace": p.namespace,
"openebs.io/helper-type": "hostpath-" + config.podName,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further, this may be problematic for upgrades, if we crash between createPod and exitPod?

Since we haven't changed any of the pod definition on this commit here, I think it's probably fine to just keep as you had initially since we're not making changes to the Pod spec in this PR itself, we can take this up when needed again.
Otherwise I'd suggest to using Apply instead of Create, which should succeed when it already exists and is compatible: https://github.com/kubernetes/client-go/blob/v0.27.2/dynamic/interface.go#L43 ?

@biscout42
Copy link

But it always fails because the init pod already exists

could you help me to understand, is it because pod or pvc exists? If it is pvc , would it make sense to query for pvc and in case it is there skip initial pod creation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants