Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KCP remediation & add machine deployment remediation test #109

Merged
merged 1 commit into from
May 14, 2024

Conversation

nasusoba
Copy link
Contributor

  • Fix machine health check for controlplane
    • fix init lock release logic
    • add handling for remediation annotation and status
    • modify machine_controller logic to handle scenarios when machine.Status.NodeRef=nil
  • Add 2 e2e tests:
    • kcp_remediation for controlplane remediation
    • md_remediation for machine deployment remediation
  • Code cleanup
    • fix TODO for reconcileKubeConfig

Fixed #108

Signed-off-by: nasusoba <[email protected]>

fix lint

Signed-off-by: nasusoba <[email protected]>

fix comment
@@ -41,7 +42,7 @@ import (
// reconcileUnhealthyMachines tries to remediate KThreesControlPlane unhealthy machines
// based on the process described in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20191017-kubeadm-based-control-plane.md#remediation-using-delete-and-recreate
// taken from the kubeadm codebase and adapted for the k3s provider.
func (r *KThreesControlPlaneReconciler) reconcileUnhealthyMachines(ctx context.Context, controlPlane *k3s.ControlPlane) (ret ctrl.Result, retErr error) {
func (r *KThreesControlPlaneReconciler) reconcileUnhealthyMachines(ctx context.Context, controlPlane *k3s.ControlPlane) (ret ctrl.Result, retErr error) { //nolint:gocyclo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to have nolint here only because we want to keep code structure same with kubeadmControlplane, so that it's easier for maintain.

@@ -14,33 +14,34 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

// Package locking implements locking functionality.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is from kcp, and copied everywhere, e.g. k3s provider, etcdadm-bootstrap-provider, cluster-api-provider-rke2

Shall we raise an issue to CAPI, ask if it's feasible to set this as public?

Copy link
Contributor Author

@nasusoba nasusoba May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised an issue

@mogliang mogliang merged commit 5c72dca into k3s-io:main May 14, 2024
5 checks passed
@nasusoba nasusoba deleted the etcd4 branch May 14, 2024 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remediation is failing when machine bootstrap failed
2 participants