Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seed job fails after Jenkins restart with backups enabled #607

Open
Bakies opened this issue Jul 29, 2021 · 10 comments
Open

Seed job fails after Jenkins restart with backups enabled #607

Bakies opened this issue Jul 29, 2021 · 10 comments
Labels
Milestone

Comments

@Bakies
Copy link

Bakies commented Jul 29, 2021

Describe the bug
After my Jenkins controller restarts the seed job fails to start. I think there's a race condition between the restore job and the seed job starting. Probably because the seed job is configured in JCasC the job is setup before the restore and it creates the file nextBuildNumber with a 1, and the restore may not override it? It takes a long time, if ever, before the seed job runs and restores the config for the rest of the jobs.

I'm currently thinking I will just exclude the seed jobs from backups. I don't think I particularly care about their history.

To Reproduce
Configure a seed job in JCasC
Run it a few times
Delete jenkins pod

Additional information

Kubernetes version: 1.19
Jenkins Operator version: v0.5.0

Add error logs about the problem here (operator logs and Kubernetes events).
jenkins-master container logs:

2021-07-28 19:59:25.874+0000 [id=146]   WARNING j.model.lazy.LazyBuildMixIn#newBuild: A new build could not be created in job github-job-dsl-seed
java.lang.IllegalStateException: JENKINS-23152: /var/lib/jenkins/jobs/github-job-dsl-seed/builds/1 already existed; will not overwrite with github-job-dsl-seed #1
        at hudson.model.RunMap.put(RunMap.java:189)
        at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:182)
        at hudson.model.AbstractProject.newBuild(AbstractProject.java:963)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1139)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:138)
        at hudson.model.Executor$1.call(Executor.java:365)
        at hudson.model.Executor$1.call(Executor.java:347)
        at hudson.model.Queue._withLock(Queue.java:1443)
        at hudson.model.Queue.withLock(Queue.java:1304)
        at hudson.model.Executor.run(Executor.java:347)
2021-07-28 19:59:25.875+0000 [id=146]   SEVERE  hudson.model.Executor#run: Executor #4 for seed-job-agent: Unexpected executor death
java.lang.IllegalStateException: JENKINS-23152: /var/lib/jenkins/jobs/github-job-dsl-seed/builds/1 already existed; will not overwrite with github-job-dsl-seed #1
        at hudson.model.RunMap.put(RunMap.java:189)
        at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:182)
Caused: java.lang.Error
        at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:190)
        at hudson.model.AbstractProject.newBuild(AbstractProject.java:963)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1139)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:138)
        at hudson.model.Executor$1.call(Executor.java:365)
        at hudson.model.Executor$1.call(Executor.java:347)
        at hudson.model.Queue._withLock(Queue.java:1443)
        at hudson.model.Queue.withLock(Queue.java:1304)
        at hudson.model.Executor.run(Executor.java:347)
2021-07-28 19:59:54.148+0000 [id=144]   WARNING j.model.lazy.LazyBuildMixIn#newBuild: A new build could not be created in job github-job-dsl-seed
java.lang.IllegalStateException: JENKINS-23152: /var/lib/jenkins/jobs/github-job-dsl-seed/builds/2 already existed; will not overwrite with -job-dsl-seed #2
        at hudson.model.RunMap.put(RunMap.java:189)
        at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:182)
        at hudson.model.AbstractProject.newBuild(AbstractProject.java:963)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1139)
        at hudson.model.AbstractProject.createExecutable(AbstractProject.java:138)
        at hudson.model.Executor$1.call(Executor.java:365)
        at hudson.model.Executor$1.call(Executor.java:347)
        at hudson.model.Queue._withLock(Queue.java:1443)
        at hudson.model.Queue.withLock(Queue.java:1304)
        at hudson.model.Executor.run(Executor.java:347)
@Bakies Bakies added the bug Something isn't working label Jul 29, 2021
@Bakies
Copy link
Author

Bakies commented Jul 29, 2021

Doesn't seem totally consistent, probably some race condition somewhere.

@justyns
Copy link

justyns commented Sep 2, 2021

We also noticed a similar issue. The seed job starts before the restore job finishes. This causes issues with Jenkins trying to re-index repos that it doesn't need to.

If you go to Manage Jenkins and click the "Reload Configuration from Disk" button, it fixes the error @Bakies posted - but a better solution imo would be for the seed job to wait until the restore process is finished before triggering.

@Bakies
Copy link
Author

Bakies commented Sep 28, 2021

Ah, thanks for that workaround, that button should be helpful to me : )

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this issue is still affecting you, just comment with any updates and we'll keep it open. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@cwitthaus
Copy link

I encountered this recently as well. I have solved it for now by slightly changing the backup.sh script. I added --exclude jobs/*job-dsl-seed to the tar command and overwrote the scripts in the default backup container by following the process in https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/custom-backup-and-restore/.

@stale stale bot removed the stale label Apr 28, 2022
@Harguer
Copy link

Harguer commented Jun 13, 2022

Hi! I was wondering if there is an update on this. I'm facing the same issue, when my jenkins pod dies, the new pod won't trigger the seed-jobs and fails with that error.

2022-06-13 13:37:53.596+0000 [id=97]	WARNING	j.model.lazy.LazyBuildMixIn#newBuild: A new build could not be created in job seed-jobs-job-dsl-seed
java.lang.IllegalStateException: JENKINS-23152: /var/lib/jenkins/jobs/seed-jobs-job-dsl-seed/builds/1 already existed; will not overwrite with seed-jobs-job-dsl-seed #1
	at hudson.model.RunMap.put(RunMap.java:194)

@emyes
Copy link

emyes commented Aug 1, 2022

Do we have any updates on this issue ? We are also facing similar issues at our end.

@Bakies
Copy link
Author

Bakies commented Oct 11, 2022 via email

@michalgoldys
Copy link

I can confirm that we've got the same problem:

2022-10-21 13:43:15.839+0000 [id=108]	WARNING	j.model.lazy.LazyBuildMixIn#newBuild: A new build could not be created in job jenkins-operator-seed-job-dsl-seed
java.lang.IllegalStateException: JENKINS-23152: /var/lib/jenkins/jobs/jenkins-operator-seed-job-dsl-seed/builds/1 already existed; will not overwrite with jenkins-operator-seed-job-dsl-seed #1

Reloading configuration from disk via the appropriate option in Jenkins settings solves the problem. But that's something that shouldn't happen after restarting the jenkins-master pod.

Image:
jenkins/jenkins:2.346.2-lts-alpine
Kubernetes operator helm chart version:
version: 0.6.2

@brokenpip3
Copy link
Collaborator

I encountered this recently as well. I have solved it for now by slightly changing the backup.sh script. I added --exclude jobs/*job-dsl-seed to the tar command and overwrote the scripts in the default backup container by following the process in https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/custom-backup-and-restore/.

This ^ should be the solution here: excluding the seeds jobs (with a regex) from the history backup.

Adding good-first-issue.

@brokenpip3 brokenpip3 added good first issue Good for newcomers and removed bug Something isn't working labels Mar 7, 2023
@github-actions github-actions bot added the stale label May 8, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 19, 2023
@brokenpip3 brokenpip3 reopened this May 19, 2023
@brokenpip3 brokenpip3 added this to the 0.9 milestone Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants