Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC fix windows containers - ensureWaiting/injection of the agent #872

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stephan48
Copy link

@stephan48 stephan48 commented Jan 4, 2022

This is a draft PR which should not be merged just yet. Hence i ignored the template for now, this should be for discussions/strategy planing first, then I will rework this PR into a proper state.

We run a Jenkins instance with a Docker Cloud configured with windows containers, we discovered that there are two blocking issues preventing windows containers from being started when Hyper-V Virtualization is enabled:

  • ensureWaiting -> unless you define a custom command to keep the container in a permanent running state running the container will fail with an obscure Hyper-V message because /bin/sh is not available
  • Hyper-V does not permit the copying of files into a running container. Currently the injection of the Agent Jar is done after the Containers are started, I implemented a crude detection if we have a Windows container and then do it before we initially start the container

The branch in this PR solves both problems in a "hacky" way.

Detection if it is a windows container is done by checking the workdir for existence of "^.:" indicating a structure similar to C:\bla.
This I find not too nice but I could not devise a better scheme yet. This should maybe moved to a more global State for a given Container, so we have a single Place to check this.

The second part is the handling of injection, currently in my PR when I see its a Windows container I inject before start other whise after. The question here is.. The beforeContainerStarted callback explicitly advises that it is a good place for injecting the Jar, why is it done in afterContainerStarted? I did not change this because i assume there must be a reason for this.

I attached TODO markers at all points where i seek clarification.

My tests where done on default unchanged linux(jenkins/agent:latest) and windows(jenkins/agent:jdk11-windowsservercore-ltsc2019) containers.

Thank you for your time.

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@pjdarton
Copy link
Member

Hmm, yes, that admission from Hyper-V that filesystem operations against a running Hyper-V container are not supported is going to be a problem.
I mean, ideally, Microsoft would actually finish implementing Hyper-V and support it but as Microsoft do like to "improve" upon standards and be a bit different, I suspect that hell will freeze over before they publicly admit that it's a deficiency, let alone fix it.
So that means that the only realistic option is dodging around the issue instead.

...hmm... that said ... it may be as simple as switching the code from doing the injection in the afterContainerStarted callback to doing it in the beforeContainerStarted method.
According to the Javadocs for those methods, the beforeContainerStarted method is the better place to do that injection.
It may be complete chance that it worked at all - it may be that you've actually found a (fairly long standing) bug in the plugin, and that it should be beforeContainerStarted for both Windows and Linux containers.
(by all means go digging in the history to see what it did before I refactored it - it's quite possible my most recent refactor in this area accidentally changed when things happened)

So, what I'd suggest is that you make a PR (or change this PR) that switches the jar injection to beforeContainerStarted regardless of it being a Linux or Windows container.
i.e. no hacky "are we on windows?" checking, just make the change for both Linux and Windows.
If that passes the automated tests (as reported by GitHub) then it hasn't broken the Linux container functionality (we actually test that - it's the Windows functionality that isn't tested) and it should be safe to merge.
Note: The automated tests are temperamental and prone to weird failures that I've never got to the bottom of. I've found that they work just fine on the VM I use to dev this plugin but they merely "often work" on the official Jenkins CI builds.

...and, if you're interested in having the docker-plugin work on Windows then, ideally, you'd then go on to submit a PR to improve the unit-tests so that they don't skip all the io.jenkins.docker.connector tests when on Windows, so that way we can get automated tests for Windows too (assuming that the Jenkins CI system gives us Windows VMs that have docker capability, that is).
Right now, the automated tests just skip anything that needs a docker daemon when running on Windows, and that (plus Microsoft's tendency to be different) is why Windows container support is so poor even though there's really no good reason why this plugin shouldn't be just as able to work on Windows as it does on Linux.

@pjdarton pjdarton added bug An issue reporting a bug or a PR fixing one. WIP PR that is *not* ready for merging (but hopefully being worked on by the author). labels May 18, 2022
.withAttachStdin(false);
// on windows we can't rely on /bin/sh being present, so we use cmd.exe to ensure a process is waiting
// TODO: same here, refined detection mechanism if possible
if(workdir.matches("^.:.*")) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjdarton What about this bit here? Maybe we can work around this by allowing the "waiting" cmd to be customized? This way we could default to /bin/sh and the admin/user of the plugin can change it to $whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CMD field can be set in the template definition.
While it might be nice to have the code set a Windows-compatible default, it's not essential as the user can specify one themselves.
i.e. this can be addressed in the help text.

@stephan48
Copy link
Author

Thank you very much for the review! I totally agree on your course of action - i will try to look at the unittests too, but can't promise anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue reporting a bug or a PR fixing one. stale WIP PR that is *not* ready for merging (but hopefully being worked on by the author).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants