Agent instance fails to connect to master despite port being open #669

pkaramol · 2020-01-28T23:51:22Z

Installing jenkin on GKE using the official helm chart.

Have used jnlp images with tags both 3.27-1 and 3.40-1

When starting a simple (shell execution) job, the agent pod, although it starts running, it gets terninated with error.
Its error logs are the following:

jenkins-agent-5j324 jnlp java.io.IOException: Failed to connect to http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp 	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:196)
jenkins-agent-5j324 jnlp 	at hudson.remoting.Engine.innerRun(Engine.java:523)
jenkins-agent-5j324 jnlp 	at hudson.remoting.Engine.run(Engine.java:474)
jenkins-agent-5j324 jnlp Caused by: java.net.ConnectException: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp 	at java.net.PlainSocketImpl.socketConnect(Native Method)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
jenkins-agent-5j324 jnlp 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
jenkins-agent-5j324 jnlp 	at java.net.Socket.connect(Socket.java:589)
jenkins-agent-5j324 jnlp 	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
jenkins-agent-5j324 jnlp 	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:193)
jenkins-agent-5j324 jnlp 	... 2 more
jenkins-agent-5j324 jnlp

I have created a test pod within the same master/agent namespace and no connectivity issue seems to exist:

/ # dig +short jenkins-inception.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-inception.jenkins.svc.cluster.local 8080
jenkins-inception.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/


  Jenkins

Environment:

cloud provider: GCP
master tag: lts
agent tag: 3.27-1 and 3.40-1
helm version:

Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

kubernetes version:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

istio version: 1.4.0

The text was updated successfully, but these errors were encountered:

timmyers · 2020-02-28T21:59:42Z

I believe this is happening because the envoy proxy is taking some time to set things up and the jnlp container tries to make a connection while this is still happening. I have had similar issues with recent versions of istio. Unfortunately I don't have a fix yet.

One solution would be for jnlp-slave to retry this connection instead of giving up on the first failure.

pkaramol · 2020-02-29T11:43:16Z

I can also confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using an older version of istio, e.g. 1.1.15

aspring · 2020-04-12T16:23:31Z

Following up on @timmyers comment this is exactly what I was observing and built a custom jnlp image that leverages wait-for-it to make sure the pod is able to connect to Jenkins prior to launching jenkins-agent. This solved the connectivity issue and from my testing its about a 3s delay on our cluster for the connection to be available.

abhishekkarigar · 2020-08-02T11:18:47Z

guys ,
i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster

yogesh9391 · 2020-09-10T13:54:12Z

@aspring could you please share the details like how you made custom image and how you have added wait.

deepan10 · 2020-09-11T14:28:21Z

guys ,
i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster

If your slave is outside Cluster, then you have use NodePort for Master to expose service.
After that you can connect the slave from outside cluster to Master which is inside cluster.

anthonyGuo · 2020-11-09T05:45:56Z

I'm facing this issue! Is there any idea except modifying jnlp images ?
istio: 1.6.8
jnlp: 4.3-4

I tried to modify the configMap for jenkins-agent: add "sleep 10; jenkins-agent" to command, but not work.
< command >sh -c " sleep 10; jenkins-agent " < /command >

logs:

SEVERE: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
java.io.IOException: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)

839928622 · 2021-06-01T13:42:14Z

facing same issue on istio 1.2.0, if u run jenkins and jenkins slave on pure kubernetes, everything works fine .

root@ubuntu:~# kubectl logs po/jenkins-slave-jrz8f -n jenkins

Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: jenkins-slave-jrz8f Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jun 01, 2021 1:00:07 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.20 Jun 01, 2021 1:00:07 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory Both error and output logs will be printed to /home/jenkins/agent/remoting Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local/] Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:192) at hudson.remoting.Engine.innerRun(Engine.java:518) at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:189) ... 2 more

mb250315 · 2021-10-27T16:29:25Z

I am getting the same issue

Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: agent-pkznn
Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener
INFO: Jenkins agent is running in headless mode.
Oct 27, 2021 4:23:18 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.11
Oct 27, 2021 4:23:18 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Oct 27, 2021 4:23:18 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/]
Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
java.io.IOException: Failed to connect to http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
at hudson.remoting.Engine.innerRun(Engine.java:724)
at hudson.remoting.Engine.run(Engine.java:540)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.base/java.net.Socket.connect(Unknown Source)
at java.base/sun.net.NetworkClient.doConnect(Unknown Source)
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
at java.base/sun.net.www.http.HttpClient.(Unknown Source)
at java.base/sun.net.www.http.HttpClient.New(Unknown Source)
at java.base/sun.net.www.http.HttpClient.New(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
... 2 more

mb250315 · 2021-10-27T16:32:24Z

I can also confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using an older version of istio, e.g. 1.1.15

I am getting it on istio 1.7.3 and GKE version 1.20.10-gke.301

timja · 2021-10-27T16:38:38Z

the recommendation appears to be to add a bit of a sleep / wait-for-it / a retry.

Happy for a fix in either this repo, or in say https://github.com/jenkinsci/remoting

cc @jeffret-b

jmcastellote · 2021-11-18T14:44:30Z

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template.
Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

hg13190 · 2021-12-12T08:22:14Z

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):
    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"
And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

How to do this if I'm not using kubernetes? How to add sleep?

timja · 2021-12-12T08:27:24Z

modify one of the startup scripts is easiest:

https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent
or
https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent.ps1

sasha-bachurin · 2022-03-24T22:09:21Z

Updating pod template might help as well

spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:4.3-4-jdk11
    command: ["/bin/sh","-c"]
    args: ["sleep 30; /usr/local/bin/jenkins-agent"]

psimms-r7 · 2022-10-24T11:03:48Z

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):
    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"
And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

We are seeing similar issues - only for Windows nodes as well
Could we add a readiness probe to the pod template I wonder, and if so what would that look like

dduportal · 2022-10-24T12:17:51Z

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):
    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"
And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like

Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic.

=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).

=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.

psimms-r7 · 2022-10-24T13:18:17Z

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):
    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"
And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like
Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic.

=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).

=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.

Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?

psimms-r7 · 2022-10-25T18:16:18Z

The error we are seeing is slightly different actually - UnknownHostException

Error

INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master

    $attempt = 6
    $success = $false
    while ($attempt -gt 0 -and -not $success) {
        try {
            $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
            if ($?) {
                Write-Host "AgentListener active"
                Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
            }
            else {
                Write-Host "AgentListener failed"
            }
        }
        catch {
            $attempt--
            Start-Sleep -s 10
            Write-Host "Failed"
            Write-Host $_
        }
    }

dduportal · 2022-10-25T18:25:21Z

Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?

I never played around with startup probes but it looks the right way to achieve. Your idea looks really good: startup probe to curl the Jenkins controller listener.
Alternatively, an initContainer added to the pod.

dduportal · 2022-10-25T19:06:04Z

The error we are seeing is slightly different actually - UnknownHostException

Error

INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master

    $attempt = 6
    $success = $false
    while ($attempt -gt 0 -and -not $success) {
        try {
            $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
            if ($?) {
                Write-Host "AgentListener active"
                Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
            }
            else {
                Write-Host "AgentListener failed"
            }
        }
        catch {
            $attempt--
            Start-Sleep -s 10
            Write-Host "Failed"
            Write-Host $_
        }
    }

The error comes from DNS resolution in your case. The UnknownHostException is pretty clear: it is NOT related to the image itself or your powershell code.

Could be worth it to check the DNS resolution with an interactive shell in your Jenkins Agent Windows pod: can it resolve external domain such as google.com?
Can you confirm that your Jenkins controller is running in a pod named jenkins in the namespace jenkins?
If you have Linux pod, can you try a Linux Jenkins agent with the same URL to see if it works with the same JENKINS_URL?

=> It reminds me of microsoft/Windows-Containers#61 (if it helps)

abcdefstar · 2022-12-13T18:06:16Z

The error we are seeing is slightly different actually - UnknownHostException

Error

INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master

    $attempt = 6
    $success = $false
    while ($attempt -gt 0 -and -not $success) {
        try {
            $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
            if ($?) {
                Write-Host "AgentListener active"
                Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
            }
            else {
                Write-Host "AgentListener failed"
            }
        }
        catch {
            $attempt--
            Start-Sleep -s 10
            Write-Host "Failed"
            Write-Host $_
        }
    }

Hi @psimms-r7 , Were you able to solve this issue? Im facing the exact same issue in my cluster as well

psimms-r7 · 2022-12-13T18:19:34Z

Hi @psimms-r7 , Were you able to solve this issue? Im facing the exact same issue in my cluster as well

@abcdefstar

I put that snippet of code into the jenkins-agent.ps1 script and bundled that into our custom jnlp image overwriting the original, which seems to make it more reliable, I haven't seen that issue since

abcdefstar · 2022-12-14T05:07:18Z

Hi @psimms-r7 , Were you able to solve this issue? Im facing the exact same issue in my cluster as well

@abcdefstar

I put that snippet of code into the jenkins-agent.ps1 script and bundled that into our custom jnlp image overwriting the original, which seems to make it more reliable, I haven't seen that issue since

Thank you so much!! Let me give it a try..

abcdefstar · 2023-06-21T11:56:03Z

@jawadqur The issue is mostly seen in windows node. The init container option works for linux.

RiyazM3 · 2023-11-16T15:55:41Z

this issue occurs on a OKE cluster (v1.25.12) using istio 1.15.1, To get it to work had to disable istio from the agent namespace.

Fix more issues with password expiry

felipecrs · 2024-10-18T14:46:09Z

Should a retry mechanism be implemented on top of the agent.jar? In the entrypoint?

I have found references that the agent should be retried in case of failures:

https://issues.jenkins.io/browse/JENKINS-49956?focusedId=331180&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-331180

So, I wonder why docker-agent doesn't do it.

felipecrs · 2024-10-18T16:22:17Z

Answering to myself:

Probably because the agent.jar itself is responsible for the reconnection, and that's why it has a parameter -noReconnect to disable the reconnection mechanism
The reference I used is very old, perhaps back then the agent.jar was not prepared for reconnecting.

slide changed the title ~~Slave instance fails to connect to master despite port being open~~ Agent instance fails to connect to master despite port being open Jun 8, 2020

jawadqur mentioned this issue Apr 18, 2023

Use initcontainer to wait for jenkins connection uc-cdis/cdis-jenkins-lib#267

Merged

lemeurherve referenced this issue in lemeurherve/jenkinsci-docker-inbound-agent Nov 19, 2023

Merge pull request #146 from slide/windows_fixes

346adc7

Fix more issues with password expiry

lemeurherve added the inbound-agent label Jan 13, 2024

lemeurherve transferred this issue from jenkinsci/docker-inbound-agent Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent instance fails to connect to master despite port being open #669

Agent instance fails to connect to master despite port being open #669

pkaramol commented Jan 28, 2020 •

edited

Loading

timmyers commented Feb 28, 2020 •

edited

Loading

pkaramol commented Feb 29, 2020

aspring commented Apr 12, 2020

abhishekkarigar commented Aug 2, 2020

yogesh9391 commented Sep 10, 2020

deepan10 commented Sep 11, 2020

anthonyGuo commented Nov 9, 2020 •

edited

Loading

839928622 commented Jun 1, 2021 •

edited

Loading

mb250315 commented Oct 27, 2021

mb250315 commented Oct 27, 2021

timja commented Oct 27, 2021 •

edited

Loading

jmcastellote commented Nov 18, 2021 •

edited

Loading

hg13190 commented Dec 12, 2021

timja commented Dec 12, 2021

sasha-bachurin commented Mar 24, 2022

psimms-r7 commented Oct 24, 2022

dduportal commented Oct 24, 2022

psimms-r7 commented Oct 24, 2022

psimms-r7 commented Oct 25, 2022

dduportal commented Oct 25, 2022

dduportal commented Oct 25, 2022

abcdefstar commented Dec 13, 2022 •

edited

Loading

psimms-r7 commented Dec 13, 2022 •

edited

Loading

abcdefstar commented Dec 14, 2022

abcdefstar commented Jun 21, 2023 •

edited

Loading

RiyazM3 commented Nov 16, 2023

felipecrs commented Oct 18, 2024

felipecrs commented Oct 18, 2024

Agent instance fails to connect to master despite port being open #669

Agent instance fails to connect to master despite port being open #669

Comments

pkaramol commented Jan 28, 2020 • edited Loading

timmyers commented Feb 28, 2020 • edited Loading

pkaramol commented Feb 29, 2020

aspring commented Apr 12, 2020

abhishekkarigar commented Aug 2, 2020

yogesh9391 commented Sep 10, 2020

deepan10 commented Sep 11, 2020

anthonyGuo commented Nov 9, 2020 • edited Loading

839928622 commented Jun 1, 2021 • edited Loading

mb250315 commented Oct 27, 2021

mb250315 commented Oct 27, 2021

timja commented Oct 27, 2021 • edited Loading

jmcastellote commented Nov 18, 2021 • edited Loading

hg13190 commented Dec 12, 2021

timja commented Dec 12, 2021

sasha-bachurin commented Mar 24, 2022

psimms-r7 commented Oct 24, 2022

dduportal commented Oct 24, 2022

psimms-r7 commented Oct 24, 2022

psimms-r7 commented Oct 25, 2022

dduportal commented Oct 25, 2022

dduportal commented Oct 25, 2022

abcdefstar commented Dec 13, 2022 • edited Loading

psimms-r7 commented Dec 13, 2022 • edited Loading

abcdefstar commented Dec 14, 2022

abcdefstar commented Jun 21, 2023 • edited Loading

RiyazM3 commented Nov 16, 2023

felipecrs commented Oct 18, 2024

felipecrs commented Oct 18, 2024

pkaramol commented Jan 28, 2020 •

edited

Loading

timmyers commented Feb 28, 2020 •

edited

Loading

anthonyGuo commented Nov 9, 2020 •

edited

Loading

839928622 commented Jun 1, 2021 •

edited

Loading

timja commented Oct 27, 2021 •

edited

Loading

jmcastellote commented Nov 18, 2021 •

edited

Loading

abcdefstar commented Dec 13, 2022 •

edited

Loading

psimms-r7 commented Dec 13, 2022 •

edited

Loading

abcdefstar commented Jun 21, 2023 •

edited

Loading