[WIP] Enable k8s testing. #347

jmchilton · 2017-06-09T18:52:12Z

This is taking forever but it will be wonderful when it works :).

bgruening · 2017-06-10T22:29:24Z

We have a small problem here. It seems that kompose does not like the 2.1 docker-compose format. But I need to specify 2.1 in order to get the ENV vars, afaik. I have already started to convert everything to the compose 3 format, which is (as far as I have read) the preferred format now -> #333

I have compiled kubernetes/kompose#600 and it is converting my v3 files with a few warnings - what do you think? All or nothing and going with v3 or spend time with hacky ways to get v2 working?

bgruening · 2017-06-11T11:49:06Z

In #348 I ported the compose file to v3 and I'm at least able to convert it with kompose using this PR: kubernetes/kompose#600

However I get these warnings:

WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/postgres/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/rabbitmq" isn't supported - ignoring path on the host

jmchilton · 2017-06-12T16:23:49Z

I think I got those warning too - for my kube setup it wouldn't mount the local directories and that made sense because the volumes were living on the VM running the Docker host and so it wouldn't even have access to them. But it would share them across the cluster correctly despite that warning - I think that is fine - local directories being shared out like that are good for local and development machines - but if you are going to use Kubernetes you need to setup some mounts and such ahead of time and that makes sense to me - at least for now.

bgruening · 2017-06-15T18:46:58Z

@jmchilton maybe you deactivate some tests from the build matrix for the time being - this is hopefully faster.

bgruening · 2017-06-16T17:15:55Z

We have a real error message :)
FATA Error while deploying application: Deployment.apps "galaxy" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy

jmchilton · 2017-06-16T17:52:54Z

😄 I don't even know why I try @bgruening - you are so much better at this than me. I'll look into the cluster policy.

bgruening · 2017-06-16T17:54:07Z

I walking in the dark as you probably do :)
But this error does not tell me anything :(

bgruening · 2017-06-17T09:45:04Z

I found this:

logging error output: "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[-]poststarthook/bootstrap-controller failed: reason withheld\n[+]poststarthook/extensions/third-party-resources ok\n[-]poststarthook/ca-registration failed: reason withheld\n[+]poststarthook/start-kube-apiserver-informers ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[-]autoregister-completion failed: reason withheld\nhealthz check failed\n"

bgruening · 2017-06-17T21:52:34Z

@jmchilton I think we made a huge step forward: https://travis-ci.org/bgruening/docker-galaxy-stable/builds/244074377

galaxy-2032667927-cnpsb                         1/1       Running   0          2m
galaxy-htcondor-1739486422-tr9p2                1/1       Running   0          2m
galaxy-htcondor-executor-2366528963-1sjvd       1/1       Running   0          2m
galaxy-htcondor-executor-big-4172325638-tnw76   1/1       Running   0          2m
galaxy-init-2851784748-bdr6t                    1/1       Running   0          2m
galaxy-postgres-3393713554-x1lbl                1/1       Running   0          2m
galaxy-proftpd-391410106-3t3k3                  1/1       Running   0          2m
galaxy-slurm-3416039136-s54fr                   1/1       Running   0          2m
pgadmin4-2349333086-8cgbk                       1/1       Running   0          2m
rabbitmq-1890260688-43jh4                       1/1       Running   0          2m

bgruening · 2017-06-17T22:05:01Z

I think the following points are open:

create the persistent volume (PV)
start the container in the correct order
find the correct URL to run bioblend tests against it

jmchilton · 2017-06-17T23:23:56Z

Holy crap - amazing work @bgruening! I'll see if I can catch up.

jmchilton · 2017-06-17T23:26:22Z

I guess the way the volumes are setup right now it is never going to work - I think we need to switch to global volumes definition for the compose file right?

The fix from https://s3.amazonaws.com/archive.travis-ci.org/jobs/244040617/log.txt?X-Amz-Expires=30&X-Amz-Date=20170618T001716Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170618/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=3b34525888d85476690aa8fe5db104fae4477574cf388b3b075f34964f430964 to the next commit seems to be the updatedb call - maybe I'm missing something though?

Three persistent volume defs seem to be needed - export, rabbitmq, and postgres. Also dropping the Docker host mounting stuff - I don't think that will fly in K8S (I can throw it back in later though if I'm wrong).

Seems to randomly fail - or is that just me?

If multiple services re-use the same global definition for a volume - they all generate the same file during conversion (not a problem per se) but during up they all attempt to create the PVC :(. Adding a script that works around this by manually creating all the service, deployments, PVCs... with kubectl create -f (post convert).

Feel free to replace this with something less hacky...

``` error: error validating "export-persistentvolumeclaim.yaml": error validating data: open /home/travis/.kube/schema/v0.0.0-master+a57c33bd28173/api/v1/schema.json: permission denied; if you choose to ignore these errors, turn validation off with --validate=false ```

jmchilton · 2017-06-18T03:55:49Z

start the container in the correct order

I think rather then trying to implement we should just make all containers wait on their dependent containers - the way galaxy-web waits for the database for instance. Kompose doesn't support depends_on and it wouldn't feel very Kubernetes-ish if it did I guess. Since this all works for me regardless of this fact - I think we are pretty close.

jmchilton · 2017-06-18T04:40:10Z

I have it running locally so I have few more open issues that would have taken forever to debug from Travis:

I could be wrong but it really seems like the infrastructure only starts up half the time.
Some of the images are published - minikube is auto-pulling these rather than using what is in the cache 😦. I'm not sure what to do about this - it only does this for the latest tag (docs at https://kubernetes.io/docs/concepts/configuration/overview/)- so maybe we should use a dev tag or something?
The .env files are not really respected it seems to me - which I guess makes sense we haven't specified which one to grab. As result the default Galaxy that comes up tries to use slurm but it isn't ready.

Update:
Spent some more time on these last two problems. 813ec46 is a terrible hack that should force the test infrastructure to use the locally built containers. I spent more time trying to get a simple .env file working with kompose and it isn't working out. I'm not sure what to do about that at all - seems like a serious Kompose bug.

This is a terrible hack - feel free to revert and work around it a different way.

bgruening · 2017-06-18T07:45:04Z

Regarding your second point I found this: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
and
https://kubernetes.io/docs/concepts/containers/images/#updating-images

I think it is save to convert everything to :dev or :master.

bgruening · 2017-06-18T08:25:02Z

compose/buildlocal.sh


-docker build --build-arg ANSIBLE_REPO=$ANSIBLE_REPO --build-arg ANSIBLE_RELEASE=$ANSIBLE_RELEASE -t quay.io/bgruening/galaxy-base ./galaxy-base/
-docker build --build-arg GALAXY_REPO=$GALAXY_REPO --build-arg GALAXY_RELEASE=$GALAXY_RELEASE -t quay.io/bgruening/galaxy-init ./galaxy-init/
+: ${TAG_SUFFIX:=""}


I'm fine with hard coding everything to dev, but this also seems to be fine.
Defaulting to dev probably makes sense to avoid that people who build this locally, without a special tag, will encounter the same problems.

An other idea is to use docker tag and tag these images with different tags, one master and one latest or dev but this is even more confusing I think.

bgruening · 2017-06-18T12:23:21Z

I tried to tackle the env substitution problem. First I filled an issue at kubernetes/kompose#650

Than I tried to use some kind of pre-processor. At first I tried to use envsubst but this does not replace fancy stuff like ${foo:-bar}. So I tried os.path.expandvars but it has the same limitations. So I ended up with a small bash script that can do the trick until upstream is fixed:

#!/bin/bash
cat > $2 << EOF
`cat $1`
EOF

usage: bash cat.sh docker-compose.yml dest.yml. Too hacky?

bgruening · 2017-06-20T15:10:19Z

The env var handling might be fixed upstream.

jmchilton and others added 6 commits June 15, 2017 12:41

[WIP] Enable K8S Testing.

dc1b323

install python yaml

9e5bf85

is is named pyyaml

809a7ba

upgrade kompose version from 0.5 to 0.7

026295b

Fix parsing of compose file

c0cdf40

Use a newer version of Go with compose v3 support.

d2118e1

jmchilton force-pushed the enable_k8_again branch from 910a48d to d2118e1 Compare June 15, 2017 16:51

K8S testing baby steps?

1ef5e26

bgruening and others added 7 commits June 15, 2017 21:18

default to port 8080 for the k8s API server

1d331dd

Deactivate most tests for ... testing K8S.

99b6d8a

More test debugging...

2092a84

More kube debugging...

bb211cc

More kubernetes debugging...

348f2e2

Is that script even getting anywhere???

4aebad6

wait until file exists

880b0f9

jmchilton and others added 6 commits June 16, 2017 13:59

Enabled privileged execution of containers in K8S testing setup.

73b372a

Print apiserver log if it fails to start.

473c021

Eliminate the need for privileged in K8S setup?

899b56e

At some point we need to run Galaxy on a different port

aac1b8a

more debug output

cedd9db

Update .travis.yml

f4c048f

fix python convert script

9b13075

jmchilton added 9 commits June 17, 2017 20:20

Improve usability of k8s Makefile.

9e0db36

Convert per-service volume definitions for K8 to global volume defs.

8bce833

Three persistent volume defs seem to be needed - export, rabbitmq, and postgres. Also dropping the Docker host mounting stuff - I don't think that will fly in K8S (I can throw it back in later though if I'm wrong).

Don't mount volumes twice in Kubernetes conversion...

9e03c07

Just see if kubectl before running kompose up...

526a6ce

Seems to randomly fail - or is that just me?

Fixup up that pre-up kubectl get all to retry if it doesn't work.

9ce801b

Hack to update bioblend port for Kube tests.

c3831ec

Feel free to replace this with something less hacky...

Force k8 tags when building images for Kubernetes.

813ec46

This is a terrible hack - feel free to revert and work around it a different way.

bgruening reviewed Jun 18, 2017

View reviewed changes

bgruening added 4 commits June 18, 2017 14:46

more debug

e0415ce

fix tag usage

03b767a

change permissions

fc0e54d

Update .travis.yml

c9e883e

jmchilton mentioned this pull request Jun 19, 2017

Container Scheduling - A meta issue galaxyproject/galaxy#3922

Closed

afgane mentioned this pull request Oct 11, 2017

Syncing with galaxy-docker-project galaxyproject/galaxy-helm#2

Closed

bgruening force-pushed the dev branch 4 times, most recently from 0bea5b2 to 3958d2a Compare March 2, 2019 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Enable k8s testing. #347

[WIP] Enable k8s testing. #347

jmchilton commented Jun 9, 2017

bgruening commented Jun 10, 2017

bgruening commented Jun 11, 2017

jmchilton commented Jun 12, 2017

bgruening commented Jun 15, 2017

bgruening commented Jun 16, 2017

jmchilton commented Jun 16, 2017

bgruening commented Jun 16, 2017

bgruening commented Jun 17, 2017

bgruening commented Jun 17, 2017

bgruening commented Jun 17, 2017

jmchilton commented Jun 17, 2017

jmchilton commented Jun 17, 2017

jmchilton commented Jun 18, 2017

jmchilton commented Jun 18, 2017 •

edited

Loading

bgruening commented Jun 18, 2017

bgruening Jun 18, 2017

bgruening commented Jun 18, 2017

bgruening commented Jun 20, 2017

[WIP] Enable k8s testing. #347

Are you sure you want to change the base?

[WIP] Enable k8s testing. #347

Conversation

jmchilton commented Jun 9, 2017

bgruening commented Jun 10, 2017

bgruening commented Jun 11, 2017

jmchilton commented Jun 12, 2017

bgruening commented Jun 15, 2017

bgruening commented Jun 16, 2017

jmchilton commented Jun 16, 2017

bgruening commented Jun 16, 2017

bgruening commented Jun 17, 2017

bgruening commented Jun 17, 2017

bgruening commented Jun 17, 2017

jmchilton commented Jun 17, 2017

jmchilton commented Jun 17, 2017

jmchilton commented Jun 18, 2017

jmchilton commented Jun 18, 2017 • edited Loading

bgruening commented Jun 18, 2017

bgruening Jun 18, 2017

Choose a reason for hiding this comment

bgruening commented Jun 18, 2017

bgruening commented Jun 20, 2017

jmchilton commented Jun 18, 2017 •

edited

Loading