Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enable k8s testing. #347

Open
wants to merge 48 commits into
base: dev
Choose a base branch
from

Conversation

jmchilton
Copy link
Collaborator

This is taking forever but it will be wonderful when it works :).

@bgruening
Copy link
Owner

We have a small problem here. It seems that kompose does not like the 2.1 docker-compose format. But I need to specify 2.1 in order to get the ENV vars, afaik. I have already started to convert everything to the compose 3 format, which is (as far as I have read) the preferred format now -> #333

I have compiled kubernetes/kompose#600 and it is converting my v3 files with a few warnings - what do you think? All or nothing and going with v3 or spend time with hacky ways to get v2 working?

@bgruening
Copy link
Owner

In #348 I ported the compose file to v3 and I'm at least able to convert it with kompose using this PR: kubernetes/kompose#600

However I get these warnings:

WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/var/run/docker.sock" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/postgres/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/" isn't supported - ignoring path on the host 
WARN Volume mount on the host "/export/rabbitmq" isn't supported - ignoring path on the host 

@jmchilton
Copy link
Collaborator Author

I think I got those warning too - for my kube setup it wouldn't mount the local directories and that made sense because the volumes were living on the VM running the Docker host and so it wouldn't even have access to them. But it would share them across the cluster correctly despite that warning - I think that is fine - local directories being shared out like that are good for local and development machines - but if you are going to use Kubernetes you need to setup some mounts and such ahead of time and that makes sense to me - at least for now.

@bgruening
Copy link
Owner

@jmchilton maybe you deactivate some tests from the build matrix for the time being - this is hopefully faster.

@bgruening
Copy link
Owner

We have a real error message :)
FATA Error while deploying application: Deployment.apps "galaxy" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy

@jmchilton
Copy link
Collaborator Author

😄 I don't even know why I try @bgruening - you are so much better at this than me. I'll look into the cluster policy.

@bgruening
Copy link
Owner

I walking in the dark as you probably do :)
But this error does not tell me anything :(

@bgruening
Copy link
Owner

I found this:

logging error output: "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[-]poststarthook/bootstrap-controller failed: reason withheld\n[+]poststarthook/extensions/third-party-resources ok\n[-]poststarthook/ca-registration failed: reason withheld\n[+]poststarthook/start-kube-apiserver-informers ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[-]autoregister-completion failed: reason withheld\nhealthz check failed\n"

@bgruening
Copy link
Owner

@jmchilton I think we made a huge step forward: https://travis-ci.org/bgruening/docker-galaxy-stable/builds/244074377

galaxy-2032667927-cnpsb                         1/1       Running   0          2m
galaxy-htcondor-1739486422-tr9p2                1/1       Running   0          2m
galaxy-htcondor-executor-2366528963-1sjvd       1/1       Running   0          2m
galaxy-htcondor-executor-big-4172325638-tnw76   1/1       Running   0          2m
galaxy-init-2851784748-bdr6t                    1/1       Running   0          2m
galaxy-postgres-3393713554-x1lbl                1/1       Running   0          2m
galaxy-proftpd-391410106-3t3k3                  1/1       Running   0          2m
galaxy-slurm-3416039136-s54fr                   1/1       Running   0          2m
pgadmin4-2349333086-8cgbk                       1/1       Running   0          2m
rabbitmq-1890260688-43jh4                       1/1       Running   0          2m

@bgruening
Copy link
Owner

I think the following points are open:

  • create the persistent volume (PV)
  • start the container in the correct order
  • find the correct URL to run bioblend tests against it

@jmchilton
Copy link
Collaborator Author

Holy crap - amazing work @bgruening! I'll see if I can catch up.

@jmchilton
Copy link
Collaborator Author

I guess the way the volumes are setup right now it is never going to work - I think we need to switch to global volumes definition for the compose file right?

Three persistent volume defs seem to be needed - export, rabbitmq, and postgres. Also dropping the Docker host mounting stuff - I don't think that will fly in K8S (I can throw it back in later though if I'm wrong).
Seems to randomly fail - or is that just me?
If multiple services re-use the same global definition for a volume - they all generate the same file during conversion (not a problem per se) but during up they all attempt to create the PVC :(. Adding a script that works around this by manually creating all the service, deployments, PVCs... with kubectl create -f (post convert).
Feel free to replace this with something less hacky...
```
error: error validating "export-persistentvolumeclaim.yaml": error validating data: open /home/travis/.kube/schema/v0.0.0-master+a57c33bd28173/api/v1/schema.json: permission denied; if you choose to ignore these errors, turn validation off with --validate=false
```
@jmchilton
Copy link
Collaborator Author

start the container in the correct order

I think rather then trying to implement we should just make all containers wait on their dependent containers - the way galaxy-web waits for the database for instance. Kompose doesn't support depends_on and it wouldn't feel very Kubernetes-ish if it did I guess. Since this all works for me regardless of this fact - I think we are pretty close.

@jmchilton
Copy link
Collaborator Author

jmchilton commented Jun 18, 2017

I have it running locally so I have few more open issues that would have taken forever to debug from Travis:

  • I could be wrong but it really seems like the infrastructure only starts up half the time.
  • Some of the images are published - minikube is auto-pulling these rather than using what is in the cache 😦. I'm not sure what to do about this - it only does this for the latest tag (docs at https://kubernetes.io/docs/concepts/configuration/overview/)- so maybe we should use a dev tag or something?
  • The .env files are not really respected it seems to me - which I guess makes sense we haven't specified which one to grab. As result the default Galaxy that comes up tries to use slurm but it isn't ready.

Update:
Spent some more time on these last two problems. 813ec46 is a terrible hack that should force the test infrastructure to use the locally built containers. I spent more time trying to get a simple .env file working with kompose and it isn't working out. I'm not sure what to do about that at all - seems like a serious Kompose bug.

This is a terrible hack - feel free to revert and work around it a different way.
@bgruening
Copy link
Owner

Regarding your second point I found this: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
and
https://kubernetes.io/docs/concepts/containers/images/#updating-images

I think it is save to convert everything to :dev or :master.


docker build --build-arg ANSIBLE_REPO=$ANSIBLE_REPO --build-arg ANSIBLE_RELEASE=$ANSIBLE_RELEASE -t quay.io/bgruening/galaxy-base ./galaxy-base/
docker build --build-arg GALAXY_REPO=$GALAXY_REPO --build-arg GALAXY_RELEASE=$GALAXY_RELEASE -t quay.io/bgruening/galaxy-init ./galaxy-init/
: ${TAG_SUFFIX:=""}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with hard coding everything to dev, but this also seems to be fine.
Defaulting to dev probably makes sense to avoid that people who build this locally, without a special tag, will encounter the same problems.

An other idea is to use docker tag and tag these images with different tags, one master and one latest or dev but this is even more confusing I think.

@bgruening
Copy link
Owner

I tried to tackle the env substitution problem. First I filled an issue at kubernetes/kompose#650

Than I tried to use some kind of pre-processor. At first I tried to use envsubst but this does not replace fancy stuff like ${foo:-bar}. So I tried os.path.expandvars but it has the same limitations. So I ended up with a small bash script that can do the trick until upstream is fixed:

#!/bin/bash
cat > $2 << EOF
`cat $1`
EOF

usage: bash cat.sh docker-compose.yml dest.yml. Too hacky?

@bgruening
Copy link
Owner

The env var handling might be fixed upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants