-
Notifications
You must be signed in to change notification settings - Fork 53
feat(ingress) Experimental Native Ingress #1243
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1243 +/- ##
==========================================
- Coverage 87.26% 86.71% -0.56%
==========================================
Files 44 45 +1
Lines 3887 3928 +41
Branches 675 681 +6
==========================================
+ Hits 3392 3406 +14
- Misses 327 354 +27
Partials 168 168 Continue to review full report at Codecov.
|
README.md
Outdated
@@ -21,7 +21,7 @@ The Controller is the central API server for [Deis Workflow][workflow]. It is in | |||
* Configure an application | |||
* Create a new user | |||
|
|||
# Development | |||
# Contributing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These clarifications to the docs are great, but should be in a different PR ideally since they're not related to ingress changes.
.gitignore
Outdated
@@ -13,6 +13,7 @@ develop-eggs | |||
.installed.cfg | |||
lib | |||
lib64 | |||
.idea* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @mboersma has noted on other PRs, I would suggest this be a separate PR, just to keep is tractable.
@@ -0,0 +1,21 @@ | |||
{{ if .Values.global.experimental_native_ingress }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost certain that the "parent" chart's global configuration isn't visible to this chart. i.e. There's no "looking up." Remember also that each Workflow component's own chart is installable independent of the "parent" Workflow chart and all its sibling components. With that in mind, you can see why "looking up" isn't a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a similar comment on some other PRs related to this same feature.
I think it's clear to me that we're going to have to explore some more creative patterns for how, in the "parent" chart, one flag can be set that enables this feature and have that setting can propagate down to dependent charts like the router chart and controller chart. This might be easier said than done, but we're going to have to find a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krancour So I think your assumption isn't entirely accurate. I was able to test this under the following situation:
- Define
global.experimental_native_ingress
in thevalues.yaml
for workflow - Neglect to define
global.experimental_native_ingress
in thevalues.yaml
for controller - Nest the two charts together on my local filesystem in the same fashion they are hosted at charts.deis.com/workflow
kris-nova:workflow kris$ tree
.
├── Chart.yaml
├── charts
│ ├── builder
│ │ ├── Chart.yaml
│ │ ├── templates
│ │ │ ├── builder-controller-secret-key-auth.yaml
│ │ │ ├── builder-deployment.yaml
│ │ │ ├── builder-secret-ssh-private-keys.yaml
│ │ │ ├── builder-service-account.yaml
│ │ │ └── builder-service.yaml
│ │ └── values.yaml
│ ├── controller
│ │ ├── Chart.yaml
│ │ ├── templates
│ │ │ ├── controller-deployment.yaml
│ │ │ ├── controller-ingress-rules.yaml
│ │ │ ├── controller-secret-django-secret-key.yaml
│ │ │ ├── controller-service-account.yaml
│ │ │ ├── controller-service.yaml
│ │ │ └── deploy-hook-secret.yaml
│ │ └── values.yaml
And run a helm install locally
helm install . --namespace deis --set global.experimental_native_ingress=true,controller.platform_domain=fabulous.af
And a quick check of our pods will demonstrate that the flag in fact was inherited from the parent (See below where the controller was creating ingresses, thus respecting the value)
kris-nova:controller kris$ kubectl get ing --all-namespaces
NAMESPACE NAME HOSTS ADDRESS PORTS AGE
deis controller-api-server-ingress deis.fabulous.af 35.186.230.119 80 4d
fabled-lungfish fabled-lungfish fabled-lungfish.fabulous.af 80 17s
I think we should add the directive to both charts (Which is already in the latest commit), so that they can be installed independently of each other.. but even after trying a second test with the directive defined in both charts, I was able to demonstrate that the value was passed down, and the parent chart's value took precedence over the child chart.
I'm almost certain that the "parent" chart's global configuration isn't visible to this chart
A quick demonstration was able to debunk this assumption, hence why I always prefer to test things before making assumptions.
Helm Version
kris-nova:workflow kris$ helm version
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, global variables are documented here:
https://github.com/kubernetes/helm/blob/master/docs/charts.md#global-values
In short, with normal variables child charts see variables set by parent chart, but without the namespace. So for example .Values.foo.bar
is seen as .Values.bar
from foo
subchart.
The global
namespace is special, because it's not stripped when passing variables to child subcharts, also:
If a subchart declares a global variable, that global will be passed downward (to the subchart's subcharts), but not upward to the parent chart. There is no way for a subchart to influence the values of the parent chart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flake8
linting is curently failing in Travis CI with this:
$ make test
cd rootfs && flake8 --show-source
./api/models/app.py:36:1: E302 expected 2 blank lines, found 1
def is_experimental_native_ingress():
^
./api/models/app.py:42:1: E302 expected 2 blank lines, found 1
def get_experimental_native_ingress_hostname():
^
./api/models/app.py:241:100: E501 line too long (114 > 99 characters)
self._scheduler.ingress.create(ingress, namespace, get_experimental_native_ingress_hostname())
^
./api/models/app.py:684:100: E501 line too long (121 > 99 characters)
if (('response' in locals() and response.status_code == 404) or failed) and not is_experimental_native_ingress():
^
make: *** [test-style] Error 1
README.md
Outdated
```bash | ||
$ run -p 127.0.0.1:8000:8000 -it <image_id> sudo -E -u deis gunicorn -c /app/deis/gunicorn/config.py api.wsgi | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should additions such as this one be in a separate PR?
README.md
Outdated
@@ -98,6 +106,8 @@ export IMAGE_PREFIX=arschles | |||
make deploy | |||
``` | |||
|
|||
The `make deploy` target is sensitive to deltas in version control. In order for you to test local code, you will first need to `git add` and `git commit` the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question applies re: whether this should be in a separate PR. However... I have a solution for this (which I recently applied to router's make deploy
that resolves this adequately such that it doesn't even need to be documented. Let's chat about it later.
@@ -24,7 +24,7 @@ spec: | |||
serviceAccount: deis-controller | |||
containers: | |||
- name: deis-controller | |||
image: quay.io/{{.Values.org}}/controller:{{.Values.docker_tag}} | |||
image: quay.io/knova/controller-dev:canary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this change was meant to be undone?
- name: "EXPERIMENTAL_NATIVE_INGRESS" | ||
value: "{{ .Values.global.experimental_native_ingress }}" | ||
- name: "EXPERIMENTAL_NATIVE_INGRESS_HOSTNAME" | ||
value: "{{ .Values.global.hostname }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that upstream in the Workflow chart, you changed this to controller.platform_domain
, so here, it should just be {{ .Values.platform_domain }}
backend: | ||
serviceName: deis-controller | ||
servicePort: 2222 | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ingress rule end up having the intended effect? afaik, the builder speaks ssh, meaning the traffic flowing to it is raw TCP... whereas ingress only works for HTTP/S. 😢 (How I wish that weren't so.)
I think that when experimental native ingress is enabled, we have to make builder's service one of type: LoadBalancer
to compensate for the fact that deis-router won't be around handling the non-HTTP builder traffic as a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the rule form here: https://github.com/deis/router/blob/756f31e19b9e2c2d31a002e7d457c5120f65cf06/charts/router/templates/router-service.yaml#L24
Is that not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to make builder's service one of type:
LoadBalancer
I think so too. This rule won't work since it needs raw TCP and ingress doesn't understand that (and my testing shows it's not routing to builder, although controller API does work). So another PR in deis/builder is probably required for everything to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoadBalancer
in deis-builder does seem to do the trick, so this ingress rule should be removed.
backend: | ||
serviceName: deis-controller | ||
servicePort: 9090 | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this rule. afaik, the only thing that (optionally) uses 9090 is the router.
(TMI: Router is set up to guarantee PROXY protocol is never used on 9090, regardless of whether it's used on 80 and 443. This is to allow load balancers to perform HTTP-based healthchecks. Healthchecks, in an ELB at least, don't support PROXY protocol.)
Back to the matter at hand... I could be wrong, but I am pretty sure controller's service doesn't listen on 9090 at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure controller's service doesn't listen on 9090 at all.
@krancour is correct AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the rule from here https://github.com/deis/router/blob/756f31e19b9e2c2d31a002e7d457c5120f65cf06/charts/router/templates/router-service.yaml#L27
Is that not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that deis-router listens on 9090 for optional healthchecks, but I think for the experimental ingress feature we can just drop that behavior. @krancour agreed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kris-nova let's drop this ingress rule too, especially since we're planning on just getting the core (experimental) functionality out there and then filling in the gaps we find.
charts/controller/values.yaml
Outdated
# Experimental feature to toggle using kubernetes ingress instead of the Deis router. | ||
# | ||
# Valid values are: | ||
# - true: The Deis router will NOT be deployed. Inherently workflow will not be usable until a Kubernetes ingress controller is installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller chart would never install the router anyway. Maybe this sentence can be reworded to articulate controller's behavior differences when this is set-- i.e. it will create ingress resources for each app in addition to the usual "routable services."
rootfs/api/models/app.py
Outdated
@@ -240,6 +263,7 @@ def delete(self, *args, **kwargs): | |||
|
|||
try: | |||
self._scheduler.ns.delete(self.id) | |||
self._scheduler.ingress.delete(self.id, self.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the line above deletes the application's namespace, is this line redundant?
rootfs/api/models/app.py
Outdated
@@ -657,7 +681,7 @@ def verify_application_health(self, **kwargs): | |||
time.sleep(1) | |||
|
|||
# Endpoint did not report healthy in time | |||
if ('response' in locals() and response.status_code == 404) or failed: | |||
if (('response' in locals() and response.status_code == 404) or failed) and not is_experimental_native_ingress(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha. This is funny. This would wait for the app to become available, right? But if experimental ingress is enabled, the possibility exists that no ingress controller is installed... if so, apps might never respond...
@mboersma should make the call on this, but I think it might be better to leave this the way it was and instead make the documentation crystal clear that if you want to BYO ingress controller, you need to have that installed before you can rightfully expect Workflow to start working.
I'm just worried about what happens if we totally give up this idea of waiting for the app to be up and responding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the documentation crystal clear that if you want to BYO ingress controller, you need to have that installed before you can rightfully expect Workflow to start working.
I think that's a reasonable approach for an experimental feature. We can change it if this is tripping users up.
I think something that is missing here is that when custom / vanity domains are either added or removed from an application, the corresponding ingress resources also need to be added or removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an error when trying to install this according to the instructions in deis/workflow#732:
$ helm install ./workflow --namespace deis --set global.experimental_native_ingress=true,controller.platform_domain=192.168.99.100.nip.io
Error: render error in "workflow/charts/controller/templates/controller-deployment.yaml": template: workflow/charts/controller/templates/controller-deployment.yaml:65:32: executing "workflow/charts/controller/templates/controller-deployment.yaml" at <.Values.controller.p...>: can't evaluate field platform_domain in type interface {}
I had to remove controller
from the EXPERIMENTAL_NATIVE_INGRESS_HOSTNAME value for things to install: "{{ .Values.platform_domain }}"
backend: | ||
serviceName: deis-controller | ||
servicePort: 2222 | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to make builder's service one of type:
LoadBalancer
I think so too. This rule won't work since it needs raw TCP and ingress doesn't understand that (and my testing shows it's not routing to builder, although controller API does work). So another PR in deis/builder is probably required for everything to work.
backend: | ||
serviceName: deis-controller | ||
servicePort: 9090 | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that deis-router listens on 9090 for optional healthchecks, but I think for the experimental ingress feature we can just drop that behavior. @krancour agreed?
rootfs/api/models/app.py
Outdated
""" | ||
Will determine the value of the experimental_native_ingress flag | ||
""" | ||
is_ingress = os.getenv('EXPERIMENTAL_NATIVE_INGRESS', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could be simplified:
return bool(os.getenv('EXPERIMENTAL_NATIVE_INGRESS'))
rootfs/api/models/app.py
Outdated
|
||
def get_experimental_native_ingress_hostname(): | ||
""" | ||
Will return the experimental native ingress platform_domain value, or an empty string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring doesn't match the code. Instead maybe:
"Returns the experimental native ingress hostname or raises ServiceUnavailable if it's not set."
rootfs/api/models/app.py
Outdated
""" | ||
Will return the experimental native ingress platform_domain value, or an empty string | ||
""" | ||
host = os.getenv('EXPERIMENTAL_NATIVE_INGRESS_HOSTNAME', "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a little simpler to test for truthy/falsey rather than an empty string explicitly IMHO:
host = os.getenv('EXPERIMENTAL_NATIVE_INGRESS_HOSTNAME')
if not host:
raise ServiceUnavailable("Unable to find hostname")
return host
I wasn't able to do a |
Is that a requirement for the core feature here? Or can we try to get that in in a follow up PR after the feature goes out? Can I steal some of your time to talk about this builder PR you think we need? |
On the note of unit tests here are my concerns after looking at our dev cycle.
Traceback (most recent call last):
File "/app/manage.py", line 12, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.5/dist-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.5/dist-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.5/dist-packages/django/core/management/commands/test.py", line 29, in run_from_argv
super(Command, self).run_from_argv(argv)
File "/usr/local/lib/python3.5/dist-packages/django/core/management/base.py", line 286, in run_from_argv
parser = self.create_parser(argv[0], argv[1])
File "/usr/local/lib/python3.5/dist-packages/django/core/management/base.py", line 260, in create_parser
self.add_arguments(parser)
File "/usr/local/lib/python3.5/dist-packages/django/core/management/commands/test.py", line 56, in add_arguments
test_runner_class = get_runner(settings, self.test_runner)
File "/usr/local/lib/python3.5/dist-packages/django/test/utils.py", line 146, in get_runner
test_module = __import__(test_module_name, {}, {}, force_str(test_path[-1]))
ImportError: No module named 'api.tests' So my proposal here is that I go ahead and fix the concerns, but do it in another PR. Right now the Controller documentation is pretty vague, and requires the user to install a lot of noisy things on their workstation to run the tool. Frankly this seems wrong, and I am not to keen on running apps like this directly on my workstation. I would like to dockerize the app so we can run tests easily But that seems like a different PR. Option 1) Hold the experimental ingress things on the new test PR Thoughts? |
We actually just run all the unit tests locally since they just rely on postgres, but I can understand the point on it being a cliff to get set up and that it requires installing tools on local workstations. We've been pretty spoiled with Docker packaging all the things. 📦 For what it's worth, here's my local setup:
I'm probably missing a few things, but that should be all that's required to get unit tests running locally and get hacking. I'll leave it to @mboersma to make the call, but I'm happy with deferring tests if we're in a rush to get this in. |
Part of the "Submitting a Pull Request" doc is "Include Tests" (and docs). So the goal is for any PR to document and test itself before it gets merged. We can of course be flexible, and merge this with a promise to backfill the unit tests immediately afterward. I just worry that could be derailed or delayed--you never know what might come up next week. And generally we won't merge anything that's going to drop controller's unit test coverage noticeably, which this PR probably will without its own unit tests. So I would rather hold these PRs out until this one has unit tests. I am happy to help with that! |
It's not that I have anything about writing tests, I just honestly can't run the tests without putting a large amount of work in as it stands right now. So it just seems like getting the tests in general up to a more useable state is another effort, which naturally feels like another PR. I am hard pressed to alter my workstation just to run a unit test suite. We really should have ways of running it in the cloud, or at least in a containerized environment. So my point here was mostly
and not
Does that make sense? I guess I don't really know what I am asking here. Do I need a blessing to do this or can I just do this? |
So true. We do |
This PR is currently failing because of some minor @kris-nova once this is passing, if I'm still working on the unit test refactor then perhaps we can merge this anyway and backfill the tests as the next task. I wouldn't usually recommend that, but this is an experimental feature and we want to get it out there for feedback soon. |
Agreed. Considering that the dockerized controller stuff is a blocker for unit tests, is this good to review again? |
@kris-nova sorry about that! The fix to e2e is in deis/e2e-runner#92. |
@bacongobbler sorry you will have to bear with me here. What do you mean fix? Was there something amiss in Basically I am just trying to get this PR merged in ASAP, and am wondering what the next steps are 🤷♀️ |
@bacongobbler ohh just kidding - the jenkins build passed 🎉 |
I can help shed a bit of light on what's going on in e2e. Basically, e2e-runner is the core component that runs all e2e jobs. It defines what happens during an e2e run. Before deis/e2e-runner#90 was merged, we would gather pod logs and delete the deis namespace, purging the cluster and returning it to a "clean" slate for the next job. Now, when deis/e2e-runner#90 was merged, we accidentally removed that cleanup step from the run_e2e script so what you saw here was an e2e run on a cluster that already ran through e2e, but was never cleaned up. Because the cluster was in a dirty state from the beginning of the run, it caused the internal server errors from the controller's point of view. I'm assuming @vdice went ahead and cleaned up the k8s clusters' states which allowed your run to go green, but now the cluster is in a dirty state once again. It will likely need to be cleaned up once more after deis/e2e-runner#92 is merged so we can get everything back to normal. Does that help clear things up? |
@@ -191,6 +191,7 @@ def create(self, *args, **kwargs): # noqa | |||
|
|||
# create required minimum resources in k8s for the application | |||
namespace = self.id | |||
ingress = self.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to happen in this PR, but a neat python trick is to set these all to self.id in one shot:
namespace = ingress = service = self.id
As shown here:
><> python3
Python 3.5.2 (default, Aug 8 2016, 12:14:07)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = b = c = 3
>>> a
3
>>> b
3
>>> c
3
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I blame my time spent working in Scala, but I have never been a fan of syntactical sugar very much. I often find it introduces complexity, without changing the behavior of the program. Just my 2 cents 😄
rootfs/bin/test
Outdated
@@ -0,0 +1,3 @@ | |||
#!/bin/bash | |||
|
|||
python /app/manage.py test ingress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed in favour of #1049. Is this something required for this PR or would you agree?
Indeed, looks like e2e failed once again. I think we can consider this OK though since we've already seen it go green once. |
Adding ingress support to controller Adding changes to chart to support feature Adding ingress rules for controller API, and SSH on TCP 2222 Adding ingress support in general Requires deis/workflow#732 Requires deis/builder#495 Requires deis/router#316 Technically a non breaking change, as the user must opt-in to the feature
Instead of raising HTTP exception, lets raise ServiceUnavailable Non breaking change
Jenkins, test this please. |
Replaced by #1274. |
Work for experimental native ingress. Run the documentation in deis/workflow#732 to see how to use the new feature.
This PR handles the changes necessary to the controller to start managing Kubernetes ingresses
Requires deis/workflow#732
Requires deis/builder#495
Requires deis/router#316