Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls certificate issuer and updater #12

Open
balupton opened this issue Apr 29, 2018 · 6 comments
Open

tls certificate issuer and updater #12

balupton opened this issue Apr 29, 2018 · 6 comments
Milestone

Comments

@balupton
Copy link
Member

balupton commented Apr 29, 2018

Summary:

Generating the nomad certs on origin did not work as the nomad machines would then have certs which did not include their private_ips in the certs ip_sans, which would cause the cert to be rejected from the local instance.

I then tried to generate the nomad certs on the nomad machines. That fixes the ip_sans issue, but then prevents nomad to nomad communication as each nomad instance then has different certs.

Solving this seems to require a certificate issuer and update service.

Possible Solutions:

Local polling + local issuance:

  1. Create a poll service on each machine that polls a vault secret (that contains issued pki combo json) every 30 seconds, if there is a change, then reconfigure the local nomad service.
  2. When a new nomad service is required, append another vault secret with the new private_ip, then generate a new pki combo with all the private_ips from the earlier secret, put that combo json into the secret at step 1.
  3. To setup the vault secrets, vault policies and tokens would need to be created for the polling and writing requirement. Or just use the cluster_token in memory.

Developer issuance in pre:

  1. For each new server that was just issued but not yet configured, the terraform script then remotes into existing services and updates their TLS cert to include the new server's private_ip.
  2. Generation of the PKI bundle could occur locally or on origin, then propagated.

Developer issuance in post:

  1. All services have TLS off at the start
  2. Then once all servers are deployed and running, remote into origin, generate the certs containing all their private_ips, then remote into each server and inject the cert, and reconfigure their services.

Abandon local TLS entirely for Cloudflare Argo Tunnel:

  1. Cloudflare Argo Tunnel only allows connections from cloudflare servers and users you give access to via Cloudflare Access. Argo Tunnel also encrypted all traffic by generating a local certificate on the machine that then interfaces with the Cloudflare endpoint. Accomplished by add cloudflare's argo tunnel #8

Assessment:

Local polling allows short TTL on local TLS. Accomplishes #4

Local polling AND dev issuance in pre, would involve reloading for all existing servers, when each new server added.

Dev issuance in post, would involve reloading for all servers, but only once in post.

Reloading may induce downtime if not timed to be simultaneous.

Conclusion:

Argo Tunnel should be explored. It could turn out to be easiest and most secure. And may turn out to be able to be used with service TLS.

At a later point, implement service TLS. It would require 1-3 weeks by estimate to get the options for it going.

@balupton balupton added this to the Local TLS milestone Apr 29, 2018
balupton added a commit that referenced this issue Apr 29, 2018
Summary at #12

Changes:

- streamlined service installation so that services can now be installed, but not as a system service (such as vault exec on nomad machine)
- better logging when a service fails to start
- nomad cert generation now happens on the nomad machines
- allow for parallel deployment by no longer using ../data locally
- gossip keys now happen via configuration and keyring, as it seemed they would get out of sync on restarts otherwise
- fixed numerous firewall issues, described in #11
- vault_env now supports foreign vault locations, as well as outputs the token it is using
- added missing vault port 8201 to firewall rules
- added server tags without type
- disabled security group rules as they were not needed, local firewalls seem better
- add cloudflare dns servers to resolv.conf on consul installations
- curl no longer outputs progress bars
@balupton
Copy link
Member Author

If I install a nomad agent on origin and the masters, then I could use nomad jobs to:

  1. run the poller
  2. generate the certs on the appropriate hosts (perhaps consul will give the ips needed then)

@balupton
Copy link
Member Author

Seems https://github.com/hashicorp/consul-template/blob/master/README.md is the official answer, even includes a vault cert gen example

@balupton
Copy link
Member Author

Two recent progressions to make this easier.

Progression One

Generating the nomad certs on origin did not work as the nomad machines would then have certs which did not include their private_ips in the certs ip_sans, which would cause the cert to be rejected from the local instance.

As vault 0.10.3 supports

URI SANs in PKI: You can now configure URI Subject Alternate Names in the pki backend. Roles can limit which SANs are allowed via globbing.

Found via https://www.vaultproject.io/api/secret/pki/index.html#uri_sans-1 and https://www.vaultproject.io/api/auth/cert/index.html#allowed_uri_sans

Then perhaps this issue can now be worked around, rather than implementing Consul Template.

Consul Template does offer the advantage of short lived certificates that can update on the fly, but at the expense of a lot more complexity.

Progression Two

Consul 1.2 introduces a new feature called Consul Connect, which automatically provides TLS for Consul Services (not consul, vault, and nomad themselves).

However, in the docs for its various features, it includes these hints:

https://www.consul.io/docs/guides/connect-production.html

Configure Agent Transport Encryption

Consul's gossip (UDP) and RPC (TCP) communications need to be encrypted otherwise attackers may be able to see ACL tokens while in flight between the server and client agents (RPC) or between client agent and application (HTTP). Certificate private keys never leave the host they are used on but are delivered to the application or proxy over local HTTP so local agent traffic should be encrypted where potentially untrusted parties might be able to observe localhost agent API traffic.

Follow the encryption documentation to ensure both gossip encryption and RPC/HTTP TLS are configured securely.

For now client and server TLS certificates are still managed by manual configuration. In the future we plan to automate more of that with the same mechanisms Connect offers to user applications.

https://www.consul.io/docs/connect/platform/nomad.html

Connect on Nomad

Connect can be used with Nomad to provide secure service-to-service communication between Nomad jobs and task groups. The ability to use the dynamic port feature of Nomad makes Connect particularly easy to use.

Using Connect with Nomad today requires manually specifying the Connect sidecar proxy and managing intentions directly via Consul (outside of Nomad). The Consul and Nomad teams are working together towards a more automatic and unified solution in an upcoming Nomad release.

Which hopefully means that HashiCorp are working on a way to make TLS automatic, not just for Consul services which Consul Connect already supports, but also for the HashiSuite itself.

Relevant links:

Conclusion

With these developments, then it seems that a combination of

  1. Consul Connect (for TLS on services/apps)
  2. and; Long-lived certificates that have URI SANs (for TLS on HashiSuite)

Should be the missing pieces for a TLS enabled cluster with minimum complexity for the current day.

If option (2) proves to not work, then Consul Template will be required for this use case. However, Consul Template for that use case has limited life expectancy, as it seems HashiCorp are working to provide a automated and builtin alternative. As such, if (2) fails, then the options are:

  1. Implement Consul Template for HashiSuite encryption
  2. Wait for HashiCorp to provide their updates to their suite

If we do Consul Template, then in a few months (or years) later, we would end up having to upgrade to the automated updates anyway, moving away from Consul Template. As such, my thinking is if the URI SANs option fails, then just proceed without HashiSuite TLS encryption in the meantime until the updates to occur.

@ghost
Copy link

ghost commented Jul 19, 2018

Is the intent to be able to run the equivalent of Argo Tunnels over TLS due to the fact that Ip addresses are not static ?

Your solution is something i have also been thinking about because i am in a non static IP address environment. The new Consul Connect looks interesting. Think a hacky setup should be tried to see if its works

@balupton
Copy link
Member Author

Seems Hasicorp is finally working to make this easier.

These would be essential reading for anyone who wants to continue this work.

@balupton
Copy link
Member Author

balupton commented Nov 7, 2019

There is now consul connect support in nomad 0.10 which seems to also assist with this

https://www.hashicorp.com/blog/consul-connect-integration-in-hashicorp-nomad/

https://www.consul.io/docs/connect/index.html

https://www.consul.io/docs/connect/ca/vault.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant