-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable debugger support for containers #315
base: master
Are you sure you want to change the base?
Conversation
Thanks a lot for working on enabling @jld could you please confirm that adding @rocallahan does adding |
HostConfig: { | ||
CapAdd: ['SYS_PTRACE'], | ||
PortBindings: {}, | ||
SecurityOpt: ['seccomp=unconfined'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about seccomp
, but isn't unconfined
a bit drastic? Couldn't we instead tweak the seccomp
profile (a.k.a. the seccomp
policy JSON file) to enable just the required syscalls and nothing more, as suggested in #232 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptrace allows seccomp to be bypassed, so constraining anything doesn't improve security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptrace allows seccomp to be bypassed
That used to be true but I don't think it is anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're right; the hole was patched for each architecture and got testing with torvalds/linux@58d0a86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sum up: I don't see any significant security issues with unconfined
, and maintaining a policy file is hard.
Yes. |
Before we move forward with this, I'd like to be certain that adding Ping @jld ? |
I'd really like to find a solution where we only enable the syscalls actually needed for The default {
"names": [
"kcmp",
"process_vm_readv",
"process_vm_writev",
"ptrace"
],
"action": "SCMP_ACT_ALLOW",
"args": [],
"comment": "",
"includes": {
"caps": [
"CAP_SYS_PTRACE"
]
},
"excludes": {}
} Am I interpreting these correctly if I conclude that adding @Martiusweb speculates that My preferred outcome would be to fork the default |
That should work. We don't need |
So here are the syscalls used by
So I conclude that to make
(I have no idea what |
I don't think you need to worry about Don't worry about 446 (and rr's other internal syscalls, 442 to 447). rr uses them internally and I think blocking them via seccomp will not affect rr's usage. |
Though you might need to whitelist them. |
Awesome, thanks @rocallahan for clarifying! I think we can reasonably tweak Docker's default |
There is a large obstacle here: passing custom seccomp profile requires a file on the Docker host, but currently we operate remotely without an agent. |
Well, we do have a Janitor clone on each Docker host, which we use to properly configure Docker (and also run an authenticated Docker proxy). Maybe we can commit a custom |
@jankeromnes has there been any movement on this? Having rr support would be fantastic, but what I currently really need is the ability to set breakpoints in gdb, which unfortunately also requires |
@tschneidereit Hello! We can already add Unfortunately, we're still blocked on creating a fork of Docker's default seccomp profile which just enables these two additional syscalls:
Note: This pull request would indeed allow |
hi @jankeromnes , this is the most relevant resource on the nets on how safe SYS_PTRACE is for containers, or so it seems. Kubernetes indeed allows to manually allow / apply any capabilities to either profiles or directly to pods/containers, bypassing naughty Docker. But, is there a definitive answer on the safety of allowing SYS_PTRACE for all containers of a cluster? Are there any security risks? Also, is there any difference on CAP_SYS_PTRACE and SYS_PTRACE? |
Hi @nikopen,
Hm, that's quite unfortunate, as it seems nobody here really understands the security implications of enabling these capabilities and syscalls in third-party controlled containers on shared infrastructure.
No, I believe these are the same thing (i.e. the capability or unit of Linux superuser privileges called "PTRACE").
Thank you for this information. I guess this means that Kubernetes disables Docker's seccomp because it already has its own permission checking mechanism (presumably with similar "safe defaults" regarding allowed or forbidden capabilities, in which case I'd love to know if they allow the
Unfortunately, we don't have a definitive answer for this. From my (limited) understanding, it seems that in the past, someone made a proof of concept using the PTRACE capability to "escape" a Docker container. I also believe (without proof) that this vulnerability has since been fixed, because the security of Docker's unprivileged containers has been improved, and I'm not aware of any recent PTRACE-related container escapes. However, I could be awfully wrong here, and I'd love to hear from someone more knowledgeable about this. |
Indeed, k8s disables Docker's seccomp, and it can be re-enabled via a specific annotation. The default ServiceAccount on an unconfigured 'vanilla' k8s installation is very open - allows pretty much everything, and then granular capabilities can be defined either via securityContext in a pod/deployment or via PodSecurityPolicies, to have unprivileged pods but allowing SYS_PTRACE et cetera.
That's interesting, is it related with this commit? torvalds/linux@58d0a86 |
As the commit message says, in the past a process could use ptrace() to negate any seccomp() policy imposed on it, by changing the syscall number after the seccomp policy had run. Therefore if you imposed any seccomp policy at all, you had to block ptrace() as well. That particular issue is fixed since 4.8 so that is no longer a reason to block ptrace() on those kernels. Another reason to block ptrace() is that it's just a really complicated API which provides a considerable amount of attack surface through which an attacker might be able to exploit zero-day kernel bugs. It makes sense that if your containers don't need ptrace(), you should block it. This will always be true I guess. I don't know of any bugs or features of ptrace() that enable container escapes or other security policy violations. I think if any were known, they would be treated as security bugs and fixed. At this point I think it should be treated like any other API that is thought to be safe but adds significant attack surface. |
Update: Docker's default seccomp profile now seems to authorize the {
"names": [
"ptrace"
],
"action": "SCMP_ACT_ALLOW",
"args": null,
"comment": "",
"includes": {
"minKernel": "4.8"
},
"excludes": {}
}, (Source.) If my interpretation is correct, this means that in order to allow the However, as far as I know, you still need Also, a custom |
(untested)