Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax /sys/dev/block restrictions for volumes and devices #13708

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

grooverdan
Copy link
Contributor

User space programs want to access information about the block
devices they are operating on. E.g. the block size is an important
aspect if doing O_DIRECT filesystem calls.

On the other hand, rhbz#1772993 wants to keep the host information
as hidden from the container running processes as possible.

We expose only the volumes and devices that are mounted into the
container by re-generating the symlinks in /sys/dev/block using
temporary volume.

Closes containers/common#2277

$  bin/podman run --rm --device=/dev/loop0:/dev/loop0:rwm   -v .:/vol --security-opt=unmask=ALL debian:latest ls -la /sys/dev/block /dev/loop0
ls: cannot access '/dev/loop0': Permission denied
/sys/dev/block:
total 0
drwxr-xr-x. 2 nobody nogroup 0 Mar 30 04:32 .
drwxr-xr-x. 4 nobody nogroup 0 Mar 30 04:32 ..
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 252:0 -> ../../devices/virtual/block/zram0
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 253:0 -> ../../devices/virtual/block/dm-0
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 253:1 -> ../../devices/virtual/block/dm-1
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 253:2 -> ../../devices/virtual/block/dm-2
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 253:3 -> ../../devices/virtual/block/dm-3
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 259:0 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 259:1 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 259:2 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 259:3 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p3
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:0 -> ../../devices/virtual/block/loop0
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:1 -> ../../devices/virtual/block/loop1
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:2 -> ../../devices/virtual/block/loop2
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:3 -> ../../devices/virtual/block/loop3
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:4 -> ../../devices/virtual/block/loop4
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:5 -> ../../devices/virtual/block/loop5
lrwxrwxrwx. 1 nobody nogroup 0 Mar 30 04:32 7:6 -> ../../devices/virtual/block/loop6

$  bin/podman run --rm --device=/dev/loop0:/dev/loop0:rwm   -v .:/vol --security-opt=mask=/sys/dev/block debian:latest ls -la /sys/dev/block /dev/loop0
ls: cannot access '/dev/loop0': Permission denied
/sys/dev/block:
total 0
drwxrwxrwt. 2 root   root    40 Mar 30 04:33 .
drwxr-xr-x. 4 nobody nogroup  0 Mar 30 04:33 ..

$  bin/podman run --rm --device=/dev/loop0:/dev/loop0:rwm   -v .:/vol  debian:latest ls -la /sys/dev/block /dev/loop0
ls: cannot access '/dev/loop0': Permission denied
/sys/dev/block:
total 4
drwxr-xr-x. 2 root   root    4096 Mar 30 04:33 .
drwxr-xr-x. 4 nobody nogroup    0 Mar 30 04:33 ..
lrwxrwxrwx. 1 root   root      32 Mar 30 04:33 253:3 -> ../../devices/virtual/block/dm-3

First POC for review. Excuse any poor golang style, its been years since I touched it.

TODO:

  • tests
  • docs
  • remove volume, especially if we didn't put anything on it.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2022
@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2022
@rhatdan
Copy link
Member

rhatdan commented May 16, 2022

Please rebase

@rhatdan
Copy link
Member

rhatdan commented May 16, 2022

LGTM after rebase.

@github-actions github-actions bot removed the stale-pr label May 17, 2022
@grooverdan
Copy link
Contributor Author

it still leaks anon volumes. I've got a tmpfs based rework almost ready. Will push in the next few days.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jun 17, 2022

Needs a rebase. or to be closed?

@github-actions github-actions bot removed the stale-pr label Jun 18, 2022
@vrothberg
Copy link
Member

Friendly ping @grooverdan

@grooverdan
Copy link
Contributor Author

I got a lot closer today https://github.com/grooverdan/podman/tree/expose-sys-dev-block-12746_v2 but progressing faster with a working dlv.

I've got it generating:

$ ls -la /run/user/1000/containers/overlay-containers/e6426c823d7b7e8f1a37458fdabe329b6deb1bd0ba11cfb33d6e7909d37ea57d/userdata/sysdevblock
total 0
drwx------. 2 dan dan 60 Jul  7 17:11 .
drwx------. 3 dan dan 60 Jul  7 17:11 ..
lrwxrwxrwx. 1 dan dan 32 Jul  7 17:11 253:3 -> ../../devices/virtual/block/dm-3

Correctly, just need to complete the bind mounts in the startup.

@grooverdan grooverdan force-pushed the expose-sys-dev-block-12746 branch from 141ce9a to 0c71250 Compare July 13, 2022 08:14
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2022
@grooverdan
Copy link
Contributor Author

Basic working:

$ bin/podman run -v /home/dan/out:/var/lib/mysql:z --device /dev/loop1 --device /dev/loop2 -it mariadb:10.7 ls -la /sys/dev/block
total 0
drwx------. 2 root   root    100 Jul 13 08:14 .
drwxr-xr-x. 4 nobody nogroup   0 Jul 13 08:14 ..
lrwxrwxrwx. 1 root   root     32 Jul 13 08:14 253:3 -> ../../devices/virtual/block/dm-3
lrwxrwxrwx. 1 root   root     33 Jul 13 08:14 7:1 -> ../../devices/virtual/block/loop1
lrwxrwxrwx. 1 root   root     33 Jul 13 08:14 7:2 -> ../../devices/virtual/block/loop2

@grooverdan grooverdan marked this pull request as ready for review July 13, 2022 08:15
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2022
ctr.config.Spec.Mounts = ctr.config.Spec.Mounts[:len(ctr.config.Spec.Mounts)-1]
}
}
ctr.state.BindMounts[sysDevBlock] = dir // ??????????????
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to removing comments here and formatting:

Is moving from a Spec to a BindMount here the right way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mheon PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe. It may work for the first time the container is run, but it will be cleared for subsequent runs. Bind mounts are regenerated each time the container is run - they're only meant to handle files Podman creates and manages itself. Why do you want to use them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bind mount is for the symlinks that Podman creates itself. This seemed like a good location for a few symlinks derived from the device and volume mounts passed from the user in a place which I assume is auto-cleaned up.

$ ls -la /run/user/1000/containers/overlay-containers/cc8ca97825f48662209987fc11c34ebbea6a7c7a5ce39dde9a17cff036794b5b/userdata/sysdevblock
total 0
drwx------. 2 dan dan 100 Jul 14 17:09 .
drwx------. 4 dan dan 220 Jul 14 17:09 ..
lrwxrwxrwx. 1 dan dan  32 Jul 14 17:09 253:3 -> ../../devices/virtual/block/dm-3
lrwxrwxrwx. 1 dan dan  33 Jul 14 17:09 7:1 -> ../../devices/virtual/block/loop1
lrwxrwxrwx. 1 dan dan  33 Jul 14 17:09 7:2 -> ../../devices/virtual/block/loop2

I'm effectively using the blkMnt created in SpecGenToOCI (where visibility of Mask/unmask options exists) as a communication mechanism (was there another better var available?) to here, setupContainer, where ctr.state.RunDir exists so I can create the above dir, and makeBindMounts to mount it like resolv.conf/hosts etc (maybe c.bindMountRootFile should have been used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so this is a Podman-managed mount - this does make sense, then. You should move to makeBindMounts, though. We do not guarantee stability of ctr.state.RunDir (practically speaking it should not change, but that's not a guarantee) so we force regeneration of all bind mounts after a fresh reboot - doing this in makeBindMounts accounts for that possibility, and ensures that your new symlinks aren't wiped from the DB on fresh boot.

@grooverdan grooverdan force-pushed the expose-sys-dev-block-12746 branch 3 times, most recently from 7e415ea to cc20c2c Compare July 15, 2022 13:16
@grooverdan
Copy link
Contributor Author

Basic golang question:
how to satisfy https://cirrus-ci.com/task/4599929409634304
and https://github.com/containers/podman/pull/13708/checks?check_run_id=7357415851

Is adding to the runtime spec the way to preserve the state?

@mheon
Copy link
Member

mheon commented Jul 15, 2022

The OCI spec is saved to the DB, so that's certainly a way to do things. Generally, anything in c.config is persisted through the DB, anything in c.state is persisted for a single boot only. The OCI spec is one of the more prominent parts of c.config. The current approach in the PR of using c.config.Spec.Linux.SysDevBlock seems fine.

@grooverdan
Copy link
Contributor Author

I don't suppose there's a way to get the CI here to run without pulling over the generated file vendor/github.com/opencontainers/runtime-spec/specs-go/config.go?

While I prepare a OCI spec update PR (if that the right way to go still) can I get help with:

How to resolve mips having a uint32 type for Stat.Dev

libpod/util_linux.go:149:30: cannot use statT.Dev (variable of type uint32) as type uint64 in argument to unix.Major
libpod/util_linux.go:149:53: cannot use statT.Dev (variable of type uint32) as type uint64 in argument to unix.Minor
libpod/util_linux.go:152:30: cannot use statT.Rdev (variable of type uint32) as type uint64 in argument to unix.Major
libpod/util_linux.go:152:54: cannot use statT.Rdev (variable of type uint32) as type uint64 in argument to unix.Minor

And golangci-lint objecting to casting it to uin64.

libpod/util_linux.go:149:36: unnecessary conversion (unconvert)
		major, minor := unix.Major(uint64(statT.Dev)), unix.Minor(uint64(statT.Dev))
		                                 ^
libpod/util_linux.go:152:36: unnecessary conversion (unconvert)
			major, minor = unix.Major(uint64(statT.Rdev)), unix.Minor(uint64(statT.Rdev))
			                                ^
make: *** [Makefile:238: golangci-lint] Error 1

Is there a way to construct this small bit of code to keep these two compilers happy?

@giuseppe
Copy link
Member

giuseppe commented Aug 8, 2022

The implementation seems to be in Podman. Why do we need a change to the OCI runtime specs?

@grooverdan grooverdan force-pushed the expose-sys-dev-block-12746 branch from cc20c2c to 5b29214 Compare August 9, 2022 01:51
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grooverdan
Once this PR has been reviewed and has the lgtm label, please assign lsm5 for approval by writing /assign @lsm5 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@grooverdan
Copy link
Contributor Author

The implementation seems to be in Podman. Why do we need a change to the OCI runtime specs?

I'm trying to following @mheon's advice to store the state in c.config. I've reworked it so it isn't in the spec part of the config.

User space programs want to access information about the block
devices they are operating on. E.g. the block size is an important
aspect if doing O_DIRECT filesystem calls.

On the other hand, rhbz#1772993 wants to keep the host information
as hidden from the container running processes as possible.

We expose only the volumes and devices that are mounted into the
container by re-generating the symlinks in /sys/dev/block for the
block devices that have host based symlinks. These are generated on
ctr.state.RunDir/sysdevblock as a mountpoint and mounted ro into the
container.

The default visibility can changed by the user with
--security-opt={u,}mask=/sys/dev/block

Consolidate the libpod.mountBind implementation.

Closes #12746

Signed-off-by: Daniel Black <[email protected]>
@grooverdan grooverdan force-pushed the expose-sys-dev-block-12746 branch from 5b29214 to 190b6e2 Compare August 9, 2022 08:12
Signed-off-by: Daniel Black <[email protected]>
@grooverdan grooverdan force-pushed the expose-sys-dev-block-12746 branch from 190b6e2 to f8f8276 Compare August 9, 2022 08:13
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2022
@openshift-merge-robot
Copy link
Collaborator

@grooverdan: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@giuseppe
Copy link
Member

@grooverdan are you still working on this?

@grooverdan
Copy link
Contributor Author

I got really stuck on the checkpoint/restore and how it was meant to be implemented. If you feel like taking over you have my heart felt gratitude, otherwise I do hope to return to it within 6 months.

@giuseppe
Copy link
Member

could containers/common#2278 replace this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove mask on /sys/dev/block
6 participants