Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: IMA Namespace support #3639

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

IlyaHanov
Copy link

The Linux kernel community is now working on supporting IMA namespaces
and it is almost done. This is a new kernel feature that allows isolation of Platform Configuration Register (PCR) values, Measurement Logs (ML), etc. So this patch presents an initial support for IMA
namespaces. The related issue is #3638.

In the Linux kernel IMA namespace is created via securityFS instead of
using clone() or unshare() system calls, e.g.:

PATH=/bin unshare --user --map-root-user --mount-proc --pid --fork --root rootfs busybox sh -c \
     "busybox mount -t securityfs /mnt /mnt; \
      busybox echo 1 > /mnt/ima/active; \
      busybox ls -la /mnt/ima/; \
      busybox cat /mnt/ima/vpcr; \
      busybox echo 'dont_measure fsmagic=0x43415d53' > /mnt/ima/policy; busybox echo ''; \
      busybox cat /mnt/ima/policy;

It is important to note that IMA namespace has no CLONE_* or other
system call flags, therefor "Non-clone flags" was presented.
Sure, this does not look like a good architecture solution, we can
discuss it anyway.

Another important side is that IMA namespace has no procFS entry, so
runC needs to know how to check IMA namespace, e.g. during validation processes.
So, the solution here is to redirect this check to USER namespace, but because runC is not
aware of namespaces which are not in procFS this solution looks buggy too, we should discuss it as well.

Now, IMA namespace creation scheme looks like:

+----------------------------+
| mount temporary securityFS |
+----------------------------+
             |
+----------------------------+
| write("ima/active", '1', 1)|
+----------------------------+
             |
+----------------------------+
|unmount temporary secuirtyFS|
+----------------------------+

1) For temporary securityFS current container's rootFS
/mnt directory has been chosen, so there're none conflicts with a Host File System because
container's rootFS is not used now, and nothing is mounted there. The
actual path for the mount directory is ".../bundle/rootfs/mnt" for a
container.

2) According to the Linux kernel's security/integrity/ima/ima_fs.c:ima_write_active()
it is only allowed to write '1', '1\0' or '1\n' to create IMA
namespace. The actual path to so-called "active" file is "securityfs/integrity/ima/active".

3) The last step is just to unmount this temporary securityFS.

4) If some of these steps fail runC will bail with appropriate error
message.

As already mentioned IMA namespace must be created in a new USER
namespace, so the order is important for USER namespace, it should be
unshared first, but this should not affect the creation's order of
remain namespaces. Now IMA namespace created after unsharing USER
namespace and other namespaces except Cgroup namespace. We can discuss
it anyway.

Here is an example of creating IMA namespace:

1) First off, configuration file is required
		$ runc spec --rootless

2) In configuration file config.json it is needed to add securityfs as a
mount point to be able to validate that IMA namespace was created
successfully. "ima" namespace type must also be specified, e.g.:

		...
                    "mounts": [ 
                        ...
		    {
	                    "destination": "/sys/kernel/security",
                            "type": "securityfs",
			    "source": "securityfs",
			    "options": [
				    "nosuid",
				    "nodev",
				    "noexec",
				    "relatime",
				    "rw"
			    ]
		    },
                        ...
                    ],
		...
		"namespaces": [
			...
			{
				"type": "user"
			},
			{
				"type": "ima"
			}
		],
		...

3) Use runc create and runc start (or runc run) as usual to create and then
communicate to the container.

4) To validate the IMA namespace was created successfully, check inside a container:
    # cat /sys/kernel/security/ima/active
    1

Since this is RFC unit tests and documentation weren't touched.

Signed-off-by: Ilya Hanov [email protected]

Advanced Software Technology Lab
Huawei

@AkihiroSuda AkihiroSuda marked this pull request as draft October 24, 2022 15:33
@@ -82,10 +91,23 @@ type Namespace struct {
Path string `json:"path"`
}

// GetPath() concatenates given pid and namespace type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GetPath returns a procfs path for the given POD and namespace type.

@@ -82,10 +91,23 @@ type Namespace struct {
Path string `json:"path"`
}

// GetPath() concatenates given pid and namespace type
// into procfs path. A few namespaces may not have procfs entry,
// call HasProcfs() first to make sure this namespace has one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/HasProcfs()/HasProcfs/

func (n *Namespace) GetPath(pid int) string {
return fmt.Sprintf("/proc/%d/ns/%s", pid, NsName(n.Type))
}

// HasProcfs() checks if the given namespace has procfs entry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/HasProcfs() checks if/HasProcfs tells whether/

// These flags must be the same according to
// libcontainer/nsenter/namespace.h
var namespaceNonCloneInfo = map[NamespaceType]int{
NEWIMA: 0x80000000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. cross-i386 CI job complains:

cannot use 0x80000000 (untyped int constant 2147483648) as int value in map literal (overflows)

So I guess we have to use explicit uint64 here.

  1. I'd rather not use a map here, but write a function with a switch statement, somewhat similar to HasProcfs above, mostly because a map initialization incurs some init overhead, and in case of a single element it is not worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe uint32, or uint (depending on what type the kernel is actually using for this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and modify bootstrapData (both sides) and data structures accordingly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on what type the kernel is actually using for this

The kernel uses nothing for this (as I've written there's no CLONE_* or any other similar flags for IMA). This value (0x80000000) is just the next bit after CLONE_NEWNET. runC can decide to use any flag or even not use a flag.

I'd rather not use a map here, but write a function with a switch statement

Interesting idea, I'll try to modify code this way.

Thanks!

return errors.New("IMA namespace cannot be created without USER namespace")
}

if config.Namespaces.Contains(configs.NEWUSER) || config.Namespaces.Contains(configs.NEWIMA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for NEWIMA here is redundant, as we error out above if NEWIMA is set and NEWUSER is not.

Also note that Contains is kinda expensive since it does a linear search, so try to optimize the code to not call it 4 times here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think it is possible to move it to the else statement for NEWUSER check like this:

if NEWUSER {
    // ...
} else {
    if NEWIMA {
        return errors.New(...)
    }
}

This way Contains will be called only one more time if NEWIMA is specified and NEWUSER isn't. I guess this will be sufficient.

Thanks!


strncat(ima_active, IMA_SECURITYFS_REL_MNT, PATH_MAX);
mntflags = MS_NOATIME | MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RELATIME;
err = mount("securityfs", ima_active, "securityfs", mntflags , NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code assumes that there is a mnt directory under container rootfs? This depends on the distro and might not always be the case.

Copy link
Author

@IlyaHanov IlyaHanov Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right this depends on a distro, but most of them follow FHS (Filesystem Hierarchy Standard) which includes /mnt directory. Anyway, now I see such a solution:

  1. Check whether rootfs contains /mnt directory (may be any other directory)
  2. If not - create it
  3. Do mount securityfs and enable IMA namespace
  4. Unmount securityfs
  5. Remove /mnt directory
  6. otherwise use /mnt as it is already presented

Another way I see is to create a temporary directory like .secfs within current rootfs, and after enabling namespace remove it. But I guess this may affect perfomance as we invoke disk I/O.

Thanks!

Comment on lines 692 to 693
if ((strnlen(ima_active, PATH_MAX) + ima_active_path_len) > PATH_MAX)
bail("ima active path is too long");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this check to before doing mount()? Also, since we can assume (and do a run-time assert in the code) that ima_active_path_len > strlen(IMA_SECURITYFS_REL_MNT), we can omit the first length check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we can completely omit the first check len(ima_active) + len(IMA_SECURITYFS_REL_MNT) > PATH_MAX, replacing it with len(ima_active) + ima_active_path_len > PATH_MAX, because before write_file we'll do strncat(ima_active, IMA_ACTIVE_PATH), where ima_active includes IMA_SECURITYFS_REL_MNT.

But you're right it is possible for the second check to do it before mount like this:

if ((len(ima_active) + len(IMA_SECURITYFS_REL_MNT) + ima_active_path_len) > PATH_MAX)
    bail(...)

This will tell us whether len('/path/to/rootfs') + len('/mnt/') + len('integrity/ima/active') is more than PATH_MAX. So, no need to do any length check after mount.

Thanks!

strncat(ima_active, IMA_ACTIVE_PATH, PATH_MAX);
err = write_file(&data, 1, ima_active);
if (err < 0)
bail("cannot set up IMA namespace");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and above) you are bailing without an unmount. This might or might not be OK to do, need to check.

Copy link
Author

@IlyaHanov IlyaHanov Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be OK, because we can mount procfs, sysfs, etc. including secfs from newly created USER namespace, but we also must have mount namespace to be able to automatically unmount all mount points when MOUNT namespace is being destroyed (that means when runC bails it will kill the process that's holding MOUNT namespace). Otherwise mount points will be in the init_mnt_ns, so no unmount will be done for those points.

Since, it's not possible to mount securityfs without new MOUNT namespace (mount will return EPERM) inside a new USER namespace all mount points will be automatically unmounted during namespace destruction process.

Thanks for pointing this out! Here may be the case that IMA namespace need to check MOUNT namespace is also specified, otherwise mount will return an error. I will post updates if this is really the case, also need to check.

Added new "ima" namespace type for config.json configuration
file for runc, e.g.:
  ...
    "namespaces": [
      ...
      {
        "type": "ima"
      },
      ...
    ],
  ...
IMA namespace will be created only if USER namespace is,
too, specified, otherwise runc returns error, so the
only allowed scheme for enabling IMA namespace is:
  ...
    "namespaces": [
      ...
      {
        "type": "user"
      },
      {
        "type": "ima"
      },
      ...
    ],
  ...
otherwise runc returns an error.

Signed-off-by: Ilya Hanov <[email protected]>
@IlyaHanov
Copy link
Author

Review changes

According to @kolyshkin comments above:

  • Move check for IMA ns under USER ns to reduce Contains() calls for validator.
  • NEWIMA flag was changed to 0x400000000. Use uint64 for this explicitly to prevent i386 from integer overflow.
  • Remove map for non-clone flags and use function with switch statement to reduce overhead on initializing map with only one value.
  • Use .ima_secfs directory instead of /mnt as a temporary mount point for securityFS. Create it during enabling IMA ns and remove when IMA ns enabled or a failure occured.
  • Move all path length checks before doing mount() operation to prevent mounting if the path is not valid.
  • Fix comments

Please, let me know if I missed something. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants