Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker and io_uring don't work together - add prebuilt binary without liburing #1435

Closed
lowski opened this issue Aug 4, 2024 · 16 comments · Fixed by #1445
Closed

Docker and io_uring don't work together - add prebuilt binary without liburing #1435

lowski opened this issue Aug 4, 2024 · 16 comments · Fixed by #1445
Labels

Comments

@lowski
Copy link

lowski commented Aug 4, 2024

Motivation

When installing mediasoup in a Docker container (on the linux/amd64 platform) the mediasoup worker will crash with this error failure exit: io_uring_queue_init() failed: Operation not permitted.

Containerd has removed support for io_uring as it has been deemed unsafe. The rationale for this decision is explained here: containerd/containerd#9048

Current Workarounds

There are two options to work around this currently: (1) force the local build of mediasoup without io_uring, and (2) override the default seccomp profile.

Local build

This is the only viable option at the moment. We can force the npm postinstall to build the worker locally by adding these two environment variables before the npm install command:

# don't download the prebuilt binary
ENV MEDIASOUP_SKIP_WORKER_PREBUILT_DOWNLOAD="true"
# build without io_uring
ENV MESON_ARGS="-Dms_disable_liburing=true"

However, this obviously slows down the build process fairly significantly. But it works and has no security cost.

Seccomp Profile (BAD idea)

We can manually override the seccomp profile by passing --security-opt with the docker run command. I have not been able to find the exact syscalls that cause the issue, but allowing all syscalls by default works (as mentioned here moby/moby#39396 (comment)).

This will make mediasoup work again, but at a great security cost -- the syscalls were disallowed for a reason.

I DON'T THINK THIS IS A GOOD IDEA. I'm just adding this in case someone stumbles across this solution as I did.

Feature Request

Even though the first method works, it is less than ideal. However, I understand that io_uring brings some performance benefit which justifies its use in general. Therefore, another prebuilt binary should exist without io_uring when using Docker.

I don't know if it's possible to detect a docker build automatically and reliably (maybe this helps?) but that would obviously be the best case. Otherwise an environment variable to switch between the version with and without io_uring would be the next best thing.

@lowski lowski added the feature label Aug 4, 2024
@ibc
Copy link
Member

ibc commented Aug 4, 2024

Please add proper rationale. That commit doesn't explain anything to me.

@lowski
Copy link
Author

lowski commented Aug 4, 2024

@ibc Sorry, the keybinding for submitting an issue without further confirmation got in the way -- I've updated the description!

@lowski lowski changed the title Docker and io_uring Docker and io_uring don't work together - add prebuilt binary without liburing Aug 4, 2024
@ibc
Copy link
Member

ibc commented Aug 4, 2024

Ok, will think about this during this week.

@amontalban
Copy link

Just faced this as well (Running mediasoup with a Kernel 6 inside Docker), maybe https://unix.stackexchange.com/a/596284 helps to detect if io_uring is enabled.

@ibc
Copy link
Member

ibc commented Aug 9, 2024

So here is the thing:

  • The mediasoup-worker binary is built on a GH CI Linux machine with Kernel >= 6 with io-uring support.
  • However, later that binary is used in a Linux kernel 6 host (a Docker container) which has io-uring disabled.

In theory this should not crash since, despite the mediasoup-worker prebuilt binary was built with io-uring support, when we later run it in another machine we check in runtime if the host supports io-uring.

Well, that's the theory... The problem is exactly that: they way in which we check io-uring support in runtime:

bool DepLibUring::IsRuntimeSupported()
{
	// clang-format off
	struct utsname buffer{};
	// clang-format on

	auto err = uname(std::addressof(buffer));

	if (err != 0)
	{
		MS_THROW_ERROR("uname() failed: %s", std::strerror(err));
	}

	MS_DEBUG_TAG(info, "kernel version: %s", buffer.version);

	auto* kernelMayorCstr = buffer.release;
	auto kernelMayorLong  = strtol(kernelMayorCstr, &kernelMayorCstr, 10);

	// liburing `sento` capabilities are supported for kernel versions greather
	// than or equal to 6.
	return kernelMayorLong >= 6;
}

This is, it's literally checking is Kernel version is >= 6 :) so hence the problem because we are in a host that has kernel >= 6 but doesn't have io-uring support.

So we need a better runtime way to know if the host supports ìo-uring` or not.

@ibc
Copy link
Member

ibc commented Aug 9, 2024

Off-topic comment, ignore please

In addition to this, I don't really think we are ready to properly run mediasoup-worker binaries built with io-uring support into hosts that do not support io-uring (even if they have kernel 5).

Imagine we are in Linux kernel 5 using a worker with io-uring support. Eventually this code in DepUsrSctp.cpp will be executed:

void DepUsrSCTP::Checker::OnTimer(TimerHandle* /*timer*/)
{
	MS_TRACE();

	auto nowMs          = DepLibUV::GetTimeMs();
	const int elapsedMs = this->lastCalledAtMs ? static_cast<int>(nowMs - this->lastCalledAtMs) : 0;

#ifdef MS_LIBURING_SUPPORTED
	// Activate liburing usage.
	// 'usrsctp_handle_timers()' will synchronously call the send/recv
	// callbacks for the pending data. If there are multiple messages to be
	// sent over the network then we will send those messages within a single
	// system call.
	DepLibUring::SetActive();
#endif

To be clear:

  • This worker was compiled with macro MS_LIBURING_SUPPORTED defined.
  • So it will end calling DepLibUring::SetActive() despite the host doesn't support io-uring and despite DepLibUring::IsRuntimeSupported() returns false. I would call this a bug almost everywhere.

Yes, I know: DepLibUring::SetActive() does "nothing" because there is no !DepLibUring::liburing. And the same for all methods... Ok, it's a bit confusing but in theory it is ok.

@ibc
Copy link
Member

ibc commented Aug 9, 2024

So we need a reliable way to know if io-uring is supported or not.

  1. Currently, in build time (in meson_build file) we check if its Linux with kernel >= 6. This is not good since, as explained above, there are kernels 6 with io-uring disabled.
  2. Currently, assuming the worker was built in a machine with io-uring support, in runtime we check if current host supports io-uring by doing the same runtime check (kernel must be >= 6).

So we need to fix both cases.

Additionally, a new option WorkerSettings.disableLiburing may help.

Some related info:

@ibc
Copy link
Member

ibc commented Aug 9, 2024

NOTE: I'm doing some improvements (but not yet addressing this problem) in this PR: #1440

@nazar-pc
Copy link
Collaborator

nazar-pc commented Aug 9, 2024

io_uring is often disabled due to numerous security issues found in it over the years, so while supported by the kernel, it may not be allowed for use by apps

@ibc
Copy link
Member

ibc commented Aug 9, 2024

io_uring is often disabled due to numerous security issues found in it over the years, so while supported by the kernel, it may not be allowed for use by apps

And that's exactly what I meant above: we need a way (in build time in meson.build) and runtime (in DepLibUring.cpp) to check if io-uring is enabled in current host despite it's included in the kernel.

We must be able to only build the worker with io-uring support if supported and enabled in current host and we must be able to only enable our DepLibUring class if io-uring if supported and enabled in current host.

ibc added a commit that referenced this issue Aug 9, 2024
### Details

- `createWorker({ disableLiburing: true })` disables LibUring usage despite it's supported by the worker and current host.
- Related (still to be fixed) issue which brings lot of context: #1435
@ibc
Copy link
Member

ibc commented Aug 9, 2024

I'm adding an WorkerSettings option to disable liburing: #1442

That PR is not intended to fix this issue.

ibc added a commit that referenced this issue Aug 12, 2024
- Fixes #1435

### Details

- Having Kernel >= 6 doesn't guarantee that `io_uring` is enabled. Some systems disable it at kernel layer.
- This PR checks if `io_uring` initialization works in launch time, otherwise `io_uring` is disabled.

### TODO

- This is another story, but we should rename `WorkerSettings.disableLiburing` to `WorkerSettings.disableIoUring`. That's what we are disabling and not the `io_uring` C wrapper library called `liburing`.
@ibc
Copy link
Member

ibc commented Aug 12, 2024

So here a PR trying to fix this problem in a reliable way:

#1445

@lowski thoughts?

@ibc
Copy link
Member

ibc commented Aug 13, 2024

So we need a reliable way to know if io-uring is supported or not.

  1. Currently, in build time (in meson_build file) we check if its Linux with kernel >= 6. This is not good since, as explained above, there are kernels 6 with io-uring disabled.

This is not really needed AFAIS. Let me explain: If kernel >=6 then then the machin (in which mediasoup-worker is being built) has io_uring capabilities, so mediasoup can be built on this machine with io_uring support. Whether such a machine has io_uring disabled doesn't change anything here. That means that when running mediasoup io_uring won't be used (thanks to PR #1445) despite the built mediasoup-worker supports it.

  1. Currently, assuming the worker was built in a machine with io-uring support, in runtime we check if current host supports io-uring by doing the same runtime check (kernel must be >= 6).

This is done in PR #1445.

@ibc ibc closed this as completed in #1445 Aug 13, 2024
ibc added a commit that referenced this issue Aug 13, 2024
Fixes #1435

### Details

- Having Kernel >= 6 doesn't guarantee that `io_uring` is enabled. Some systems disable it at kernel layer.
- This PR checks if `io_uring` initialization works in launch time, otherwise `io_uring` is disabled.
@BRAiNCHiLD95
Copy link

@ibc A little confused by what the way forward is

The solution (as of v3.14.11) is to use disableLiburing: true in WorkerSettings object passed to createWorker()

And, in the future, #1445 (which will probably be deployed with v3.14.12) io_uring won't be used in runtime checks? Does that mean the disableLiburing: true won't be needed or is that going to remain a requirement for containers running on the linux/amd64 platform?

@ibc
Copy link
Member

ibc commented Aug 22, 2024

The solution (as of v3.14.11) is to use disableLiburing: true in WorkerSettings object passed to createWorker()

This is to disable io_uring even if it works.

And, in the future, #1445 (which will probably be deployed with v3.14.12) io_uring won't be used in runtime checks?

This is to not fail if io_uring is supported by the kernel but is disabled at kernel layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

5 participants