-
Notifications
You must be signed in to change notification settings - Fork 5
seccomp: Take unshare() out of CAP_SYS_ADMIN gate #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm a maintainer and author of passt (https://passt.top/), a user-mode networking implementation, that's used to connect containers, with pasta(1), and virtual machines, with passt(1), in an unprivileged way, without creating network interfaces. By the way, Moby optionally uses pasta(1) to connect rootless containers via rootlesskit: https://github.com/rootless-containers/rootlesskit/blob/236f31ec2258a1da1b1a9b62b168dd5f9a840f83/pkg/network/pasta/pasta.go Given that these tools deal with network packets from untrusted workloads, we pay particular attention to their security posture. The project implements a rather substantial sandboxing mechanism, so that, once the initialisation phase completes, passt(1) and pasta(1) only have access to an empty filesystem with a zero-size limit, and relinquish access possibilities to any resources they don't need, by means of detaching namespaces: https://passt.top/passt/tree/isolation.c https://passt.top/#security Users report that they can't use passt(1) in Docker containers, with one notable example at: https://bugs.passt.top/show_bug.cgi?id=116 and resort to run modified builds of passt: https://bugs.passt.top/show_bug.cgi?id=116#c6 with sandboxing features entirely disabled. This is of course not something we support, so it's not a particular concern in terms of maintainability, but it still forces users to disable important security features, and it's a rather alarming trend. As a side note, Flatpak has a similar issue: flatpak/flatpak#5921 and, same there, users routinely run custom builds of applications that ship strict native sandboxing features (including passt, Chromium, and Firefox) with those features disabled. This is not in the best interest of security and surely not in the best interest of those users. To fix this, enable unshare() regardless of the CAP_SYS_ADMIN capability, so that unprivileged applications can perform appropriate sandboxing. I'm well aware of CVE-2022-0185 and CVE-2022-0492, but, since then, there have been significant hardening efforts going on in the affected portions of the kernel and the current situation appears substantially different, now. Despite the original intention, a blanket ban on unprivileged unshare() appears nowadays to be detrimental to the security of containerised application, instead of contributing to it, as an increased number of applications finally start using namespaces for their own sandboxing, which is generally stricter than what any container runtime can provide. Link: https://bugs.passt.top/show_bug.cgi?id=116 Reported-by: simonvanderlans@gmail.com Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I just found #4 as I moved this merge request to the right repository. I'm not sure what to do with this one, as it's partially a duplicate, but passt(1) and pasta(1) need unshare(2) flags that are not covered by that one. |
"uname", | ||
"unlink", | ||
"unlinkat", | ||
"unshare", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this as a non-default built-in profile like
--security-opt seccomp=allow-unshare-user
?Or if we are going to have this as the default, we will need to provide
seccomp=disallow-unshare-user
option.
Originally posted by @AkihiroSuda in #42441
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I wasn't aware of moby/moby#42441.
I would argue that unshare() should be the default, otherwise container developers will hit https://bugs.passt.top/show_bug.cgi?id=116#c0 and keep distributing less secure builds of software because they have no practical way to ask users to add options when they run containers. See also https://bugs.passt.top/show_bug.cgi?id=116#c9.
I can take care of adjusting this pull request (if it makes sense at all) in the sense of moby/moby#42455, which already implemented your suggestion.
Note: this is the corrected version of moby/moby#51130, which I opened against the wrong repository. I'm just copying over the whole description from there.
I'm a maintainer and author of passt (https://passt.top/), a user-mode networking implementation, that's used to connect containers, with pasta(1), and virtual machines, with passt(1), in an unprivileged way, without creating network interfaces.
By the way, Moby optionally uses pasta(1) to connect rootless containers via rootlesskit:
Given that these tools deal with network packets from untrusted workloads, we pay particular attention to their security posture.
The project implements a rather substantial sandboxing mechanism, so that, once the initialisation phase completes, passt(1) and pasta(1) only have access to an empty filesystem with a zero-size limit, and relinquish access possibilities to any resources they don't need, by means of detaching namespaces:
Users report that they can't use passt(1) in Docker containers, with one notable example at:
and resort to run modified builds of passt:
with sandboxing features entirely disabled. This is of course not something we support, so it's not a particular concern in terms of maintainability, but still it forces users to disable important security features, and it's a rather alarming trend.
As a side note, Flatpak has a similar issue:
and, same there, users routinely run custom builds of applications that ship strict native sandboxing features (including passt, Chromium, and Firefox) with those features disabled. This is not in the best interest of security and surely not in the best interest of those users.
To fix this, enable unshare() regardless of the CAP_SYS_ADMIN capability, so that unprivileged applications can perform appropriate, strict sandboxing.
I'm well aware of CVE-2022-0185 and CVE-2022-0492, but, since then, there have been significant hardening efforts going on in the affected portions of the kernel and the current situation appears substantially different, now.
Despite the original intention, a blanket ban on unprivileged unshare() appears nowadays to be detrimental to the security of containerised application, instead of contributing to it, as an increased number of applications finally start using namespaces for their own sandboxing, which is generally stricter than what any container runtime can provide.
Link: https://bugs.passt.top/show_bug.cgi?id=116
Reported-by: simonvanderlans@gmail.com
Signed-off-by: Stefano Brivio sbrivio@redhat.com
- What I did
I took unshare(2), the system call, out of the CAP_SYS_ADMIN gate in the default seccomp profile.
- How I did it
I did it proudly, with a keyboard. I used so-called shortcuts that allowed me to conceptually cut one line of text file and paste it to another location.
- How to verify it
Run passt in a Docker container.
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)
Inspired from a submission at https://user.xmission.com/~emailbox/ascii_cats.htm: