On Mon, Jun 10, 2024 at 02:46:06AM -0700, Jonathan Calmels wrote:
On Sun, Jun 09, 2024 at 09:33:01PM GMT, Serge E. Hallyn wrote:
On Sun, Jun 09, 2024 at 03:43:35AM -0700, Jonathan Calmels wrote:
This patch adds a new capability security bit designed to constrain a task’s userns capability set to its bounding set. The reason for this is twofold:
- This serves as a quick and easy way to lock down a set of capabilities for a task, thus ensuring that any namespace it creates will never be more privileged than itself is.
- This helps userspace transition to more secure defaults by not requiring specific logic for the userns capability set, or libcap support.
Example:
# capsh --secbits=$((1 << 8)) --drop=cap_sys_rawio -- \ -c 'unshare -r grep Cap /proc/self/status' CapInh: 0000000000000000 CapPrm: 000001fffffdffff CapEff: 000001fffffdffff CapBnd: 000001fffffdffff CapAmb: 0000000000000000 CapUNs: 000001fffffdffff
But you are not (that I can see, in this or the previous patch) keeping SECURE_USERNS_STRICT_CAPS in securebits on the next level unshare. Though I think it's ok, because by then both cap_userns and cap_bset are reduced and cap_userns can't be expanded. (Sorry, just thinking aloud here)
Right this is safe to reset, but maybe we do keep it if the secbit is locked? This is kind of a special case compared to the other bits.
I don't think it would be worth the extra complication in the secbits code, and it's semantically very different from the cap_userns.
- /* Limit userns capabilities to our parent's bounding set. */
In the case of userns_install(), it will be the target user namespace creator's bounding set, right? Not "our parent's"?
Good point, I should reword this comment.