@@ -63,6 +63,38 @@ Since `pivot_root()` only takes path arguments the new rootfs would need to
6363be passed via ` /proc/<pid>/fd/<nr> ` . In the long run we should add a new
6464` pivot_root() ` syscall operating on file descriptors instead of paths.
6565
66+ ### Create mount namespace with custom rootfs via ` open_tree() ` and ` fsmount() `
67+
68+ Add ` OPEN_TREE_NAMESPACE ` flag to ` open_tree() ` and ` FSMOUNT_NAMESPACE ` flag
69+ to ` fsmount() ` that create a new mount namespace with the specified mount tree
70+ as the rootfs mounted on top of a copy of the real rootfs. These return a
71+ namespace file descriptor instead of a mount file descriptor.
72+
73+ This allows ` OPEN_TREE_NAMESPACE ` to function as a combined
74+ ` unshare(CLONE_NEWNS) ` and ` pivot_root() ` .
75+
76+ When creating containers the setup usually involves using ` CLONE_NEWNS ` via
77+ ` clone3() ` or ` unshare() ` . This copies the caller's complete mount namespace.
78+ The runtime will also assemble a new rootfs and then use ` pivot_root() ` to
79+ switch the old mount tree with the new rootfs. Afterward it will recursively
80+ unmount the old mount tree thereby getting rid of all mounts.
81+
82+ Copying all of these mounts only to get rid of them later is wasteful. With a
83+ large mount table and a system where thousands of containers are spawned in
84+ parallel this quickly becomes a bottleneck increasing contention on the
85+ semaphore.
86+
87+ ** Use-Case:** Container runtimes can create an extremely minimal rootfs
88+ directly:
89+
90+ ``` c
91+ fd_mntns = open_tree(-EBADF, " /var/lib/containers/wootwoot" , OPEN_TREE_NAMESPACE);
92+ ```
93+
94+ This creates a mount namespace where "wootwoot" has become the rootfs. The
95+ caller can ` setns() ` into this new mount namespace and assemble additional
96+ mounts without copying and destroying the entire parent mount table.
97+
6698### Query mount information via file descriptor with ` statmount() `
6799
68100Extend ` struct mnt_id_req ` to accept a file descriptor and introduce
0 commit comments