Skip to content

Commit 55503c9

Browse files
committed
wishlist: add create mount namespaces with custom rootfs extension
Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent 3af682a commit 55503c9

1 file changed

Lines changed: 32 additions & 0 deletions

File tree

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,38 @@ Since `pivot_root()` only takes path arguments the new rootfs would need to
6363
be passed via `/proc/<pid>/fd/<nr>`. In the long run we should add a new
6464
`pivot_root()` syscall operating on file descriptors instead of paths.
6565

66+
### Create mount namespace with custom rootfs via `open_tree()` and `fsmount()`
67+
68+
Add `OPEN_TREE_NAMESPACE` flag to `open_tree()` and `FSMOUNT_NAMESPACE` flag
69+
to `fsmount()` that create a new mount namespace with the specified mount tree
70+
as the rootfs mounted on top of a copy of the real rootfs. These return a
71+
namespace file descriptor instead of a mount file descriptor.
72+
73+
This allows `OPEN_TREE_NAMESPACE` to function as a combined
74+
`unshare(CLONE_NEWNS)` and `pivot_root()`.
75+
76+
When creating containers the setup usually involves using `CLONE_NEWNS` via
77+
`clone3()` or `unshare()`. This copies the caller's complete mount namespace.
78+
The runtime will also assemble a new rootfs and then use `pivot_root()` to
79+
switch the old mount tree with the new rootfs. Afterward it will recursively
80+
unmount the old mount tree thereby getting rid of all mounts.
81+
82+
Copying all of these mounts only to get rid of them later is wasteful. With a
83+
large mount table and a system where thousands of containers are spawned in
84+
parallel this quickly becomes a bottleneck increasing contention on the
85+
semaphore.
86+
87+
**Use-Case:** Container runtimes can create an extremely minimal rootfs
88+
directly:
89+
90+
```c
91+
fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);
92+
```
93+
94+
This creates a mount namespace where "wootwoot" has become the rootfs. The
95+
caller can `setns()` into this new mount namespace and assemble additional
96+
mounts without copying and destroying the entire parent mount table.
97+
6698
### Query mount information via file descriptor with `statmount()`
6799

68100
Extend `struct mnt_id_req` to accept a file descriptor and introduce

0 commit comments

Comments
 (0)