diff --git a/docs/reference/commandline/daemon.md b/docs/reference/commandline/daemon.md index a4606266c9..8bc30a8eca 100644 --- a/docs/reference/commandline/daemon.md +++ b/docs/reference/commandline/daemon.md @@ -62,6 +62,7 @@ weight = -1 --tlscert="~/.docker/cert.pem" Path to TLS certificate file --tlskey="~/.docker/key.pem" Path to TLS key file --tlsverify Use TLS and verify the remote + --userns-remap="default" Enable user namespace remapping --userland-proxy=true Use userland proxy for loopback traffic Options with [] may be specified multiple times. @@ -632,6 +633,133 @@ For information about how to create an authorization plugin, see [authorization plugin](../../extend/authorization.md) section in the Docker extend section of this documentation. +## Daemon user namespace options + +The Linux kernel [user namespace support](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) provides additional security by enabling +a process, and therefore a container, to have a unique range of user and +group IDs which are outside the traditional user and group range utilized by +the host system. Potentially the most important security improvement is that, +by default, container processes running as the `root` user will have expected +administrative privilege (with some restrictions) inside the container but will +effectively be mapped to an unprivileged `uid` on the host. + +When user namespace support is enabled, Docker creates a single daemon-wide mapping +for all containers running on the same engine instance. The mappings will +utilize the existing subordinate user and group ID feature available on all modern +Linux distributions. +The [`/etc/subuid`](http://man7.org/linux/man-pages/man5/subuid.5.html) and +[`/etc/subgid`](http://man7.org/linux/man-pages/man5/subgid.5.html) files will be +read for the user, and optional group, specified to the `--userns-remap` +parameter. If you do not wish to specify your own user and/or group, you can +provide `default` as the value to this flag, and a user will be created on your behalf +and provided subordinate uid and gid ranges. This default user will be named +`dockremap`, and entries will be created for it in `/etc/passwd` and +`/etc/group` using your distro's standard user and group creation tools. + +> **Note**: The single mapping per-daemon restriction is in place for now +> because Docker shares image layers from its local cache across all +> containers running on the engine instance. Since file ownership must be +> the same for all containers sharing the same layer content, the decision +> was made to map the file ownership on `docker pull` to the daemon's user and +> group mappings so that there is no delay for running containers once the +> content is downloaded. This design preserves the same performance for `docker +> pull`, `docker push`, and container startup as users expect with +> user namespaces disabled. + +### Starting the daemon with user namespaces enabled + +To enable user namespace support, start the daemon with the +`--userns-remap` flag, which accepts values in the following formats: + + - uid + - uid:gid + - username + - username:groupname + +If numeric IDs are provided, translation back to valid user or group names +will occur so that the subordinate uid and gid information can be read, given +these resources are name-based, not id-based. If the numeric ID information +provided does not exist as entries in `/etc/passwd` or `/etc/group`, daemon +startup will fail with an error message. + +*Example: starting with default Docker user management:* + +``` + $ docker daemon --userns-remap=default +``` +When `default` is provided, Docker will create - or find the existing - user and group +named `dockremap`. If the user is created, and the Linux distribution has +appropriate support, the `/etc/subuid` and `/etc/subgid` files will be populated +with a contiguous 65536 length range of subordinate user and group IDs, starting +at an offset based on prior entries in those files. For example, Ubuntu will +create the following range, based on an existing user named `user1` already owning +the first 65536 range: + +``` + $ cat /etc/subuid + user1:100000:65536 + dockremap:165536:65536 +``` + +> **Note:** On a fresh Fedora install, we had to `touch` the +> `/etc/subuid` and `/etc/subgid` files to have ranges assigned when users +> were created. Once these files existed, range assignment on user creation +> worked properly. + +If you have a preferred/self-managed user with subordinate ID mappings already +configured, you can provide that username or uid to the `--userns-remap` flag. +If you have a group that doesn't match the username, you may provide the `gid` +or group name as well; otherwise the username will be used as the group name +when querying the system for the subordinate group ID range. + +### Detailed information on `subuid`/`subgid` ranges + +Given potential advanced use of the subordinate ID ranges by power users, the +following paragraphs define how the Docker daemon currently uses the range entries +found within the subordinate range files. + +The simplest case is that only one contiguous range is defined for the +provided user or group. In this case, Docker will use that entire contiguous +range for the mapping of host uids and gids to the container process. This +means that the first ID in the range will be the remapped root user, and the +IDs above that initial ID will map host ID 1 through the end of the range. + +From the example `/etc/subid` content shown above, the remapped root +user would be uid 165536. + +If the system administrator has set up multiple ranges for a single user or +group, the Docker daemon will read all the available ranges and use the +following algorithm to create the mapping ranges: + +1. The range segments found for the particular user will be sorted by *start ID* ascending. +2. Map segments will be created from each range in increasing value with a length matching the length of each segment. Therefore the range segment with the lowest numeric starting value will be equal to the remapped root, and continue up through host uid/gid equal to the range segment length. As an example, if the lowest segment starts at ID 1000 and has a length of 100, then a map of 1000 -> 0 (the remapped root) up through 1100 -> 100 will be created from this segment. If the next segment starts at ID 10000, then the next map will start with mapping 10000 -> 101 up to the length of this second segment. This will continue until no more segments are found in the subordinate files for this user. +3. If more than five range segments exist for a single user, only the first five will be utilized, matching the kernel's limitation of only five entries in `/proc/self/uid_map` and `proc/self/gid_map`. + +### User namespace known restrictions + +The following standard Docker features are currently incompatible when +running a Docker daemon with user namespaces enabled: + + - sharing PID or NET namespaces with the host (`--pid=host` or `--net=host`) + - sharing a network namespace with an existing container (`--net=container:*other*`) + - sharing an IPC namespace with an existing container (`--ipc=container:*other*`) + - A `--readonly` container filesystem (this is a Linux kernel restriction against remounting with modified flags of a currently mounted filesystem when inside a user namespace) + - external (volume or graph) drivers which are unaware/incapable of using daemon user mappings + - Using `--privileged` mode flag on `docker run` + +In general, user namespaces are an advanced feature and will require +coordination with other capabilities. For example, if volumes are mounted from +the host, file ownership will have to be pre-arranged if the user or +administrator wishes the containers to have expected access to the volume +contents. + +Finally, while the `root` user inside a user namespaced container process has +many of the expected admin privileges that go along with being the superuser, the +Linux kernel has restrictions based on internal knowledge that this is a user namespaced +process. The most notable restriction that we are aware of at this time is the +inability to use `mknod`. Permission will be denied for device creation even as +container `root` inside a user namespace. + ## Miscellaneous options IP masquerading uses address translation to allow containers without a public diff --git a/experimental/README.md b/experimental/README.md index d2eff37d8d..659780e3fa 100644 --- a/experimental/README.md +++ b/experimental/README.md @@ -72,7 +72,7 @@ to build a Docker binary with the experimental features enabled: ## Current experimental features * [External graphdriver plugins](plugins_graphdriver.md) - * [User namespaces](userns.md) + * The user namespaces feature has graduated from experimental. ## How to comment on an experimental feature diff --git a/experimental/userns.md b/experimental/userns.md deleted file mode 100644 index cb713f7d65..0000000000 --- a/experimental/userns.md +++ /dev/null @@ -1,119 +0,0 @@ -# Experimental: User namespace support - -Linux kernel [user namespace support](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) provides additional security by enabling -a process--and therefore a container--to have a unique range of user and -group IDs which are outside the traditional user and group range utilized by -the host system. Potentially the most important security improvement is that, -by default, container processes running as the `root` user will have expected -administrative privilege (with some restrictions) inside the container but will -effectively be mapped to an unprivileged `uid` on the host. - -In this experimental phase, the Docker daemon creates a single daemon-wide mapping -for all containers running on the same engine instance. The mappings will -utilize the existing subordinate user and group ID feature available on all modern -Linux distributions. -The [`/etc/subuid`](http://man7.org/linux/man-pages/man5/subuid.5.html) and -[`/etc/subgid`](http://man7.org/linux/man-pages/man5/subgid.5.html) files will be -read for the user, and optional group, specified to the `--userns-remap` -parameter. If you do not wish to specify your own user and/or group, you can -provide `default` as the value to this flag, and a user will be created on your behalf -and provided subordinate uid and gid ranges. This default user will be named -`dockremap`, and entries will be created for it in `/etc/passwd` and -`/etc/group` using your distro's standard user and group creation tools. - -> **Note**: The single mapping per-daemon restriction exists for this experimental -> phase because Docker shares image layers from its local cache across all -> containers running on the engine instance. Since file ownership must be -> the same for all containers sharing the same layer content, the decision -> was made to map the file ownership on `docker pull` to the daemon's user and -> group mappings so that there is no delay for running containers once the -> content is downloaded--exactly the same performance characteristics as with -> user namespaces disabled. - -## Starting the daemon with user namespaces enabled -To enable this experimental user namespace support for a Docker daemon instance, -start the daemon with the aforementioned `--userns-remap` flag, which accepts -values in the following formats: - - - uid - - uid:gid - - username - - username:groupname - -If numeric IDs are provided, translation back to valid user or group names -will occur so that the subordinate uid and gid information can be read, given -these resources are name-based, not id-based. If the numeric ID information -provided does not exist as entries in `/etc/passwd` or `/etc/group`, daemon -startup will fail with an error message. - -*An example: starting with default Docker user management:* - -``` - $ docker daemon --userns-remap=default -``` -In this case, Docker will create--or find the existing--user and group -named `dockremap`. If the user is created, and the Linux distribution has -appropriate support, the `/etc/subuid` and `/etc/subgid` files will be populated -with a contiguous 65536 length range of subordinate user and group IDs, starting -at an offset based on prior entries in those files. For example, Ubuntu will -create the following range, based on an existing user already having the first -65536 range: - -``` - $ cat /etc/subuid - user1:100000:65536 - dockremap:165536:65536 -``` - -> **Note:** On a fresh Fedora install, we found that we had to `touch` the -> `/etc/subuid` and `/etc/subgid` files to have ranges assigned when users -> were created. Once these files existed, range assignment on user creation -> worked properly. - -If you have a preferred/self-managed user with subordinate ID mappings already -configured, you can provide that username or uid to the `--userns-remap` flag. -If you have a group that doesn't match the username, you may provide the `gid` -or group name as well; otherwise the username will be used as the group name -when querying the system for the subordinate group ID range. - -## Detailed information on `subuid`/`subgid` ranges - -Given there may be advanced use of the subordinate ID ranges by power users, we will -describe how the Docker daemon uses the range entries within these files under the -current experimental user namespace support. - -The simplest case exists where only one contiguous range is defined for the -provided user or group. In this case, Docker will use that entire contiguous -range for the mapping of host uids and gids to the container process. This -means that the first ID in the range will be the remapped root user, and the -IDs above that initial ID will map host ID 1 through the end of the range. - -From the example `/etc/subid` content shown above, that means the remapped root -user would be uid 165536. - -If the system administrator has set up multiple ranges for a single user or -group, the Docker daemon will read all the available ranges and use the -following algorithm to create the mapping ranges: - -1. The ranges will be sorted by *start ID* ascending -2. Maps will be created from each range with where the host ID will increment starting at 0 for the first range, 0+*range1* length for the second, and so on. This means that the lowest range start ID will be the remapped root, and all further ranges will map IDs from 1 through the uid or gid that equals the sum of all range lengths. -3. Ranges segments above five will be ignored as the kernel ignores any ID maps after five (in `/proc/self/{u,g}id_map`) - -## User namespace known restrictions - -The following standard Docker features are currently incompatible when -running a Docker daemon with experimental user namespaces enabled: - - - sharing namespaces with the host (--pid=host, --net=host, etc.) - - sharing namespaces with other containers (--net=container:*other*) - - A `--readonly` container filesystem (a Linux kernel restriction on remount with new flags of a currently mounted filesystem when inside a user namespace) - - external (volume/graph) drivers which are unaware/incapable of using daemon user mappings - - Using `--privileged` mode containers - - volume use without pre-arranging proper file ownership in mounted volumes - -Additionally, while the `root` user inside a user namespaced container -process has many of the privileges of the administrative root user, the -following operations will fail: - - - Use of `mknod` - permission is denied for device creation by the container root - - others will be listed here when fully tested diff --git a/man/docker-daemon.8.md b/man/docker-daemon.8.md index 8001c72d65..8e4a3acc0b 100644 --- a/man/docker-daemon.8.md +++ b/man/docker-daemon.8.md @@ -53,6 +53,7 @@ docker-daemon - Enable daemon mode [**--tlskey**[=*~/.docker/key.pem*]] [**--tlsverify**] [**--userland-proxy**[=*true*]] +[**--userns-remap**[=*default*]] # DESCRIPTION **docker** has two distinct functions. It is used for starting the Docker @@ -223,6 +224,9 @@ unix://[/path/to/socket] to use. **--userland-proxy**=*true*|*false* Rely on a userland proxy implementation for inter-container and outside-to-container loopback communications. Default is true. +**--userns-remap**=*default*|*uid:gid*|*user:group*|*user*|*uid* + Enable user namespaces for containers on the daemon. Specifying "default" will cause a new user and group to be created to handle UID and GID range remapping for the user namespace mappings used for contained processes. Specifying a user (or uid) and optionally a group (or gid) will cause the daemon to lookup the user and group's subordinate ID ranges for use as the user namespace mappings for contained processes. + # STORAGE DRIVER OPTIONS Docker uses storage backends (known as "graphdrivers" in the Docker