cf3f902df4
full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657 - opencontainers/runc#2010 criu image path permission error when checkpoint rootless container - opencontainers/runc#2028 Update to Go 1.12 and drop obsolete versions - opencontainers/runc#2029 Update dependencies - opencontainers/runc#2034 Support for logging from children processes - opencontainers/runc#2035 specconv: always set "type: bind" in case of MS_BIND - opencontainers/runc#2038 `r.destroy` can defer exec in `runner.run` method - opencontainers/runc#2041 Change the permissions of the notify listener socket to rwx for everyone - opencontainers/runc#2042 libcontainer: intelrdt: add missing destroy handler in defer func - opencontainers/runc#2047 Move systemd.Manager initialization into a function in that module - opencontainers/runc#2057 main: not reopen /dev/stderr - closes opencontainers/runc#2056 Runc + podman|cri-o + systemd issue with stderr - closes kubernetes/kubernetes#77615 kubelet fails starting CRI-O containers (Ubuntu 18.04 + systemd cgroups driver) - closes cri-o/cri-o#2368 Joining worker node not starting flannel or kube-proxy / CRI-O error "open /dev/stderr: no such device or address" - opencontainers/runc#2061 libcontainer: fix TestGetContainerState to check configs.NEWCGROUP - opencontainers/runc#2065 Fix cgroup hugetlb size prefix for kB - opencontainers/runc#2067 libcontainer: change seccomp test for clone syscall - opencontainers/runc#2074 Update dependency libseccomp-golang - opencontainers/runc#2081 Bump CRIU to 3.12 - opencontainers/runc#2089 doc: First process in container needs `Init: true` - opencontainers/runc#2094 Skip searching /dev/.udev for device nodes - closes opencontainers/runc#2093 HostDevices() race with older udevd versions - opencontainers/runc#2098 man: fix man-pages - opencontainers/runc#2103 cgroups/fs: check nil pointers in cgroup manager - opencontainers/runc#2107 Make get devices function public - opencontainers/runc#2113 libcontainer: initial support for cgroups v2 - opencontainers/runc#2116 Avoid the dependency on cgo through go-systemd/util package - removes github.com/coreos/pkg as dependency - opencontainers/runc#2117 Remove libcontainer detection for systemd features - fixes opencontainers/runc#2117 Cache the systemd detection results - opencontainers/runc#2119 libcontainer: update masked paths of /proc - relates to #36368 Add /proc/keys to masked paths - relates to #38299 Masked /proc/asound - relates to #37404 Add /proc/acpi to masked paths (CVE-2018-10892) - opencontainers/runc#2122 nsenter: minor fixes - opencontainers/runc#2123 Bump x/sys and update syscall for initial Risc-V support - opencontainers/runc#2125 cgroup: support mount of cgroup2 - opencontainers/runc#2126 libcontainer/nsenter: Don't import C in non-cgo file - opencontainers/runc#2129 Only allow proc mount if it is procfs - addresses opencontainers/runc#2129 AppArmor can be bypassed by a malicious image that specifies a volume at /proc (CVE-2019-16884) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> |
||
---|---|---|
.. | ||
README.md | ||
cloned_binary.c | ||
namespace.h | ||
nsenter.go | ||
nsenter_gccgo.go | ||
nsenter_unsupported.go | ||
nsexec.c |
README.md
nsenter
The nsenter
package registers a special init constructor that is called before
the Go runtime has a chance to boot. This provides us the ability to setns
on
existing namespaces and avoid the issues that the Go runtime has with multiple
threads. This constructor will be called if this package is registered,
imported, in your go application.
The nsenter
package will import "C"
and it uses cgo
package. In cgo, if the import of "C" is immediately preceded by a comment, that comment,
called the preamble, is used as a header when compiling the C parts of the package.
So every time we import package nsenter
, the C code function nsexec()
would be
called. And package nsenter
is only imported in init.go
, so every time the runc
init
command is invoked, that C code is run.
Because nsexec()
must be run before the Go runtime in order to use the
Linux kernel namespace, you must import
this library into a package if
you plan to use libcontainer
directly. Otherwise Go will not execute
the nsexec()
constructor, which means that the re-exec will not cause
the namespaces to be joined. You can import it like this:
import _ "github.com/opencontainers/runc/libcontainer/nsenter"
nsexec()
will first get the file descriptor number for the init pipe
from the environment variable _LIBCONTAINER_INITPIPE
(which was opened
by the parent and kept open across the fork-exec of the nsexec()
init
process). The init pipe is used to read bootstrap data (namespace paths,
clone flags, uid and gid mappings, and the console path) from the parent
process. nsexec()
will then call setns(2)
to join the namespaces
provided in the bootstrap data (if available), clone(2)
a child process
with the provided clone flags, update the user and group ID mappings, do
some further miscellaneous setup steps, and then send the PID of the
child process to the parent of the nsexec()
"caller". Finally,
the parent nsexec()
will exit and the child nsexec()
process will
return to allow the Go runtime take over.
NOTE: We do both setns(2)
and clone(2)
even if we don't have any
CLONE_NEW*
clone flags because we must fork a new process in order to
enter the PID namespace.