DockerCLI/vendor/github.com/opencontainers/runc/libcontainer
Sebastiaan van Stijn cf3f902df4
update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884)
full diff: https://github.com/opencontainers/runc/compare/v1.0.0-rc8...3e425f80a8c931f88e6d94a8c831b9d5aa481657

  - opencontainers/runc#2010 criu image path permission error when checkpoint rootless container
  - opencontainers/runc#2028 Update to Go 1.12 and drop obsolete versions
  - opencontainers/runc#2029 Update dependencies
  - opencontainers/runc#2034 Support for logging from children processes
  - opencontainers/runc#2035 specconv: always set "type: bind" in case of MS_BIND
  - opencontainers/runc#2038 `r.destroy` can defer exec in `runner.run` method
  - opencontainers/runc#2041 Change the permissions of the notify listener socket to rwx for everyone
  - opencontainers/runc#2042 libcontainer: intelrdt: add missing destroy handler in defer func
  - opencontainers/runc#2047 Move systemd.Manager initialization into a function in that module
  - opencontainers/runc#2057 main: not reopen /dev/stderr
      - closes opencontainers/runc#2056 Runc + podman|cri-o + systemd issue with stderr
      - closes kubernetes/kubernetes#77615 kubelet fails starting CRI-O containers (Ubuntu 18.04 + systemd cgroups driver)
      - closes cri-o/cri-o#2368 Joining worker node not starting flannel or kube-proxy / CRI-O error "open /dev/stderr: no such device or address"
  - opencontainers/runc#2061 libcontainer: fix TestGetContainerState to check configs.NEWCGROUP
  - opencontainers/runc#2065 Fix cgroup hugetlb size prefix for kB
  - opencontainers/runc#2067 libcontainer: change seccomp test for clone syscall
  - opencontainers/runc#2074 Update dependency libseccomp-golang
  - opencontainers/runc#2081 Bump CRIU to 3.12
  - opencontainers/runc#2089 doc: First process in container needs `Init: true`
  - opencontainers/runc#2094 Skip searching /dev/.udev for device nodes
      - closes opencontainers/runc#2093 HostDevices() race with older udevd versions
  - opencontainers/runc#2098 man: fix man-pages
  - opencontainers/runc#2103 cgroups/fs: check nil pointers in cgroup manager
  - opencontainers/runc#2107 Make get devices function public
  - opencontainers/runc#2113 libcontainer: initial support for cgroups v2
  - opencontainers/runc#2116 Avoid the dependency on cgo through go-systemd/util package
      - removes github.com/coreos/pkg as dependency
  - opencontainers/runc#2117 Remove libcontainer detection for systemd features
      - fixes opencontainers/runc#2117 Cache the systemd detection results
  - opencontainers/runc#2119 libcontainer: update masked paths of /proc
      - relates to #36368 Add /proc/keys to masked paths
      - relates to #38299 Masked /proc/asound
      - relates to #37404 Add /proc/acpi to masked paths (CVE-2018-10892)
  - opencontainers/runc#2122 nsenter: minor fixes
  - opencontainers/runc#2123 Bump x/sys and update syscall for initial Risc-V support
  - opencontainers/runc#2125 cgroup: support mount of cgroup2
  - opencontainers/runc#2126 libcontainer/nsenter: Don't import C in non-cgo file
  - opencontainers/runc#2129 Only allow proc mount if it is procfs
      - addresses opencontainers/runc#2129 AppArmor can be bypassed by a malicious image that specifies a volume at /proc (CVE-2019-16884)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-12-12 21:21:27 +01:00
..
nsenter update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) 2019-12-12 21:21:27 +01:00
system update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) 2019-12-12 21:21:27 +01:00
user Update containerd to 1.2.1 2019-02-02 13:30:55 +01:00
README.md update runc to v1.0.0-rc8-92-g84373aaa (CVE-2019-16884) 2019-12-12 21:21:27 +01:00

README.md

libcontainer

GoDoc

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

import (
	_ "github.com/opencontainers/runc/libcontainer/nsenter"
)

func init() {
	if len(os.Args) > 1 && os.Args[1] == "init" {
		runtime.GOMAXPROCS(1)
		runtime.LockOSThread()
		factory, _ := libcontainer.New("")
		if err := factory.StartInitialization(); err != nil {
			logrus.Fatal(err)
		}
		panic("--this line should have never been executed, congratulations--")
	}
}

Then to create a container you first have to initialize an instance of a factory that will handle the creation and initialization for a container.

factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
	logrus.Fatal(err)
	return
}

Once you have an instance of the factory created we can create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
                Bounding: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Effective: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Inheritable: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Permitted: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Ambient: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
        },
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
		{Type: configs.NEWCGROUP},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			AllowAllDevices:  nil,
			AllowedDevices:   configs.DefaultAllowedDevices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  configs.DefaultAutoCreatedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container:

container, err := factory.Create("container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
	Init:   true,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This let's you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Code and documentation copyright 2014 Docker, inc. The code and documentation are released under the Apache 2.0 license. The documentation is also released under Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.