Add support for user-defined healthchecks

This PR adds support for user-defined health-check probes for Docker
containers. It adds a `HEALTHCHECK` instruction to the Dockerfile syntax plus
some corresponding "docker run" options. It can be used with a restart policy
to automatically restart a container if the check fails.

The `HEALTHCHECK` instruction has two forms:

* `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container)
* `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image)

The `HEALTHCHECK` instruction tells Docker how to test a container to check that
it is still working. This can detect cases such as a web server that is stuck in
an infinite loop and unable to handle new connections, even though the server
process is still running.

When a container has a healthcheck specified, it has a _health status_ in
addition to its normal status. This status is initially `starting`. Whenever a
health check passes, it becomes `healthy` (whatever state it was previously in).
After a certain number of consecutive failures, it becomes `unhealthy`.

The options that can appear before `CMD` are:

* `--interval=DURATION` (default: `30s`)
* `--timeout=DURATION` (default: `30s`)
* `--retries=N` (default: `1`)

The health check will first run **interval** seconds after the container is
started, and then again **interval** seconds after each previous check completes.

If a single run of the check takes longer than **timeout** seconds then the check
is considered to have failed.

It takes **retries** consecutive failures of the health check for the container
to be considered `unhealthy`.

There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list
more than one then only the last `HEALTHCHECK` will take effect.

The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK
CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands;
see e.g. `ENTRYPOINT` for details).

The command's exit status indicates the health status of the container.
The possible values are:

- 0: success - the container is healthy and ready for use
- 1: unhealthy - the container is not working correctly
- 2: starting - the container is not ready for use yet, but is working correctly

If the probe returns 2 ("starting") when the container has already moved out of the
"starting" state then it is treated as "unhealthy" instead.

For example, to check every five minutes or so that a web-server is able to
serve the site's main page within three seconds:

    HEALTHCHECK --interval=5m --timeout=3s \
      CMD curl -f http://localhost/ || exit 1

To help debug failing probes, any output text (UTF-8 encoded) that the command writes
on stdout or stderr will be stored in the health status and can be queried with
`docker inspect`. Such output should be kept short (only the first 4096 bytes
are stored currently).

When the health status of a container changes, a `health_status` event is
generated with the new status. The health status is also displayed in the
`docker ps` output.

Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit is contained in:
Thomas Leonard 2016-04-18 10:48:13 +01:00 committed by Tibor Vass
parent cceb74311b
commit 51ddea93a2
2 changed files with 127 additions and 0 deletions

View File

@ -1470,6 +1470,73 @@ The `STOPSIGNAL` instruction sets the system call signal that will be sent to th
This signal can be a valid unsigned number that matches a position in the kernel's syscall table, for instance 9,
or a signal name in the format SIGNAME, for instance SIGKILL.
## HEALTHCHECK
The `HEALTHCHECK` instruction has two forms:
* `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container)
* `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image)
The `HEALTHCHECK` instruction tells Docker how to test a container to check that
it is still working. This can detect cases such as a web server that is stuck in
an infinite loop and unable to handle new connections, even though the server
process is still running.
When a container has a healthcheck specified, it has a _health status_ in
addition to its normal status. This status is initially `starting`. Whenever a
health check passes, it becomes `healthy` (whatever state it was previously in).
After a certain number of consecutive failures, it becomes `unhealthy`.
The options that can appear before `CMD` are:
* `--interval=DURATION` (default: `30s`)
* `--timeout=DURATION` (default: `30s`)
* `--retries=N` (default: `1`)
The health check will first run **interval** seconds after the container is
started, and then again **interval** seconds after each previous check completes.
If a single run of the check takes longer than **timeout** seconds then the check
is considered to have failed.
It takes **retries** consecutive failures of the health check for the container
to be considered `unhealthy`.
There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list
more than one then only the last `HEALTHCHECK` will take effect.
The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK
CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands;
see e.g. `ENTRYPOINT` for details).
The command's exit status indicates the health status of the container.
The possible values are:
- 0: success - the container is healthy and ready for use
- 1: unhealthy - the container is not working correctly
- 2: starting - the container is not ready for use yet, but is working correctly
If the probe returns 2 ("starting") when the container has already moved out of the
"starting" state then it is treated as "unhealthy" instead.
For example, to check every five minutes or so that a web-server is able to
serve the site's main page within three seconds:
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
To help debug failing probes, any output text (UTF-8 encoded) that the command writes
on stdout or stderr will be stored in the health status and can be queried with
`docker inspect`. Such output should be kept short (only the first 4096 bytes
are stored currently).
When the health status of a container changes, a `health_status` event is
generated with the new status.
The `HEALTHCHECK` feature was added in Docker 1.12.
## Dockerfile examples
Below you can see some examples of Dockerfile syntax. If you're interested in

View File

@ -1250,6 +1250,7 @@ Dockerfile instruction and how the operator can override that setting.
#entrypoint-default-command-to-execute-at-runtime)
- [EXPOSE (Incoming Ports)](#expose-incoming-ports)
- [ENV (Environment Variables)](#env-environment-variables)
- [HEALTHCHECK](#healthcheck)
- [VOLUME (Shared Filesystems)](#volume-shared-filesystems)
- [USER](#user)
- [WORKDIR](#workdir)
@ -1398,6 +1399,65 @@ above, or already defined by the developer with a Dockerfile `ENV`:
Similarly the operator can set the **hostname** with `-h`.
### HEALTHCHECK
```
--health-cmd Command to run to check health
--health-interval Time between running the check
--health-retries Consecutive failures needed to report unhealthy
--health-timeout Maximum time to allow one check to run
--no-healthcheck Disable any container-specified HEALTHCHECK
```
Example:
$ docker run --name=test -d \
--health-cmd='stat /etc/passwd || exit 1' \
--health-interval=2s \
busybox sleep 1d
$ sleep 2; docker inspect --format='{{.State.Health.Status}}' test
healthy
$ docker exec test rm /etc/passwd
$ sleep 2; docker inspect --format='{{json .State.Health}}' test
{
"Status": "unhealthy",
"FailingStreak": 3,
"Log": [
{
"Start": "2016-05-25T17:22:04.635478668Z",
"End": "2016-05-25T17:22:04.7272552Z",
"ExitCode": 0,
"Output": " File: /etc/passwd\n Size: 334 \tBlocks: 8 IO Block: 4096 regular file\nDevice: 32h/50d\tInode: 12 Links: 1\nAccess: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 0/ root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
},
{
"Start": "2016-05-25T17:22:06.732900633Z",
"End": "2016-05-25T17:22:06.822168935Z",
"ExitCode": 0,
"Output": " File: /etc/passwd\n Size: 334 \tBlocks: 8 IO Block: 4096 regular file\nDevice: 32h/50d\tInode: 12 Links: 1\nAccess: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 0/ root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
},
{
"Start": "2016-05-25T17:22:08.823956535Z",
"End": "2016-05-25T17:22:08.897359124Z",
"ExitCode": 1,
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
},
{
"Start": "2016-05-25T17:22:10.898802931Z",
"End": "2016-05-25T17:22:10.969631866Z",
"ExitCode": 1,
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
},
{
"Start": "2016-05-25T17:22:12.971033523Z",
"End": "2016-05-25T17:22:13.082015516Z",
"ExitCode": 1,
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
}
]
}
The health status is also displayed in the `docker ps` output.
### TMPFS (mount tmpfs filesystems)
```bash