Service update failure thresholds and rollback

This adds support for two enhancements to swarm service rolling updates:

- Failure thresholds: In Docker 1.12, a service update could be set up
  to either pause or continue after a single failure occurs. This adds
  an --update-max-failure-ratio flag that controls how many tasks need to
  fail to update for the update as a whole to be considered a failure. A
  counterpart flag, --update-monitor, controls how long to monitor each
  task for a failure after starting it during the update.

- Rollback flag: service update --rollback reverts the service to its
  previous version. If a service update encounters task failures, or
  fails to function properly for some other reason, the user can roll back
  the update.

SwarmKit also has the ability to roll back updates automatically after
hitting the failure thresholds, but we've decided not to expose this in
the Docker API/CLI for now, favoring a workflow where the decision to
roll back is always made by an admin. Depending on user feedback, we may
add a "rollback" option to --update-failure-action in the future.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This commit is contained in:
Aaron Lehmann 2016-09-02 14:12:05 -07:00 committed by Tibor Vass
parent 3ba4b59233
commit 8c03c1201b
4 changed files with 78 additions and 67 deletions

View File

@ -1809,9 +1809,12 @@ _docker_service_update() {
--restart-delay
--restart-max-attempts
--restart-window
--rollback
--stop-grace-period
--update-delay
--update-failure-action
--update-max-failure-ratio
--update-monitor
--update-parallelism
--user -u
--workdir -w

View File

@ -1108,6 +1108,8 @@ __docker_service_subcommand() {
"($help)--stop-grace-period=[Time to wait before force killing a container]:grace period: "
"($help)--update-delay=[Delay between updates]:delay: "
"($help)--update-failure-action=[Action on update failure]:mode:(pause continue)"
"($help)--update-max-failure-ratio=[Failure rate to tolerate during an update]:fraction: "
"($help)--update-monitor=[Duration after each task update to monitor for failure]:window: "
"($help)--update-parallelism=[Maximum number of tasks updated simultaneously]:number: "
"($help -u --user)"{-u=,--user=}"[Username or UID]:user:_users"
"($help)--with-registry-auth[Send registry authentication details to swarm agents]"
@ -1185,6 +1187,7 @@ __docker_service_subcommand() {
"($help)*--container-label-rm=[Remove a container label by its key]:label: " \
"($help)*--group-rm=[Remove previously added user groups from the container]:group:_groups" \
"($help)--image=[Service image tag]:image:__docker_repositories" \
"($help)--rollback[Rollback to previous specification]" \
"($help -)1:service:__docker_complete_services" && ret=0
;;
(help)

View File

@ -38,6 +38,8 @@ Options:
--stop-grace-period value Time to wait before force killing a container (default none)
--update-delay duration Delay between updates
--update-failure-action string Action on update failure (pause|continue) (default "pause")
--update-max-failure-ratio value Failure rate to tolerate during an update
--update-monitor duration Duration after each task update to monitor for failure (default 0s)
--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
-u, --user string Username or UID (format: <name|uid>[:<group|gid>])
--with-registry-auth Send registry authentication details to Swarm agents

View File

@ -42,9 +42,12 @@ Options:
--restart-delay value Delay between restart attempts (default none)
--restart-max-attempts value Maximum number of restarts before giving up (default none)
--restart-window value Window used to evaluate the restart policy (default none)
--rollback Rollback to previous specification
--stop-grace-period value Time to wait before force killing a container (default none)
--update-delay duration Delay between updates
--update-failure-action string Action on update failure (pause|continue) (default "pause")
--update-max-failure-ratio value Failure rate to tolerate during an update
--update-monitor duration Duration after each task update to monitor for failure (default 0s)
--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
-u, --user string Username or UID (format: <name|uid>[:<group|gid>])
--with-registry-auth Send registry authentication details to Swarm agents