Health Checking Your Docker Containers
Are your containers feeling under the weather? Struggling to get out of bed? See how you can build in health checks to make sure your containers are fighting fit.
Join the DZone community and get the full member experience.
Join For FreeOne of the new features in Docker 1.12 is how health check for a container can be baked into the image definition. And this can be overridden at the command line.
Just like the CMD instruction, there can be multiple HEALTHCHECK instructions in Dockerfile but only the last one is effective.
This is a great addition because a container reporting status as Up 1 hour
may return errors. The container may be up but there is no way for the application inside the container to provide a status. This instruction fixes that.
The Dockerfile that builds arungupta/couchbase image is:
FROM couchbase:latest
COPY configure-node.sh /opt/couchbase
HEALTHCHECK --interval=5s --timeout=3s CMD curl --fail http://localhost:8091/pools || exit 1
CMD ["/opt/couchbase/configure-node.sh"]
It uses a configure-node.sh script to configure the server using the Couchbase REST API. The new instruction to notice here is HEALTHCHECK.
This instruction can be specified as:
HEALTHCHECK <options> CMD <command>
The <options> can be:
--interval=DURATION (default 30s).
--timeout=DURATION (default 30s).
--retries=N (default 3).
The <command> is the command that runs inside the container to check the health.
If health check is enabled, then the container can have three states:
Starting: Initial status when the container is still starting.
Healthy: If the command succeeds, then the container is healthy.
Unhealthy: If a single run of the <command> takes longer than the specified timeout, then it is considered unhealthy. If a health check fails, then the <command> will run retries number of times and will be declared unhealthy if the <command> still fails.
The commands exit status indicates the health status of the container. The following values are allowed:
0: container is healthy.
1: container is not healthy.
In our instruction, /pools REST API is invoked using curl. If the command fails then an exit status of 1 is returned, and this marks the container unhealthy for that attempt. This command is invoked every 5 seconds. The container is marked unhealthy if the command does not return successfully within 3 seconds.
Run the container as:
docker run -d --name db arungupta/couchbase:latest
Check the status:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
55b14302671e arungupta/couchbase:latest "/entrypoint.sh /opt/" 2 seconds ago Up 1 seconds (health: starting) 8091-8094/tcp, 11207/tcp, 11210-11211/tcp, 18091-18093/tcp db
Notice how health: starting status is reported in the STATUS column. Checking after a few seconds shows the status:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
55b14302671e arungupta/couchbase:latest "/entrypoint.sh /opt/" About a minute ago Up About a minute (healthy) 8091-8094/tcp, 11207/tcp, 11210-11211/tcp, 18091-18093/tcp db
And now it's reported healthy.
More details about this HEALTHCHECK instruction can be found on docs.docker.com.
Now, if you are running an image that does not have HEALTHCHECK instruction then the docker run command can be used to specify similar values. An equivalent runtime command would be:
docker run -d --name db --health-cmd "curl --fail http://localhost:8091/pools || exit 1" --health-interval=5s --timeout=3s arungupta/couchbase
The last five health checks for a container can be obtained using the docker inspect
command:
docker inspect --format='{{json .State.Health}}' db
The output is shown as:
{
"Status": "healthy",
"FailingStreak": 0,
"Log": [
{
"Start": "2016-11-12T03:23:03.351561Z",
"End": "2016-11-12T03:23:03.422176171Z",
"ExitCode": 0,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 768 100 768 0 0 595k 0 --:--:-- --:--:-- --:--:-- 750k\n{\"isAdminCreds\":true,\"isROAdminCreds\":false,\"isEnterprise\":true,\"pools\":[{\"name\":\"default\",\"uri\":\"/pools/default?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"streamingUri\":\"/poolsStreaming/default?uuid=1b84cdbd136e4e8466049dd062dd6969\"}],\"settings\":{\"maxParallelIndexers\":\"/settings/maxParallelIndexers?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"viewUpdateDaemon\":\"/settings/viewUpdateDaemon?uuid=1b84cdbd136e4e8466049dd062dd6969\"},\"uuid\":\"1b84cdbd136e4e8466049dd062dd6969\",\"implementationVersion\":\"4.5.1-2844-enterprise\",\"componentsVersion\":{\"lhttpc\":\"1.3.0\",\"os_mon\":\"2.2.14\",\"public_key\":\"0.21\",\"asn1\":\"2.0.4\",\"kernel\":\"2.16.4\",\"ale\":\"4.5.1-2844-enterprise\",\"inets\":\"5.9.8\",\"ns_server\":\"4.5.1-2844-enterprise\",\"crypto\":\"3.2\",\"ssl\":\"5.3.3\",\"sasl\":\"2.3.4\",\"stdlib\":\"1.19.4\"}}"
},
{
"Start": "2016-11-12T03:23:08.423558928Z",
"End": "2016-11-12T03:23:08.510122392Z",
"ExitCode": 0,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 768 100 768 0 0 309k 0 --:--:-- --:--:-- --:--:-- 375k\n{\"isAdminCreds\":true,\"isROAdminCreds\":false,\"isEnterprise\":true,\"pools\":[{\"name\":\"default\",\"uri\":\"/pools/default?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"streamingUri\":\"/poolsStreaming/default?uuid=1b84cdbd136e4e8466049dd062dd6969\"}],\"settings\":{\"maxParallelIndexers\":\"/settings/maxParallelIndexers?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"viewUpdateDaemon\":\"/settings/viewUpdateDaemon?uuid=1b84cdbd136e4e8466049dd062dd6969\"},\"uuid\":\"1b84cdbd136e4e8466049dd062dd6969\",\"implementationVersion\":\"4.5.1-2844-enterprise\",\"componentsVersion\":{\"lhttpc\":\"1.3.0\",\"os_mon\":\"2.2.14\",\"public_key\":\"0.21\",\"asn1\":\"2.0.4\",\"kernel\":\"2.16.4\",\"ale\":\"4.5.1-2844-enterprise\",\"inets\":\"5.9.8\",\"ns_server\":\"4.5.1-2844-enterprise\",\"crypto\":\"3.2\",\"ssl\":\"5.3.3\",\"sasl\":\"2.3.4\",\"stdlib\":\"1.19.4\"}}"
},
{
"Start": "2016-11-12T03:23:13.511446818Z",
"End": "2016-11-12T03:23:13.58141325Z",
"ExitCode": 0,
"Output": " {\"isAdminCreds\":true,\"isROAdminCreds\":false,\"isEnterprise\":true,\"pools\":[{\"name\":\"default\",\"uri\":\"/pools/default?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"streamingUri\":\"/poolsStreaming/default?uuid=1b84cdbd136e4e8466049dd062dd6969\"}],\"settings\":{\"maxParallelIndexers\":\"/settings/maxParallelIndexers?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"viewUpdateDaemon\":\"/settings/viewUpdateDaemon?uuid=1b84cdbd136e4e8466049dd062dd6969\"},\"uuid\":\"1b84cdbd136e4e8466049dd062dd6969\",\"implementationVersion\":\"4.5.1-2844-enterprise\",\"componentsVersion\":{\"lhttpc\":\"1.3.0\",\"os_mon\":\"2.2.14\",\"public_key\":\"0.21\",\"asn1\":\"2.0.4\",\"kernel\":\"2.16.4\",\"ale\":\"4.5.1-2844-enterprise\",\"inets\":\"5.9.8\",\"ns_server\":\"4.5.1-2844-enterprise\",\"crypto\":\"3.2\",\"ssl\":\"5.3.3\",\"sasl\":\"2.3.4\",\"stdlib\":\"1.19.4\"}} % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 768 100 768 0 0 248k 0 --:--:-- --:--:-- --:--:-- 375k\n"
},
{
"Start": "2016-11-12T03:23:18.583512367Z",
"End": "2016-11-12T03:23:18.677727356Z",
"ExitCode": 0,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dlo{\"isAdminCreds\":true,\"isROAdminCreds\":false,\"isEnterprise\":true,\"pools\":[{\"name\":\"default\",\"uri\":\"/pools/default?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"streamingUri\":\"/poolsStreaming/default?uuid=1b84cdbd136e4e8466049dd062dd6969\"}],\"settings\":{\"maxParallelIndexers\":\"/settings/maxParallelIndexers?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"viewUpdateDaemon\":\"/settings/viewUpdateDaemon?uuid=1b84cdbd136e4e8466049dd062dd6969\"},\"uuid\":\"1b84cdbd136e4e8466049dd062dd6969\",\"implementationVersion\":\"4.5.1-2844-enterprise\",\"componentsVersion\":{\"lhttpc\":\"1.3.0\",\"os_mon\":\"2.2.14\",\"public_key\":\"0.21\",\"asn1\":\"2.0.4\",\"kernel\":\"2.16.4\",\"ale\":\"4.5.1-2844-enterprise\",\"inets\":\"5.9.8\",\"ns_server\":\"4.5.1-2844-enterprise\",\"crypto\":\"3.2\",\"ssl\":\"5.3.3\",\"sasl\":\"2.3.4\",\"stdlib\":\"1.19.4\"}}ad Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 768 100 768 0 0 307k 0 --:--:-- --:--:-- --:--:-- 375k\n"
},
{
"Start": "2016-11-12T03:23:23.679661467Z",
"End": "2016-11-12T03:23:23.782372291Z",
"ExitCode": 0,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left{\"isAdminCreds\":true,\"isROAdminCreds\":false,\"isEnterprise\":true,\"pools\":[{\"name\":\"default\",\"uri\":\"/pools/default?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"streamingUri\":\"/poolsStreaming/default?uuid=1b84cdbd136e4e8466049dd062dd6969\"}],\"settings\":{\"maxParallelIndexers\":\"/settings/maxParallelIndexers?uuid=1b84cdbd136e4e8466049dd062dd6969\",\"viewUpdateDaemon\":\"/settings/viewUpdateDaemon?uuid=1b84cdbd136e4e8466049dd062dd6969\"},\"uuid\":\"1b84cdbd136e4e8466049dd062dd6969\",\"implementationVersion\":\"4.5.1-2844-enterprise\",\"componentsVersion\":{\"lhttpc\":\"1.3.0\",\"os_mon\":\"2.2.14\",\"public_key\":\"0.21\",\"asn1\":\"2.0.4\",\"kernel\":\"2.16.4\",\"ale\":\"4.5.1-2844-enterprise\",\"inets\":\"5.9.8\",\"ns_server\":\"4.5.1-2844-enterprise\",\"crypto\":\"3.2\",\"ssl\":\"5.3.3\",\"sasl\":\"2.3.4\",\"stdlib\":\"1.19.4\"}} Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 768 100 768 0 0 439k 0 --:--:-- --:--:-- --:--:-- 750k\n"
}
]
}
Related Refcard:
Published at DZone with permission of Arun Gupta, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments