Erlang Liveness Checks in Kubernetes
Tue, Jun 12, 2018I have an ever-increasing number of small projects and deployments that I use either internally or with some availability to the public, and have been relying on Kubernetes to make managing them easy. Not too long ago, I started adding a liveness probe to each pod definition as a contingency against a hung runtime. My pod definition at the time looked like this example from my Aflame project:
containers:
- name: aflame
image: docker-registry:5000/erlang-aflame
livenessProbe:
exec:
command:
- /deploy/bin/aflame
- ping
initialDelaySeconds: 5
This worked fine, and I was able to verify that nodes that failed the ping test would be taken down. However, some weeks later I was poking around and realized that CPU utilization on my server was significantly higher than I would expect.
Looking at htop
, it was difficult to see a precise culprit, except for
a large number of processes called erl_child_setup
coming into
existence, pegging a CPU core and disappearing again. After googling around, I
landed on the
source code
for this task and found this section of main:
/* We close all fds except the uds from beam.
All other fds from now on will have the
CLOEXEC flags set on them. This means that we
only have to close a very limited number of fds
after we fork before the exec. */
#if defined(HAVE_CLOSEFROM)
closefrom(4);
#else
for (i = 4; i < max_files; i++)
#if defined(__ANDROID__)
if (i != system_properties_fd())
#endif
(void) close(i);
#endif
According to the ps
output on my system, max_files
was getting set to
1,048,576 - so every time this program run, it was hot looping over a million
possible file descriptors and calling close
on each! No wonder it was
resulting in so much system time. But what was actually causing all of these
erl_child_setup
calls? I had initially suspected a misbehaving deployment
code, but the spike in load lined up with when I added the liveness checks. It
turns out that the relx
ping
command is surprisingly heavyweight; or at
least ends up that way when run in an environment with a very high max file
count, which was the case under docker.
The fix
In order to fix this, I altered my liveness probe to run the ping check with a
low ulimit
on the number of open files. We still need to loop, but we are at
least looping over a considerably more restrained number of descriptors.
I also took the
opportunity to increase the interval between checks, since the default check
period is rather quick.
containers:
- name: aflame
image: docker-registry:5000/erlang-aflame
livenessProbe:
exec:
command:
+ - softlimit
+ - -o
+ - "128"
- /deploy/bin/aflame
- ping
initialDelaySeconds: 5
+ periodSeconds: 60
Note that in order to get softlimit
in your container, you may need to
rebuild with daemontools
package installed, or another source that contains
a limit utility.
With these changes, system load rapidly dropped from nearly 20 to a more
reasonable 3. The default performance of this wrapper would likely be helped by
an adoption of the closefrom
syscall into the Linux kernel, but unfortunately
the only references I can find to this are
a pessimistic ticket from 2009
and an unmerged
patch from Zheng Liu in 2014.