Reap zombie processes using a SIGCHLD
handler
Content |
Tested on |
Debian (Etch, Lenny, Squeeze) |
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal) |
Objective
To install a SIGCHLD
handler for reaping zombie processes
Background
When a child process terminates it does not disappear entirely. Instead it becomes a ‘zombie process’ which is no longer capable of executing, but which still has a PID and an entry in the process table. This is indicated by the state code Z
in ps
or top
.
The presence of a moderate number of zombie processes is not particularly harmful, but they add unnecessary clutter that can be confusing to the administrator. In extreme cases they could exhaust the number of available process table slots. For these reasons, well-behaved programs should ensure that zombie processes are removed in a timely manner.
The process of eliminating zombie processes is known as ‘reaping’. The simplest method is to call wait
, but this will block the parent process if the child has not yet terminated. Alternatives are to use waitpid
to poll or SIGCHLD
to reap asynchronously. The method described here uses SIGCHLD
.
Scenario
Suppose you have written a network server which spawns a separate child process to handle each connection. The child process terminates itself when the connection closes, without any involvement from the parent process. It would be unacceptable for the parent process to block, therefore calling wait
immediately after fork
is not an option.
Method
Overview
The method described here has two steps:
- Define a handler for
SIGCHLD
that callswaitpid
. - Register the
SIGCHLD
handler.
Note that the signal is named SIGCHLD
with an H
, as opposed to SIGCLD
(which has a similar function, but potentially different semantics and is non-portable).
The following header files are used:
Header | Used by |
---|---|
<signal.h> |
sigaction , sigemptyset , struct sigaction , SIGCHLD , SA_RESTART , SA_NOCLDSTOP
|
<stdio.h> |
perror |
<stdlib.h> |
exit |
<sys/wait.h> |
waitpid , pid_t , WNOHANG
|
<errno.h> |
errno |
Define a handler for SIGCHLD that calls waitpid
The operations that can be safely performed within a signal handler are very limited, but they include use of the waitpid
function:
void handle_sigchld(int sig) { int saved_errno = errno; while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {} errno = saved_errno; }
The reason for calling waitpid
as opposed to wait
is to allow use of the WNOHANG
option, which prevents the handler from blocking. This allows for the possibility of SIGCHLD
being raised for reasons other than the termination of a child process.
(SIGCHLD
has three conventional uses: to indicate that a child process has terminated, stopped or continued. The latter two conditions can be suppressed using SA_NOCLDSTOP
as described below, but that would not prevent a process with the right permissions from raising SIGCHLD
for any reason using the kill
function or an equivalent.)
The reason for placing waitpid
within a loop is to allow for the possibility that multiple child processes could terminate while one is in the process being reaped. Only one instance of SIGCHLD
can be queued, so it may be necessary to reap several zombie processes during one invocation of the handler function.
The loop ensures that any zombies which existed prior to invocation of the handler function will be reaped. If any further zombies come into being after that moment in time then they may or may not be reaped by that invocation of the handler function (depending on the timing), but they should leave behind a pending SIGCHLD
that will result in the handler being called again.
There is a possibility that waitpaid
could alter errno
. Saving errno
then restoring it afterwards prevents any change from interfering with code outside the handler.
Register the SIGCHLD handler
The POSIX-recommended method for registering a signal handler is to use the sigaction
function:
struct sigaction sa; sa.sa_handler = &handle_sigchld; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_RESTART | SA_NOCLDSTOP; if (sigaction(SIGCHLD, &sa, 0) == -1) { perror(0); exit(1); }
You should do this before any child processes terminate, which in practice means registering before any are spawned. (POSIX neither requires nor prohibits SIGCHLD
being raised in respect of a child that had already terminated when the handler was registered, so a program which relied on this happening might work but would not be portable.)
When an operating system function is interrupted by a signal the default behaviour is to return immediately (either with the error EINTR
, or reporting partial completion if that is possible). This creates a need for such functions to be wrapped in a loop for the purpose of handling EINTR
, which is both inconvenient and error-prone. Setting the SA_RESTART
flag when the signal is registered makes this unnecessary in most cases, and is recommended unless you have a good reason not to.
Setting the SA_NOCLDSTOP
flag prevents SIGCHLD
from being raised when a child process stops or continues (as opposed to terminating). Since our interest is confined to processes that have terminated, there no harm in this and it may prevent the handler being invoked unnecessarily. It does not obviate the need to use WNOHANG
within the handler because it does not prevent SIGCHLD
from being raised in some other way.
Alternatives
Explicitly set the SIGCHLD handler to SIG_IGN
If (as in the example above) the signal handler does nothing beyond calling waitpid
then an alternative is available. Setting the SIGCHLD
handler to SIG_IGN
will cause zombie processes to be reaped automatically:
struct sigaction sa; sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGCHLD, &sa, 0) == -1) { perror(0); exit(1); }
This can be implemented portably and somewhat more concisely with the signal
function if you prefer:
if (signal(SIGCHLD, SIG_IGN) == SIG_ERR) { perror(0); exit(1); }
Note that it is not sufficient for SIGCHLD
to have a disposition that causes it to be ignored (as the default, SIG_DFL
, would do): it is only by setting it to SIG_IGN
that this behaviour is obtained.
One drawback of this method is that it is slightly less portable than explicitly calling waitpid
: the behaviour it depends on is required by POSIX.1-2001, and previously by the Single Unix Specification, but not by POSIX.1-1990.
Set the SA_NOCLDWAIT flag
Another way to achieve the same outcome is to set the SA_NOCLDWAIT
flag when installing the signal handler:
struct sigaction sa; sa.sa_handler = &handle_sigchld; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_RESTART | SA_NOCLDSTOP | SA_NOCLDWAIT; if (sigaction(SIGCHLD, &sa, 0) == -1) { perror(0); exit(1); }
Unfortunately this is not as useful as it could be, because it is implementation-defined whether SIGCHLD
is raised in response to process termination when SA_NOCLDWAIT
is set. Since you cannot rely on the handler function being invoked, it follows that the handler cannot actually do anything if you want its behaviour to be portable. At that point you may as well set the handler to SIG_IGN
, in which case there is arguably no need to set SA_NOCLDWAIT
.
There is one small advantage to using SA_NOCLDWAIT
: if it is supported at all then you can be reasonably confident that it will have the desired behaviour, whereas for SIG_IGN
this is assured only if the operating system declares conformance to an appropriate version of POSIX or SUS.
See also
Further reading
- wait, waitpid, Base Specifications Issue 7, The Open Group, 2008
- <signal.h>, Base Specifications Issue 7, The Open Group, 2008