Rate this page

Flattr this

Reap zombie processes using a SIGCHLD handler

Tested on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal)

Objective

To install a SIGCHLD handler for reaping zombie processes

Background

When a child process terminates it does not disappear entirely. Instead it becomes a ‘zombie process’ which is no longer capable of executing, but which still has a PID and an entry in the process table. This is indicated by the state code Z in ps or top.

The presence of a moderate number of zombie processes is not particularly harmful, but they add unnecessary clutter that can be confusing to the administrator. In extreme cases they could exhaust the number of available process table slots. For these reasons, well-behaved programs should ensure that zombie processes are removed in a timely manner.

The process of eliminating zombie processes is known as ‘reaping’. The simplest method is to call wait, but this will block the parent process if the child has not yet terminated. Alternatives are to use waitpid to poll or SIGCHLD to reap asynchronously. The method described here uses SIGCHLD.

Scenario

Suppose you have written a network server which spawns a separate child process to handle each connection. The child process terminates itself when the connection closes, without any involvement from the parent process. It would be unacceptable for the parent process to block, therefore calling wait immediately after fork is not an option.

Method

Overview

The method described here has two steps:

  1. Define a handler for SIGCHLD that calls waitpid.
  2. Register the SIGCHLD handler.

Note that the signal is named SIGCHLD with an H, as opposed to SIGCLD (which has a similar function, but potentially different semantics and is non-portable).

The following header files are used:

Header Used by
<signal.h> sigaction, sigemptyset, struct sigaction, SIGCHLD, SA_RESTART, SA_NOCLDSTOP
<stdio.h> perror
<stdlib.h> exit
<sys/wait.h> waitpid, pid_t, WNOHANG
<errno.h> errno

Define a handler for SIGCHLD that calls waitpid

The operations that can be safely performed within a signal handler are very limited, but they include use of the waitpid function:

void handle_sigchld(int sig) {
  int saved_errno = errno;
  while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}
  errno = saved_errno;
}

The reason for calling waitpid as opposed to wait is to allow use of the WNOHANG option, which prevents the handler from blocking. This allows for the possibility of SIGCHLD being raised for reasons other than the termination of a child process.

(SIGCHLD has three conventional uses: to indicate that a child process has terminated, stopped or continued. The latter two conditions can be suppressed using SA_NOCLDSTOP as described below, but that would not prevent a process with the right permissions from raising SIGCHLD for any reason using the kill function or an equivalent.)

The reason for placing waitpid within a loop is to allow for the possibility that multiple child processes could terminate while one is in the process being reaped. Only one instance of SIGCHLD can be queued, so it may be necessary to reap several zombie processes during one invocation of the handler function.

The loop ensures that any zombies which existed prior to invocation of the handler function will be reaped. If any further zombies come into being after that moment in time then they may or may not be reaped by that invocation of the handler function (depending on the timing), but they should leave behind a pending SIGCHLD that will result in the handler being called again.

There is a possibility that waitpaid could alter errno. Saving errno then restoring it afterwards prevents any change from interfering with code outside the handler.

Register the SIGCHLD handler

The POSIX-recommended method for registering a signal handler is to use the sigaction function:

struct sigaction sa;
sa.sa_handler = &handle_sigchld;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
if (sigaction(SIGCHLD, &sa, 0) == -1) {
  perror(0);
  exit(1);
}

You should do this before any child processes terminate, which in practice means registering before any are spawned. (POSIX neither requires nor prohibits SIGCHLD being raised in respect of a child that had already terminated when the handler was registered, so a program which relied on this happening might work but would not be portable.)

When an operating system function is interrupted by a signal the default behaviour is to return immediately (either with the error EINTR, or reporting partial completion if that is possible). This creates a need for such functions to be wrapped in a loop for the purpose of handling EINTR, which is both inconvenient and error-prone. Setting the SA_RESTART flag when the signal is registered makes this unnecessary in most cases, and is recommended unless you have a good reason not to.

Setting the SA_NOCLDSTOP flag prevents SIGCHLD from being raised when a child process stops or continues (as opposed to terminating). Since our interest is confined to processes that have terminated, there no harm in this and it may prevent the handler being invoked unnecessarily. It does not obviate the need to use WNOHANG within the handler because it does not prevent SIGCHLD from being raised in some other way.

Alternatives

Explicitly set the SIGCHLD handler to SIG_IGN

If (as in the example above) the signal handler does nothing beyond calling waitpid then an alternative is available. Setting the SIGCHLD handler to SIG_IGN will cause zombie processes to be reaped automatically:

struct sigaction sa;
sa.sa_handler = SIG_IGN;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGCHLD, &sa, 0) == -1) {
  perror(0);
  exit(1);
}

This can be implemented portably and somewhat more concisely with the signal function if you prefer:

if (signal(SIGCHLD, SIG_IGN) == SIG_ERR) {
  perror(0);
  exit(1);
}

Note that it is not sufficient for SIGCHLD to have a disposition that causes it to be ignored (as the default, SIG_DFL, would do): it is only by setting it to SIG_IGN that this behaviour is obtained.

One drawback of this method is that it is slightly less portable than explicitly calling waitpid: the behaviour it depends on is required by POSIX.1-2001, and previously by the Single Unix Specification, but not by POSIX.1-1990.

Set the SA_NOCLDWAIT flag

Another way to achieve the same outcome is to set the SA_NOCLDWAIT flag when installing the signal handler:

struct sigaction sa;
sa.sa_handler = &handle_sigchld;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_NOCLDSTOP | SA_NOCLDWAIT;
if (sigaction(SIGCHLD, &sa, 0) == -1) {
  perror(0);
  exit(1);
}

Unfortunately this is not as useful as it could be, because it is implementation-defined whether SIGCHLD is raised in response to process termination when SA_NOCLDWAIT is set. Since you cannot rely on the handler function being invoked, it follows that the handler cannot actually do anything if you want its behaviour to be portable. At that point you may as well set the handler to SIG_IGN, in which case there is arguably no need to set SA_NOCLDWAIT.

There is one small advantage to using SA_NOCLDWAIT: if it is supported at all then you can be reasonably confident that it will have the desired behaviour, whereas for SIG_IGN this is assured only if the operating system declares conformance to an appropriate version of POSIX or SUS.

See also

Further reading

Tags: c | posix | process | signal