Mar 26, 2017 - 4 Minute Read

Signals from the past

It’s just common knowledge that pressing Ctrl-C will usually terminate a long running command in a shell. How cool is that? But, how does it work?

Every developer that has done some little system programming on Linux knows that the answer is signals. They are defined in the POSIX standard and are a (very) limited form of inter-process communication. The idea is really simple: a signal is sent to a program to notify it that an event has occurred. It could be the Ctrl-C event(aka SIGINT) or the request of program termination(SIGTERM).

The really interesting part is that programs can react to those events providing really cool features. It’s extremely convenient to abort the compilation(without corrupting anything) of the Linux kernel just by pressing Ctrl-C, isn’t it? Therefore one would think that such a useful thing to have would be trivial to implement, right? Wrong. It’s really a pain. Let’s see why.

At the beginning there was signal

As defined in the POSIX standard, the primitive for registering a callback to a signal is signal. The semantic is pretty easy, just call it with the signal you’re interested in and provide the proper callback.

It turns out writing a correct signal handler is way more harder than writing a multi-threaded program. The problem is that your program stops as soon as it catches the signal, no matter what it was doing. That means that everything could possibly be in an inconsistent state(yes, even if you protect your code with locks). At this point I’m hearing you saying: “hey, what can I do in the handler then?”. Well, there are some system calls that are guaranteed to work(open for example), but they’re really just a few and even basic functions like printf are not in there because they acquire locks internally.

The probably safest way to write a signal handler is to set a sig_atomic_t flag(that is guaranteed to work) once a signal arrives and then poll that flag and do something useful when it’s set. Polling is always your friend.

But let’s ignore the above for now, there is another pitfall in using signal: the signal handler gets reset after the signal is received. Ok, you say, I’m gonna register the handler again in the handler itself and that’s it. Unfortunately no, because it’s possible that another signal arrives in the meantime and that’s a race condition.

Ok, signal is broken, let’s build something new.

Then sigaction came

signal was replaced by sigaction and obviously that works perfectly…</sarcasm> It suffers the same usability problem of signal. However, it introduced some interesting ideas. For example it’s possible to retrieve some useful information about the signal like the sender process pid or the user id(see the siginfo_t struct and the sa_sigaction attribute in sigaction).

…along with sigwait

Then some clever folks thought that we could just block the current thread until a signal arrives; sigwait was born. It turns out that’s a pretty usable and easy API. Just spawn another thread that blocks until one of a given set of signals arrives and do something according to the signal received. This way the code would run “normally” without interrupting the normal execution of the code eliminating the limitation of the signal handler. That’s awesome.

Though, with threads come great responsibility. You need to properly protect shared variables with locks and so forth, but we’re more confident with these kind of problems than with weird interrupt ones. Moreover, threads have their own sigmask and that can cause some problems, because a thread could ignore some signals that are worth catching.

This works, but we can do better.

…and signalfd eventually killed them all

Some other folks then thought that we could just expose the events via a file descriptor following the UNIX philosophy. That rocks, UNIX rocks. Once we have a file descriptor we can rule the world! We can integrate with existing event loops such as epoll or select or we can just call the sync functions. Everything that’s left to us is just reading the siginfo_t from the file descriptor and call the appropriate function for the signal received. That’s what I mean for “easy”.


Funny things

  • signals are aggregated, that means that on the imaginary “signals queue” there will be at most one instance of any signals. This problem affects signalfd as well as signal and sigaction, but I don’t really see where the problem is, because you eventually get the signal…;

  • most system calls can be interrupted by a signal and in those cases they return EINTR that is “the call was interrupted, please retry but consider what we’ve done so far”; I always wondered why there are that big warnings in several programming languages docs saying that the API could return less things than it was asked for. This is certainly a reason.

Literature