A small event loop library
Event loops libraries are optimized for the case of programs with huge number of sockets, timers, etc. It’s great, big programs needs performances, but what about small programs?
What is a “small application”? It’s hard to say, but I’d define it as a program that needs less than, say, 50 sockets at runtime. Something like that. The definition is fuzzy on purpose. A server is probably not a “small application” even if the size of the codebase is modest, because it may need to handle a huge number of events. A client may have a decently sized codebase, yet not need to handle more than a bunch of sockets at the same time.
When I’m working on an application that needs to monitor just a few connections and maybe some timers, libevent feels a bit overkill, even if it has handy APIs.
In this post I’m going to describe a small event loop built for small applications instead. I wrote it originally for amused, my music player, and then used it also for telescope, my “small internet” browser.
amused: minimalistic music player
telescope: multi-protocol browser for the small internet
It’s a small event loop library that supports callbacks for file descriptor activities, signals and timers, all in about 500 lines of code, counting comments and blanks. It doesn’t provide any buffered I/O facility like libevent’ bufferevents, but it’s easy to plug another layer on top: telescope uses a ‘bufio’ layer that supports TLS too, and it’s, again, less than 500 lines (blanks et al.)
It uses poll(2) is because it’s an event loop for small programs and because it’s portable. It is still possible to provide the same API using a different backend, so select(2), kqueue(2), or epoll(2) could be used if needed.
Like most small things, it’s compact and written with the actual usage in mind. It is still simple and flexible enough to adapt (and to be easily adapted) for other usages as well. When applicable, I’ll also discuss some different implementation choices. Finally, and most importantly, it’s been both educative and fun to write.
Monitoring file descriptors
I’ll show one bit at a time, starting from the fundamentals: registering callbacks for file descriptor events.
This is the public interface:
#define EV_READ 0x1 #define EV_WRITE 0x2 int ev_init(void); int ev_add(int, int, void(*)(int, int, void *), void *); int ev_del(int); int ev_loop(void); void ev_break(void);
One note on the naming and usage is due: without doing it on purpose, I’ve ended up using some of the names used by libev, so as-is it’s impossible to use it together in the same application. I’m also using some constants that conflict with libevent, so it’s likewise impossible to use them both in the same source file. I don’t consider it an issue since an application generally uses only one event loop library; otherwise, the code is small enough that doing a rename is not problematic.
Since I don’t use threads, there’s only one implicit global event loop and these functions operates on it. If thread support is needed, having ‘ev_init’ returning a struct representing the event loop and passing this to the other functions instead of relying on a global state is straightforward.
The functions should behave as follows:
- ‘ev_init’: initializes the event loop.
- ‘ev_add’: registers a (persistent) callback for a file descriptor.
- ‘ev_del’: removes the monitoring of the file descriptor.
- ‘ev_loop’: starts the event loop.
- ‘ev_break’: stops the event loop, to be used from a callback.
Internally, for convenience, we’ll use the struct ‘evcb’ to store the tuple of the event callback and its data:
struct evcb { void (*cb)(int, int, void *); void *udata; };
The struct ‘evbase’ is the state of the event loop; this can be put (opaque) in the public APIs in case the application uses threads and needs multiple event loops:
struct evbase { size_t len; struct pollfd *pfds; size_t pfdlen; struct evcb *cbs; size_t cblen; /* more to follow for signals and timers */ };
Lastly, the global state is “just” a struct ‘evbase’ and a flag for stopping.
static struct evbase *base; static int ev_stop;
To track the file descriptors I’m using two arrays, one for the struct pollfd needed for poll(2), and one for the callbacks. Later, we’ll use the file descriptors number as index of the array: this gives us a nice O(1) for ‘ev_add’ and ‘ev_del’, and also to map poll(2) results to the event callbacks.
In short: poll(2) is a system call that accepts an array of structs ‘pollfd’ that are fundamentally tuples of a file descriptor and the events to report for it; when it returns, these array items are updated to report which events were triggered. If a file descriptor is -1 it is ignored.
Now, the actual implementation of the interfaces. ‘ev_init’ is straightforward:
int ev_init(void) { if (base != NULL) { errno = EINVAL; return -1; } if ((base = calloc(1, sizeof(*base))) == NULL) return -1; if (ev_resize(16) == -1) { free(base->pfds); free(base->cbs); free(base); base = NULL; return -1; } return 0; }
‘ev_resize’ is a small helper function that resizes the two arrays, ‘pfds’ and ‘cbs’, to the given size. I’ll leave it as an exercise to the reader.
‘ev_add’ is similarly very simple. I’m also using a small helper function to convert between ‘EV_READ’/‘EV_WRITE’ and the poll(2) flags.
static inline int ev2poll(int ev) { int ret = 0; if (ev & EV_READ) ret |= POLLIN; if (ev & EV_WRITE) ret |= POLLOUT; return (ret); } int ev_add(int fd, int ev, void (*cb)(int, int, void *), void *udata) { if (fd < 0) { errno = EBADF; return -1; } /* make room if necessary */ if ((size_t)fd >= base->len) { if (ev_resize(fd + 1) == -1) return -1; } /* set the pollfd structure */ base->pfds[fd].fd = fd; base->pfds[fd].events = ev2poll(ev); base->pfds[fd].revents = 0; /* ...and save the callback/data as well */ base->cbs[fd].cb = cb; base->cbs[fd].udata = udata; return 0; }
‘ev_del’ is also left as an excersise for the reader. I’m showing ‘ev_loop’ however, which is also (un)surprisingly straightforward, like most of the rest of the code. All it does is to loop over poll(2) and then invoke the callbacks for the events that fired.
int ev_loop(void) { size_t i; int n; while (!ev_stop) { if ((n = poll(base->pfds, base->len, -1)) == -1) { if (errno != EINTR) return -1; } /* * iterate on the array until we know we still have unseen * events */ for (i = 0; i < base->len && n > 0 && !ev_stop; ++i) { if (base->pfds[i].fd == -1) continue; if (base->pfds[i].revents & (POLLIN|POLLOUT|POLLHUP)) { n--; base->cbs[i].cb(base->pfds[i].fd, poll2ev(base->pfds[i].revents), base->cbs[i].udata); } } } return 0; }
‘poll2ev’ is another helper to convert poll(2) flags to ev’s ‘EV_READ’ and ‘EV_WRITE’ flags.
I’m not showing the code for ‘ev_break’ either, but it just sets the flag so that ev_loop() exits. I never had the need for restarting an event loop, nor running only one “step”, so for the sake of simplicity I’m not providing these functionalities.
This is enough to watch sockets and has a libevent-esque interface in just a handful lines of code. However, real applications also need to react to (some) signals as well.
Catching signals
Now that we have a basic file descriptor monitoring in place, we can build upon it some signal handling.
While linux has a specific interface for treating signals as files, we’re aiming to be portable and will use the self-pipe trick/technique. The idea is to set up a pipe with, well, ourselves, and use the signal handler to write to one ends of the pipe. Later, in the event loop, we’ll see the write and we’ll know the event was caught: this is because there is only little we can do safely in a signal handler context, so we’ll limit ourselves to just a write(2).
First, we’ll need to extend the public interface to accommodate for signals:
/* ... */ #define EV_SIGNAL 0x4 /* ... */ int ev_signal(int, void(*)(int, int, void *), void *);
The interface is similar to the one for signals but needs the signal number instead of a file descriptor.
The pipe is also “allocated” on the evbase:
struct evbase { /* ... */ int sigpipe[2]; struct evcb sigcb; };
Since we’re not always going to need event handling, the pipe is created on demand in ‘ev_signal’
int ev_init(void) { /* ... */ base->sigpipe[0] = -1; base->sigpipe[1] = -1; return 0; } int ev_signal(int sig, void (*cb)(int, int, void *), void *udata) { int flags; if (base->sigpipe[0] == -1) { if (pipe(base->sigpipe) == -1) return -1; /* mark the pipe as non-blocking */ if ((flags = fcntl(base->sigpipe[1], F_GETFL)) == -1 || fcntl(base->sigpipe[1], F_SETFL, flags | O_NONBLOCK) == -1) return -1; if (ev_add(base->sigpipe[0], EV_READ, ev_sigdispatch, NULL) == -1) return -1; } base->sigcb.cb = cb; base->sigcb.udata = udata; signal(sig, ev_sigcatch); return 0; }
The write end of the pipe is marked as non-block as a precaution against deadlock: I don’t want the signal handler to wait on a filled pipe! If it means that under catastrophic conditions it may loose some signals, it’s tolerable.
To make this work we actually need two more helper functions: the “real” signal handler:
static void ev_sigcatch(int signo) { unsigned char s; int err; err = errno; /* * We should be able to write up to PIPE_BUF bytes without * blocking. */ s = signo; (void) write(base->sigpipe[1], &s, sizeof(s)); errno = err; }
Note how ‘errno’ is preserved in case of a write failure. I’m also assuming that the signal number can fit a char; I doubt that there are UNIX variants with more than 256 signals, and I’m worried of partial writes of larger integers.
When a signal is caught, the event loop will see an activity on the other half of the pipe and the ‘ev_sigdispatch’ callback will be fired:
static void ev_sigdispatch(int fd, int ev, void *data) { unsigned char signo; if (read(fd, &signo, sizeof(signo)) != sizeof(signo)) return; base->sigcb.cb(signo, EV_SIGNAL, base->sigcb.udata); }
To keep the implementation simple, ‘ev_signal’ has only one callback for all the signals. This is admittedly a footgun in the API design, since it may seem that it’s possible to use different callbacks per signal. Usually, an application uses (or can use) just one signal handler, but in case is not difficult to accommodate to different callbacks per signal.
We could use an array to map signal numbers to callback functions. There is no portable way to determine the maximum number of signals, but a number like 64 or 128 should be practically fine across all UNIX variants, although I haven’t checked.
That’s it for the signals, it’s just a very tiny layer over the file descriptor monitoring.
Handling timers
The ability to schedule events after a given amount of time is crucially useful. For example, a client may want to close its connection if the server doesn’t reply in a decent amount of time instead of waiting indefinitely. Or maybe an animation has to be updated. Or maybe we want to throttle an heavy operation that is triggered often, like handling resizes of the window.
There are different ways to handle timers. One way could be with linux’ timerfd to get a file descriptor representing the timer, but it’s non-portable and wastes a file descriptor per timer.
Another way of handling timers is to use setitimer(2), which is portable, but relies on SIGALRM and there are only a few timers we can create. Relying on signals is also a bit gross, so I’m using a different technique.
poll(2) optionally accepts the number of milliseconds to wait for events. Milliseconds granularity is good enough for my use-cases, the smaller timer telescope needs is of a quarter of second, so it’s “good enough” for our purposes. Otherwise there’s ppoll(2), a very recent addition to POSIX, which provides a finer granularity.
To design the timer APIs I’ve taken inspiration from javascript ‘setInterval’ and ‘setTimeout’ functions: each timer is associated with an unque integer identifier, a timer id, that can be used to cancel it. Unlike javascript (and libevent) only one “flavor” is provided: timers. For repeating timers, or intervals, the user callback will have to schedule itself again, not a big deal.
The public APIs is:
/* ... */ #define EV_TIMEOUT 0x8 /* ... */ unsigned int ev_timer(const struct timeval *, void(*)(int, int, void *), void *); int ev_timer_pending(unsigned int); int ev_timer_cancel(unsigned int);
So three new functions: one to create a timer, one to check if it’s still pending, and finally one to cancel a scheduled timer.
The internal state also needs some additions for handling timers:
struct evtimer { unsigned int id; struct timeval tv; struct evcb cb; }; struct evbase { /* ... */ unsigned int tid; struct evtimer *timers; size_t ntimers; size_t reserve_from; size_t reserve_till; size_t timerscap; };
‘tid‘ is our timer id counter. We use it to generate new “unique” thread ids, eventually by wrapping around. As we’ll see later, zero is reserved as special value to indicate a failure in allocating the timer.
Then there’s probably the most interesting bit seen so far. The timers are stored in a binary heap, which is by the way my favourite data structure. It’s very simple to implement, yet has some good algorithmic properties.
Reminder: a heap is a tree-based data structure that can be represented as an array. It does not have the strong properties of more complex tree-based data structures but in turn is very easy to implement.
There’s one twist to the implementation however: the reserve zone. This is just a term I've come up with. The idea is to reserve a scratch area after the end of the heap to add new timers. Later, before using the heap, we do one heapify on the whole array (existing heap plus this reserve zone) instead of bubbling up on each timer addition. As a bonus, we only need to implement heapify and bubbledown and can leave out bubbleup.
Reminder: bubbledown and bubbleup are two basic operations on heaps, used to “sift up” or down elements in a heap. They’re usually employed for insertion and deletion purposes.
Edit: cage (thanks!) suggested the “nursery” name which I strongly prefer. I’m still keeping scratch/reserve term to avoid updating the code right now but future revisions of ‘ev.c’ will change the nomenclature.
The nice thing about using simple data structures is that the implementation is also easy.
‘ev_timer’ is straightforward, it just needs to make sure there’s enough space in the scratch zone and fill the entry for the timer.
unsigned int ev_timer(const struct timeval *tv, void (*cb)(int, int, void*), void *udata) { struct evtimer *evt; void *t; size_t newcap; unsigned int nextid; if (tv == NULL) { errno = EINVAL; return 0; } if (base->reserve_till == base->timerscap) { newcap = base->timerscap + 8; t = recallocarray(base->timers, base->timerscap, newcap, sizeof(*base->timers)); if (t == NULL) return 0; base->timers = t; base->timerscap = newcap; } if ((nextid = ++base->tid) == 0) nextid = ++base->tid; evt = &base->timers[base->reserve_till]; evt->id = nextid; memcpy(&evt->tv, tv, sizeof(*tv)); evt->cb.cb = cb; evt->cb.udata = udata; base->reserve_till++; return (nextid); }
Since we have two places where timers may be, the heap and the scratch zone, a helper function to find the offset of a timer given its id will come in handy very soon:
static int find_timer(unsigned int id, size_t *pos) { size_t i; if (id == 0) return (0); for (i = 0; i < base->ntimers; ++i) { if (base->timers[i].id == id) { *pos = i; return (1); } } for (i = base->reserve_from; i < base->reserve_till; ++i) { if (base->timers[i].id == id) { *pos = i; return (1); } } return (0); }
Note that there may be a gap between the end of the heap and the start of the reserve zone.
In some cases, it could be helpful to know whether a timer is still pending. We do so via the public ‘ev_timer_pending’, which is just a wrapper around ‘find_timer’ and not so interesting to show the implementation. This is not a cheap operation since it’s linear on the number of pending timers, but it’s useful to provide.
We’ll also need a small helper to remove a timer from the heap:
static inline void cancel_timer(size_t i) { base->ntimers--; if (i != base->ntimers) { memcpy(&base->timers[i], &base->timers[base->ntimers], sizeof(*base->timers)); bubbledown(i); } }
I’m not showing the code for bubbledown, it’s left as an excercise for the reader. It’s just a matter of looking up the implementation in a textbook.
‘cancel_timer’ needs an offset inside the heap, so it’s not suitable for external usage. We’ll provide a reserve zone aware ‘ev_timer_cancel’ function instead:
int ev_timer_cancel(unsigned int id) { size_t i; if (!find_timer(id, &i)) return (-1); if (i < base->ntimers) { cancel_timer(i); return (0); } base->reserve_till--; if (i != base->reserve_till) memcpy(&base->timers[i], &base->timers[base->reserve_till], sizeof(*base->timers)); return (0); }
We’re almost there. We just need one more helper, a way to merge the reserve zone into the heap, and this will just be the heapify algorithm:
Reminder: heapify is an operation that builds a heap from an array in place.
static void timerheapify(void) { size_t i, reserve, gap; reserve = base->reserve_till - base->reserve_from; if (reserve == 0) return; gap = base->reserve_from - base->ntimers; if (gap != 0) { memmove(&base->timers[base->ntimers], &base->timers[base->reserve_from], reserve * sizeof(*base->timers)); base->reserve_from -= gap; base->reserve_till -= gap; } base->ntimers = base->reserve_till; if (base->ntimers < 2) return; i = base->ntimers / 2 - 1; for (;;) { bubbledown(i); if (i == 0) break; i--; } }
When adding signals support we didn’t had to change the main loop since it was just a layer on top of existing file descriptor event tracking. For timers however we’ll have to do some changes.
First of all, we’ll need a few more variables:
int ev_loop(void) { struct timespec elapsed, beg, end; struct timeval tv, sub, *min; struct evcb cb; int n, msec; size_t i; while (!ev_stop) {
Then we also have to make sure to heapify and reset the reserve zone for ev_timer to work
timerheapify(); base->reserve_from = base->ntimers; base->reserve_till = base->ntimers;
Until now, we called poll(2) with a timeout of -1 to wait indefinitely. Now we have to compute the maximum time we’re allowed to wait:
min = NULL; msec = -1; if (base->ntimers) { min = &base->timers[0].tv; msec = min->tv_sec * 1000 + (min->tv_usec + 999) / 1000; }
poll doesn’t provide a way to know how much time it waited, so we have to manually track the time:
clock_gettime(CLOCK_MONOTONIC, &beg); if ((n = poll(base->pfds, base->len, msec)) == -1) { if (errno != EINTR) return -1; } if (n == 0 && min) memcpy(&tv, min, sizeof(tv)); else { clock_gettime(CLOCK_MONOTONIC, &end); timespecsub(&end, &beg, &elapsed); TIMESPEC_TO_TIMEVAL(&tv, &elapsed); }
There’s a small optimization in here. If poll returned zero, we know it slept for the msec milliseconds so we don’t have to call clock_gettime() again.
Note also how we use CLOCK_MONOTONIC to avoid issues with clock jumps.
Now it’s just a matter of traversing the heap and firing the expired timers.
for (i = 0; i < base->ntimers && !ev_stop; /* nop */) { timersub(&base->timers[i].tv, &tv, &sub); if (sub.tv_sec <= 0) {
Actually, there’s one problem here: timers can delete themselves. We have to protect ourselves from the heap being changed under our feet.
It’s not a problem if a timer schedules another one, at worst the heap will be reallocated if the reserve zone doesn’t have enough space. If a timer deletes itself however, the items are moved in place. Worst yet, we don’t have a simple way to notice it!
The solution is to cancel the timer before firing it and paying attention at how we iterate through the array:
/* * delete the timer before calling its * callback; protects from timer that * attempt to delete themselves. */ memcpy(&cb, &base->timers[i].cb, sizeof(cb)); cancel_timer(i); cb.cb(-1, EV_TIMEOUT, cb.udata); continue; } memcpy(&base->timers[i].tv, &sub, sizeof(sub)); i++; }
If a timer hasn’t fired yet, we still have to decrement its expiration.
This could be optimized by stopping once we’ve fired the expired timers and keeping track of the deadline when timers should fire instead of how much seconds they’re supposed to live.
The rest of ‘ev_loop()’ is unchanged, the file descriptors are traversed and their events fired as showed before.
Wrapping up
This is it! As promised, the implementation is just below 500 lines of code, counting comments and blanks, and barely going over it when you count the public interface.
There are a few things you may find in “production-grade” event loops, like combining events or priorities. With “combining events” I mean like how libevent allows to specify a timer together with an event: if the event fires before the timer, the timer is implicitly canceled, otherwise it is the event to be canceled if the timers fires first. It is not hard to work around this omission in the consumer code, although the library could be extended to provide this facility by itself.
Priorities are useful to make sure that some events fire before others, although I’m not convinced that this is a very useful feature. I think it’s more robust to handle it in the application code.
Another possibly useful thing are user-defined events. Sometimes it could be useful to dispatch a custom event, and have another bit of code responding to it. The library should have to be modified to accommodate for these, but should be trivial.