Friday, September 9, 2016

Input threads in the X server

A great new feature has been merged during this 1.19 X server development cycle: we're now using threads for input [1]. Previously, there were two options for how an input driver would pass on events to the X server: polling or from within the signal handler. Polling simply adds all input devices' file descriptors to a select(2) loop that is processed in the mainloop of the server. The downside here is that if the server is busy rendering something, your input is delayed until that rendering is complete. Historically, polling was primarily used by the keyboard driver because it just doesn't matter much when key strokes are delayed. Both because you need the client to render them anyway (which it can't when it's busy) and possibly also because we're just so bloody used to typing delays.

The signal handler approach circumvented the delays by installing a SIGIO handler for each input device fd and calling that when any input occurs. This effectively interrupts the process until the signal handler completes, regardless of what the server is currently busy with. A great solution to provide immediate visible cursor movement (hence it is used by evdev, synaptics, wacom, and most of the now-retired legacy drivers) but it comes with a few side effects. First of all, because the main process is interrupted, the bit where we read the events must be completely separate to the bit where we process the events. That's easy enough, we've had an input event queue in the server for as long as I've been involved with X.Org development (~2006). The drivers push events into the queue during the signal handler, in the main loop the server reads them and processes them. In a busy server that may be several seconds after the pointer motion was performed on the screen but hey, it still feels responsive.

The bigger issue with the use of a signal handler is: you can't use malloc [2]. Or anything else useful. Look at the man page for signal(7), it literally has a list of allowed functions. This leads to two weird side-effects: one is that you have to pre-allocate everything you may ever need for event processing, the other is that you need to re-implement any function that is not currently async signal safe. The server actually has its own implementation of printf for this reason (for error logging). Let's just say this is ... suboptimal. Coincidentally, libevdev is mostly async signal safe for that reason too. It also means you can't use any libraries, because no-one [3] is insane enough to make libraries async signal-safe.

We were still mostly "happy" with it until libinput came along. libinput is a full input stack and expecting it to work within a signal handler is the somewhere between optimistic, masochistic and sadistic. The xf86-input-libinput driver doesn't use the signal handler and the side effect of this is that a desktop with libinput didn't feel as responsive when the server was busy rendering.

Keith Packard stepped in and switched the server from the signal handler to using input threads. Or more specifically: one input thread on top of the main thread. That thread controls all the input device's file descriptors and continuously reads events off them. It otherwise provides the same functionality the signal handler did before: visible pointer movement and shoving events into the event queue for the main thread to process them later. But of course, once you switch to threads, problems have 2 you now. A signal handler is "threading light", only one code path can be interrupted and you know you continue where you left off. So synchronisation primitives are easier than in threads where both code paths continue independently. Keith replaced the previous xf86BlockSIGIO() calls with corresponding input_lock() and input_unlock() calls and all the main drivers have been switched over. But some interesting race conditions kept happening. But as of today, we think most of these are solved.

The best test we have at this point is libinput's internal test suite. It creates roughly 5000 devices within about 4 minutes and thus triggers most code paths to do with device addition and removal, especially the overlaps between devices sending events before/during/after they get added and/or removed. This is the largest source of possible errors as these are the code paths with the most amount of actual simultaneous access to the input devices by both threads. But what the test suite can't test is normal everyday use. So until we get some more code maturity, expect the occasional crash and please do file bug reports. They'll be hard to reproduce and detect, but don't expect us to run into the same race conditions by accident.

[1] Yes, your calendar is right, it is indeed 2016, not the 90s or so
[2] Historical note: we actually mostly ignored this until about 2010 or so when glibc changed the malloc implementation and the server was just randomly hanging whenever we tried to malloc from within the signal handler. Users claimed this was bad UX, but I think it's right up there with motif.
[3] yeah, yeah, I know, there's always exceptions.

5 comments:

timofonic said...

Does it make sense to still put efforts in XOrg instead accelerating Wayland deployment? I know it's better to say things than doing them and the planning of a future-proof replacement to X may be too hard.

Another thing that always worries me is that we can never predict the future. These days I see iterative design a lot better than trying to do something that will last forever, just like life evolves to adapt to a new environment. The X architects probable have tought *A LOT* and these days it's obsolete as hell, but probably most software is obsolete these days too.

What's that about responsible feeling software? I have yet to find that these days, all them have lackings about using resources fairly and efficiently because many reasons (multithreading and able to juice multiple CPU cores, able to get use of GPU, efficient memory usage...).

Web browsers, word processors and other members of office suites, media players...

Even emacs still freezes on a busy multiple CPU core system! They still live in the past, like tons of software in this 2016+ days and that's very sad. I feel like progress has stopped :/

Sergey Bugaev said...

Is it multithreaded under Wayland/Mutter as well, or what's the story there?

Related: https://bugzilla.gnome.org/show_bug.cgi?id=745032

druedain said...

@timofonic Most TODOs are on the DEs side currently, not Wayland's. Because of that and because of legacy apps we will be using Xorg even in Wayland days (XWayland), so I'm really happy that people are still putting effort in Xorg.

Thank you Peter!

miasma said...

I had the impression that sharing data between the two threads might cause some additional latency due to occasional locking or are you using lock-free data structures? Using a threaded approach might mean that the kernel's scheduler has a bigger role now? Also in a case the system runs out of memory, wouldn't an approach with pre-allocated input buffers perform in a more reliable manner? I can't really say that's the case now with X. If the system runs out of memory, the whole desktop slows down and it's really hard to close windows or do anything.

Unknown said...

Thanks for this post, anything to make my desktop feel more responsive is fantastic. However, can you describe situations where threaded input really shines from a user's perspective?

ps.
"problems have 2 you now" I see what you did there ;)