Friday, September 9, 2016

Input threads in the X server

A great new feature has been merged during this 1.19 X server development cycle: we're now using threads for input [1]. Previously, there were two options for how an input driver would pass on events to the X server: polling or from within the signal handler. Polling simply adds all input devices' file descriptors to a select(2) loop that is processed in the mainloop of the server. The downside here is that if the server is busy rendering something, your input is delayed until that rendering is complete. Historically, polling was primarily used by the keyboard driver because it just doesn't matter much when key strokes are delayed. Both because you need the client to render them anyway (which it can't when it's busy) and possibly also because we're just so bloody used to typing delays.

The signal handler approach circumvented the delays by installing a SIGIO handler for each input device fd and calling that when any input occurs. This effectively interrupts the process until the signal handler completes, regardless of what the server is currently busy with. A great solution to provide immediate visible cursor movement (hence it is used by evdev, synaptics, wacom, and most of the now-retired legacy drivers) but it comes with a few side effects. First of all, because the main process is interrupted, the bit where we read the events must be completely separate to the bit where we process the events. That's easy enough, we've had an input event queue in the server for as long as I've been involved with X.Org development (~2006). The drivers push events into the queue during the signal handler, in the main loop the server reads them and processes them. In a busy server that may be several seconds after the pointer motion was performed on the screen but hey, it still feels responsive.

The bigger issue with the use of a signal handler is: you can't use malloc [2]. Or anything else useful. Look at the man page for signal(7), it literally has a list of allowed functions. This leads to two weird side-effects: one is that you have to pre-allocate everything you may ever need for event processing, the other is that you need to re-implement any function that is not currently async signal safe. The server actually has its own implementation of printf for this reason (for error logging). Let's just say this is ... suboptimal. Coincidentally, libevdev is mostly async signal safe for that reason too. It also means you can't use any libraries, because no-one [3] is insane enough to make libraries async signal-safe.

We were still mostly "happy" with it until libinput came along. libinput is a full input stack and expecting it to work within a signal handler is the somewhere between optimistic, masochistic and sadistic. The xf86-input-libinput driver doesn't use the signal handler and the side effect of this is that a desktop with libinput didn't feel as responsive when the server was busy rendering.

Keith Packard stepped in and switched the server from the signal handler to using input threads. Or more specifically: one input thread on top of the main thread. That thread controls all the input device's file descriptors and continuously reads events off them. It otherwise provides the same functionality the signal handler did before: visible pointer movement and shoving events into the event queue for the main thread to process them later. But of course, once you switch to threads, problems have 2 you now. A signal handler is "threading light", only one code path can be interrupted and you know you continue where you left off. So synchronisation primitives are easier than in threads where both code paths continue independently. Keith replaced the previous xf86BlockSIGIO() calls with corresponding input_lock() and input_unlock() calls and all the main drivers have been switched over. But some interesting race conditions kept happening. But as of today, we think most of these are solved.

The best test we have at this point is libinput's internal test suite. It creates roughly 5000 devices within about 4 minutes and thus triggers most code paths to do with device addition and removal, especially the overlaps between devices sending events before/during/after they get added and/or removed. This is the largest source of possible errors as these are the code paths with the most amount of actual simultaneous access to the input devices by both threads. But what the test suite can't test is normal everyday use. So until we get some more code maturity, expect the occasional crash and please do file bug reports. They'll be hard to reproduce and detect, but don't expect us to run into the same race conditions by accident.

[1] Yes, your calendar is right, it is indeed 2016, not the 90s or so
[2] Historical note: we actually mostly ignored this until about 2010 or so when glibc changed the malloc implementation and the server was just randomly hanging whenever we tried to malloc from within the signal handler. Users claimed this was bad UX, but I think it's right up there with motif.
[3] yeah, yeah, I know, there's always exceptions.

Tuesday, September 6, 2016

Fedora: Cinnamon, MATE and the broken GNOME touchpad panel

On Fedora, if you have mate-desktop or cinnamon-desktop installed, your GNOME touchpad configuration panel won't work (see Bug 1338585). Both packages install a symlink to assign the synaptics driver to the touchpad. But GNOME's control-center does not support synaptics anymore, so no touchpad is detected. Note that the issue occurs regardless of whether you use MATE/Cinnamon, merely installing it is enough.

Unfortunately, there is no good solution to this issue. Long-term both MATE and Cinnamon should support libinput but someone needs to step up and implement it. We don't support run-time driver selection in the X server, so an xorg.conf.d snippet is the only way to assign a touchpad driver. And this means that you have to decide whether GNOME's or MATE/Cinnamon's panel is broken at X start-up time.

If you need the packages installed but you're not actually using Mate/Cinnamon itself, remove the following symlinks (whichever is present on your system):

# rm /etc/X11/xorg.conf.d/99-synaptics-mate.conf
# rm /etc/X11/xorg.conf.d/99-synaptics-cinnamon.conf
# rm /usr/share/X11/xorg.conf.d/99-synaptics-mate.conf
# rm /usr/share/X11/xorg.conf.d/99-synaptics-cinnamon.conf
The /usr/share paths are the old ones and have been replaced with the /etc/ symlinks in cinnamon-desktop-3.0.2-2.fc25 and mate-desktop-1.15.1-4.fc25 and their F24 equivalents.

libinput and the Lenovo T450 and T460 series touchpads

I'm using T450 and T460 as reference but this affects all laptops from the Lenovo *50 and *60 series. The Lenovo T450 and T460 have the same touchpad hardware, but unfortunately it suffers from what is probably a firmware issue. On really slow movements, the pointer has a halting motion. That effect disappears when the finger moves faster.

The observable effect is that of a pointer stalling, then jumping by 20 or so pixels. We have had a quirk for this in libinput since March 2016 (see commit a608d9) and detect this at runtime for selected models. In particular, what we do is look for a sequence of events that only update the pressure values but not the x/y position of the finger. This is a good indication that the bug triggers. While it's possible to trigger pressure changes alone, triggering several in a row without a change in the x/y coordinates is extremely unlikely. Remember that these touchpads have a resolution of ~40 units per mm - you cannot hold your finger that still while changing pressure [1]. Once we see those pressure changes only we reset the motion history we keep for each touch. The next event with an x/y coordinate will thus not calculate the delta to the previous position and not trigger a move. The event after that is handled normally again. This avoids the extreme jumps but there isn't anything we can do about the stalling - we never get the event from the kernel. [2]

Anyway. This bug popped up again elsewhere so this time I figured I'll analyse the data more closely. Specifically, I wrote a script that collected all x/y coordinates of a touchpad recording [3] and produced a black and white image of all device coordinates sent. This produces a graphic that's interesting but not overly useful:


Roughly 37000 touchpad events. You'll have to zoom in to see the actual pixels.
I modified the script to assume a white background and colour any x/y coordinate that was never hit black. So an x coordinate of 50 would now produce a vertical 1 pixel line at 50, a y coordinate of 70 a horizontal line at 70, etc. Any pixel that remains white is a coordinate that is hit at some point, anything black was unreachable. This produced more interesting results. Below is the graphic of a short, slow movement right to left.

A single short slow finger movement
You can clearly see the missing x coordinates. More specifically, there are some events, then a large gap, then events again. That gap is the stalling cursor where we didn't get any x coordinates. My first assumption was that it may be a sensor issue and that some areas on the touchpad just don't trigger. So what I did was move my finger around the whole touchpad to try to capture as many x and y coordinates as possible.

Let's have look at the recording from a T440 first because it doesn't suffer from this issue:


Sporadic black lines indicating unused coordinates but the center is purely white, indicating every device unit was hit at some point
Ok, looks roughly ok. The black areas are irregular, on the edges and likely caused by me just not covering those areas correctly. In the center it's white almost everywhere, that's where the most events were generated. And now let's compare this to a T450:

A visible grid of unreachable device units
The difference is quite noticeable, especially if you consider that the T440 recording had under 15000 events, the T450 recording had almost 37000. The T450 has a patterned grid of unreachable positions. But why? We currently use the PS/2 protocol to talk to the device but we should be using RMI4 over SMBus instead (which is what Windows has done for a while and luckily the RMI4 patches are on track for kernel 4.9). Once we talk to the device in its native protocol we see a resolution of ~20 units/mm and it looks like the T440 output:

With RMI4, the grid disappears
Ok, so the problem is not missing coordinates in the sensor and besides, at the resolution the touchpad has a single 'pixel' not triggering shouldn't be much of a problem anyway.

Maybe the issue had to do with horizontal movements or something? The next approach was for me to move my finger slowly from one side to the left. That's actually hard to do consistently when you're not a robot, so the results are bound to be slightly different. On the T440:


The x coordinates are sporadic with many missing ones, but the y coordinates are all covered
You can clearly see where the finger moved left to right. The big black gaps on the x coordinates mostly reflect me moving too fast but you can see how the distance narrows, indicating slower movements. Most importantly: vertically, the strip is uniformly white, meaning that within that range I hit every y coordinate at least once. And the recording from the T450:

Only one gap in the y range, sporadic gaps in the x range
Well, still looks mostly the same, so what is happening here? Ok, last test: This time an extremely slow motion left to right. It took me 87 seconds to cover the touchpad. In theory this should render the whole strip white if all x coordinates are hit. But look at this:

An extremely slow finger movement
Ok, now we see the problem. This motion was slow enough that almost every x coordinate should have been hit at least once. But there are large gaps and most notably: larger gaps than in the recording above that was a faster finger movement. So what we have here is not an actual hardware sensor issue but that the firmware is working against us here, filtering things out. Unfortunately, that's also the worst result because while hardware issues can usually be worked around, firmware issues are a lot more subtle and less predictable. We've also verified that newer firmware versions don't fix this and trying out some tweaks in the firmware didn't change anything either.

Windows is affected by this too and so is the synaptics driver. But it's not really noticeable on either and all reports so far were against libinput, with some even claiming that it doesn't manifest with synaptics. But each time we investigated in more detail it turns out that the issue is still there (synaptics uses the same kernel data after all) but because of different acceleration methods users just don't trigger it. So my current plan is to change the pointer acceleration to match something closer to what synaptics does on these devices. That's hard because synaptics is mostly black magic (e.g. synaptics' pointer acceleration depends on screen resolution) and hard to reproduce. Either way, until that is sorted at least this post serves as a link to point people to.

Many thanks to Andrew Duggan from Synaptics and Benjamin Tissoires for helping out with the analysis and testing of all this.

[1] Because pressing down on a touchpad flattens your finger and thus changes the shape slightly. While you can hold a finger still, you cannot control that shape
[2] Yes, predictive movement would be possible but it's very hard to get this right
[3] These are events as provided by the kernel and unaffected by anything in the userspace stack

Wednesday, August 31, 2016

New xserver driver sort order - evdev < libinput < (synaptics|wacom|...)

In the X server, the input driver assignment is handled by xorg.conf.d snippets. Each driver assigns itself to the type of devices it can handle and the driver that actually loaded is simply the one that sorts last. Historically, we've had the evdev driver sort low and assign itself to everything. synaptics, wacom and the other few drivers that matter sorted higher than evdev and thus assigned themselves to the respective device.

When xf86-input-libinput first came out 2 years ago, we used a higher sort order than all other drivers to assign it to (almost) all devices. This was of course intentional because we believe that libinput is the best input stack around, the odd bug non-withstanding. Now it has matured a fair bit and we had a lot more exposure to various types of hardware. We've been quirking and fixing things like crazy and libinput is much better for it.

Two things were an issue with this approach though. First, overriding xf86-input-libinput required manual intervention, usually either copying or symlinking an xorg.conf.d snippet. Second, even though we were overriding the default drivers, we still had them installed everywhere. Now it's time to start properly retiring the old drivers.

The upstream approach for this is fairly simple: the xf86-input-libinput xorg.conf.d snippet will drop in sort order to sit above evdev. evdev remains as the fallback driver for miscellaneous devices where evdev's blind "forward everything" approach is sufficient. All other drivers will sort higher than xf86-input-libinput and will thus override the xf86-input-libinput assignment. The new sort order is thus:

  • evdev
  • libinput
  • synaptics, wacom, vmmouse, joystick
evdev and libinput are generic drivers, the others are for specific devices or use-cases. To use a specific driver other than xf86-input-libinput, you now only have to install it. To fall back to xf86-inputlibinput, you uninstall it. No more manual xorg.conf.d snippets symlinking.

This has an impact on distributions and users. Distributions should ensure that other drivers are never installed by default unless requested by the user (or some software). And users need to be aware that having a driver other than xf86-input-libinput installed may break things. For example, recent GNOME does not support the synaptics driver anymore, installing it will cause the control panel's touchpad bits to stop working. So there'll be a messy transition period but once things are settled, the solution to most input-related driver bugs will be "install/remove driver $foo" as opposed to the current symlink/copy/write an xorg.conf.d snippet.

Friday, August 5, 2016

libinput and disable-while-typing

A common issue with users typing on a laptop is that the user's palms will inadvertently get in contact with the touchpad at some point, causing the cursor to move and/or click. In the best case it's annoying, in the worst case you're now typing your password into the newly focused twitter application. While this provides some general entertainment and thus makes the world a better place for a short while, here at the libinput HQ [1] we strive to keep life as boring as possible and avoid those situations.

The best way to avoid accidental input is to detect palm touches and simply ignore them. That works ok-ish on some touchpads and fails badly on others. Lots of hardware is barely able to provide an accurate touch location, let alone enough information to decide whether a touch is a palm. libinput's palm detection largely works by using areas on the touchpad that are likely to be touched by the palms.

The second-best way to avoid accidental input is to disable the touchpad while a user is typing. The libinput marketing department [2] has decided to name this feature "disable-while-typing" (DWT) and it's been in libinput for quite a while. In this post I'll describe how exactly DWT works in libinput.

Back in the olden days of roughly two years ago we all used the synaptics X.Org driver and were happy with it [3]. Disable-while-typing was featured there through the use of a tool called syndaemon. This synaptics daemon [4] has two modes. One was to poll the keyboard state every few milliseconds and check whether a key was down. If so, syndaemon sends a command to the driver to tell it to disable itself. After a timeout when the keyboard state is neutral again syndaemon tells the driver to re-enable itself. This causes a lot of wakeups, especially during those 95% of the time when the user isn't actually typing. Or missed keys if the press + release occurs between two polls. Hence the second mode, using the RECORD extension, where syndaemon opens a second connection to the X server and end checks for key events [5]. If it sees one float past, it tells the driver to disable itself, and so on and so forth. Either way, you had a separate process that did that job. syndaemon had a couple of extra options and features that I'm not going to discuss here, but we've replicated the useful ones in libinput.

libinput has no external process, DWT is integrated into the library with a couple of smart extra features. This is made easier by libinput controlling all the devices, so all keyboard events are passed around internally to the touchpad backend. That backend then decides whether it should stop sending events. And this is where the interesting bits come in.

First, we have different timeouts: if you only hit a single key, the touchpad will re-enable itself quicker than after a period of typing. So if you use the touchpad, hit a key to trigger some UI the pointer only stops moving for a very short time. But once you type, the touchpad disables itself longer. Since your hand is now in a position over the keyboard, moving back to the touchpad takes time anyway so a longer timeout doesn't hurt. And as typing is interrupted by pauses, a longer timeout bridges over those to avoid accidental movement of the cursor.

Second, any touch started while you were typing is permanently ignored, so it's safe to rest the palm on the touchpad while typing and leave it there. But we keep track of the start time of each touch so any touch started after the last key event will work normally once the DWT timeout expires. You may feel a short delay but it should be well in the acceptable range of a tens of ms.

Third, libinput is smart enough to detect which keyboard to pair with. If you have an external touchpad like the Apple Magic Trackpad or a Logitech T650, DWT will never enable on those. Likewise, typing on an external keyboard won't disable the internal touchpad. And in the rare case of two internal touchpads [6], both of them will do the right thing. As of systemd v231 the information of whether a touchpad is internal or external is available in the ID_INPUT_TOUCHPAD_INTEGRATION udev tag and thus available to everyone, not just libinput.

Finally, modifier keys are ignored for DWT, so using the touchpad to do shift-clicks works unimpeded. This also goes for the F-Key row and the numpad if you have any. These keys are usually out of the range of the touchpad anyway so interference is not an issue here. As of today, modifier key combos work too. So hitting Ctrl+S to save a document won't disable the touchpad (or any other modifiers + key combination). But once you are typing DWT activates and if you now type Shift+S to type the letter 'S' the touchpad remains disabled.

So in summary: what we've gained from switching to libinput is one external process less that causes wakeups and the ability to be a lot smarter about when we disable the touchpad. Coincidentally, libinput has similar code to avoid touchpad interference when the trackpoint is in use.

[1] that would be me
[2] also me
[3] uphill, both ways, snow, etc.
[4] nope. this one wasn't my fault
[5] Yes, syndaemon is effectively a keylogger, except it doesn't do any of the "logging" bit a keylogger would be expected to do to live up to its name
[6] This currently happens on some Dell laptops using hid-i2c. We get two devices, one named "DLL0704:01 06CB:76AE Touchpad" or similar and one "SynPS/2 Synaptics TouchPad". The latter one will never send events unless hid-i2c is disabled in the kernel

Wednesday, July 20, 2016

libinput is done

Don't panic. Of course it isn't. Stop typing that angry letter to the editor and read on. I just picked that title because it's clickbait and these days that's all that matters, right?

With the release of libinput 1.4 and the newest feature to add tablet pad mode switching, we've now finished the TODO list we had when libinput was first conceived. Let's see what we have in libinput right now:

  • keyboard support (actually quite boring)
  • touchscreen support (actually quite boring too)
  • support for mice, including middle button emulation where needed
  • support for trackballs including the ability to use them rotated and to use button-based scrolling
  • touchpad support, most notably:
    • proper multitouch support on touchpads [1]
    • two-finger scrolling and edge scrolling
    • tapping, tap-to-drag and drag-lock (all configurable)
    • pinch and swipe gestures
    • built-in palm and thumb detection
    • smart disable-while-typing without the need for an external process like syndaemon
    • more predictable touchpad behaviours because everything is based on physical units [2]
    • a proper API to allow for kinetic scrolling on a per-widget basis
  • tracksticks work with middle button scrolling and communicate with the touchpad where needed
  • tablet support, most notably:
    • each tool is a separate entity with its own capabilities
    • the pad itself is a separate entity with its own capabilities and events
    • mode switching is exported by the libinput API and should work consistently across callers
  • a way to identify if multiple kernel devices belong to the same physical device (libinput device groups)
  • a reliable test suite
  • Documentation!
The side-effect of libinput is that we are also trying to fix the rest of the stack where appropriate. Mostly this meant pushing stuff into systemd/udev so far, with the odd kernel fix as well. Specifically the udev bits means we
  • know the DPI density of a mouse
  • know whether a touchpad is internal or external
  • fix up incorrect axis ranges on absolute devices (mostly touchpads)
  • try to set the trackstick sensitivity to something sensible
  • know when the wheel click is less/more than the default 15 degrees
And of course, the whole point of libinput is that it can be used from any Wayland compositor and take away most of the effort of implementing an input stack. GNOME, KDE and enlightenment already uses libinput, and so does Canonical's Mir. And some distribution use libinput as the default driver in X through xf86-input-libinput (Fedora 22 was the first to do this). So overall libinput is already quite a success.

The hard work doesn't stop of course, there are still plenty of areas where we need to be better. And of course, new features come as HW manufacturers bring out new hardware. I already have touch arbitration on my todo list. But it's nice to wave at this big milestone as we pass it into the way to the glorious future of perfect, bug-free input. At this point, I'd like to extend my thanks to all our contributors: Andreas Pokorny, Benjamin Tissoires, Caibin Chen, Carlos Garnacho, Carlos Olmedo Escobar, David Herrmann, Derek Foreman, Eric Engestrom, Friedrich Schöller, Gilles Dartiguelongue, Hans de Goede, Jackie Huang, Jan Alexander Steffens (heftig), Jan Engelhardt, Jason Gerecke, Jasper St. Pierre, Jon A. Cruz, Jonas Ådahl, JoonCheol Park, Kristian Høgsberg, Krzysztof A. Sobiecki, Marek Chalupa, Olivier Blin, Olivier Fourdan, Peter Frühberger, Peter Hutterer, Peter Korsgaard, Stephen Chandler Paul, Thomas Hindoe Paaboel Andersen, Tomi Leppänen, U. Artie Eoff, Velimir Lisec.

Finally: libinput was started by Jonas Ådahl in late 2013, so it's already over 2.5 years old. And the git log shows we're approaching 2000 commits and a simple LOCC says over 60000 lines of code. I would also like to point out that the vast majority of commits were done by Red Hat employees, I've been working on it pretty much full-time since 2014 [3]. libinput is another example of Red Hat putting money, time and effort into the less press-worthy plumbing layers that keep our systems running. [4]

[1] Ironically, that's also the biggest cause of bugs because touchpads are terrible. synaptics still only does single-finger with a bit of icing and on bad touchpads that often papers over hardware issues. We now do that in libinput for affected hardware too.
[2] The synaptics driver uses absolute numbers, mostly based on the axis ranges for Synaptics touchpads making them unpredictable or at least different on other touchpads.
[3] Coincidentally, if you see someone suggesting that input is easy and you can "just do $foo", their assumptions may not match reality
[4] No, Red Hat did not require me to add this. I can pretty much write what I want in this blog and these opinions are my own anyway and don't necessary reflect Red Hat yadi yadi ya. The fact that I felt I had to add this footnote to counteract whatever wild conspiracy comes up next is depressing enough.

Friday, July 15, 2016

Why synclient does not work anymore

More and more distros are switching to libinput by default. That's a good thing but one side-effect is that the synclient tool does not work anymore [1], it just complains that "Couldn't find synaptics properties. No synaptics driver loaded?"

What is synclient? A bit of history first. Many years ago the only way to configure input devices was through xorg.conf options, there was nothing that allowed for run-time configuration. The Xorg synaptics driver found a solution to that: the driver would initialize a shared memory segment that kept the configuration options and a little tool, synclient (synaptics client), would know about that segment. Calling synclient with options would write to that SHM segment and thus toggle the various options at runtime. Driver and synclient had to be of the same version to know the layout of the segment and it's about as secure as you expect it to be. In 2008 I added input device properties to the server (X Input Extension 1.5 and it's part of 2.0 as well of course). Rather than the SHM segment we now had a generic API to talk to the driver. The API is quite simple, you effectively have two keys (device ID and property number) and you can set any value(s). Properties literally support just about anything but drivers restrict what they allow on their properties and which value maps to what. For example, to enable left-finger tap-to-click in synaptics you need to set the 5th byte of the "Synaptics Tap Action" property to 1.

xinput, a commandline tool and debug helper, has a generic API to change those properties so you can do things like xinput set-prop "device name" "property name" 1 [2]. It does a little bit under the hood but generally it's pretty stupid. You can run xinput set-prop and try to set a value that's out of range, or try to switch from int to float, or just generally do random things.

We were able to keep backwards compatibility in synclient, so where before it would use the SHM segment it would now use the property API, without the user interface changing (except the error messages are now standard Xlib errors complaining about BadValue, BadMatch or BadAccess). But synclient and xinput use the same API to talk to the server and the server can't tell the difference between the two.

Fast forward 8 years and now we have libinput, wrapped by the xf86-input-libinput driver. That driver does the same as synaptics, the config toggles are exported as properties and xinput can read and change them. Because really, you do the smart work by selecting the right property names and values and xinput just passes on the data. But synclient is broken now, simply because it requires the synaptics driver and won't work with anything else. It checks for a synaptics-specific property ("Synaptics Edges") and if that doesn't exists it complains with "Couldn't find synaptics properties. No synaptics driver loaded?". libinput doesn't initialise that property, it has its own set of properties. We did look into whether it's possible to have property-compatibility with synaptics in the libinput driver but it turned out to be a huge effort, flaky reliability at best (not all synaptics options map into libinput options and vice versa) and the benefit was quite limited. Because, as we've been saying since about 2009 - your desktop environment should take over configuration of input devices, hand-written scripts are dodo-esque.

So if you must insist on shellscripts to configure your input devices use xinput instead. synclient is like fsck.ext2, on that glorious day you switch to btrfs it won't work because it was only designed with one purpose in mind.

[1] Neither does syndaemon btw but it's functionality is built into libinput so that doesn't matter.
[2] xinput set-prop --type=int --format=32 "device name" "hey I have a banana" 1 2 3 4 5 6 and congratulations, you've just created a new property for all X clients to see. It doesn't do anything, but you could use those to attach info to devices. If anything was around to read that.