Thursday, February 20, 2020

A tale of missing touches

libinput 1.15.1 had a new feature: it matched the expected touch count with the one actually seen as opposed to the one advertised by the kernel. That is good news for ALPS devices whose kernel driver lies about their capabilities because these days who doesn't. However, in some cases that feature had the side-effect of reducing the touch count to zero - meaning libinput would ignore any touch. This caused a slight UX degradation.

After a bit of debugging and/or cursing, the issue was identified as a libevdev issue, specifically - the way libevdev replays events after a SYN_DROPPED event. And after several days of fixing things, adding stuff to the CI and adding meson support for libevdev so the CI can actually run a few useful things, it's time for a blog post to brain-dump and possibly entertain the occasional reader such as you are. Congratulations, I guess.

The Linux kernel's evdev protocol is a serial protocol where all events have a type, a code and a value. Events are grouped by EV_SYN.SYN_REPORT events, so the event type is EV_SYN (0), the event code is SYN_REPORT (also 0). The value is usually (but not always), you guessed it, zero. A SYN_REPORT signals that the current event sequence (also called a "frame") is to be interpreted as one hardware event [0]. In the simplest case, two hardware events from a mouse could look like this:

EV_REL   REL_X        1
EV_SYN   SYN_REPORT   0
EV_REL   REL_X        1
EV_REL   REL_Y        1
EV_SYN   SYN_REPORT   0
While we have five evdev events here, those represent one hardware event with an x movement of 1 and a second hardware event with a diagonal movement by 1/1. Glorious, we all understand evdev now (if not, read this and immediately afterwards this, although that second post will be rather reinforced by this post).

Life as software developer would be quite trivial but our universe hates us and we need an extra event code called SYN_DROPPED. This event is used by the kernel when events from the device come in faster than you're reading them. This shouldn't happen given that most input devices scan out at the casual rate of every 7ms or slower and we're not exactly running on carrier pigeons here. But your compositor has been a busy bee rendering all these browser windows containing kitten videos and thus completely neglected to check whether you've moved the finger on the touchpad recently. So the kernel sort-of clears the current event buffer and positions a shiny steaming SYN_DROPPED in there to notify the compositor of its wrongdoing. [1]

Now, we could assume that every evdev client (libinput, every Xorg driver, ...) knows how to handle SYN_DROPPED events correctly but we're self-aware enough that we don't. So SYN_DROPPED handling is wrapped via libevdev, in a way that lets the clients use almost exactly the same processing paths they use for normal events. libevdev gives you a notification that a SYN_DROPPED occured, then you fetch the events one-by-one until libevdev tells you have the complete current state of the device, and back to kittens you go. In pseudo-code, your input stack's event loop works like this:

while (user_wants_kittens):
   event = libevdev_get_event()

   if event is a SYN_DROPPED:
     while (libevdev_is_still_synchronizing):
        event = libevdev_get_event()
        process_event(event)
   else:
        process_event(event)
Now, this works great for keys where you get the required events to release or press new keys. This works great for relative axes because meh, who cares [2]. This works great for absolute axes because you just get the current state of the device and done. This works great for touch because, no wait, that bit is awful.

You see, the multi-touch protocol is ... special. It uses the absolute axes, but it also multiplexes over those axes via the slot protocol. A normal two-touch event looks like this:

EV_ABS ABS_MT_SLOT         0
EV_ABS ABS_MT_POSITION_X   123
EV_ABS ABS_MT_SLOT         1
EV_ABS ABS_MT_POSITION_X   456
EV_ABS ABS_MT_POSITION_Y   789
EV_ABS ABS_X               123
EV_SYN SYN_REPORT          0
The first two evdev events are slot 0 (first touch [3]), the second two evdev events are slot 1 (second touch [3]). Both touches update their X position but the second touch also updates its Y position. But for single-touch emulation we also get the normal absolute axis event [3]. Which is equivalent to the first touch [3] and can be ignored if you're handling the MT axes [3] (I'm getting a lot of mileage out of that footnote). And because things aren't confusing enough: events within an evdev frame are position-independent except the ABS_MT axes which need to be processed in sequence. So that ABS_X events could be anywhere within that frame, but the ABS_MT axes need to be grouped by slot.

About that single-touch emulation... We also have a single-touch multi-touch protocol via EV_KEY. For devices that can only track N fingers but can detect N+M fingers, we have a set of BTN_TOOL defines. Two fingers down sets BTN_TOOL_DOUBLETAP, three fingers down sets BTN_TOOL_TRIPLETAP, etc. Those are just a bitfield though, so no position data is available. And it tops out at BTN_TOOL_QUINTTAP but then again, that's a good maximum backed by a lot of statistical samples from users hands. On many devices, we have to combine that single-touch MT protocol with the real MT protocol. Synaptics touchpads on PS/2 only support 2 finger positions but detect up 5 touches otherwise [4]. And remember the ALPS devices? They say they have 4 slots but may only send data for two or three, so we have to detect this at runtime and switch to the BTN_TOOL bits for some touches.

So anyway, now that we unfortunately all understand the MT protocol(s), let's look at that libevdev bug. libevdev checks the slot states after SYN_DROPPED to detect whether any touch has stopped or started during SYN_DROPPED. It also detects whether a touch has changed, i.e. the user lifted the finger(s) and put the finger(s) down again while SYN_DROPPED was happening. For those touches it generates the events to stop the original touch, then events to start the new touch. This needs to be done over two event frames, i.e. with a SYN_REPORT in between [5]. But the implementation ended up splitting those changes - any touch that changed was terminated in the first event frame, any touch that outright stopped was terminated in the second event frame. That in itself wasn't the problem yet, the problem was that libevdev didn't emulate the single-touch multi-touch protocol with those emulated frames. So we ended up with event frames where slots would terminate but the single-touch protocol didn't update until a frame later.

This doesn't matter for most users. Both protocols were still correct-enough in their own bubble, only once you start mixing protocols was where it all started getting wonky. libinput does this because it has to, too many devices out there only track two fingers. So if you want three-finger tapping and pinch gestures, you need to handle both protocols. Despite this we didn't notice until we added the quirk for ALPS devices. Because now libinput sometimes noticed that after a SYN_DROPPED there were no fingers on the touchpad (because they all stopped/changed) but the BTN_TOOL bits were still on so clearly we have a touchpad that cannot track all fingers it detects - in this case zero. [6]

So to recap: libinput's auto-adjustment of the touch count for buggy touchpad devices failed thanks to libevdev's buggy workaround of the device sync. The device sync we need because we can't rely on userspace handling touches correctly across SYN_DROPPED. An event which only gets triggered because the compositor is too buggy to read input events in time. I don't know how to describe it exactly, but what I can see all the way down are definitely not turtles.

And the sad thing about it: if we didn't try to correct for the firmware and accepted that gestures are just broken on ALPS devices because the kernel driver is lying to us, none of the above would have mattered. Likewise, the old xorg synaptics driver won't be affected by this because it doesn't handle multitouch properly anyway, so it doesn't need to care about these discrepancies. Or, in other words and much like real life: the better you try to be, the worse it all gets.

And as the take-home lesson: do upgrade to libinput 1.15.2 and do upgrade to libevdev 1.9.0 when it's out. Your kittens won't care but at least that way it won't make me feel like I've done all this work in vain.

[0] Unless the SYN_REPORT value is nonzero but let's not confuse everyone more than necessary
[1] A SYN_DROPPED is per userspace client, so a debugging tool reading from the same event node may not see that event unless it too is busy with feline renderings.
[2] yes, you'll get pointer jumps because event data is missing but since you've been staring at those bloody cats anyway, you probably didn't even notice
[3] usually, but not always
[4] on those devices, identifying a 3-finger pinch gesture only works if you put the fingers down in the correct order
[5] historical reasons: in theory a touch could change directly but most userspace can't handle it and it's too much effort to add now
[6] libinput 1.15.2 leaves you with 1 finger in that case and that's good enough until libevdev is released

2 comments:

R. said...

Peter,

Long time reader (and Fedora contributor, plus you helped me with integration of libinput for KDE Touchpad configuration) here.

Just want to say I enjoy reading your blog posts for its wealth of technical details *and* the humour.

Thanks for writing them!

António said...

I was going to leave a comment just to state the same. The «possibly entertain the occasional reader such as you are» passage likely made us self aware.

Thanks for the enlightening entertainment. Or entertaining enlightenment.