Tuesday, January 28, 2020

libinput's timer offset errors

Let's say you have a friend (this wouldn't happen to you, of course, just that friend) who is staring at their system logs and wonder why it is full of messages similar to this:

 libinput error: client bug: timer event5 debounce short: offset negative (-7ms)
And the question is of course - what is going on here and why hasn't this been fixed yet. Now, the libinput documentation explains this already but it's always worthwhile to fire out a blog post into the void in the hope someone reads it.

libinput uses a specific model to communicate with the Wayland compositor (or the X server). There is a single epoll file descriptor and that fd will trigger whenever something happens that's of interest to libinput. When that fd triggers, the compositor is expected to call libinput_dispatch() which is the main "do stuff" function of libinput.

The actual trigger doesn't matter, it could be an event from a device but it could be something else. The caller doesn't have to care. All that matters is that there is code like this:

if (libinput_fd_triggered_in_select)
   libinput_dispatch();
And then libinput will do the right thing. Whether you also want events from libinput is almost orthogonal to this.

libinput uses timerfd internally so any timeouts also trigger the epoll fd. Timeouts are scheduled based on the event's time stamp, so if you get an event with timestamp T, a timeout of 180ms will be scheduled for time T + 180ms. So the process looks something like this:

T(0): kernel button event
T(0): libinput_dispatch(): schedule timeout for T(0+180)
...
T(180): epoll fd triggers
T(180): libinput_dispatch(): process timeout
...
This works generally fine. Even with some delays we don't generally need to worry about the timeouts and they still trigger as expected. But some of the timeouts are "short", as in 8ms short. And this is where these warnings may trigger.

Let's say your compositor is busy doing some rendering. The epoll fd triggers with a button event but the compositor is too busy to handle it immediately. Instead, it finishes whatever it's doing and only then calls libinput_dispatch():

T(0): kernel button event
...
T(12): libinput_dispatch(): schedule timeout for T(0+8)

libinput error: client bug: timer event5 debounce short: offset negative (-4ms)
libinput will still use the event's timestamp instead of the wall clock time so the scheduled timeout is no longer in the future. And that is when the error message will be printed. This isn't a libinput bug, it's always a bug in the compositor. Especially gnome-shell is still struggling with these instances and while great strives have been made to make it more responsive, it can still happen.

The error message may seem cryptic, but it provides a bunch of useful information: event5 is your event node, "debounce short" is the timer name so we know where we got stuck. And 4ms gives us an indication how much we got delayed.

And for the record: the other end of this issue - delayed libinput_dispatch() after a timeout should have triggered is handled quietly by libinput. For example, if you have a physical event queued and a timeout expiry, we will process the earlier one first to make sure the sequences are handled correctly.