Thursday, March 21, 2019

Using hexdump to print binary protocols

I had to work on an image yesterday where I couldn't install anything and the amount of pre-installed tools was quite limited. And I needed to debug an input device, usually done with libinput record. So eventually I found that hexdump supports formatting of the input bytes but it took me a while to figure out the right combination. The various resources online only got me partway there. So here's an explanation which should get you to your results quickly.

By default, hexdump prints identical input lines as a single line with an asterisk ('*'). To avoid this, use the -v flag as in the examples below.

hexdump's format string is single-quote-enclosed string that contains the count, element size and double-quote-enclosed printf-like format string. So a simple example is this:

$ hexdump -v -e '1/2 "%d\n"' 
-11643
23698
0
0
-5013
6
0
0
This prints 1 element ('iteration') of 2 bytes as integer, followed by a linebreak. Or in other words: it takes two bytes, converts it to int and prints it. If you want to print the same input value in multiple formats, use multiple -e invocations.
$ hexdump -v -e '1/2 "%d "' -e '1/2 "%x\n"' 
-11568 d2d0
23698 5c92
0 0
0 0
6355 18d3
1 1
0 0
This prints the same 2-byte input value, once as decimal signed integer, once as lowercase hex. If we have multiple identical things to print, we can do this:
$ hexdump -v -e '2/2 "%6d "' -e '" hex:"' -e '4/1 " %x"' -e '"\n"'
-10922  23698 hex: 56 d5 92 5c
     0      0 hex: 0 0 0 0
 14879      1 hex: 1f 3a 1 0
     0      0 hex: 0 0 0 0
     0      0 hex: 0 0 0 0
     0      0 hex: 0 0 0 0
Which prints two elements, each size 2 as integers, then the same elements as four 1-byte hex values, followed by a linebreak. %6d is a standard printf instruction and documented in the manual.

Let's go and print our protocol. The struct representing the protocol is this one:

struct input_event {
#if (__BITS_PER_LONG != 32 || !defined(__USE_TIME_BITS64)) && !defined(__KERNEL__)
        struct timeval time;
#define input_event_sec time.tv_sec
#define input_event_usec time.tv_usec
#else
        __kernel_ulong_t __sec;
#if defined(__sparc__) && defined(__arch64__)
        unsigned int __usec;
#else
        __kernel_ulong_t __usec;
#endif
#define input_event_sec  __sec
#define input_event_usec __usec
#endif
        __u16 type;
        __u16 code;
        __s32 value;
};
So we have two longs for sec and usec, two shorts for type and code and one signed 32-bit int. Let's print it:
$ hexdump -v -e '"E: " 1/8 "%u." 1/8 "%06u" 2/2 " %04x" 1/4 "%5d\n"' /dev/input/event22 
E: 1553127085.097503 0002 0000    1
E: 1553127085.097503 0002 0001   -1
E: 1553127085.097503 0000 0000    0
E: 1553127085.097542 0002 0001   -2
E: 1553127085.097542 0000 0000    0
E: 1553127085.108741 0002 0001   -4
E: 1553127085.108741 0000 0000    0
E: 1553127085.118211 0002 0000    2
E: 1553127085.118211 0002 0001  -10
E: 1553127085.118211 0000 0000    0
E: 1553127085.128245 0002 0000    1
And voila, we have our structs printed in the same format evemu-record prints out. So with nothing but hexdump, I can generate output I can then parse with my existing scripts on another box.

Friday, March 15, 2019

libinput's internal building blocks

Ho ho ho, let's write libinput. No, of course I'm not serious, because no-one in their right mind would utter "ho ho ho" without a sufficient backdrop of reindeers to keep them sane. So what this post is instead is me writing a nonworking fake libinput in Python, for the sole purpose of explaining roughly how libinput's architecture looks like. It'll be to the libinput what a Duplo car is to a Maserati. Four wheels and something to entertain the kids with but the queue outside the nightclub won't be impressed.

The target audience are those that need to hack on libinput and where the balance of understanding vs total confusion is still shifted towards the latter. So in order to make it easier to associate various bits, here's a description of the main building blocks.

libinput uses something resembling OOP except that in C you can't have nice things unless what you want is a buffer overflow\n\80xb1001af81a2b1101. Instead, we use opaque structs, each with accessor methods and an unhealthy amount of verbosity. Because Python does have classes, those structs are represented as classes below. This all won't be actual working Python code, I'm just using the syntax.

Let's get started. First of all, let's create our library interface.

class Libinput:
   @classmethod
   def path_create_context(cls):
        return _LibinputPathContext()

   @classmethod
   def udev_create_context(cls):
       return _LibinputUdevContext()

   # dispatch() means: read from all our internal fds and
   # call the dispatch method on anything that has changed
   def dispatch(self):
        for fd in self.epoll_fd.get_changed_fds():
           self.handlers[fd].dispatch()

   # return whatever the next event is
   def get_event(self):
        return self._events.pop(0)

   # the various _notify functions are internal API
   # to pass things up to the context
   def _notify_device_added(self, device):
        self._events.append(LibinputEventDevice(device))
        self._devices.append(device)

   def _notify_device_removed(self, device):
        self._events.append(LibinputEventDevice(device))
        self._devices.remove(device)

   def _notify_pointer_motion(self, x, y):
        self._events.append(LibinputEventPointer(x, y))



class _LibinputPathContext(Libinput):
   def add_device(self, device_node):
       device = LibinputDevice(device_node)
       self._notify_device_added(device)

   def remove_device(self, device_node):
       self._notify_device_removed(device)


class _LibinputUdevContext(Libinput):
   def __init__(self):
       self.udev = udev.context()

   def udev_assign_seat(self, seat_id):
       self.seat_id = seat.id

       for udev_device in self.udev.devices():
          device = LibinputDevice(udev_device.device_node)
          self._notify_device_added(device)


We have two different modes of initialisation, udev and path. The udev interface is used by Wayland compositors and adds all devices on the given udev seat. The path interface is used by the X.Org driver and adds only one specific device at a time. Both interfaces have the dispatch() and get_events() methods which is how every caller gets events out of libinput.

In both cases we create a libinput device from the data and create an event about the new device that bubbles up into the event queue.

But what really are events? Are they real or just a fidget spinner of our imagination? Well, they're just another object in libinput.

class LibinputEvent:
     @property
     def type(self):
        return self._type

     @property
     def context(self):
         return self._libinput
       
     @property
     def device(self):
        return self._device

     def get_pointer_event(self):
        if instanceof(self, LibinputEventPointer):
            return self  # This makes more sense in C where it's a typecast
        return None

     def get_keyboard_event(self):
        if instanceof(self, LibinputEventKeyboard):
            return self  # This makes more sense in C where it's a typecast
        return None


class LibinputEventPointer(LibinputEvent):
     @property
     def time(self)
        return self._time/1000

     @property
     def time_usec(self)
        return self._time

     @property
     def dx(self)
        return self._dx

     @property
     def absolute_x(self):
        return self._x * self._x_units_per_mm

     @property
     def absolute_x_transformed(self, width):
        return self._x *  width/ self._x_max_value
You get the gist. Each event is actually an event of a subtype with a few common shared fields and a bunch of type-specific ones. The events often contain some internal value that is calculated on request. For example, the API for the absolute x/y values returns mm, but we store the value in device units instead and convert to mm on request.

So, what's a device then? Well, just another I-cant-believe-this-is-not-a-class with relatively few surprises:

class LibinputDevice:
   class Capability(Enum):
       CAP_KEYBOARD = 0
       CAP_POINTER  = 1
       CAP_TOUCH    = 2
       ...

   def __init__(self, device_node):
      pass  # no-one instantiates this directly

   @property
   def name(self):
      return self._name

   @property
   def context(self):
      return self._libinput_context

   @property
   def udev_device(self):
      return self._udev_device

   @property
   def has_capability(self, cap):
      return cap in self._capabilities

   ...
Now we have most of the frontend API in place and you start to see a pattern. This is how all of libinput's API works, you get some opaque read-only objects with a few getters and accessor functions.

Now let's figure out how to work on the backend. For that, we need something that handles events:

class EvdevDevice(LibinputDevice):
    def __init__(self, device_node):
       fd = open(device_node)
       super().context.add_fd_to_epoll(fd, self.dispatch)
       self.initialize_quirks()

    def has_quirk(self, quirk):
        return quirk in self.quirks

    def dispatch(self):
       while True:
          data = fd.read(input_event_byte_count)
          if not data:
             break

          self.interface.dispatch_one_event(data)

    def _configure(self):
       # some devices are adjusted for quirks before we 
       # do anything with them
       if self.has_quirk(SOME_QUIRK_NAME):
           self.libevdev.disable(libevdev.EV_KEY.BTN_TOUCH)


       if 'ID_INPUT_TOUCHPAD' in self.udev_device.properties:
          self.interface = EvdevTouchpad()
       elif 'ID_INPUT_SWITCH' in self.udev_device.properties:
          self.interface = EvdevSwitch()
       ...
       else:
          self.interface = EvdevFalback()


class EvdevInterface:
    def dispatch_one_event(self, event):
        pass

class EvdevTouchpad(EvdevInterface):
    def dispatch_one_event(self, event):
        ...

class EvdevTablet(EvdevInterface):
    def dispatch_one_event(self, event):
        ...


class EvdevSwitch(EvdevInterface):
    def dispatch_one_event(self, event):
        ...

class EvdevFallback(EvdevInterface):
    def dispatch_one_event(self, event):
        ...
Our evdev device is actually a subclass (well, C, *handwave*) of the public device and its main function is "read things off the device node". And it passes that on to a magical interface. Other than that, it's a collection of generic functions that apply to all devices. The interfaces is where most of the real work is done.

The interface is decided on by the udev type and is where the device-specifics happen. The touchpad interface deals with touchpads, the tablet and switch interface with those devices and the fallback interface is that for mice, keyboards and touch devices (i.e. the simple devices).

Each interface has very device-specific event processing and can be compared to the Xorg synaptics vs wacom vs evdev drivers. If you are fixing a touchpad bug, chances are you only need to care about the touchpad interface.

The device quirks used above are another simple block:

class Quirks:
   def __init__(self):
       self.read_all_ini_files_from_directory('$PREFIX/share/libinput')

   def has_quirk(device, quirk):
       for file in self.quirks:
          if quirk.has_match(device.name) or
             quirk.has_match(device.usbid) or
             quirk.has_match(device.dmi):
             return True
       return False

   def get_quirk_value(device, quirk):
       if not self.has_quirk(device, quirk):
           return None

       quirk = self.lookup_quirk(device, quirk)
       if quirk.type == "boolean":
           return bool(quirk.value)
       if quirk.type == "string":
           return str(quirk.value)
       ...
A system that reads a bunch of .ini files, caches them and returns their value on demand. Those quirks are then used to adjust device behaviour at runtime.

The next building block is the "filter" code, which is the word we use for pointer acceleration. Here too we have a two-layer abstraction with an interface.

class Filter:
   def dispatch(self, x, y):
      # converts device-unit x/y into normalized units
      return self.interface.dispatch(x, y)

   # the 'accel speed' configuration value
   def set_speed(self, speed):
       return self.interface.set_speed(speed)

   # the 'accel speed' configuration value
   def get_speed(self):
       return self.speed

   ...


class FilterInterface:
   def dispatch(self, x, y):
       pass

class FilterInterfaceTouchpad:
   def dispatch(self, x, y):
       ...
       
class FilterInterfaceTrackpoint:
   def dispatch(self, x, y):
       ...

class FilterInterfaceMouse:
   def dispatch(self, x, y):
      self.history.push((x, y))
      v = self.calculate_velocity()
      f = self.calculate_factor(v)
      return (x * f, y * f)

   def calculate_velocity(self)
      for delta in self.history:
          total += delta
      velocity = total/timestamp  # as illustration only

   def calculate_factor(self, v):
      # this is where the interesting bit happens,
      # let's assume we have some magic function
      f = v * 1234/5678
      return f
So libinput calls filter_dispatch on whatever filter is configured and passes the result on to the caller. The setup of those filters is handled in the respective evdev interface, similar to this:
class EvdevFallback:
    ...
    def init_accel(self):
         if self.udev_type == 'ID_INPUT_TRACKPOINT':
             self.filter = FilterInterfaceTrackpoint()
         elif self.udev_type == 'ID_INPUT_TOUCHPAD':
             self.filter = FilterInterfaceTouchpad()
         ...
The advantage of this system is twofold. First, the main libinput code only needs one place where we really care about which acceleration method we have. And second, the acceleration code can be compiled separately for analysis and to generate pretty graphs. See the pointer acceleration docs. Oh, and it also allows us to easily have per-device pointer acceleration methods.

Finally, we have one more building block - configuration options. They're a bit different in that they're all similar-ish but only to make switching from one to the next a bit easier.

class DeviceConfigTap:
    def set_enabled(self, enabled):
       self._enabled = enabled

    def get_enabled(self):
        return self._enabled

    def get_default(self):
        return False

class DeviceConfigCalibration:
    def set_matrix(self, matrix):
       self._matrix = matrix

    def get_matrix(self):
        return self._matrix

    def get_default(self):
        return [1, 0, 0, 0, 1, 0, 0, 0, 1]
And then the devices that need one of those slot them into the right pointer in their structs:
class  EvdevFallback:
   ...
   def init_calibration(self):
      self.config_calibration = DeviceConfigCalibration()
      ...

   def handle_touch(self, x, y):
       if self.config_calibration is not None:
           matrix = self.config_calibration.get_matrix

       x, y = matrix.multiply(x, y)
       self.context._notify_pointer_abs(x, y)

And that's basically it, those are the building blocks libinput has. The rest is detail. Lots of it, but if you understand the architecture outline above, you're most of the way there in diving into the details.

libinput and location-based touch arbitration

One of the features in the soon-to-be-released libinput 1.13 is location-based touch arbitration. Touch arbitration is the process of discarding touch input on a tablet device while a pen is in proximity. Historically, this was provided by the kernel wacom driver but libinput has had userspace touch arbitration for quite a while now, allowing for touch arbitration where the tablet and the touchscreen part are handled by different kernel drivers.

Basic touch arbitratin is relatively simple: when a pen goes into proximity, all touches are ignored. When the pen goes out of proximity, new touches are handled again. There are some extra details (esp. where the kernel handles arbitration too) but let's ignore those for now.

With libinput 1.13 and in preparation for the Dell Canvas Dial Totem, the touch arbitration can now be limited to a portion of the screen only. On the totem (future patches, not yet merged) that portion is a square slightly larger than the tool itself. On normal tablets, that portion is a rectangle, sized so that it should encompass the users's hand and area around the pen, but not much more. This enables users to use both the pen and touch input at the same time, providing for bimanual interaction (where the GUI itself supports it of course). We use the tilt information of the pen (where available) to guess where the user's hand will be to adjust the rectangle position.

There are some heuristics involved and I'm not sure we got all of them right so I encourage you to give it a try and file an issue where it doesn't behave as expected.

Thursday, February 14, 2019

Adding entries to the udev hwdb

In this blog post, I'll explain how to update systemd's hwdb for a new device-specific entry. I'll focus on input devices, as usual.

What is the hwdb and why do I need to update it?

The hwdb is a binary database sitting at /etc/udev/hwdb.bin and /usr/lib/udev/hwdb.d. It is usually used to apply udev properties to specific devices, those properties are then picked up by other processes (udev builtins, libinput, ...) to apply device-specific behaviours. So you'll need to update the hwdb if you need a specific behaviour from the device.

One of the use-cases I commonly deal with is that some touchpad announces wrong axis ranges or resolutions. With the correct hwdb entry (see the example later) udev can correct these at device initialisation time and every process sees the right axis ranges.

The database is compiled from the various .hwdb files you have sitting on your system, usually in /etc/udev/hwdb.d and /usr/lib/hwdb.d. The terser format of the hwdb files makes them easier to update than, say, writing a udev rule to set those properties.

The full process goes something like this:

  • The various .hwdb files are installed or modified
  • The hwdb.bin file is generated from the .hwdb files
  • A udev rule triggers the udev hwdb builtin. If a match occurs, the builtin prints the to-be properties, and udev captures the output and applies it as udev properties to the device
  • Some other process (often a different udev builtin) reads the udev property value and does something.
On its own, the hwdb is merely a lookup tool though, it does not modify devices. Think of it as a virtual filing cabinet, something will need to look at it, otherwise it's just dead weight.

An example for such a udev rule from 60-evdev.rules contains:

IMPORT{builtin}="hwdb --subsystem=input --lookup-prefix=evdev:", \
  RUN{builtin}+="keyboard", GOTO="evdev_end"
The IMPORT statement translates as "look up the hwdb, import the properties". The RUN statement runs the "keyboard" builtin which may change the device based on the various udev properties now set. The GOTO statement goes to skip the rest of the file.

So again, on its own the hwdb doesn't do anything, it merely prints to-be udev properties to stdout, udev captures those and applies them to the device. And then those properties need to be processed by some other process to actually apply changes.

hwdb file format

The basic format of each hwdb file contains two types of entries, match lines and property assignments (indented by one space). The match line defines which device it is applied to.

For example, take this entry from 60-evdev.hwdb:

# Lenovo X230 series
evdev:name:SynPS/2 Synaptics TouchPad:dmi:*svnLENOVO*:pn*ThinkPad*X230*
 EVDEV_ABS_01=::100
 EVDEV_ABS_36=::100
The match line is the one starting with "evdev", the other two lines are property assignments. Property values are strings, any interpretation to numeric values or others is to be done in the process that requires those properties. Noteworthy here: the hwdb can overwrite previously set properties, but it cannot unset them.

The match line is not specified by the hwdb beyond "it's a glob". The format to use is defined by the udev rule that invokes the hwdb builtin. Usually the format is:

someprefix:search criteria:
For example, the udev rule that applies for the match above is this one in 60-evdev.rules:
KERNELS=="input*", \
  IMPORT{builtin}="hwdb 'evdev:name:$attr{name}:$attr{[dmi/id]modalias}'", \
  RUN{builtin}+="keyboard", GOTO="evdev_end"
What does this rule do? $attr entries get filled in by udev with the sysfs attributes. So on your local device, the actual lookup key will end up looking roughly like this:
evdev:name:Some Device Name:dmi:bvnWhatever:bvR112355:bd02/01/2018:...
If that string matches the glob from the hwdb, you have a match.

Attentive readers will have noticed that the two entries from 60-evdev.rules I posted here differ. You can have multiple match formats in the same hwdb file. The hwdb doesn't care, it's just a matching system.

We keep the hwdb files matching the udev rules names for ease of maintenance so 60-evdev.rules keeps the hwdb files in 60-evdev.hwdb and so on. But this is just for us puny humans, the hwdb will parse all files it finds into one database. If you have a hwdb entry in my-local-overrides.hwdb it will be matched. The file-specific prefixes are just there to not accidentally match against an unrelated entry.

Applying hwdb updates

The hwdb is a compiled format, so the first thing to do after any changes is to run

$ systemd-hwdb update
This command compiles the files down to the binary hwdb that is actually used by udev. Without that update, none of your changes will take effect.

The second thing is: you need to trigger the udev rules for the device you want to modify. Either you do this by physically unplugging and re-plugging the device or by running

$ udevadm trigger
or, better, trigger only the device you care about to avoid accidental side-effects:
$ udevadm trigger /sys/class/input/eventXYZ
In case you also modified the udev rules you should re-load those too. So the full quartet of commands after a hwdb update is:
$ systemd-hwdb update
$ udevadm control --reload-rules
$ udevadm trigger
$ udevadm info /sys/class/input/eventXYZ
That udevadm info command lists all assigned properties, these should now include the modified entries.

Adding new entries

Now let's get down to what you actually want to do, adding a new entry to the hwdb. And this is where it also get's tricky to have a generic guide because every hwdb file has its own custom match rules.

The best approach is to open the .hwdb files and the matching .rules file and figure out what the match formats are and which one is best. For USB devices there's usually a match format that uses the vendor and product ID. For built-in devices like touchpads and keyboards there's usually a dmi-based match format (see /sys/class/dmi/id/modalias). In most cases, you can just take an existing entry and copy and modify it.

My recommendation is: add an extra property that makes it easy to verify the new entry is applied. For example do this:

# Lenovo X230 series
evdev:name:SynPS/2 Synaptics TouchPad:dmi:*svnLENOVO*:pn*ThinkPad*X230*
 EVDEV_ABS_01=::100
 EVDEV_ABS_36=::100
 FOO=1
Now run the update commands from above. If FOO=1 doesn't show up, then you know it's the hwdb entry that's not yet correct. If FOO=1 does show up in the udevadm info output, then you know the hwdb matches correctly and any issues will be in the next layer.

Increase the value with every change so you can tell whether the most recent change is applied. And before your submit a pull request, remove the FOO entry.

Oh, and once it applies correctly, I recommend restarting the system to make sure everything is in order on a freshly booted system.

Troubleshooting

The reason for adding hwdb entries is always because we want the system to handle a device in a custom way. But it's hard to figure out what's wrong when something doesn't work (though 90% of the time it's a typo in the hwdb match).

In almost all cases, the debugging sequence is the following:

  • does the FOO property show up?
  • did you run systemd-hwdb update?
  • did you run udevadm trigger?
  • did you restart the process that requires the new udev property?
  • is that process new enough to have support for that property?
If the answer to all these is "yes" and it still doesn't work, you may have found a bug. But 99% of the time, at least one of those is a sound "no. oops.".

Your hwdb match may run into issues with some 'special' characters. If your device has e.g. an ® in its device name (some Microsoft devices have this), a bug in systemd caused the match to fail. That bug is fixed now but until it's available in your distribution, replace with an asterisk ('*') in your match line.

Greybeards who have been around since before 2014 (systemd v219) may remember a different tool to update the hwdb: udevadm hwdb --update. This tool still exists, but it does not have the exact same behaviour as systemd-hwdb update. I won't go into details but the hwdb generated by the udevadm tool can provide unexpected matches if you have multiple matches with globs for the same device. A more generic glob can take precedence over a specific glob and so on. It's a rare and niche case and fixed since systemd v233 but the udevadm behaviour remained the same for backwards-compatibility.

Happy updating and don't forget to add Database Administrator to your CV when your PR gets merged.

Wednesday, December 12, 2018

High resolution wheel scrolling on Linux v4.21

Disclaimer: this is pending for v4.21 and thus not yet in any kernel release.

Most wheel mice have a physical feature to stop the wheel from spinning freely. That feature is called detents, notches, wheel clicks, stops, or something like that. On your average mouse that is 24 wheel clicks per full rotation, resulting in the wheel rotating by 15 degrees before its motion is arrested. On some other mice that angle is 18 degrees, so you get 20 clicks per full rotation.

Of course, the world wouldn't be complete without fancy hardware features. Over the last 10 or so years devices have added free-wheeling scroll wheels or scroll wheels without distinct stops. In many cases wheel behaviour can be configured on the device, e.g. with Logitech's HID++ protocol. A few weeks back, Harry Cutts from the chromium team sent patches to enable Logitech high-resolution wheel scrolling in the kernel. Succinctly, these patches added another axis next to the existing REL_WHEEL named REL_WHEEL_HI_RES. Where available, the latter axis would provide finer-grained scroll information than the click-by-click REL_WHEEL. At the same time I accidentally stumbled across the documentation for the HID Resolution Multiplier Feature. A few patch revisions later and we now have everything queued up for v4.21. Below is a summary of the new behaviour.

The kernel will continue to provide REL_WHEEL as axis for "wheel clicks", just as before. This axis provides the logical wheel clicks, (almost) nothing changes here. In addition, a REL_WHEEL_HI_RES axis is available which allows for finer-grained resolution. On this axis, the magic value 120 represents one logical traditional wheel click but a device may send a fraction of 120 for a smaller motion. Userspace can either accumulate the values until it hits a full 120 for one wheel click or it can scroll by a few pixels on each event for a smoother experience. The same principle is applied to REL_HWHEEL and REL_HWHEEL_HI_RES for horizontal scroll wheels (which these days is just tilting the wheel). The REL_WHEEL axis is now emulated by the kernel and simply sent out whenever we have accumulated 120.

Important to note: REL_WHEEL and REL_HWHEEL are now legacy axes and should be ignored by code handling the respective high-resolution version.

The magic value of 120 is taken directly from Windows. That value was chosen because it has a good number of integer factors, so dividing 120 by whatever multiplier the mouse uses gives you a integer fraction of 120. And because HW manufacturers want it to work on Windows, we can rely on them doing it right, provided we use the same approach.

There are two implementations that matter. Harry's patches enable the high-resolution scrolling on Logitech mice which seem to mostly have a multiplier of 8 (i.e. REL_WHEEL_HI_RES will send eight events with a value of 15 before REL_WHEEL sends 1 click). There are some interesting side-effects with e.g. the MX Anywhere 2S. In high-resolution mode with a multiplier of 8, a single wheel movement does not always give us 8 events, the firmware does its own magic here. So we have some emulation code in place with the goal of making the REL_WHEEL event happen on the mid-point of a wheel click motion. The exact point can shift a bit when the device sends 7 events instead of 8 so we have a few extra bits in place to reset after timeouts and direction changes to make sure the wheel behaviour is as consistent as possible.

The second implementation is for the generic HID protocol. This was all added for Windows Vista, so we're only about a decade behind here. Microsoft got the Resolution Multiplier feature into the official HID documentation (possibly in the hope that other HW manufacturers implement it which afaict didn't happen). This feature effectively provides a fixed value multiplier that the device applies in hardware when enabled. It's basically the same as the Logitech one except it's set through a HID feature instead of a vendor-specific protocol. On the devices tested so far (all Microsoft mice because no-one else seems to implement this) the multipliers vary a bit, ranging from 4 to 12. And the exact behaviour varies too. One mouse behaves correctly (Microsoft Comfort Optical Mouse 3000) and sends more events than before. Other mice just send the multiplied value instead of the normal value, so nothing really changes. And at least one mouse (Microsoft Sculpt Ergonomic) sends the tilt-wheel values more frequently and with a higher value. So instead of one event with value 1 every X ms, we now get an event with value 3 every X/4 ms. The mice tested do not drop events like the Logitech mice do, so we don't need fancy emulation code here. Either way, we map this into the 120 range correctly now, so userspace gets to benefit.

As mentioned above, the Resolution Multiplier HID feature was introduced for Windows Vista which is... not the most recent release. I have a strong suspicion that Microsoft dumped this feature as well, the most recent set of mice I have access to don't provide the feature anymore (they have vendor-private protocols that we don't know about instead). So the takeaway for all this is: if you have a Logitech mouse, you'll get higher-resolution scrolling on v4.21. If you have a Microsoft mouse a few years old, you may get high-resolution wheel scrolling if the device supports it. Any other vendor or a new Microsoft mouse, you don't get it.

Coincidentally, if you know anyone at Microsoft who can provide me with the specs for their custom protocol, I'd appreciate it. We'd love to have support for it both in libratbag and in the kernel. Or any other vendor, come to think of it.

Tuesday, December 11, 2018

Understanding HID report descriptors

This time we're digging into HID - Human Interface Devices and more specifically the protocol your mouse, touchpad, joystick, keyboard, etc. use to talk to your computer.

Remember the good old days where you had to install a custom driver for every input device? Remember when PS/2 (the protocol) had to be extended to accommodate for mouse wheels, and then again for five button mice. And you had to select the right protocol to make it work. Yeah, me neither, I tend to suppress those memories because the world is awful enough as it is.

As users we generally like devices to work out of the box. Hardware manufacturers generally like to add bits and bobs because otherwise who would buy that new device when last year's device looks identical. This difference in needs can only be solved by one superhero: Committee-man, with the superpower to survive endless meetings and get RFCs approved.

Many many moons ago, when USB itself was in its infancy, Committee man and his sidekick Caffeine boy got the USB consortium agree on a standard for input devices that is so self-descriptive that operating systems (Win95!) can write one driver that can handle this year's device, and next year's, and so on. No need to install extra drivers, your device will just work out of the box. And so HID was born. This may only be an approximate summary of history.

Originally HID was designed to work over USB. But just like Shrek the technology world is obsessed with layers so these days HID works over different transport layers. HID over USB is what your mouse uses, HID over i2c may be what your touchpad uses. HID works over Bluetooth and it's celebrity-diet version BLE. Somewhere, someone out there is very slowly moving a mouse pointer by sending HID over carrier pigeons just to prove a point. Because there's always that one guy.

HID is incredibly simple in that the static description of the device can just be bytes burnt into the ROM like the Australian sun into unprepared English backpackers. And the event frames are often an identical series of bytes where every bit is filled in by the firmware according to the axis/buttons/etc.

HID is incredibly complicated because parsing it is a stack-based mental overload. Each individual protocol item is simple but getting it right and all into your head is tricky. Luckily, I'm here for you to make this simpler to understand or, failing that, at least more entertaining.

As said above, the purpose of HID is to make devices describe themselves in a generic manner so that you can have a single driver handle any input device. The idea is that the host parses that standard protocol and knows exactly how the device will behave. This has worked out great, we only have around 200 files dealing with vendor- and hardware-specific HID quirks as of v4.20.

HID messages are Reports. And to know what a Report means and how to interpret it, you need a Report Descriptor. That Report Descriptor is static and contains a series of bytes detailing "what" and "where", i.e. what a sequence of bits represents and where to find those bits in the Report. So let's try and parse one of Report Descriptors, let's say for a fictional mouse with a few buttons. How exciting, we're at the forefront of innovation here.

The Report Descriptor consists of a bunch of Items. A parser reads the next Item, processes the information within and moves on. Items are small (1 byte header, 0-4 bytes payload) and generally only apply exactly one tiny little bit of information. You need to accumulate several items to build up enough information to actually know what's happening.

The "what" question of the Report Descriptor is answered with the so-called Usage. This could be something simple like X or Y (0x30 and 0x31) or something more esoteric like System Menu Exit (0x88). A Usage is 16 bits but all Usages are grouped into so-called Usage Pages. A Usage Page too is a 16 bit value and together they form the 32-bit value that tells us what the device can do. Examples:

     0001 0031 # Generic Desktop, Y
     0001 0088 # Generic Desktop, System Menu Exit
     0003 0005 # VR Controls, Head Tracker
     0003 0006 # VR Controls, Head Mounted Display
     0004 0031 # Keyboard, Keyboard \ and | 
Note how the Usage in the last item is the same as the first one, without the Usage Page you will mix things up. It helps if you always think of as the Usage as a 32-bit number. For your kids' bed-time story time, here are the HID Usage Tables from 2004 and the approved HID Usage Table Review Requests of the last decade. Because nothing puts them to sleep quicker than droning on about hex numbers associated with remote control buttons.

To successfully interpret a Report from the device, you need to know which bits have which Usage associated with them. So let's go back to our innovative mouse. We would want a report descriptor with 6 items like this:

Usage Page (Generic Desktop)
Usage (X)
Report Size (16)
Usage Page (Generic Desktop)
Usage (Y)
Report Size (16)
This basically tells the host: X and Y both have 16 bits. So if we get a 4-byte Report from the device, we know two bytes are for X, two for Y.

HID was invented when a time when bits were more expensive than printer ink, so we can't afford to waste any bits (still the case because who would want to spend an extra penny on more ROM). HID makes use of so-called Global items, once those are set their value applies to all following items until changed. Usage Page and Report Size are such Global items, so the above report descriptor is really implemented like this:

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
The Report Count just tells us that 2 fields of the current Report Size are coming up. We have two usages, two fields, and 16 bits each so we know what to do. The Input item is sort-of the marker for the end of the stack, it basically tells us "process what you've seen so far", together with a few flags. Rel in this case means that the Usages are relative. Oh, and Input means that this is data from device to host. Output would be data from host to device, e.g. to set LEDs on a keyboard. There's also Feature which indicates configurable items.

Buttons on a device are generally just numbered so it'd be monumental 16-bits-at-a-time waste to have HID send Usage (Button1), Usage (Button2), etc. for every button on the device. HID instead provides a Usage Minimum and Usage Maximum to sequentially order them. This looks like this:

Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
So we have 5 buttons here and each button has one bit. Note how the buttons are Abs because a button state is not a relative value, it's either down or up. HID is quite intolerant to Schrödinger's thought experiments.

Let's put the two things together and we have an almost-correct Report descriptor:

Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)

Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
New here is Cnst. This signals that the bits have a constant value, thus don't need a Usage and basically don't matter (haha. yeah, right. in theory). Linux does indeed ignore those. Cnst is used for padding to align on byte boundaries - 5 bits for buttons plus 3 bits padding make 8 bits. Which makes one byte as everyone agrees except for granddad over there in the corner. I don't know how he got in.

Were we to get a 5-byte Report from the device, we'd parse it approximately like this:

  button_state = byte[0] & 0x1f
  x = bytes[1] | (byte[2] << 8)
  y = bytes[3] | (byte[4] << 8)
Hooray, we're almost ready. Except not. We may need more info to correctly interpret the data within those reports.

The Logical Minimum and Logical Maximum specify the value range of the actual data. We need this to tell us whether the data is signed and what the allowable range is. Together with the Physical Minimum and the Physical Maximum they specify what the values really mean. In the simple case:

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
This just means our x/y data is signed. Easy. But consider this combination:
...
Logical Minimum (0)
Logical Maximum (1)
Physical Minimum (1)
Physical Maximum (12)
This means that if the bit is 0, the effective value is 1. If the bit is 1, the effective value is 12.

Note that the above is one report only. Devices may have multiple Reports, indicated by the Report ID. So our Report Descriptor may look like this:

Report ID (01)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Report ID (02)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
If we were to get a Report now, we need to check byte 0 for the Report ID so we know what this is. i.e. our single-use hard-coded parser would look like this:
  if byte[0] == 0x01:
    button_state = byte[1] & 0x1f
  else if byte[0] == 0x02:
    x = bytes[2] | (byte[3] << 8)
    y = bytes[4] | (byte[5] << 8)
A device may use multiple Reports if the hardware doesn't gather all data within the same hardware bits. Now, you may ask: if I get fifteen reports, how should I know what belongs together? Good question, and lucky for you the HID designers are miles ahead of you. Report IDs are grouped into Collections.

Collections can have multiple types. An Application Collection describes a set of inputs that make sense as a whole. Usually, every Report Descriptor must define at least one Application Collection but you may have two or more. For example, a a keyboard with integrated trackpoint should and/or would use two. This is how the kernel knows it needs to create two separate event nodes for the device. Application Collections have a few reserved Usages that indicate to the host what type of device this is. These are e.g. Mouse, Joystick, Consumer Control. If you ever wondered why you have a device named like "Logitech G500s Laser Gaming Mouse Consumer Control" this is the kernel simply appending the Application Collection's Usage to the device name.

A Physical Collection indicates that the data is collected at one physical point though what a point is is a bit blurry. Theoretical physicists will disagree but a point can be "a mouse". So it's quite common for all reports on a mouse to be wrapped in one Physical Collections. If you have a device with two sets of sensors, you'd have two collections to illustrate which ones go together. Physical Collections also have reserved Usages like Pointer or Head Tracker.

Finally, a Logical Collection just indicates that some bits of data belong together, whatever that means. The HID spec uses the example of buffer length field and buffer data but it's also common for all inputs from a mouse to be grouped together. A quick check of my mice here shows that Logitech doesn't wrap the data into a Logical Collection but Microsoft's firmware does. Because where would we be if we all did the same thing...

Anyway. Now that we know about collections, let's look at a whole report descriptor as seen in the wild:

Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Application)
 Usage Page (Generic Desktop)
 Usage (Mouse)
 Collection (Logical)
  Report ID (26)
  Usage (Pointer)
  Collection (Physical)
   Usage Page (Button)
   Usage Minimum (1)
   Usage Maximum (5)
   Report Count (5)
   Report Size (1)
   Logical Minimum (0)
   Logical Maximum (1)
   Input (Data,Var,Abs)
   Report Size (3)
   Report Count (1)
   Input (Cnst,Arr,Abs)
   Usage Page (Generic Desktop)
   Usage (X)
   Usage (Y)
   Report Count (2)
   Report Size (16)
   Logical Minimum (-32767)
   Logical Maximum (32767)
   Input (Data,Var,Rel)
   Usage (Wheel)
   Physical Minimum (0)
   Physical Maximum (0)
   Report Count (1)
   Report Size (16)
   Logical Minimum (-32767)
   Logical Maximum (32767)
   Input (Data,Var,Rel)
  End Collection
 End Collection
End Collection
We have one Application Collection (Generic Desktop, Mouse) that contains one Logical Collection (Generic Desktop, Mouse). That contains one Physical Collection (Generic Desktop, Pointer). Our actual Report (and we have only one but it has the decimal ID 26) has 5 buttons, two 16-bit axes (x and y) and finally another 16 bit axis for the Wheel. This device will thus send 8-byte reports and our parser will do:
   if byte[0] != 0x1a: # it's decimal in the above descriptor
        error, should be 26
   button_state = byte[1] & 0x1f
   x = byte[2] | (byte[3] << 8)
   y = byte[4] | (byte[5] << 8)
   wheel = byte[6] | (byte[7] << 8)
That's it. Now, obviously, you can't write a parser for every HID descriptor out there so your actual parsing code needs to be generic. The Linux kernel does exactly that and so does everything else that needs to parse HID. There's a huge variety in devices out there, all with HID descriptors that may or may not be correct. As with so much in life, correct HID implementations are often defined by "whatever Windows accepts" so if you like playing catch, Linux development is for you.

Oh, in case you just got a bit too optimistic about the state of the world: HID allows for vendor-defined usages. Which does exactly what you'd think it does, it hides vendor-specific protocol inside what should be a generic protocol. There are devices with hidden report IDs that you can only unlock by sending the right magic sequence to the report and/or by defeating the boss on Level 4. Usually those devices present themselves as basic/normal devices over HID but if you know the magic sequence you get to use *gasp* all buttons. Or access the device-specific configuration features. Logitech's HID++ is just one example here but at least that's one where we have most of the specs available.

The above describes how to parse the HID report descriptor and interpret the reports. But what happens once you have a HID report correctly parsed? In the case of the Linux kernel, once the report descriptor is parsed evdev nodes are created (one per Application Collection, more or less). As the Reports come in, they are mapped to evdev codes and the data appears on the evdev node. That's where userspace like libinput can pick it up. That bit is actually quite simple (mostly anyway).

The above output was generated with the tools from the hid-tools repository. Go forth and hid-record.

Monday, December 3, 2018

ggkbdd is a generic gaming keyboard daemon

Last week while reviewing a patch I read that some gaming keyboards have two modes - keyboard mode and gaming mode. When in gaming mode, the keys send out pre-recorded macros when pressed. Presumably (I am not a gamer) this is to record keyboard shortcuts to have quicker access to various functionalities. The macros are stored in the hardware and are thus relatively independent of the host system. Pprovided you have access to the custom protocol, which you probably don't when you're on Linux. But I digress.

I reckoned this could be done in software and work with any 5 dollar USB keyboard. A few hours later, I have this working now: ggkbdd. It sits directly above the kernel and waits for key events. Once the 'mode key' is hit, the keyboard will send pre-configured key sequences for the respective keys. Hitting the mode key again (or ESC) switches back to normal mode.

There's a lot of functionality that is missing such as integration with the desktop (probably via DBus), better security (dropping privs, masking the fd to avoid accidental key logging), better system integration (request fds from logind, possibly through the compositor). And error handling, etc. I think the total time on this spent is somewhere between 3 and 4h, and that includes the time to write this blog post and debug the systemd unit autostartup. There are likely other projects that solve it the same way, or at least in a similar manner. I didn't check.

This was done as proof-of-concept and

  • I don't know if it's useful and if so, what the use-cases are
  • I don't know if I will have any time to fix things on this
  • I don't know if other (better developed) projects already occupy that space
In the grand glorious future and provided this is indeed something generally useful, this would need compositor integration. Not sure we'll ever get to that point. Meanwhile, consider this a code drop for a proof-of-concept and expect that you'll have to fix any bugs yourself.