Tuesday, August 25, 2020

libei - a library to support emulated input

Let's talk about eggs. X has always supported XSendEvent() which allows anyone to send any event to any client [1]. However, this event had a magic bit to make it detectable, so clients detect and subsequently ignore it. Spoofing input that just gets ignored is of course not productive, so in the year 13 BG [2] the XTest extension was conceived. XTest has a few requests that allow you to trigger a keyboard event (press and release, imagine the possibilities), buttons and pointer motion. The name may seem odd until someone explains to you that it was primarily written to support automated testing of X servers. But no-one has the time to explain that.

Having a separate extension worked around the issue of detectability and thus any client could spoof input events. Security concerns were addressed with "well, just ifdef out that extension then" which worked great until other applications started using it for input emulation. Since around ~2008 XTest events are emulated through special XTest devices in the server but that is solely to make the implementation less insane. Technically this means that XTest events are detectable again, except that no-one bothers to actually do that. Having said that, these devices only make it possible to detect an XTest event, but not which client sent that event. And, due to how the device hierarchy works, it's really hard to filter out those events anyway.

Now it's 2020 and we still have an X server that basically allows access to anything and any client to spoof input. This level of security is industry standard for IoT devices but we are trying to be more restrictive than that on your desktop, lest the stuff you type actually matters. The accepted replacement for X is of course Wayland, but Wayland-the-protocol does not provide an extension to emulate input. This makes sense, emulating input is not exactly a display server job [3] but it does leaves us with a bunch of things no longer working.


So let's talk about libei. This new library for Emulated Input consists of two components: libei, the client library to be used in applications, and libeis, the part to be used by an Emulated Input Server, to be used in the compositors. libei provides the API to connect to an EIS implementation and send events. libeis provides the API to receive those events and pass them on to the compositor. Let's see how this looks like in the stack:

    +--------------------+             +------------------+
    | Wayland compositor |---wayland---| Wayland client B |
    +--------------------+\            +------------------+
    | libinput | libeis  | \_wayland______
    +----------+---------+                \
        |          |           +-------+------------------+
 /dev/input/       +---brei----| libei | Wayland client A |
"brei" is the communication "Bridge for EI" and is a private protocol between libei and libeis.

Emulated input is not particularly difficult, mostly because when you're emulating input, you know exactly what you are trying to do. There is no interpretation of bad hardware data like libinput has to do, it's all straightforward. Hence the communication between libei and libeis is relatively trivial, it's all button, keyboard, pointer and touch events. Nothing fancy here and I won't go into details because that part will work as you expect it to. The selling point of libei is elsewhere, primarily in separation, distinction and control.

libei is a separate library to libinput or libwayland or Xlib or whatever else may have. By definition, it is thus a separate input stream in both the compositor and the client. This means the compositor can do things like display a warning message while emulated input is active. Or it can re-route the input automatically, e.g. assign a separate wl_seat to the emulated input devices so they can be independently focused, etc. Having said that, the libeis API is very similar to libinput's API so integration into compositors is quite trivial because most of the hooks to process incoming events are already in place.

libei distinguishes between different clients, so the compositor is always aware of who is currently emulating input. You have synergy, xdotool and a test suite running at the same time? The compositor is aware of which client is sending which events and can thus handle those accordingly.

Finally, libei provides control over the emulated input. This works on multiple levels. A libei client has to request device capabilities (keyboard, touch, pointer) and the compositor can restrict to a subset of those (e.g. "no keyboard for you"). Second, the compositor can suspend/resume a device at any time. And finally, since the input events go through the compositor anyway, it can discard events it doesn't like. For example, even where the compositor allowed keyboards and the device is not suspended, it may still filter out Escape key presses. Or rewrite those to be Caps Lock because we all like a bit of fun, don't we?

The above isn't technically difficult either, libei itself is not overly complex. The interaction between an EI client and an EIS server is usually the following:

  • client connects to server and says hello
  • server disconnects the client, or accepts it
  • client requests one or more devices with a set of capabilities
  • server rejects those devices or allows those devices and/or limits their capabilities
  • server resumes the device (because they are suspended by default)
  • client sends events
  • client disconnects, server disconnects the client, server suspends the device, etc.
Again, nothing earth-shattering here. There is one draw-back though: the server must approve of a client and its devices, so a client is not able to connect, send events and exit. It must wait until the server has approved the devices before it can send events. This means tools like xdotool need to be handled in a special way, more on that below.

Flatpaks and portals

With libei we still have the usual difficulties: a client may claim it's synergy when really it's bad-hacker-tool. There's not that much we can do outside a sandbox, but once we are guarded by a portal, things look different:

    | Wayland compositor |_
    +--------------------+  \
    | libinput | libeis  |   \_wayland______
    +----------+---------+                  \
        |     [eis-0.socket]                 \
 /dev/input/     /   \\       +-------+------------------+
                |      ======>| libei | Wayland client A |
                |      after    +-------+------------------+
         initial|     handover   /
      connection|               / initial request
                |              /  dbus[org.freedesktop.portal.EmulatedInput]
        | xdg-desktop-portal |
The above shows the interaction when a client is run inside a sandbox with access to the org.freedesktop.portal.Desktop bus. The libei client connects to the portal (it won't see the EIS server otherwise), the portal hands it back a file descriptor to communicate on and from then on it's like a normal EI session. What the portal implementation can do though is write several Restrictions for EIS on the server side of the file descriptor using libreis. Usually, the portal would write the app-id of the client (thus guaranteeing a reliable name) and maybe things like "don't allow keyboard capabilities for any device". Once the fd is handed to the libei client, the restrictions cannot be loosened anymore so a client cannot overwrite its own name.

So the full interaction here becomes:

  • Client connects to org.freedesktop.portal.EmulatedInput
  • Portal implementation verifies that the client is allowed to emulate input
  • Portal implementation obtains a socket to the EIS server
  • Portal implementation sends the app id and any other restrictions to the EIS server
  • Portal implementation returns the socket to the client
  • Client creates devices, sends events, etc.
For a client to connect to the portal instead of the EIS server directly is currently a one line change.

Note that the portal implementation is still in its very early stages and there are changes likely to happen. The general principle is unlikely to change though.


Turns out we still have a lot of X clients around so somehow we want to be able to use those. Under Wayland, those clients connect to Xwayland which translates X requests to Wayland requests. X clients will use XTest to emulate input which currently goes to where the dodos went. But we can add libei support to Xwayland and connect XTest, at which point the diagram looks like this:

    +--------------------+             +------------------+
    | Wayland compositor |---wayland---| Wayland client B |
    +--------------------+\            +------------------+
    | libinput | libeis  | \_wayland______
    +----------+---------+                \
        |          |           +-------+------------------+
 /dev/input/       +---brei----| libei |     XWayland     |
                                                | XTEST
                                         |  X client |
XWayland is basically just another Wayland client but it knows about XTest requests, and it knows about the X clients that request those. So it can create a separate libei context for each X client, with pointer and keyboard devices that match the XTest abilities. The end result of that is you can have a xdotool libei context, a synergy libei context, etc. And of course, where XWayland runs in a sandbox it could use the libei portal backend. At which point we have a sandboxed X client using a portal to emulate input in the Wayland compositor. Which is pretty close to what we want.

Where to go from here?

So, at this point the libei repo is still sitting in my personal gitlab namespace. Olivier Fourdan has done most of the Xwayland integration work and there's a basic Weston implementation. The portal work (tracked here) is the one needing the most attention right now, followed by the implementations in the real compositors. I think I have tentative agreement from the GNOME and KDE developers that this all this is a good idea. Given that the goal of libei is to provide a single approach to emulate input that works on all(-ish) compositors [4], that's a good start.

Meanwhile, if you want to try it, grab libei from git, build it, and run the eis-demo-server and ei-demo-client tools. And for portal support, run the eis-fake-portal tool, just so you don't need to mess with the real one. At the moment, those demo tools will have a client connecting a keyboard and pointer and sending motion, button and 'qwerty' keyboard events every few seconds. The latter with client and/or server-set keymaps because that's possible too.


What does all this have to do with eggs? "Ei", "Eis", "Brei", and "Reis" are, respectively, the German words for "egg", "ice" or "ice cream", "mush" (think: porridge) and "rice". There you go, now you can go on holidays to a German speaking country and sustain yourself on a nutritionally imbalanced diet.

[1] The whole "any to any" is a big thing in X and just shows that in the olden days you could apparently trust, well, apparently anyone
[2] "before git", i.e. too bothersome to track down so let's go with the Copyright notice in the protocol specification
[3] XTest the protocol extension exists presumably because we had a hammer and a protocol extension thus looked like a nail
[4] Let's not assume that every compositor having their own different approach to emulation is a good idea

No comments: