This is a higher-level explanation of the inputfd protocol RFC I sent to the list on March 31. Note that this is a first draft, this protocol may never see the light of the day or may be significantly altered before it lands.
First, what is it? inputfd is a protocol for a Wayland compositor to pass a file descriptor (fd) for an input device directly to the client. The client can then read events off this fd and process them, without any additional buffering in between. In the ideal case, the compositor doesn't care (or even notice) that events are flowing between the kernel and the client. Because the compositor sets up the fd, the client does not need any special privileges.
Why is this needed? There are a few input devices that will not be handled by libinput. libinput is the stack for those devices that have a direct interaction with the desktop - mice, keyboards, touchpads, ... But plenty of devices do not want or require desktop interactions. Joysticks/gamepads are the prime example here, but 3D mice or VR input devices also come to mind. These devices don't control the cursor on the desktop, so the compositor doesn't care about the device's events. Note that the current draft only caters for joysticks, 3D mice etc. are TBD.
Why not handle these devices in libinput? Joysticks in particular are so varied that a library like libinput would have to increase the API surface area massively to cater for every possibility. And in the end it is likely that libinput would merely buffer events and pass them on almost as-is anyway. From a maintenance POV - I don't want to deal with that. And from a technical POV - there's little point to have libinput in between anyway. Furthermore, it's already the case that gaming devices are opened directly by the application rather than going through X. So just connecting the clients with the device directly has a advantages.
What does the compositor do? The compositor has two jobs: device filtering and focus management. In the current draft, a client can say "give me all gaming devices" but it's the compositor that decides which device is classified as such (see below for details). Depending on seat assignment or other policies, a client may only see a fraction of the devices currently connected.
Focus management is relatively trivial: the compositor decides that a client has focus now (e.g. by clicking into the window) and hands that client an fd. On focus out (e.g. alt-tab) the fd is revoked and the client cannot read any more events. Rinse, wash, repeat as the focus changes. Note that the protocol does not define how the focus is decided, that task is up to the compositor. Having focus management in the compositor gives us a simple benefit: the client can't hog the device or read events when it's out of focus. This makes it easy to have multiple clients running that need access to the same device without messing up state in a backgrounded client. The security benefit is minimal, few people enter their password with a joystick. But it's still a warm fuzzy feeling to know that a client cannot read events when it's not supposed to.
What devices are applicable devices? This is one of the TBD of the protocol. The compositor needs to know whether a device is a gaming device or not, but the default udev tag ID_INPUT_JOYSTICK is too crude and provides false positives (e.g. some Wacom tablets have that tag set). The conversion currently goes towards having a database that can define this better, but this point is far from decided on.
There are a couple of other points that are still being discussed. If you have knowledge in game (framework) development, do join the discussion, we need input. Personally I am far from a game developer, so I cannot fathom all the weird corner cases we may have to deal with here. Flying blind while designing a protocol is fun, but not so much when you then have to maintain it for the next 10 years. So some navigational input would be helpful.
And just to make sure you join the right discussion, I've disabled comments here :)