Friday, January 20, 2012

XKB breaking grabs - CVE-2012-0064

Given that there is a copious amount of misinformation being spread, here is a summary of CVE-2012-0064, straight from a horse's mouth.

Outline of the issue

The bug allows users to work around screen locking (e.g. gnome-screensaver) by hitting Control+Alt+keypad multiply or Control+Alt+keypad divide. This terminates the input grab the screensaver has and thus allows a user to interact with the desktop, skipping the password entry.

Affected versions

Affected is anyone running X server 1.11 or later (or release candidates thereof). So if "Xorg -version" shows something else on your box, stop worrying. I doubt any distribution would have back-ported the patches.

In Fedora/Red Hat land - the only distributions affected are Fedora 16 and current Fedora Rawhide. Both have been fixed, the F16 update is avaialble here. Note that the update is to xkeyboard-config, not to the server itself.

Fedora 15 is not affected. RHEL 4, 5, 6 and thus CentOS 4, 5, 6 and other derivatives are not affected. I believe that most other distributions have now pushed out updates as well, if you want to link to the respective updates, please do so in the comments.

Sergey has also pushed out xkeyboard-config 2.5 today with the fix included.

History of the feature

The X protocol does not allow the server to break grabs. Once a client has a grab, the server must wait for that client to release the grab, terminate, or the grab window to become unviewable. This is an issue when debugging applications - if your client has a keyboard grab, you cannot use the debugger since all key events will go to the client being debugged. To avoid this issue, the X server has had two combinations to break grabs: Control+Alt+Keypad multiply and Control+Alt+Keypad divide. They forced grab termination inside the X server and although against the protocol it made debugging possible. The option required explicit enabling in the xorg.conf.

These options were removed in server 1.4 and disabled since. Which made debugging hard, so last year we merged a patch to bring them back, together with some other features. They are triggered by XKB actions (as they used to be). The plan was to remove the XKB actions from the default keymap so that the action is available on request but not enabled by default. This is where a miscommunication happened, the removal from the default keymap never happened. So server 1.11 and vanilla xkeyboard-config ship with both the actions available and present in the current keymap. As a result, any user can break a grab from any application and thus get around screen locking.

Outline of the fix

To shoot yourself in the foot, you need two items: a gun and a trigger. We have removed the trigger. The fix we've now pushed into xkeyboard-config removes the actions from the default keymap and into an XKB option instead. So the fix does not remove the gun, but it requires the user to screw the trigger in themselves before trying to hurt themselves. In a default configuration, it is thus no longer possible to break the grab of your screensaver.

To re-enable grab debugging, run setxkbmap with "-option grab:break_actions" or enable "Allow breaking grabs with keyboard actions (warning: security risk)" in the "Miscellaneous compatibility options" in your keyboard layout configuration tool of choice.

Tuesday, January 3, 2012

Multitouch in X - Touch grab handling

This post is part of a series on multi-touch support in the X.Org X server.
  1. Multitouch in X - Getting events
  2. Multitouch in X - Pointer emulation
In this post, I'll outline how grabs on touch events work. This post assumes basic knowledge of the XI2 Xlib interfaces.

Passive grabs

The libXi interface has one new passive grab call: XIGrabTouchBegin, which works pretty much like the existing passive grab APIs. As with event selection, you must set all three event masks XI_TouchBegin, XI_TouchUpdate and XI_TouchEnd or a BadValue error occurs. Once a passive grab activates in response to a touch, the client must choose to either accept or reject a touch. Details on that below.

Grabs activate on a TouchBegin event and due to the nature of multitouch, multiple touch grabs may be active at any time - some of them for different clients.

Active grabs

Active grabs do not have a new call, they are handled through the event masks of the existing XIGrabDevice(3) call. If a client has an active touch grab on the device, it is automatically the owner of the touch sequence (ownership is described below). If a client has an active pointer or keyboard grab on the device, it is the owner of the touch sequence for pointer emulated touch events only. Other touch events are unaffected by the grab and are processed normally.

Acceptance and rejection

Pointer grabs provide exclusive access to the device, but to some degree a client can opt to replay the event it received on the next client. We expect that touch sequences will often trigger gesture recognition, and a client may realise after a few events that it doesn't actually want that touch sequence. So we expanded the replay semantics. clients with a touch grab must choose to either accept or reject a touch.

Accepting a touch signals to the server that the touch sequence is meant for this client and no-one else. The server then exclusively delivers to that client until the terminating TouchEnd.

Rejecting a touch sequence signals that the touch sequence is not meant for this client. Once a client rejects a touch sequence, the server sends the TouchEnd event to that client (if the touch is still active) and replays the full touch sequence [1] on the next grab or window. We use the term owner of a touch sequence to talk about the current recipient.

The order of events for two clients Cg and Cw, with Cg having a grab and Cw having a regular touch event selection on a window, is thus:
TouchBegin to Cg    → 
TouchUpdate to Cg   → 
TouchUpdate to Cg   → 
                    ← Cg rejects touch
                    ← Cw becomes new owner
TouchEnd+ to Cg     →
TouchBegin* to Cw   → 
TouchUpdate* to Cw  → 
TouchUpdate* to Cw  → 
#### physical touch ends #### 
TouchEnd to Cw      →
Events with + mark an event created by the server, * mark events replayed by the server

For nested grabs, this sequence simply repeats for each client until either a grabbing client accepts the touch or the client with the event selection becomes the owner.

In the above case, the touch ended after Cg rejected the touch. If the touch ends before the current owner accepted or rejected it, the owner gets the TouchEnd event and the touch is left handing until the owner accepts or rejects it. If accepted, that's it. If rejected, the new owner gets the full sequence in one go, including the TouchEnd event. The sequence is thus:
TouchBegin to Cg    → 
TouchUpdate to Cg   → 
TouchUpdate to Cg   → 
#### physical touch ends #### 
TouchEnd to Cg      →
                    ← Cg rejects touch
                    ← Cw becomes new owner
TouchBegin* to Cw   → 
TouchUpdate* to Cw  → 
TouchUpdate* to Cw  → 
TouchEnd* to Cw     →

Touch ownership handling

One additional event type that XI 2.2 introduces is the XI_TouchOwnership event. Clients selecting for this event signal that they need to receive touch events before they're the owner of the touch sequence. This event type can be selected on both grabs and event selections.

First up: there are specific use-cases where you need this. If you don't fall into them, you're better off just skipping on ownership events, they make everything more complicated. And whether you need ownership events depends not only on you, but also the stack you're running under. On normal touch event selection, touch events are only delivered to the current owner of the touch. With multiple grabs, the delivery is sequential and delivery of touch events may be delayed.

Clients selecting for touch ownership events get the events as they occur, even if they are not the current owner. The XI_TouchOwnership event is delivered if and when they become the current owner. The last part is important: if you select for ownership events, you may receive touch events but you may not become the owner of that sequence. So while you can start reacting to that sequence, anything your app does must be undo-able in case the e.g. window manager claims the touch sequence.

If we look at the same sequence as above with two clients selecting for ownership, the sequence looks like this:
TouchBegin to Cg     → 
TouchBegin to Cw     → 
TouchOwnership to Cg →
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
                     ← Cg rejects touch
                     ← Cw becomes new owner
TouchEnd+ to Cg      →
TouchOwnership to Cw →
#### physical touch ends #### 
TouchEnd to Cw      →
Note: TouchOwnership events do not correspond to any physical event, they are always generated by the server

If a touch ends before the owner accepts, the current owner gets the TouchEnd, all others get a TouchUpdate event instead. That TouchUpdate has a flag XITouchPendingEnd set, signalling that no more actual events will arrive from this touch but the touch is still waiting for owner acceptance.
TouchBegin to Cg     → 
TouchBegin to Cw     → 
TouchOwnership to Cg →
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
#### physical touch ends #### 
TouchEnd to Cg       →
TouchUpdate to Cw    →  (XITouchPendingEnd flag set)
                     ← Cg rejects touch
                     ← Cw becomes new owner
TouchOwnership to Cw →
TouchEnd to Cw       →
In both cases, we dealt with a rejecting owner. For an accepting owner, the sequences look like this:
TouchBegin to Cg     → 
TouchBegin to Cw     → 
TouchOwnership to Cg →
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
                     ← Cg accepts touch
TouchEnd+ to Cw      →
TouchUpdate to Cg    → 
#### physical touch ends #### 
TouchEnd to Cg      →
or
TouchBegin to Cg     → 
TouchBegin to Cw     → 
TouchOwnership to Cg →
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
TouchUpdate to Cg    → 
TouchUpdate to Cw    → 
#### physical touch ends #### 
TouchEnd to Cg       →
TouchUpdate to Cw    →  (XITouchPendingEnd flag set)
                     ← Cg accepts touch
TouchEnd* to Cw      →
In the case of multiple grabs, the same strategy applies in order of grab activation. Ownership events may be selected by some clients but not others. In that case, each client is treated as requested, so the event sequence the server deals with may actually look like this:
TouchBegin to C1     → 
TouchBegin to C3     → 
TouchOwnership to C1 →
TouchUpdate to C1    → 
TouchUpdate to C3    → 
TouchUpdate to C1    → 
TouchUpdate to C3    → 
                     ← C1 rejects touch
                     ← C2 becomes new owner
TouchEnd+ to C1      →
TouchBegin* to C2    → 
TouchUpdate* to C2   → 
TouchUpdate* to C2   → 
                     ← C2 rejects touch
                     ← C3 becomes new owner
TouchEnd+ to C2      →
TouchOwnership to C3 →
#### physical touch ends #### 
TouchEnd to C3       →


[1] obviously we need to store these events so "full sequence" really means all events until the buffer was full

Thursday, December 22, 2011

Multitouch in X - Pointer emulation

This post is part of a series on multi-touch support in the X.Org X server.
  1. Multitouch in X - Getting events
In this post, I'll outline how pointer emulation on touch events works. This post assumes basic knowledge of the XI2 Xlib interfaces.

Why pointer emulation?

One of the base requirements of adding multitouch support to the X server was that traditional, non-multitouch applications can still be used. Multitouch should be a transparent addition, available where needed, not required where not supported.

So we do pointer emulation for multitouch events, and it's actually specified in the protocol how we do it. Mainly so it's reliable and predictable for clients.

What is pointer emulation in X

Pointer emulation simply means that for specific touch sequences, we generate pointer events. The conditions for emulation are that the the touch sequence is eligible for pointer emulation (details below) and that no other client has a touch selection on that window/grab.

The second condition is important: if your client selects for both touch and pointer events on a window, you will never see the emulated pointer events. If you are an XI 2.2 client and you select for pointer but not touch events, you will see pointer events. These events are marked with the XIPointerEmulated so that you know they come from an emulated source.

Emulation on direct-touch devices

For direct-touch devices, we emulate pointer events for a touch sequence provided the touch is the first touch on the device, i.e. no other touch sequences were active for this device when the touch started. The touch sequence is emulated until it ends, even if other touches start and end while that sequence is active.

Emulation on dependent-touch devices

Dependent touch devices do not emulate pointer events. Rather, we send the normal mouse movements from the device as regular pointer events.

Button events and button state

Pointer emulation triggers motion events and, more importantly, button events. The button number for touches is hardcoded to 1 (any more specific handling such as long-click for right buttons should be handled by touch-aware clients instead), so the detail field of an emulated button event is 1 (unless the button is logically mapped).

The button state field on emulated pointer events adjusts for pointer emulation as it would for regular button events. The button state is thus (usually) 0x0 for the emulated ButtonPress and 0x100 for the MotionNotify and ButtonRelease events.

Likewise, any request that returns the button state will have the appropriate state set, even if no emulated event actually got sent.

Grab handling works as for regular pointer events, though the interactions between touch grabs and emulated pointer grabs are somewhat complex. I'll get to that in a later post.

The confusing bit

There is one behaviour about the pointer emulation that may be confusing, even though the specs may seem logical and the behaviour is within the specs.

If you put one finger down, it will emulate pointer events. If you then put another finger down, the first finger will continue to emulate pointer events. If you now lift the first finger (keeping the second down) and put the first finger down again, that finger will not generate events. This is noticable mainly in bi-manual or multi-user interaction.

The reason this doesn't work is simple: to the X server, putting the first finger down just looks like another touchpoint appearing when there is already one present. The server does not know that this is the same finger again, it doesn't know that your intention was to emulate again with that finger. Most of the semantics for such interaction is in your head alone and hard to guess. Guessing it wrong can be quite bad, since that new touchpoint may have been part of a two-finger gesture with the second finger and whoops - instead of scrolling you just closed a window, pasted your password somewhere or killed a kitten. So we err on the side of caution, because, well, think of the kittens.

Multitouch in X - Getting events

This post is part of a series on multi-touch support in the X.Org X server. I recommend re-reading Thoughts on Linux multitouch from last year for some higher-level comments.
In this post, I'll outline how to identify touch devices and register for touch events.

This post assumes basic knowledge of the XI2 Xlib interfaces. Code examples should not be scrutinised for language-correctness.

New event types

XI 2.2 defines four new event types: XI_TouchBegin, XI_TouchUpdate, XI_TouchEnd are the standard events that most applications will be using. The fourth event, XI_TouchOwnership is mainly for handling specific situations where reaction speed is at a premium and gesture processing when grabs are active. I won't be covering those in this post.

Identifying touch devices

To use multitouch functionality from a client application, the client must announce support for the X Input Extension version 2.2 through the XIQueryVersion(3) request.
int major = 2, minor = 2;
XIQueryVersion(dpy, &major, &minor);
if (major * 1000 + minor < 2002)
    printf("Server does not support XI 2.2\n");
Once announced, an XIQueryDevice(3) call may return a new class type, the XITouchClass. If this class is present on a device, the device supports multitouch.The class struct itself is defined like this:
typedef struct
{
    int         type;
    int         sourceid;
    int         mode;
    int         num_touches;
} XITouchClassInfo;
The num_touches field specifies the number of simultaneous touches supported by the device. If the number is 0, we simply don't know (likely) or the device supports an unlimited number of touches (less likely). Regardless of the value expect that some devices lie, so it's best to treat this value as a guide only.

The mode field specifies the type of touch devices. We currently define two types and the server behaviour differs depending on the type:
  • XIDirectTouch for direct-input touch devices (e.g. your average touchscreen or tablet).  For this type of device, the touch events will be delivered to the windows at the of the touch point. Again, similar to what you would expect from a tablet interface - you press top left and the application top-left responds.
  • XIDependentTouch for a indirect input devices with multi-touch functionality. Touchpads are the prime example here. Touch events on such devices will be sent to the window underneath the cursor and clients are expected to interpret the touchpoints as (semantically) relative to the cursor position. For example, if your cursor is inside a Firefox window and you touch with two fingers on the top-left corner of the touchpad, Firefox will get those events. It can then decide on how to interpret those touchpoints.
A device that has a TouchClass may send touch events, but these events use the same axes as pointer events. Having said that, a touch device may still send pointer events as well - if the physical device generates both.
Your code to identify touch devices could roughly look like this:
XIDeviceInfo *info;
int nevices;

info = XIQueryDevice(display, XIAllDevices, &ndevices);

for (i = 0; i < ndevices; i++)
{
    XIDeviceInfo *dev = &info[i];
    printf("Device name %d\n", dev->name);
    for (j = 0; j < dev->num_classes; j++)
    {
        XIAnyClassInfo *class = dev->classes[j];
        XITouchClassInfo *t = (XITouchClassInfo*)class;

        if (class->type != XITouchClass)
            continue;

        printf("%s touch device, supporting %d touches.\n",
               (t->mode == XIDirectTouch) ?  "direct" : "dependent",
               t->num_touches);
    }
}

Selecting for touch events

Selecting for touch events on a window is mostly identical to pointer events. A client creates an event mask and submits it with XISelectEvents(3). One exception applies: a client must always select for all three touch events [1], XI_TouchBegin, XI_TouchUpdate, XI_TouchEnd. Selecting for one or two only will result in a BadValue error.

As for button events, only one client may select for touch events on any given window and the event delivery attempts traverse from the bottom-most window in the window tree up to the root window. Where a matching event selection is found, the event is delivered and the traversal stops.

Handling touch events

The three event types [1] are XIDeviceEvents like pointer and keyboard events. So from a client's point of view, in essence all we added was new event types.

The detail field of touch events specifies the touch ID, a unique ID for this particular touch for the lifetime of the touch sequence. Each touch sequence consists of a TouchBegin event, zero or more TouchUpdate events and one TouchEnd event. Since multiple touch sequences may be ongoing at any time, keeping track of the ID is important. The server guarantees that the touch ID is unique per device and that it will not be re-used [2]. Note that while touch IDs increase, they increase by an implementation-defined amount. Don't rely on the next touch ID to be the current ID + 1.

The button state in a touch event is the state of the physical buttons only. A TouchUpdate or TouchEnd event will thus usually have a zero button state. [3]

That's pretty much it, otherwise the handling of touch events is identical to pointer or keyboard events. Touch event handling should be straightforward and the significant deviations from the current protocol are in the grab handling, something I'll handle in a future post.

[1] I know, it's four. Good that you're paying attention.
[2] Technically ID collision may occur. For that to happen, you'd need to hold at least one touch down while triggering enough touches to exhaust a 32 bit ID range. And hope that after the wraparound you will get the same ID. There are better ways to spend your weekend.
[3] pointer emulation changes this, but I'll get to that some other time.

Thursday, December 15, 2011

Multitouch patches posted

After pulling way too many 12+ hour days, I've finally polished the patchset for native multitouch support in the X.Org server into a reasonable state. The full set of patches is now on the list. And I'm still expecting this to get merged for 1.12 (and thus in time for Fedora 17).

The code is available from the multitouch branches of the following repositories:
  git://people.freedesktop.org/~whot/xserver
  git://people.freedesktop.org/~whot/inputproto
  git://people.freedesktop.org/~whot/xf86-input-evdev
  git://people.freedesktop.org/~whot/libXi
Here's a screencast running Fedora 16 with the modified X server and a little multitouch event debugging application.


Below is a short summary of what multitouch in X actually means, but one thing is important: being the windowing system, X provides multitouch support. That does not mean that every X application now supports multitouch, it merely means that they can now use multitouch if they want to. That also includes gestures, they need application support.

A car analogy: X provides a new road, the applications still have to opt to drive on it.

Multitouch events

XI 2.2 adds three main event types: XI_TouchBegin, XI_TouchUpdate and XI_TouchEnd. These three make up a touch sequence. X clients must subscribe to all three events at once and will then receive the events as they come in from the device (more or less, grabs can interfere here). Each touch event has a unique touch ID so clients can track the touches over time.

We support two device types: XIDirectDevice includes tablets and touchscreens where the events are delivered to the position the touch occurs at. XIDependentDevice includes multitouch-capable touchpads. Such devices still control a normal pointer by default, but for multi-finger gestures are possible. For such devices, the touchpoints are delivered to the window underneath the pointer.

That is pretty much the gist of it. I'll post more information over time as the release gets closer, so stay tuned.

Pointer emulation

Multitouch can be a compelling interaction method but as said above, X only provides support for multitouch. It will take a while for applications to pick it up (Carlos Garnacho is working on GTK3) and some never will. Since we still need to interact with those applications, we provide backwards-compatible pointer emulation. Again, the details are in the protocol but the gist of it is that for the first touchpoint we emulate pointer events.

That's the really nasty bit, because you now have to sync up the grab event semantics of the core, XI 1.x and XI2 protocols and wrap it all around the new grab semantics. So that if you have a multitouch app running under a window manager without multitouch support everything still works as expected.
That framework is now in place too though I expect it to still have bugs, especially in the hairier corner cases.

But other than that, it should work just as intended. I can interact with my GNOME3 desktop quite well and I get multitouch events to my test applications.

[edit Dec 20: typo fix]

Tuesday, December 6, 2011

A short update on multitouch

For the last couple of weeks I've been pretty much working full-time on getting multitouch/XI 2.2 ready for the merge (well, I was on holidays for a bit too). So first of all - sorry if I've been ignoring bugs or emails, I'm working to a few deadlines here. Anyway, here's a bit of a status update.

Right now, it looks like touch event delivery is working, including nested grabs.Chase Douglas started on the pointer emulation while I was away and we're now at the point where emulation works, except that pointer grabs on top of multitouch clients aren't handled yet. I'm still rather optimistic to get this into 1.12, though it's getting a bit unwieldly. Carlos Garnacho has already sent me some patches, so he's testing the lot against the GTK branches.

However, since touch support cannot simply be bolted on top and needs to be integrated properly, this has triggered some extra rewrites here and there. I'm currently some 200 commits ahead of master sync-point. I'm planning to get this number down to something sane before merging but meanwhile, sorry, I'll have to keep ignoring you until this is done.

Friday, December 2, 2011

Improving code readability through temporary variables

We don't always have the luxury of using library interfaces that are sensibly designed and enforce readability self-explanatory (see this presentation). A fictional function may look like this:

extern void foo(struct foo *f,
Bool check_device,
int max_devices,
Bool emulate);

But a calls often end up like this:

foo(mystruct, TRUE, 0, FALSE);

Or, even worse, the function call could be:

foo(mystruct, !dev->invalid, 0, list_is_first_entry(dev->list));

The above is virtually unreadable and to understand it one needs to look at the function definition and the caller at the same time. The simple use of a temporary variable can do wonders here:

Bool check_device = !dev->invalid;
Bool emulate_pointer = list_is_first_entry(dev->list));

foo(mystruct, check_device, 0, emulate_pointer);

It adds a few lines of code (though I suspect the compiler will mostly reduce the output anyway) but it improves readability. Especially in the cases where the source field name is vastly different to the implied effect. In the second example, it's immediately obvious that pointer emulation should happen for the first entry.