Wednesday, October 28, 2009

X11R7.5 released - but what is it?

Thanks to Alan Coopersmith's efforts, X11R7.5 was released a few days ago. Except - what does that mean?

This post is intends to shed some light onto the components of the X11R7.5 release and where the version number comes from.

X Window System


If you're running a desktop system other than Windows or OS X, you're most likely running some instance of the X Window System, also referred to as "X11" or simply "X". X consists of several components that all make up the "X Window System", yet some of them are more visible than others.


X Protocol

The core component of X is the X Protocol. This is what defines X, it is essentially the API.
The X Protocol consists of the core protocol, dating back to the 1980s and a number of protocol extensions, essentially additions to the core protocol. If you hear terms like X Input, XRandR, RENDER, etc., all of these are protocol extensions.


X Server and the drivers

The X Server is the process that talks to the hardware drivers and listens to requests from applications to draw things. It also handles input events and passes them on to the right application. Depending on your hardware, you have a number of drivers. These days many setups have evdev and synaptics for input, and intel, ATI or nvidia for graphics.

The X Server supports the core protocol and most protocol extensions, but different X servers may support different versions. Generally, the most recent X Server supports the latest version of the protocol.


Xlib and friends

Xlib (or libX11) is the library that allows applications to talk X Protocol to the server. It wraps the low-level protocol into a slightly higher-level API. These days, most applications that display a GUI use Xlib at some point - though Xlib is usually abstracted away by a saner toolkit such as GTK or Qt.

Xlib has been the single toolkit to talk X Protocol for ages, but in recent years XCB is gaining some traction (and in fact recent versions of Xlib use XCB at the lowest level).


X applications

A number of applications are traditionally part of the X Window System. One of the well known ones is xeyes, but other, crucial tools such as setxkbmap and xkbcomp are part of these applications as well.


Misc other stuff

There are a number of other packages that include fonts, misc utils, data packages etc. I'll skip the details, it's just important to know they're there.



X11R7.what?


Back a few years ago, all the above component were part of one repository. To build one of the components, you also had to build the others. To release one, you'd have to release the whole lot. Over time, the version numbers crept up to 6.9 for this so-called monolithic tree.

X11R6.9 (X11 Release 6.9) was the last monolithic release. Around 2005, the monolithic tree was split up into separate repositories for each component. This also reset the version numbers for most of the components - those that inherited the 6.9 version numbers (or even 7.0) were reset to 1.0.

Since then, the X11R7.x releases (referred to as "katamari") are quite like distributions. They cherry-pick a bunch of module versions known to work together and combine them into one set. The modules themselves move mostly independent of the katamaris and thus their version numbers may skip between katamaris. For example, X11R7.4 had the X Server 1.5, X11R7.5 has X Server 1.7.

This is where much confusion comes from. Many users don't know whether they're running 1.7, 7.5, 1.0 or 6.8. The intent of a katamari is simply to provide a set of modules that are sufficient to get a basic GUI running. That's why over time modules get added or removed from the katamari as well. A module that was part of X11R7.5 may not be part of X11R7.6 and of course the other way round (a full list of which versions are included is at the top of the X11R7.5 Changelog).

Which version actually matters?


Katamaris matter mostly for distributors. They represent a set of versions known working together and make for easy picking. A distribution is free to start out with a katamari and then update to newer modules as they are released. The katamari is merely a starting point, not more.

For this reason, it rarely matters to an individual user whether a module they're running is part of a katamari. For bug reporting, developers need to know the versions of the individual modules affected so they can narrow down which bug may be triggered.

To get the versions for the X Server and the drivers, look at the /var/log/Xorg.0.log. The first line states the version of the X server. Drivers are loaded dynamically, so you need to search for them in the log. For example, my log says:


X.Org X Server 1.7.0

[...]

(II) Module intel: vendor="X.Org Foundation"
compiled for 1.6.99.903, module version = 2.9.0
Module class: X.Org Video Driver
ABI class: X.Org Video Driver, version 6.0

[...]

(II) Module evdev: vendor="X.Org Foundation"
compiled for 1.7.0, module version = 2.3.0
Module class: X.Org XInput Driver
ABI class: X.Org XInput driver, version 7.0

[...]

(II) Loading /usr/lib/xorg/modules/input/synaptics_drv.so
(II) Module synaptics: vendor="X.Org Foundation"
compiled for 1.6.99.900, module version = 1.1.99
Module class: X.Org XInput Driver
ABI class: X.Org XInput driver, version 7.0


So right now I'm running X Server 1.7.0, with evdev 2.3.0, intel 2.9.0 and synaptics 1.1.99. Whether these versions are part of a katamari doesn't matter.

X apps usually have some -version switch. For libraries, it's best to use the distribution's packaging system (e.g. rpm -q libX11) to get the version number.

Tuesday, October 6, 2009

Bugzilla in Firefox

Something interesting I found last week: you can add the freedesktop.org bugzilla to the firefox search engines.

You may laugh now that I didn't know that already.

Anyway, since I know at least two more persons who didn't know that either there's bound to be others out there that don't know either:
  • Go to http://bugs.freedesktop.org
  • Click on the google logo in the searchbar
  • Select "Add FreeDesktop Bugzilla"
  • Go to Manage Search Engines
  • Enter a keyword (e.g. "fdo") for the new engine.
Voila, now you can just type "fdo 20500" in the address bar and it'll take you straight to the bug number. Alternatively, you can type in a searchword too.

Same works for a number of bugzilla instances. Useful.

Friday, October 2, 2009

XI2 and MPX released!

It finally happened! After nearly 4 years of development, MPX has been released as part of XI2 in the new X Server 1.7.

The whole thing started when I started my PhD in late 2004. The problem I found was that there was no support for collaboration on a single shared display. All the solutions at the time were hacks at the toolkit or application level. I found that the only way we can get truly collaborative interfaces is by adding it into the windowing system itself. So started hacking on X in late 2005. I went from scratching my head and wondering how some of the stuff could compile (I had never heard of K&R function declarations) to rewriting large parts of the input subsystem and even ended up as release manager. Not in a single day though.

Now we're done. MPX is out, and we have generic low-level support for multiple input devices. You know the whole one keyboard-one mouse paradigm we've had since Doug Engelbart invented the mouse? It's over, you don't have to restrict yourself anymore when writing an app.

Of course, this is a low-level change and when you wake up tomorrow, not a lot will have actually changed. We still need the toolkits to support it, we need apps to pick it up, we need the desktop environments to start thinking about what can be made useful. Nonetheless, basic collaboration features are already there and it can only get better from here.

Let's see what will happen.

Thursday, August 20, 2009

Re-designing input methods with XKB

I've had an interesting meeting with Jens Petersen yesterday about input methods. Jens is one of the i18n guys working for Red Hat.

Input methods are a way of merging several typed symbols into one actual symbols. Western languages rarely use them (the compose key isn't quite the same), but many eastern languages rely on them. To give one (made up) example, an IM setup allows you to type "qqq" and converts it into the chinese symbol for tree.

Unfortunately, IM implementations are somewhat broken and rely on a multitude of hacks. Right now, IM implementations often need to hook onto keycodes instead of keysyms. Keycodes are a numerical value that is usually the same for a key (except when it isn't). So "q" will always be the same keycode (except when it isn't). In X, a keycode has no meaning other than being an index into the keysym table.

Keysyms are the actual symbols that are to be displayed. So while the "q" key may have a keycode of 24, it will have the keysym for "q" in qwerty and the keysym for "a" in azerty.

And here's where everything goes wrong for IM. If you listen for keycodes, and you switch drivers, then keycode 24 isn't the same key anymore. If you listen for keysyms and you switch layout, keysym "q" isn't the same key anymore. Oops.

During a previous meeting and the one yesterday, we came up with a solution to fix them properly.

Let's take a step back and look at keyboard input. The user hits a physical key, usually because of what is printed on that key. That key generates a keycode, which represents a keysym. That keysym is usually the same symbol as what is printed on the keyboard. (Of course, there are exceptions to that with the prime example being dvorak layout on a qwerty physical keyboard)
In the end, IM should aim to provide the same functionality, with the added step of combining multiple symbols into one.

For IM implementations, we can differ between two approaches:
In the first approach, a set of keysyms should combine to a final symbol. For example, typing "tree" should result in a tree symbol. This case can be fixed easily by the IM implementation only ever dealing with keysyms. Where the key is located doesn't matter and it works equally well with us(qwerty) and fr(dvorak). As a mental bridge: if the symbols come in via morse code and you can convert to the correct final symbol, then your IM is in this category. This approach is easy to deal with, so we can close the case on it.

In the second approach, a set of key presses should combine to a final symbol. For example, typing the top left key four times should result in a tree symbol. In this case, we can't hook onto keysyms because they may change with the layout. But we can't hook onto keycodes either because they are essentially random.

Wait. What? Why does the keysym change with the layout?

Because we have the wrong layout selected. If you're trying to type Chinese, you shouldn't have a us layout. If you're trying to type Japanese, you shouldn't have a french layout. Because these keysyms don't represent what the key is supposed to do. The keysyms are supposed to represent what is printed on the keyboard, and those symbols are Chinese, Japanese, Indic, etc. So the solution is to fix up the keysyms. Instead of trying to listen for a "q", the keyboard layout should generate a "tree" keysym. The IM implementation can then listen for this symbol and combine to the final symbol as required.

This essentially means that for each keyboard with intermediate symbols there should be an appropriate keyboard layout - just as there is for western languages. And once these keysyms are available, the second approach becomes identical to the first approach and it doesn't matter anymore where the physical key is located.

The good thing about this approach are that users and developers can leverage existing tools for selecting and changing between different layouts. (bonus points for using the word "leverage") It also means that a more unified configuration between standard DE tools and IM tools is possible.

For the IM implementation, this simplifies things by a bit. First of all, it can listen to the XKB group state to adjust automatically whether IM is needed or not. For example, if us(qwerty) and traditional chinese are configured as layouts, the IM implementation can kick in whenever the group toggles to chinese. As long as it is on us(qwerty), it can slumber in the background.

Second, no layout-specific hacks are required. The physical location of the key, the driver, they all don't matter anymore. Even morse-code is supported now ;)

Talking to Jens, his main concern is that XKB limits to 4 groups at a time. This restriction is built into the protocol and won't disappear completely anytime soon. Though XI2 and XKB2 address this issue, it will take a while to get a meaningful adoption rate. Nonetheless, the approach above should make IM for the large majority of users more robust and predictable, without the issues coming up whenever hacks are involved.

I think this is the right approach, Jens agrees and Sergey Udaltsov, the xkeyboard-config maintainer too. So now we just need to get this implemented, but it will take a while to sort out all the details and move all languages over.

Saturday, August 8, 2009

The case for zsh

A few months back, in January or February I decided to switch to zsh as default shell and it has made my work a lot more effective. So I encourage you to try it, it has a number of features that are quite useful. Towards the bottom of this post is my own setup, feel free to use it.

Disclaimer: some or all of the features below are probably available in other shells. This is not a "$SHELL is so much better than $OTHERSHELL" posting, this is about how a particular setup has made my work more effective.

The main features I found useful, in no particular order:

  • history size of 5000 with duplicate removal means I type most commands now with Ctrl+R. Most of what I do is repetitive enough that if I have typed some weird command a few months back it will still be in the history.

  • merged histories. ever had 15 terminals open and then found out that the history of one is not available in the others, and on closing only the last one is added to the history? not a problem anymore.

  • commandline completion - just beautiful. includes host completion for ssh commands, man page completion, rpm and CVS module completion, git command/tag/branch completion, etc.

  • completion exclusion: if you type rm foo.c it won't suggest foo.c again since it's already in the list.

  • app-specific completion. You can simply add filetypes to complete for your program (e.g. only pdfs for the pdf reader, etc.)

  • vim/emacs key bindings. whatever you fancy. It's nice to use the vim commands for delete word, replace word, etc. Especially for multi-line commands.

  • git branch display - one of the scripts makes my prompt display the git branch if i'm in a git directory. since I frequently work with 5+ branches, that's really handy. So for example, my prompt looks like this:

    :: whot@dingo:~/xorg/xserver (xi2-protocol-tests)>

    indicating that the xserver repo is on branch xi2-protocol-tests. It also displays whether I have commits queued up or local changes, so I don't forget to commit something before pushing. Type disable-git-prompt to disable this again if your repo is _really_ big (e.g. the kernel), otherwise it takes forever to get the prompt to display.

  • "GUI" selection for tab-completion. hit Tab and below the line you get a list of all files and you can go through with them using Tab. Like this:





So anyway, have a look at my zsh files and use them as you will. Save them as $HOME/.zshrc and $HOME/.zsh/ to get started.

Thursday, July 30, 2009

XI2 ready for the final mile

As announced on the xorg-devel list, I think XI2 is ready for branching to 1.7.

This means that I consider the protocol stable enough and I will focus only on bugfixing. In a few days time I'll cut a 901 release (release candidate 1) for inputproto and libXi.

Any testing is appreciated, feel free to file bugs for anything that's broken.

Thursday, July 23, 2009

XI2 Recipes, Part 6

This post is part of a mini-series of various recipes on how to deal with the new functionality in XI2. The examples here are merely snippets, full example programs to summarize each part are available here.

In the first five parts, I covered how to get and manipulate the device hierarchy, how to select for events, how to get extended device information, the common event types and active and passive grabs. In this part, I will focus on the client pointer.

The ClientPointer principle


The ClientPointer (CP) principle is only partly interesting for normal applications since no XI2 client should ever need it. The exception is the window manager. About one quarter of XI2 protocol requests and replies are ambiguous in the presence of multiple master devices. The best example is XQueryPointer(3). If there are two or more master pointers, XQueryPointer has a <50% chance of returning the right data.

The ClientPointer is a dedicated master pointer assigned to each application, either implicitly or explicitly. This pointer is then used for any ambiguous requests the application may send. This adds predictability as the data returned is always from the same device. Given the above example, XQueryPointer requests from one client will always return the same pointer's coordinates. Thinking of xeyes this means that the eyes will follow the same cursor. For any requests or replies that require keyboard data, the master keyboard paired with the CP is used.

The CP is implicitly assigned whenever an application sends an ambiguous request. Then the server picks the first master pointer and assigns it to the client. This of course happens only when the client doesn't have an assigned CP yet.

Alternatively, the CP can be explicitly assigned. The XISetClientPointer(3) and XIGetClientPointer(3) calls are to set and query the current ClientPointer for a client.


Status XISetClientPointer(
Display* dpy,
Window win,
int deviceid
);

Bool XIGetClientPointer(
Display* dpy,
Window win,
int* deviceid
);


Both calls take a window and a deviceid. If the window is a valid window, the client owning this window will have the CP set to the given device. The window parameter may also be just a pure client ID. Finally, the window parameter may be None, in which case the requesting client's CP is set to the given device. This is not useful beyond debugging, if the client understands enough XI2 to set the CP it should be able to handle multiple devices properly.

Getting the CP takes the same parameters but it returns the deviceid and it returns True if the CP has been set for the target client, regardless of whether it was set implicitly or explicitly. If no CP is set yet, XIGetClientPointer returns False.

Event delivery, XI2 and and grabs


The CP setting does not affect event delivery in any way. Regardless of which master pointer is the ClientPointer, any device can still interact with the client. This also means that the CP has no effect whatsoever on XI2 or XI1 requests since they are not ambiguous.

Grabs are a different matter. Since the activation of a grab is ambiguous in the core protocol (XGrabPointer - well, which pointer?) a grab will by default activate on the CP. This can be a bit iffy since an application that just grabs the pointer may not grab the one currently within the window boundaries. So the grabbing code has two exceptions. One, if a device is already grabbed by the client, a grab request will act on the already-grabbed device instead of the CP. Two, if a passive grab activates it will activate on the device triggering the grab, not on the CP.

In practice, this means that if a client has a passive grab on a button and any device presses this button, the passive grab activates on this device. If the client then requests and active grab (which toolkits such as GTK do), the active grab is set on the already-grabbed device.
The result: in most cases the grab happens on the "correct" device for the current situation.

How to use the ClientPointer


As said above, the only application that should really need to know about the CP is the window manager who manages core applications as well as XI apps. The most straightforward manner to managing an application is to set the CP whenever a pointer clicks into a client window. This ensures that if the applications requests some ambiguous data, a pointer that is interacting with the application is used.

I have used this method in a custom window manager written for a user study several moons ago and it works well enough. Of course, you are free to contemplate situations where such a simple approach is not sufficient.