Who-T: August 2009

I've had an interesting meeting with Jens Petersen yesterday about input methods. Jens is one of the i18n guys working for Red Hat.

Input methods are a way of merging several typed symbols into one actual symbols. Western languages rarely use them (the compose key isn't quite the same), but many eastern languages rely on them. To give one (made up) example, an IM setup allows you to type "qqq" and converts it into the chinese symbol for tree.

Unfortunately, IM implementations are somewhat broken and rely on a multitude of hacks. Right now, IM implementations often need to hook onto keycodes instead of keysyms. Keycodes are a numerical value that is usually the same for a key (except when it isn't). So "q" will always be the same keycode (except when it isn't). In X, a keycode has no meaning other than being an index into the keysym table.

Keysyms are the actual symbols that are to be displayed. So while the "q" key may have a keycode of 24, it will have the keysym for "q" in qwerty and the keysym for "a" in azerty.

And here's where everything goes wrong for IM. If you listen for keycodes, and you switch drivers, then keycode 24 isn't the same key anymore. If you listen for keysyms and you switch layout, keysym "q" isn't the same key anymore. Oops.

During a previous meeting and the one yesterday, we came up with a solution to fix them properly.

Let's take a step back and look at keyboard input. The user hits a physical key, usually because of what is printed on that key. That key generates a keycode, which represents a keysym. That keysym is usually the same symbol as what is printed on the keyboard. (Of course, there are exceptions to that with the prime example being dvorak layout on a qwerty physical keyboard)
In the end, IM should aim to provide the same functionality, with the added step of combining multiple symbols into one.

For IM implementations, we can differ between two approaches:
In the first approach, a set of keysyms should combine to a final symbol. For example, typing "tree" should result in a tree symbol. This case can be fixed easily by the IM implementation only ever dealing with keysyms. Where the key is located doesn't matter and it works equally well with us(qwerty) and fr(dvorak). As a mental bridge: if the symbols come in via morse code and you can convert to the correct final symbol, then your IM is in this category. This approach is easy to deal with, so we can close the case on it.

In the second approach, a set of key presses should combine to a final symbol. For example, typing the top left key four times should result in a tree symbol. In this case, we can't hook onto keysyms because they may change with the layout. But we can't hook onto keycodes either because they are essentially random.

Wait. What? Why does the keysym change with the layout?

Because we have the wrong layout selected. If you're trying to type Chinese, you shouldn't have a us layout. If you're trying to type Japanese, you shouldn't have a french layout. Because these keysyms don't represent what the key is supposed to do. The keysyms are supposed to represent what is printed on the keyboard, and those symbols are Chinese, Japanese, Indic, etc. So the solution is to fix up the keysyms. Instead of trying to listen for a "q", the keyboard layout should generate a "tree" keysym. The IM implementation can then listen for this symbol and combine to the final symbol as required.

This essentially means that for each keyboard with intermediate symbols there should be an appropriate keyboard layout - just as there is for western languages. And once these keysyms are available, the second approach becomes identical to the first approach and it doesn't matter anymore where the physical key is located.

The good thing about this approach are that users and developers can leverage existing tools for selecting and changing between different layouts. (bonus points for using the word "leverage") It also means that a more unified configuration between standard DE tools and IM tools is possible.

For the IM implementation, this simplifies things by a bit. First of all, it can listen to the XKB group state to adjust automatically whether IM is needed or not. For example, if us(qwerty) and traditional chinese are configured as layouts, the IM implementation can kick in whenever the group toggles to chinese. As long as it is on us(qwerty), it can slumber in the background.

Second, no layout-specific hacks are required. The physical location of the key, the driver, they all don't matter anymore. Even morse-code is supported now ;)

Talking to Jens, his main concern is that XKB limits to 4 groups at a time. This restriction is built into the protocol and won't disappear completely anytime soon. Though XI2 and XKB2 address this issue, it will take a while to get a meaningful adoption rate. Nonetheless, the approach above should make IM for the large majority of users more robust and predictable, without the issues coming up whenever hacks are involved.

I think this is the right approach, Jens agrees and Sergey Udaltsov, the xkeyboard-config maintainer too. So now we just need to get this implemented, but it will take a while to sort out all the details and move all languages over.

A few months back, in January or February I decided to switch to zsh as default shell and it has made my work a lot more effective. So I encourage you to try it, it has a number of features that are quite useful. Towards the bottom of this post is my own setup, feel free to use it.

Disclaimer: some or all of the features below are probably available in other shells. This is not a "$SHELL is so much better than $OTHERSHELL" posting, this is about how a particular setup has made my work more effective.

The main features I found useful, in no particular order:

history size of 5000 with duplicate removal means I type most commands now with Ctrl+R. Most of what I do is repetitive enough that if I have typed some weird command a few months back it will still be in the history.

merged histories. ever had 15 terminals open and then found out that the history of one is not available in the others, and on closing only the last one is added to the history? not a problem anymore.

commandline completion - just beautiful. includes host completion for ssh commands, man page completion, rpm and CVS module completion, git command/tag/branch completion, etc.

completion exclusion: if you type rm foo.c it won't suggest foo.c again since it's already in the list.

app-specific completion. You can simply add filetypes to complete for your program (e.g. only pdfs for the pdf reader, etc.)

vim/emacs key bindings. whatever you fancy. It's nice to use the vim commands for delete word, replace word, etc. Especially for multi-line commands.

git branch display - one of the scripts makes my prompt display the git branch if i'm in a git directory. since I frequently work with 5+ branches, that's really handy. So for example, my prompt looks like this:
```
  :: whot@dingo:~/xorg/xserver (xi2-protocol-tests)>
```
indicating that the xserver repo is on branch xi2-protocol-tests. It also displays whether I have commits queued up or local changes, so I don't forget to commit something before pushing. Type disable-git-prompt to disable this again if your repo is _really_ big (e.g. the kernel), otherwise it takes forever to get the prompt to display.

"GUI" selection for tab-completion. hit Tab and below the line you get a list of all files and you can go through with them using Tab. Like this:

So anyway, have a look at my zsh files and use them as you will. Save them as $HOME/.zshrc and $HOME/.zsh/ to get started.

Who-T

Thursday, August 20, 2009

Re-designing input methods with XKB

Saturday, August 8, 2009

The case for zsh