Thursday, August 20, 2009

Re-designing input methods with XKB

I've had an interesting meeting with Jens Petersen yesterday about input methods. Jens is one of the i18n guys working for Red Hat.

Input methods are a way of merging several typed symbols into one actual symbols. Western languages rarely use them (the compose key isn't quite the same), but many eastern languages rely on them. To give one (made up) example, an IM setup allows you to type "qqq" and converts it into the chinese symbol for tree.

Unfortunately, IM implementations are somewhat broken and rely on a multitude of hacks. Right now, IM implementations often need to hook onto keycodes instead of keysyms. Keycodes are a numerical value that is usually the same for a key (except when it isn't). So "q" will always be the same keycode (except when it isn't). In X, a keycode has no meaning other than being an index into the keysym table.

Keysyms are the actual symbols that are to be displayed. So while the "q" key may have a keycode of 24, it will have the keysym for "q" in qwerty and the keysym for "a" in azerty.

And here's where everything goes wrong for IM. If you listen for keycodes, and you switch drivers, then keycode 24 isn't the same key anymore. If you listen for keysyms and you switch layout, keysym "q" isn't the same key anymore. Oops.

During a previous meeting and the one yesterday, we came up with a solution to fix them properly.

Let's take a step back and look at keyboard input. The user hits a physical key, usually because of what is printed on that key. That key generates a keycode, which represents a keysym. That keysym is usually the same symbol as what is printed on the keyboard. (Of course, there are exceptions to that with the prime example being dvorak layout on a qwerty physical keyboard)
In the end, IM should aim to provide the same functionality, with the added step of combining multiple symbols into one.

For IM implementations, we can differ between two approaches:
In the first approach, a set of keysyms should combine to a final symbol. For example, typing "tree" should result in a tree symbol. This case can be fixed easily by the IM implementation only ever dealing with keysyms. Where the key is located doesn't matter and it works equally well with us(qwerty) and fr(dvorak). As a mental bridge: if the symbols come in via morse code and you can convert to the correct final symbol, then your IM is in this category. This approach is easy to deal with, so we can close the case on it.

In the second approach, a set of key presses should combine to a final symbol. For example, typing the top left key four times should result in a tree symbol. In this case, we can't hook onto keysyms because they may change with the layout. But we can't hook onto keycodes either because they are essentially random.

Wait. What? Why does the keysym change with the layout?

Because we have the wrong layout selected. If you're trying to type Chinese, you shouldn't have a us layout. If you're trying to type Japanese, you shouldn't have a french layout. Because these keysyms don't represent what the key is supposed to do. The keysyms are supposed to represent what is printed on the keyboard, and those symbols are Chinese, Japanese, Indic, etc. So the solution is to fix up the keysyms. Instead of trying to listen for a "q", the keyboard layout should generate a "tree" keysym. The IM implementation can then listen for this symbol and combine to the final symbol as required.

This essentially means that for each keyboard with intermediate symbols there should be an appropriate keyboard layout - just as there is for western languages. And once these keysyms are available, the second approach becomes identical to the first approach and it doesn't matter anymore where the physical key is located.

The good thing about this approach are that users and developers can leverage existing tools for selecting and changing between different layouts. (bonus points for using the word "leverage") It also means that a more unified configuration between standard DE tools and IM tools is possible.

For the IM implementation, this simplifies things by a bit. First of all, it can listen to the XKB group state to adjust automatically whether IM is needed or not. For example, if us(qwerty) and traditional chinese are configured as layouts, the IM implementation can kick in whenever the group toggles to chinese. As long as it is on us(qwerty), it can slumber in the background.

Second, no layout-specific hacks are required. The physical location of the key, the driver, they all don't matter anymore. Even morse-code is supported now ;)

Talking to Jens, his main concern is that XKB limits to 4 groups at a time. This restriction is built into the protocol and won't disappear completely anytime soon. Though XI2 and XKB2 address this issue, it will take a while to get a meaningful adoption rate. Nonetheless, the approach above should make IM for the large majority of users more robust and predictable, without the issues coming up whenever hacks are involved.

I think this is the right approach, Jens agrees and Sergey Udaltsov, the xkeyboard-config maintainer too. So now we just need to get this implemented, but it will take a while to sort out all the details and move all languages over.

13 comments:

glandium said...

Actually, japanese is a kind of special case. There are actually 2 ways of inputting japanese, each of which would imho fit in each scheme.
There are japanese hiragana written on keyboards, and they can be inputted directly, then IM would convert them to kanji if deemed necessary by the writer.
The second way involves combining latin characters to first form the hiragana, then possibly kanjis. There is no reason this method should require a japanese layout. There is no reason for a french keyboard owner to type 'hirqgqnq' to write 'ひらがな' while an us keyboard owner can type 'hiragana'.

khc said...

I think it depends on what input method you are using as well. Some input methods (those that are pinyin based, for example) uses alphabets and of course you want that to follow keyboard layout changes. Others (Bopomofo, Cangjie) that assign a symbol base on keyboard location probably want those symbols to stay the same regardless of keyboard layout.

Peter Hutterer said...

@khc:
anything that uses alphabets is already covered by keysym tables anyway. and in the end, these keysym tables are simply mappings from the phys. key location to the symbol. So for the second type, we just need to get the symbol tables as well.

@glandium:
it's the first type you describe that needs a keyboard layout. and this layout simply specifies where each hiragana is located. the IM can then combine them as they are typed as required.
the second one is covered since for latin characters we have location-independent keymaps for all sorts of drivers and langugages.


note that not being able to speak Japanese myself, I might miss some details there.

Unknown said...

User interface unification is very right. Switching between Japanese and English shouldn't be completely different from switching between Russian and English. But on a technological level it seems a lot more dubious to try and bind input methods to keyboard layouts. It's not uncommon to have input methods that work by having the user type the text phonetically on a Latin keyboard. That has to work whatever Latin keyboard is being used - qwerty, azerty, whever the punctuation is placed, etc. I don't see how you can make a hard association between keyboard layout and input method... the keyboard layout has its role to play, which is to translate the user's keypresses into what is printed on the key (the keysym), and can't be repurposed to handle knowing what method should be used of going from that keysym to the input text.

The same complaint as the above applies to non-phonetic input methods - in my somewhat limited understanding, Chinese users are often not using a keyboard specifically intended for the input method that they are using. We can't assume that there is a single "Wubi" keyboard layout - (http://en.wikipedia.org/wiki/Wubi_method - picked at random from Wikipedia's list of Chinese input method) Now, the expectation of typing on an azerty keyboard is slightly differenty - the user actually is thinking of a physical organization of keys - so it's *more* like a different layout - but the rest of the keyboard still needs to follow whatever the keyboard model is that the user is using. (I doubt azerty keyboards have much usage in China anyways)

It's a complex area, no doubt, and there are some interesting interactions. Hopefully a division of labor can be worked out where the input method doesn't have to worry about the details of the layout - where if the input method wants a q it can listen for that, and if it wants the second key on the second row it can listen for that, but I don't think "If you're trying to type Chinese, you shouldn't have a us layout" really captures the situation.

Peter Hutterer said...

@Owen:
Thanks. I realise now that my original blog wasn't precise enough.

The division of IM and layouts should be that layouts specify what symbol is produced on a specific key. IM then takes these symbols and combines them according to the current setting - whatever that may be.

So if IM wants latin characters for phonetic translation, then the layout should be a latin layout.

If IM needs other characters as baseline, these should be represented by the layout too.
From the wikipedia wubi page: "The A key's shortcut character is 工.". I think that this should be rephrased that "The left-most key in the second row is 工". This is exactly what the layout should represent, in the same way as the us layout specifies "the left-most key in the second row is 'a'".

The IM can then combine these symbols as they should be combined. The key though is that - when using Wubi - the IM doesn't listen for an "a", it listens for "工".

At no point has the layout any control over what IM is used and when this IM activates. What can be implemented this way, is a dual-layout of "us,wubi" and depending if 'us' is the currenlty active one. IM either disables itself or automatically switches to phonetic translation. If 'wubi' is the active one, IM translates from the wubi characters.

Trying to summarise it again: if you need to explain how to type the word "FOO", you would say "Hit F, hit O, hit O". This is independent of the layout.

If you explain how to type '勹', you
you shouldn't need to say "hit third key from left, fourth key from right, top key from bottom". Instead, you should be able to say "Hit 金, hit 钅, hit 用" (I realize this isn't how you type '勹'). This again is independent of the layout.

Unknown said...

Since you say "anything that uses alphabets is already covered by keysym tables anyway", I assume that you aren't thinking that keyboard layouts are 1-1 to input, methods but are somehow managed behind the scenes? That if the user picks an input method that *does* require a different layout, then we go off and add that layout to the user's keyboard?

(It's clearly not OK for the user to have to select things in two different dialogs and have to have them correspond for things to work correctly.)

I guess the interesting question here is whether the interesting non-phonetic layouts corresponds to physical hardware being sold somewhere. Whethere there's a finite process of standardization of what keysymbols are used and adding the layouts to xkb-config.

Or are there going to be a continual stream of new input methods using the keyboard in new different ways?

[ It's interesting to note that unlike normal XKB usage, the keysyms don't really correspond to what's printed on the key. A key on a physical Wubi keyboard will have multiple radicals printed on it, and which one the user meant has to be disambiguated by the input method. It can't be done at the XKB level. ]

Peter Hutterer said...

@Owen:
The UI is tricky to get right but it should - as you say - not be separate.
So yes, if a user selects a new IM, then this new layout should be reflected in the normal configuration tools as well and vice versa.

If I look at the xkb configuration files, there are a many that represent one specific piece of hardware. This is simplified by the Linux kernel these days but they still exist.
Right now, an IM implementation needs to know about all possible variation of the hardware. There is a high chance that at least some of these variations these can be moved into keyboard layouts - similar to those already present.

If I understood the Wubi layout correctly, there is one main component for each key. This could be the one stored in the layout. The IM method then receives this main component and combines it to the actual symbol based on previous or future symbols.

Much in the same manner that the base component '3' is combined to a '♥' if the compose key and the < sign were pressed beforehand.
Wubi is more complex, but the principle looks the same to me.


The layout alone would be of no use for those needing proper characters, it is only in combination with the IM that the right symbol results.
The benefit from it is though that - if needed - a special layout can shuffle the physical location of these keys around without changes in the IM. The IM would still receive the same base characters and convert them accordingly.

Unknown said...

After reading Kristian LPC presentation draft (http://www.linuxplumbersconf.org/ocw/proposals/57), I wonder how the work you're doing on input (both keyboard and pointer) could be reusable in Wayland ?

Jens Petersen said...

I hear real physical Wubi keyboards don't exist so in that sense something like Zuiyin (for Chewing) is probably a better example, but I agree that once ibus-table (and ibus-m17n) supports non-ascii input it should be possible to handle Wubi say cleanly with an xkb layout. Of course that still leaves the question what exactly to do in Linux consoles (ibus-fbterm, etc).

Unknown said...

Hi,

Sorry if this is a little off topic, but this interesting post reminds me of wildest dreams, and I wonder if this will be possible with XInput2.

I possess and always dreamed to be able to use at the same time the following devices on my computer:
- 1 wireless mouse (9 buttons, 1 wheel)
- a wacom tablet
- 2 keyboards (1 wireless with a uk layout, 1 wired with a french layout)

Most of the time I use an english keyboard since Unix being very qwerty friendly, that is what is the most efficient for me.

But sometimes I have to type long french documents with a lot of accents and in that case it's more efficient to use a french keyboard:

Since I'm using a qwerty layout most of the time, I tend to forget where the less used characters are. So it's more practical to switch to a different physical keyboard than changing the logical layout of the uk keyboard, as I have the keys right in front of my eyes.

Of course I guess in a perfect world I should be able to type with several layouts without having a look at the keys, but I'm just human :)

As a side note, I am not sure at what level the keyboard layout is supposed to be managed: i-e is it a property of the master device or of the slave device?

To make things worse, I'm currently making a FTIR/wiimote device and I'd really love to make use of up to 4 fingers to control my htpc...

So, will some of that be possible with XI2? What will be missing? What depend on other layers?

Cheers,
Gildas

definite said...

Actually, ibus-table have a compose table to deal with compose key.

BTW, I've post some opinions on http://dingyichen.livejournal.com/15676.html.

Mind checking it out?

Jens Petersen said...

> physical Wubi keyboards don't exist

Actually Ding-Yi Chen just showed me a photo of a Wubi keyboard and it seems Wubi stickers are also available, so I stand corrected - though they are pretty uncommon I think.

definite said...

I think "inputting English on mobile phone number pad" is a good analog of what we, CJK input method developers encountered.

Suppose you want to enter 'define' with T9 input method. This can be done by:
1. Switch to T9
2. Type 333463
3. You may need to use cursor keys to select correct candidate, i.e. 'define'.

You mentioned that only number will be in the layout definition. But where shall we put these English alphabets?

I did have some idea in my blog. But it is far from perfect, so I am looking forward to hearing your opinion about how should we put English alphabet over a mobile phone keypad.