2003
01 02 03 04 05 06 07 08 09 10 11 12
2006
01 02 03 04 05 06 07 08 09 10 11 12
2007
01 02 03 04 05 06 07 08 09 10 11 12
2008
01 02 03 04 05 06 07 08 09 10 11 12
2009
01 02 03 04 05 06 07 08 09 10 11 12
2010
01 02 03 04 05 06 07 08 09 10 11 12
2011
01 02 03 04 05 06 07 08 09 10 11 12
2017
01 02 03 04 05 06 07 08 09 10 11 12
2018
01 02 03 04 05 06 07 08 09 10 11 12
 
Jan
31
2008

Dealing with NSTextInput Protocol

Since MacHeist 2 was finished, over 40,000 mac users are happy with their new software and I’m one of them. However, I found some apps in the bundle, say Pixelmator, VectorDesigner and Cha-Ching, didn’t support multi-language very well. Why is that happening? Isn’t Mac OS X a unicode-savvy environment such that Cocoa apps are born with multi-language capability? The answer is yes, but only to some degree.

Cocoa apps can simply store or display unicode strings by using NSString, whose internal representation is UTF16. In fact, the apps mentioned above have no problem storing or displaying unicode string. They just didn’t work well with input methods. Now you may ask, “Well, what is input method? Why should I care about it? Shouldn’t that kind of thing be handled by Cocoa?” Mostly you don’t have to because NSTextField will do the right thing for you. But if you have special requirements like laying the input text along the path — that’s what VectorDesigner currently can’t do with Japanese or Chinese — then you must take care when implementing your NSTextField replacement. You have to make sure your view conforms to NSTextInput protocol.

In this article, I’ll first introduce some facts about how languages like Chinese or Japanese differ from English. Secondly, I’ll give you the idea of what an input method is and how it works on Mac OS X.

The significant difference between Eastern languages (Chinese/Japanese/Korean, a.k.a CJK) and Western languages (English/Germany/French/Spanish) is the cardinality of alphabet. English has only 26 letters, but CJK have a lot more than that. CJK share a huge base of characters called Hanzi. (漢字; Kanji in Japanese; Hanja in Korean) Hanzi was first used in China, then spread to Japan and Korea with minor modifications. For example:

Hanzi

Unlike Chinese, Japanese and Korean wasn’t fully consists of Hanzi. Japanese has about 50 basic letter, which called “kana.” One or more kana may form a meaningful element, kanji.

kana

There are about 5,000 frequently used hanzi in Chinese and 2,000 in Japanese. It make no sense to assign each CJK character to a key on keyboard. (Otherwise you’re likely to break the Guinness record of the world’s largest keyboard.) Hence, to input CJK character, we encode each CJK character to multiple keystrokes in logical ways. It’s just like how do you input English letters on a cell phone with only 10 buttons. The way to encode CJK characters to multiple keystrokes is called “input method.”

Input methods on Mac OS X maintain a buffer of keystrokes which will be mapped to one or more characters. The following is an example of Chinese input method.

a.png
a. I’m going to input a Hanzi. Let’s see how the input method work.


b.png
b. I press the “G” key and the input method convert it to a phonemic symbol. Notice the underline, which means the phonemic symbol is still in the temporal buffer and hasn’t been committed to TextEdit.app yet.


c.png
c. Now I press the “J” key. Another phonemic symbol comes.


d.png
d. Press space. Input method now converts the phonemic symbol into a Hanzi. Notice the underline is still here.


e.png
e. Usually we have many different Hanzi with the same pronouncing. Now I select the correct one in a candidate list.


f.png
f. This one is what I want.


g.png
g. Press return to commit the buffer to TextEdit.app. Now the underline is gone.

Now you get the idea of what an input method may look like. Next, let’s talk about how the input method communicate with Cocoa apps.

inputmethod.png

When an NSTextInput object receive a keyDown: message, it forward the raw event to input method by calling interpretKeyEvents:. Then input method will convert the raw keystrokes to string by its own logic. (ex. convert “G” to “ㄕ” in the above example, or optipn-e to an accent symbol.) If the string is ready to commit, it will call insertText: or doCommandBySelector: depending on a mechanism called key-bindings. In most cases, your key events will follow this path if you use English input method. That is, you fire a key down event by pressing “G” key, NSTextInput object invokes [NSResponder interpretKeyEvents:] and it will deliver the raw keystrokes to the current input method, the English input method convert it to string “G”, and finally insert the string back by insertText:.

What if the string isn’t ready to be committed? If so, the input method will put the string into a temporal buffer.

On Mac OS X, it’s the app’s responsibility rather than input method to draw the uncommitted string. The input method will inform your NSTextInput object by calling setMarkedText:selectedRange: whenever the temporal buffer was changed. The first argument “marked text” is actually the uncommitted string in temporal buffer; The second argument “selected range” denotes the caret and selection of the marked text.

1.png
marked text: 輸入大法好 selected range: (0, 0)

2.png
marked text: 輸入大法好 selected range: (1, 0)

3.png
marked text: 輸入大法好 selected range: (2, 2)

The following screenshot is my implementation of marked text drawing in Nally. Notice the selected range is independent to the caret or selection of normal text. You must maintain them separately. Don’t get confused.

nally.png

In most cases, the marked text is recommended to be drawn as if it was inserted into the position of current caret with the same style of surrounding normal text plus an extra underline. When your view have marked text, hide the normal caret and draw the marked text’s caret.

draw.png

Here are some tips that may help you support multi-language in your app:

  • Use insertText: instead of keyDown: unless what you want is not “text.” For example: game programming. You want the player to press keys A, S, D, W to move.
  • Implement the NSTextInput protocol if you want to support input method, which means your software could sell to more countries.
  • Read these documents first: Overview of Text Editing, Creating Custom (Text) Views.
  • Be sure to clear the marked text when receiving insertText:, which has the semantics of “commit marked text.”
  • Maybe you want to test your work. To turn on input method, go to System Preferences > International > Input Menu and select your input method.

    pref.png

    After that, change your input method from the menu located on the up-right corner. Make sure caps lock is off. If you select Traditional Chinese > Hanin, you can test with these keystrokes: G-J-Space-B-J-4 on US layout keyboard. It should result “輸入法.”

    menu.png

(Correct me if you find any errors in the above article, including the technical detail, grammar, punctuation, etc.)

 
 

Write Concisely