The “autocomplete” function was born in Chinese computing


This is an excerpt from The Chinese Computer: A Global History of the Information Age by Thomas S. Mullaney, published May 28 by The MIT Press. It has been lightly edited.

ymiw2

klt4

pwyy1

wdy6

o1

dfb2

wdv2

fypw3

uet5

dm2

dlu1…

A young Chinese man sat at his QWERTY keyboard and entered an enigmatic string of letters and numbers.

Was it a code? A child’s game? A confusion? He was Chinese.

At least, the beginning of Chinese. These 44 beats marked the first steps of a process known as “input” or shuru: the act of making Chinese characters appear on the monitor of a computer or other digital device using a QWERTY keyboard or touchpad.

Frames taken from a screencast of the 2013 Chinese input contest. COURTESY OF MIT PRESS

In all computer and digital media, Chinese text input depends on programming programs. software known as “input method editors”, better known as “IME” or simply “input methods” (shurufa). IMEs are a form of “middleware“, so called because they operate between the hardware of the user’s device and the software of your program or application. Whether it’s composing a Chinese document in Microsoft Word, searching the internet, sending text messages, or anything else, an IME is always at work, intercepting all of the user’s keystrokes and trying to figure out what Chinese characters they want to produce. Entry, simply put, is how ymiw2klt4pwyy …is converted to a string of Chinese characters.

IMEs are restless creatures. From the moment you press a key or slide a stroke, initiate a dynamic and iterative process, collecting the data entered by the user and searching the computer’s memory for possible matches with Chinese characters. The most popular IMEs today are based on Chinese phonetics, that is, they use the letters of the Latin alphabet to describe the sound of Chinese characters, while operators in mainland China use the country’s official romanization system, Hanyu pinyin. .


Input Method Editor pop-up menu example in Chinese (抄袭 / “plagiarism”). COURTESY OF MIT PRESS

This young man was Huang Zhenyu (also known by his pseudonym, Yu Shi). He was one of about 60 contestants that day, each with a bright red sash on his shoulder, like an old-time ticker-tape parade or beauty pageant. A sign at the front of the room read “Love Chinese Characters” in golden yellow.Ai Hanzi). The contestants’ task was to transcribe a speech by outgoing Chinese President Hu Jintao as quickly and accurately as possible. “Let us hold high the Great Flag of Socialism with Chinese Characteristics,” it began, or in the original: 高举中国特色社会主义伟大旗帜为夺取全面建设小康社会新胜利而奋斗. However, Huang’s QWERTY keyboard did not allow him to enter these characters directly, so instead he entered the string of letters and numbers in almost gibberish: ymiw2klt4pwyy1wdy6

With these four dozen keystrokes, Huang was well on his way to not only winning the 2013 National Chinese Character Typing Competition, but also reaching one of the fastest typing speeds ever recordedanywhere in the world.

ymiw2klt4pwyy1wdy6 …not the same as 高举中国特色社会主义… The keys Huang actually pressed on his QWERTY keyboard—his “primary transcription,” as we might call it—were completely different from the symbols that finally appeared on the screen. his computer, that is, the “secondary transcript” of Hu Jintao’s speech. This is true for each and every one of the world’s more than one billion Chinese-speaking computer users. In Chinese computing, what you type is never what you get.

For readers accustomed to word processing and computing in English, this should come as a surprise. For example, if I compared the paragraph you’re reading right now to a keylogger that showed exactly which buttons I pressed to type it, the exercise would be unilluminating (to put it mildly). “For -_- the -_- accustomed -_- readers…”, he would say (forgiving any typographical or editing errors). In typing in the English language and in computer introduction, a typist’s primary and secondary transcriptions are, in principle, identical. The symbols on the keys and those on the screen are the same.

The same is not true of Chinese computing. When entering Chinese, the symbols that a person sees on a QWERTY keyboard are always different from the symbols that ultimately appear on the monitor or on paper. Each and every user of computers and new media in the Sinophone world—regardless of whether they are extremely fast or extremely slow—uses their devices in exactly the same way as Huang Zhenyu, constantly immersed in this iterative process of criteria-candidacy-confirmation, using one IME or another. Not some Chinese speaking users, but all. This is the first and most basic characteristic of Chinese computing: Chinese human-computer interaction (HCI). requires users to operate completely in code at all times.

If Huang Zhenyu’s mastery of a complex alphanumeric code were not impressive enough, consider the astonishing speed of his performance. He transcribed the first 31 Chinese characters of Hu Jintao’s speech in about five seconds, with an extrapolated speed of 372 Chinese characters per minute. At the end of the grueling 20-minute competition, which spanned thousands of characters, he crossed the finish line at an almost incredible speed of 221.9 characters per minute.

That is, 3.7 Chinese characters per second.

In the context of English, Huang’s first five seconds would have been equivalent to about 375 English words per minute, and his total speed in the competition easily exceeded 200 words per minute, a breakneck pace that has not been matched by anyone in the English-speaking world (at least using QWERTY). In 1985, Barbara Blackburn achieved a record verified by the Guinness Book of Records of 170 English words per minute (on a typewriter no less). Later, speed demon Sean Wrona surpassed Blackburn’s mark with a performance of 174 words per minute (on a computer keyboard, it should be noted). As impressive as these milestones are, the truth is that if Huang’s performance had taken place in the English-speaking world, his name would be enshrined in the Guinness Book of Records as the new brand to beat.

Huang’s speed also had special historical significance.

For a person who lived between the years 1850 and 1950—the period analyzed in the book The typewriter China—, the idea of ​​producing Chinese by mechanical means at a speed of more than 200 characters per minute would have been virtually unimaginable. Throughout the history of Chinese telegraphy, dating back to the 1870s, operators maxed out at a few dozen characters per minute. In the heyday of Chinese typing, from the 1920s to the 1970s, the fastest recorded speeds were just 80 characters per minute (most typists worked at much slower speeds). With regard to modern information technologies, i.e. Chinese was consistently one of the slowest writing systems in the world.

What changed? How is it that writing that was so long dismissed as cumbersome and impotently complex suddenly rivaled—even surpassed—computer typing speeds recorded in other parts of the world? Even if we accept that Chinese computer users are somehow capable of “real-time” coding, shouldn’t Chinese IMEs result in a lower overall “ceiling” for Chinese text processing compared to English? After all, Chinese users have to jump through many more hoops throughout a cumbersome multi-step process: the IME has to intercept the user’s keystrokes, search the memory for a match, present possible candidates and wait for the user’s confirmation. Meanwhile, English-speaking computer users simply press the key they want to see printed on the screen. What could be simpler than the “immediacy” of “Q equals Q”, “W equals W”, etc.?

Tom Mullaney
COURTESY OF TOM MULLANEY

To unravel this apparent paradox, we will examine the first Chinese computer ever designed: the Sinotype, also known as the Ideographic Composition Machine. Presented in 1959 by MIT professor Samuel Hawks Caldwell and the Graphic Arts Research Foundation, this machine had a QWERTY keyboard, which the operator used to enter—not the phonetic values ​​of the Chinese characters—but the strokes of which they are composed. Chinese characters. Sinotype’s goal was not to “build” Chinese characters on the page, like a user builds English words by successively adding letters. Instead, Each “spelled” stroke served as the electronic address that the Sinotype logic circuit used to retrieve a Chinese character from memory.. In other words, the first-ever Chinese computer relied on the same kind of “extra steps” seen in Huang Zhenyu’s 2013 award-winning performance.

During Caldwell’s research, he discovered unexpected benefits from all these additional steps, benefits completely unknown in the context of Anglophone human-machine interaction at the time. The Sinotype, he discovered, I needed a lot less keystrokes to find a Chinese character in memory that stops to compose one through conventional means of registration. By analogy, “spelling” a nine-letter word like “crocodile” took much longer than retrieving that same word from memory (“cocod” would be enough for a computer to make an unambiguous match, after all, given the absence of other words with similar or identical spellings). Caldwell called the discovery of it “minimal spelling”, making it an essential part of the first Chinese computer to be built.

Today we know this technique by another name: “autocomplete”, a human-computer interaction strategy in which additional layers of mediation result in a faster textual introduction than the “unmediated” act of typing. Decades before its rediscovery in the English-speaking world, autocompletion was invented in Chinese computing.

 
For Latest Updates Follow us on Google News
 

-