Wednesday, July 30, 2003

Telirati Newsletter #4

In the following, the fouth of my Telirati newsletters, I rake speech technology and IP telephony people over the coals for their lack of visualization of how these technologies would actually be used by customers. That was in 1997. The situation today is scarcely better: Speech recognition is, as predicted, better than ever. It is, however, just as distant from being a viable tool for interacting with a computer, or even with limited computing devices like handhelds or smartphones. The reason it has not progressed is as it ever was: lack of a gestalt for a voice user interface that is comparable to the desktop metaphor.

Even IP telephony is nearly as badly off: Cisco sells a decent IP phone, as does 3Com. And after that, well... I'd have to Google it up. I have no idea who else sells IP phones. One would have hoped by now VTech or Siemens would sell a cordless phone that plugs into Ethernet, and sells with a software PBX. Still science fiction.

No doubt this pattern of an inability of the telecom industry to overcome the obvious will play out over the rest of the archive of Telirati Newsletters as I dig them up and dust them off. And that is one of the big reasons I enjoy the game business so much. In wireless mobile games, especially, the industry changes, in big ways, from day to day. Some changes are good, some less so, but the field is so large and dynamic that something will happen in a commercially interesting way, somewhere in the world.

Newsletter #4: One of these days…

I'm a firm believer in the power of visualization, or at least the discipline of it. I used to work in speech recognition. There is a field full of vision, but little discipline of visualization. On the BBC today, there was a story on IBM's new Chinese dictation software, and a nice woman from IBM was enthusing on how this was going to be revolutionary, because typing in Chinese is such a bother. First thing I noticed, the woman was French. The BBC reporter, probably having heard "Speech recognition, this time for sure!" before, asked the woman if any Chinese users had been consulted as to the usefulness of the system and she assured him IBM has a lab in China dedicated to all Chinese-specific aspects of the project. Now I neither speak Chinese nor have I used any of the keyboarding tools available for typing in Chinese, but I do know some things from my experience in speech recognition:

First, you can't talk all day. My voice is a bit hoarse right now from reading to my daughter for an hour and a half because she wanted to plow through all the way to the end of C. S. Lewis's The Voyage of the Dawn Treader tonight. Talking to my computer all day would be more tiring and, in addition, tiresome. As unnatural as a keyboard is, a speech-based user interface just trades one perversion of nature for another.

Second, assume you have perfect speech recognition - how much of the speech user interface problem have you solved? Not much. It's a bit like saying the fellow who invented the mouse invented the modern graphical user interface. Or the inventor of bit-mapped display. The graphical user interface was an interdisciplinary solution that wove technologies and disciplines as disparate as CAD drawing systems, typography, modeling, and visual language together into a solution.

The development of the GUI must have been guided by visualization that knowledge workers manipulate documents directly - by typing them, writing on them, moving them physically in ways that organize them. This immensely successful and fundamental development in user interfaces has become so ingrained that we do not think about it. In fact, there has been little intellectualizing about what the Web means to user interface design when it has, in effect, liberated users from button down (no pun intended) models of physical papers on desks in offices to a world that resembles colorful magazines. In this new world the previously uniform user interface language of gray buttons of a certain exact size, all a certain exact distance form one another, etc., etc., is replaced by a world of few if any conventions. If the cursor turns into a finger that seems ready to click, go ahead and click, and see where it takes you.

So if visualization has not saved speech recognition from the perennial proclamation of Real Soon Now, how will it help telephony developers? In my pervious newsletter I described a problem with Internet telephony - lack of terminal devices. No lack of computers, just that computers make lousy phones. Let's visualize:

I have a line I use for a modem connection, and a line for voice calls. The Lucent speakerphone next to my PC can access both. I would prefer to have neither, or at least be able to scale back to the one voice line. My computer should be on a high-speed Internet access device. So far so good, MediaOne will probably be in my town soon enough. But there's a problem with that: Current MediaOne consumer-grade access allows for only one device, with one IP address. That, as we shall see, is a problem.

But back to the phone set for a moment: I like my Lucent speakerphone, but it would have to have better audio in order to be my PC audio device as well. A headset jack would be nice, too, not just for phone calls, but for PC audio use as well - so make it a stereo headset, with a separate mic jack. Now to connect the "telephone set" to the PC - USB, please. No phone line to the phone? Nope. But that means the PC has to a) have a two-line "modem" device, or b) make the phone itself an Internet phone, and put routing software in the PC it is attached to (and routing in the set-top box to get to the cable-based internet access).

My house has wiring that is grotty old phone wiring, installed by the pot-addled hippies who build the place, who economized by using ancient four-prong sockets. If I want to upgrade, I'll have to saw bigger holes in the walls to start with. Ick! Think wireless: to begin with, make the receiver on the telephone set on my desk a wireless handset instead.

Now visualize this: I get a phone call, I pick up call on the speakerphone at my desk. My wife wants to make another call, in another room, because all this consultant talk gets on her nerves. She takes the wireless handset, flips it on, and dials her call. Underneath this simple activity, this is what is happening: The phone's base station starts a second Internet phone call, which the PC routes onto the household LAN, and which the set-top box routes onto the cable company's line.

Copyright 1997 Zigurd Mednieks. May be reproduced and redistributed with attribution and this notice intact.