Monday, October 17, 2011

Voice Recognition Is About to Re-Wire Our Brains

Voice-based data input to a computer is not a new idea. While the keyboard, mouse, and more recently, gestures, have been the primary way of interacting with computers, the idea of voice-based interaction is as old as the HAL 9000, the talking computer from Space Odyssey. Software such as Dragon Naturally Speaking (since 2005 owned by Nuance) or IBM’s ViaVoice have been around for almost two decades and let’s not forget the often infuriating Interactive Voice Recognition (IVR) used by most telephone support departments today.

Voice recognition software stole the spotlight last week when Apple released its new iPhone 4S with built-in Siri software. Embedding voice recognition directly into the operating system is a major milestone and having it included in a mobile device makes perfect sense as we can see from the video commercial by Apple. Only the TV set is a device that needs voice control even more – I am still waiting for the kind of interaction Marty McFly (Michael J. Fox) was using in the Back to the Future II movie. In fact, Siri for Apple TV is rumored to be on the way and Microsoft recently demonstrated voice-based movie search on Xbox 360.

So, how come we have not been talking to our computers for the last decade since the technology was there? Well, part of it was the accuracy of the recognition. When I used Naturally Speaking Back in the 90s, I had to train the software to understand me which was a lot of work for meager results. We all know the frustration with any IVR based system: “Sorry, I didn’t quite catch that. Could you please try it again?”. And while Siri represents the next generation of voice recognition, plenty of stories about the funny results that its use can result in circulated on the Web immediately after the new iPhone was released.
Source: STST
With increased computing power and better software algorithms, the quality is becoming less of an issue. One day, the software might even understand dialects or foreign accents like mine. But I suspect that’s only part of the adoption challenge. The other part lies in our ability to express our thoughts verbally to a computer. Most of our verbal communication is not very straightforward and we even enjoy taking our time before coming to the point. In places where communication has to be clear and precise such as military orders, radio protocol, or business negotiations, it is only possible after many hours of training. Naturally, people don’t speak that way.

However, just some 30 years ago typical managers didn’t have computers on their desks. They would spend several hours each day responding to correspondence by dictating letters onto a tape which their secretaries would later transcribe on a typewriter and later on a word processor and eventually on a PC using Word Perfect. Another 10 years before that, the dictation was done in real-time and the secretary had to know short-hand to keep up. It took years before the PC made it to the manager’s desk. What amazes me today is that the managers were able to dictate complete letters in full, well articulated sentences.

For most of us, that’s not so easy anymore. Today, we have a generation of PC users spoiled by the editing power at our fingertips. Most of us, knowledge workers, formulate our sentences as we write them and since it is so easy to rephrase any sentence or start from the beginning, we do it all the time. I’ve been observing many people doing this and I know that I am not alone. Most humans, even professional writers, would have a difficult time dictating in complete sentences. Giving commands to the computer such as search requests is one thing but authoring text via voice recognition requires a new skill set that is underdeveloped in most of us today. We know from the past that we humans are capable of such skills but the last 30 years of PC revolution have re-wired our brains differently.

Now Siri and other voice recognition software may be starting a new era. An era where we can – and perhaps must - express ourselves verbally in a new way. Let’s see how it goes. [computer, strike last sentence] Ehm…

By the way, when is Siri going to be available on iPhone 4?

1 comment:

  1. "By the way, when is Siri going to be available on iPhone 4?"

    Probably soon: