Kurt Fuqua, vice president of SVOX USA, explores the technical challenges involved in the future of interactive voice communication in mobile devices during a panel discussion at swissnex San Francisco. With moderator Dylan Tweney, Wired.com senior technology editor, Fuqua and fellow panelists Jim Larson of W3C Voice Browser Working Group and Bill Meisel, president of speech-industry consulting and publishing firm TMA Associates, discuss just what it takes for a device to understand the meaning of language, how to make multiple applications work together, and how voice commands will soon allow smartphones to go way beyond clicks and menus to allow for truly natural conversations with the devices in our pockets.
Bio
Kurt Fuqua
Kurt Fuqua is Vice President of SVOX USA. A computational linguist who has created natural-language understanding software for mobile phones and who specializes in conversational systems, Fuqua was project manager for Pico, the speech synthesis in Android, part of Google's Nexus One phone.
He created and maintains the Scalable Language API, an industry standard for natural-language applications. He has also created comprehensive grammars for multiple languages.
James Larson
James Larson is co-chair of the W3C Voice Browser Working Group, which creates language standards for developing speech applications. He is Program Chair of SpeechTEK, the world's largest speech technology conference, and the new SpeechTEK Europe Conference to be held in London this May.
He also teaches courses in user interface design, voice user interfaces, and XML languages at Portland State University and Oregon Health and Sciences University in Portland, Oregon. He has written several books, including /VoiceXML: Introduction to Developing Speech Applications/.
William Meisel
William "Bill" Meisel is president of TMA Associates, a speech-industry consulting and publishing firm. Meisel also writes the Speech Strategy News newsletter, is co-organizer of the Mobile Voice Conference, executive director of the Applied Voice Input Output Society, and edited a recent book, VUI Visions. In the 1980s, he founded and ran a speech recognition technology company that did early work in automating customer service and continuous-speech dictation of medical reports, after managing the Computer Science Division of an engineering firm eventually acquired by a large defense company.
Meisel began his career as a university professor and wrote the first textbook on Computer Pattern Recognition. He has published over 20 papers and holds a B.S. in engineering from Caltech and a Ph.D. in electrical engineering from USC.
Dylan Tweney
Dylan Tweney is a writer and editor based in San Mateo, California. He specializes in creating clear, compelling copy about science and technology. He has worked for magazines, Web sites, podcasts, online video, corporate communications, and marketing programs, and is currently a senior editor at Wired.com covering the technology beat, where he manages a staff of five reporters and editors and oversees Wired's hardware blog, Gadget Lab.
Tweney has 15 years of experience as a journalist. His work has been published in WIRED, Business 2.0, PC World, and dozens of other publications. His corporate clients have included RLG, Cisco, Deloitte & Touche, World Book Encyclopedia, and TeaLeaf Technology. You can learn more at his blog, the Tweney Review.
Ability of computer systems to accept speech input and act on it or transcribe it into written language. Current research efforts are directed toward applications of automatic speech recognition (ASR), where the goal is to transform the content of speech into knowledge that forms the basis for linguistic or cognitive tasks, such as translation into another language. Practical applications include database-query systems, information retrieval systems, and speaker identification and verification systems, as in telebanking. Speech recognition has promising applications in robotics, particularly development of robots that can hear. See alsopattern recognition.
I'd agree. The example, "Do you remember the italian restaurant I went to last week?" is a bad example. It would be so easy for anyone to find the name of the place without having to bother with this new app. Plus, texting Chris yourself: Dude. Let's meet at that italian place we went to last week.
What italian place? I wasn't with you.
Oh yeah. Hold on. (checks google maps, finds name of place by checking the street view to read the sign) It's called Olive Garden. Meet me there dude.
Okay. What time?
My point is that this app they're working on isn't necessary. But I can see that there may be a need for apps to interact as they get more complex. His example is just bad.