The Decade of Machines, that Understand Speech

Plenary / Panel
German and English language

Alex Acero


Speech recognition has been an active area of research for the last 40 years that, while it s starting to be used in some commercial applications, is far from the Star Trek computer we all want. Many of the predictions in science fiction movies like  2001 Space Odyssey have been correct but not the prediction of the intelligent computer that talks. In this talk I will give a brief historical overview and then describe some of the challenges this technology faces. Demos that illustrate the state-of-the-art will be provided. Finally I ll describe opportunities for speech technology during this decade.

Artificial Intelligence is the set of disciplines that tackle problems that humans find easy to solve but machines find it hard. Speech recognition is the holy grail of artificial intelligence. Many users are perplexed that computers can beat a chess grandmaster yet cannot do something  as simple as recognize speech reliably. This mismatch in expectations has caused many problems in the field.

Humans do much better than machines in recognizing speech because they don t  simply transcribe the words but they also understand what the message is, and thus can guess what a missing word (perhaps due to background noise or lack of clarity on the speaker) is through the use of context. Understanding and transcription often come hand in hand for humans, yet this is not the case with computers that have a very limited understanding capability. Much of the work that will happen this decade to break this status-quo will have to do with improving this context model by adding domain-independent knowledge as well as personalization.

The  cocktail party effect shows the ability of humans to follow one conversation when several are present simultaneously. This is currently not possible with today s speech recognition technology. Scene analysis should take the incoming signal and interpreted as a sum of two signals that has the highest likelihood and a more powerful spectral analysis is needed for this to happen. In addition, the context model will be needed in breaking up the signal into two or more independent signals. This poses tremendous computational and algorithmic challenges that will need to be resolved before we can successfully talk to our smartphones in the train station or cafeteria.

Speech recognition works reasonably well when a speaker trains a system and articulates his/her speech. The error rate of recognition systems increases to the point of making them useless when the user speaks in a more spontaneous manner. A new paradigm is needed to better model this spontaneous style.


Senior Researcher and Manager Speech Technology Group
Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C. Chair

Alejandro ACERO

Senior Researcher and Manager Speech Technology Group

 Before joining Microsoft in 1994, I worked in the speech groups of Apple Computer and Telefonica Investigacion y Desarrollo. I received a Ph.D. from Carnegie Mellon University in 1990, a Master's from Rice University in 1987 and an engineering degree from the Universidad Politecnica de Madrid in 1985, all in Electrical Engineering. I'm also an affiliate Professor of Electrical Engineering at University of Washington.
 Research interests:
 Speech Recognition: robustness to noise, rapid adaptation, acoustic modeling, signal processing.
 Spoken Language Systems: rapid prototyping of speech understanding systems.
 Speech Synthesis: automatically trained concatenative synthesis and distribution-based synthesis.

Dr. Peter F. KROGH

Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

 Studied Arts in Law and Diplomacy and Philosophy at Tufts University
1958-1960 Trainee and Acting Assistant Branch Manager, The New England Merchants Bank, Boston
1961-1962 Instructor in Government, Tufts University
1962-1967 Assistant Dean, Fletcher School of Law and Diplomacy, Tufts University
1963-1967 Host, television interview program, "Backgrounds" - WGBH-TV, Boston
1965 Visiting Scholar, The Brookings Institute
1967-1968 White House Fellow, Special Assistant to the Secretary of State
1968-1970 Associate Dean, Fletcher School of Law and Diplomacy, Tufts University
1970-1995 Dean and Professor of International Affairs, School of Foreign Service
1982-1988 Moderator, weekly PBS television program on foreign affairs "American Interests"
1988-2005 Moderator, PBS television foreign affairs series: "Great Decisions"
since 1995 Dean Emeritus and Distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

Technology Symposium

