The pioneering speech processing methods developed by Hynek Hermansky are some of the most widely used in speech recognition, speech and audio coding, and speaker and language recognition. Known for innovative signal processing approaches that make speech parameters insensitive to channel variability and noise, his many accomplishments include Perceptual Linear Prediction (PLP) signal representation, Relative Spectral Analysis (RASTA) processing, and TRAP and TANDEM neural net based feature extraction techniques, in machine recognition of speech. PLP is one of the two primary feature sets used internationally for speech recognition. RASTA helps to reduce the effects of linear spectral distortion when using a microphone for speech recognition with a different frequency response than what was used during training. TRAP is one of the first techniques which explicitly uses very large temporal contexts of speech and TANDEM allows for interface between modern neural net features and conventional speech recognition systems.
An IEEE Fellow, Hermansky is the Julian S. Smith Professor of Electrical Engineering and director of the Center for Language and Speech Processing at Johns Hopkins University, Baltimore, MD, USA.
The groundbreaking innovations of Hermann Ney have led to major advances in speech and language processing and sparked a new era of research in automatic recognition and translation of spoken language. His dynamic programming-based techniques made search/decoding processes computationally feasible for large-vocabulary continuous speech recognition. He developed a statistical smoothing technique, now known as the Kneser-Ney smoothing process, which is one of the most efficient and widely used methods for language model smoothing. Ney also created phrase-based machine translation, which outperformed conventional linguistic systems in translating spoken language. His publicly available GIZA++ toolkit for statistical word alignment has become the standard for building state-of-the-art machine translation systems. Ney’s contributions have greatly impacted today’s speech interfaces for mobile devices, among other applications.
An IEEE Fellow, Ney is a professor with the RWTH Aachen University of Technology, Aachen, Germany.
Mari Ostendorf has advanced the field of speech and language processing by combining pioneering advances in statistical modelling and machine learning with an understanding of the deeper structures in speech. In acoustic modeling, she developed stochastic segment models to overcome the limitations of hidden Markov models, allowing the use of richer feature sets. Ostendorf helped standardize the method for annotating prosodic events (the properties of syllables and larger units of speech such as intonation, tone, and rhythm), as a major contributor to the now standard “ToBI” method for labeling speech prosody, and demonstrated its utility for improving spoken language technology. She also created novel frameworks of language modeling that are effective in separating different contextual factors, significantly impacting language modeling for speech recognition.
An IEEE Fellow, Ostendorf is a professor of electrical engineering at the University of Washington, Seattle, WA, USA.
Mark Yoffe Liberman’s trailblazing efforts in creating the Linguistic Data Consortium (LDC) have fueled the development and advancement of human language technologies (HLTs) including speech and speaker recognition, machine translation, and semantic analyses. Founded at the University of Pennsylvania in 1992, the LDC became the largest developer of shared language resources, distributing more than 120,000 copies of over 2,000 databases covering 91 different languages to more than 3,600 organizations in over 70 countries. Liberman has also helped to create technologies that have daily impact on HLTs, including a speech activity detector that regularly processes LDC speech data to reduce annotation cost and increase accuracy and a forced aligner that was integrated into the Forced Alignment and Vowel Extraction service that has revolutionized phonetic and sociolinguistic research.
Liberman is the Christopher H. Browne Distinguished Professor of Linguistics and director of the Linguistic Data Consortium at the University of Pennsylvania, Philadelphia, PA, USA.
Considered one of the most innovative speech and audio coding researchers in the world, the technologies and standards developed by Takehiro Moriya are used internationally for the practical transmission of speech signals over digital networks. His many contributions to standardized coding techniques include the pitch synchronous innovation scheme adopted by the Japanese digital cellular phone system, which is recognized as the lowest bit-rate standard for public telephone service in the world. His speech coder implemented in the ITU-T G.729 standard has become the default technique for VoIP systems, cellular phones, and voice recording devices. Moriya also made a significant contribution to establishing MPEG audio lossless coding (ALS), which has been selected as one of the techniques for the next generation of super-high-definition broadcasting in Japan.
An IEEE Fellow and NTT Fellow, Moriya is head of the Moriya Research Lab, NTT Communication Science Laboratories, Atsugi, Kanagawa, Japan.
A highly respected leader in speech and language processing, the innovations developed by Prof. Steve Young continue to advance the state-of-the art in translating spoken words into text. Prof. Young developed the HTK Toolkit for hidden Markov model-based speech recognition, which has become indispensable software in research laboratories worldwide and has served as the basis for training many commercial speech recognition systems. Techniques including decision-tree state clustering and parallel model combination have enabled speech recognition systems that are more robust to additive and convolutional noise. Prof. Young’s research in applying statistical techniques to spoken dialogue systems aims to revolutionize the development of systems that integrate natural conversational speech interactions between users and information systems such as in call centers, mobile devices, and other applications.
An IEEE Fellow, Prof. Young is a professor with the University of Cambridge, Cambridgeshire, UK.
Biing-Hwang Juang has developed speech recognition and coding technologies found in practically all of today's cell phones, smart phones, and voice-enabled applications. He was one of the first to realize the potential of Hidden Markov Models (HMMs) for speech recognition systems, which enable computers to identify spoken words and sentences. His "mixture density HMM" became the dominant technology for speech recognition systems used in cell phones, voice-enabled navigation systems, and customer-service telecom lines. Dr. Juang was one of the first to develop vector quantization methods for low-bit-rate speech coding. He also co-developed an efficient speech representation method that is at the heart of speech coders used in cell phones and for Voice over internet applications.
An IEEE Fellow, Dr. Juang is the Motorola Foundation Chair Professor and GRA Eminent Scholar at the Georgia Institute of Technology, Atlanta, GA, USA.
Considered a world expert on acoustic phonetics, Victor Zue’s pioneering work on developing spoken-language interfaces that make human-computer interactions easier and more natural has helped people to communicate with electronic devices much the same way we converse with each other. The speech systems developed by Dr. Zue while head of the Massachusetts Institute of Technology’s Spoken Language Systems Group from 1989 to 2001 became the forerunners of today’s speech applications, such as the Siri application featured on Apple’s iPhones. He and his team created multilanguage conversation systems integrating speech processing with dialog-based problem solving. These interfaces provided information such as driving directions, weather information, and flight schedules through web-browser-based systems and as speech-only systems over publicly available telephone lines.
An IEEE member and member of the US National Academy of Engineering, Dr. Zue is the Delta Electronics Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, Cambridge, MA, USA.
James K. Baker and Janet MacIver Baker have reshaped the field of automatic speech recognition with innovations that allow speech to be accurately converted to text and computers to be controlled with spoken commands. In 1982, the husband-and-wife team founded Dragon Systems, Inc., famous for its highly accurate commercial voice dictation systems and many industry firsts. Released in 1997, Dragon NaturallySpeaking™ was the first general-purpose continuous dictation system, quickly establishing worldwide market leadership and winning numerous industry awards. Today’s applications range from general-purpose dictation, voice transcription, and audio search to speech interfaces for smartphones and automotive systems. Individually, Dr. James Baker is best known for his pioneering work in introducing the efficacy and power of stochastic processing and Hidden Markov Models (a statistical method for the recognition of speech) to automatic speech recognition. In addition to technical contributions ranging from signal processing to audio search engines, Dr. Janet Baker has also helped initiate international projects to create large nonproprietary databases and to foster rigorous performance evaluation methodologies.
Dr. James Baker is currently a Distinguished Career Professor at Carnegie Mellon, Pittsburgh, PA, and focuses on new paradigms for automatic learning, especially to help develop speech technologies for all the languages of the world.
An IEEE member, Dr. Janet Baker is currently affiliated with MIT Media Lab and Harvard Medical School, and conducts neuroscience research applying machine learning to speech and language processing in the brain.
Julie Hirschberg is an innovator in building viable computational models of human prosody for use in speech synthesis. Her goal has been to make human–computer interaction in spoken dialog systems more natural and effective. She was one of the architects of the Tone and Break Indices (ToBI) system for labeling human prosodic contours that is used in many text-to-speech systems and is widely used in prosody research. Initially used for intonational description of standard American English, ToBI has been extended to model other languages. Professor Hirschberg has also pioneered work in emotional and deceptive speech and on audio browsing and retrieval research, both in techniques to improve audio search and to make audio search engines more usable.
An IEEE Senior member, Dr. Hirschberg is a professor in the Department of Computer Science at Columbia University, New York, NY.
A leader in the field of speech processing for nearly two decades, Sadaoki Furui continues to play an important role in improving natural communication between humans and machines in today’s “voice-activated” world. Best known for investigating human perception of transient sounds during the 1980s, gaining an important understanding of human hearing, Dr. Furui provided the first quantifiable measurement of the importance of spectral transition upon intelligibility. By incorporating spectral derivatives, also known as “delta cepstra,” the accuracy of speech recognition systems was greatly improved. Most of today’s practical speech recognition, speaker identification, and verification systems incorporate this concept. Dr. Furui continues to be a leader in the field, having recently overseen a Japanese national project whose goal was to develop a system for automatic understanding and summarization of spontaneous speech.
An IEEE Fellow, Dr. Furui is currently a professor with the Department of Computer Science at the Tokyo Institute of Technology, Japan.
John Makhoul, chief scientist at BBN Technologies, Cambridge, MA, has made a number of significant contributions to the mathematical modeling of speech signals. Prominent among these contributions are his papers on linear prediction, which models the evolution of a signal over time, and vector quantization, which allows for the efficient coding of signals and parameters. Dr. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition, and speech understanding. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly. He has been leading research and development in speech systems and more recently in applying existing and new technology to the area of language translation.
An IEEE Life Fellow, Dr. Makhoul has received several awards from IEEE.
Raj Reddy’s work on efficient algorithms gave rise to modern speech recognition systems, demonstrating speaker independence and a capability of handling a large vocabulary. Dr. Reddy, currently Mozah Bint Nasser University Professor of Computer Science and Robotics in the School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, has been a member of the faculty since 1969. The Sphinx recognition system, which serves as the basis of commercial speech recognition systems found today in automated phone centers, and computers and robots that respond to a user's voice, was developed under Dr. Reddy’s guidance.
An IEEE Fellow, he holds ten honorary doctorates and is the recipient of the 1994 Association for Computing Machinery Turing Award. Dr. Reddy is former president of the American Association for Artificial Intelligence.
Allen Gersho, research professor at the University of California, Santa Barbara, has been one of the leading contributors to the areas of speech coding and signal compression. His applications have been instrumental to internet voice, cellular telephony, audio for video conferencing, and secure voice terminals for government and military areas. His work in Vector Quantization (VQ), a fundamental and powerful tool for data compression, has improved the efficiency of speech coding by achieving high-quality speech at very low bit rates. Along with Robert M. Gray, he co-authored Vector Quantization and Signal Compression, a book that has become a standard reference on the subject. He also developed with his former student Raymond Chen an adaptive post-filtering technique that has become the standard in code excited linear predictive (CELP) speech coders and is used by many leading telecommunication equipment companies.
An IEEE Fellow, he has previously received the Ericsson-Nokia Best Paper Award and the IEEE Third Millennium Medal. Dr. Gersho has a B.A. from Massachusetts Institute of Technology, Cambridge, MA, and a doctorate from Cornell University, Ithaca, NY, all in Electrical Engineering.
Mr. James D. Johnston has been called the father of perceptual audio coding for his pioneering contributions that revolutionized digital audio. His accomplishments during a 26-year career at AT&T Bell Labs have, among other achievements, allowed for the distribution of digital music and digital radio over the internet. As a technology leader at AT&T Research Labs, Florham Park, NJ, he invented a number of basic techniques used in perceptual audio coding, especially in MP3 and MPEG 2 Advanced Audio Coder (AAC). He also created the standards used for coding audio-visual information in movies, video, and music formats in a digital compressed format. One of Mr. Johnston’s particular breakthroughs is using perceptual coding to shrink files. This removes parts of a signal that cannot be detected by human beings and reduces the bit rate needed for audio transmission or storage by a factor of ten or more. This technique also cuts signal transmission time without substantially affecting quality. His achievements have enabled high-quality audio to be delivered at high bandwidth efficiency in telephone communications and web-based networking. They also influenced international standards for audio transmission, such as the MP3 standard, widely used in computer networks. Many internet companies use the technology Johnston developed as the foundation of the electronic music-distribution business, including AAC players and jukebox systems.
Mr. Johnston retired from AT&T Labs in 2002 and is now an audio architect at Microsoft Corporation in Redmond, WA. An IEEE member, he is a co-recipient of the IEEE Donald G. Fink Prize Paper Award. He also is a Fellow of the Audio Engineering Society, has published more than 50 technical papers, and has been awarded more than 20 US patents. He received his bachelor's and master's degrees from Carnegie Mellon University, Pittsburgh, PA, both in Electrical Engineering.
Dr. Frederick Jelinek, director of the Center for Language & Speech Processing, and Julian S. Smith, Professor of Electrical Engineering at Johns Hopkins University in Baltimore, MD, is hailed by his contemporaries as one of the fathers of modern speech and language processing. His pioneering statistical methods form the foundation for nearly all of today's state-of-the-art speech recognition technology. Dr. Jelinek shaped several major concepts in lexical language modeling and was a leading force behind "tree banking," a method for automating the ability to measure parsing quality. His research workshops have made significant contributions to the pool of trained specialists in the fields of speech and natural language processing.
An IEEE Fellow, he is author of Statistical Methods for Speech Recognition and Probabilistic Information Theory. Dr. Jelinek's many honors include the IEEE Signal Processing Society's Society Award and the IEEE Information Theory Society's Golden Jubilee Paper Award.
The field of speech research owes much to Dr. Gunnar Fant and Dr. Kenneth N. Stevens, who first collaborated at the Massachusetts Institute of Technology (MIT) in Cambridge, MA, in 1949. Dr. Fant has been instrumental in creating the multidisciplinary area of speech communications and technology, while Dr. Stevens has pioneered the science and engineering of speech acoustics, production, perception, and processing. Their textbooks, Dr. Fant's Preliminaries to Speech Analysis (co-authored by Roman Jakobson and Morris Halle) and Acoustic Theory of Speech Production and Dr. Stevens' Acoustic Phonetics, remain definitive speech processing texts.
Dr. Fant is professor emeritus in what is now the department of speech, music, and hearing at Stockholm's Royal Institute of Technology (KTH). He is a member of the US National Academy of Engineering, the American Academy of Arts and Sciences, the Royal Swedish Academy of Engineering Science, and the Royal Swedish Academy of Science, and he is a Fellow of the Acoustical Society of America. His honors include the Swedish Academy of Engineering Sciences' gold medal and the Ericsson prize in telecommunication (shared with James Flanagan).
Dr. Stevens is the Clarence J. LeBel Professor of Electrical Engineering and Computer Science and head of the research laboratory of electronics' speech communication group at MIT. He also is consultant to Sensimetrics Corporation in Somerville, MA. An IEEE Life Fellow, he is a Fellow of the American Academy of Arts and Sciences and a member of the US National Academy of Engineering and the US National Academy of Sciences. A Fellow and past president of the Acoustical Society of America, he has received that society's gold medal and the US National Medal of Science.