From Audrey, the very first system capable of recognizing numbers from 0 to 9, to the ultra-sophisticated voice assistants Amazon's Alexa or Google Assistant, voice recognition has taken a quantum leap forward. This artificial intelligence technology can now be found everywhere, in our homes, phones and cars, to the point where it would be hard to do without it.
Want to know more about automatic speech recognition (ASR)? History, operation, models, applications... Here's our article.
What is voice recognition?
Speech recognition is based on over 70 years of scientific research, and is still going strong! What is it and what is it used for? We explain.
Automatic Speech Recognition (ASR): definition
Automatic speech recognition (ASR) is an artificial intelligence technology for understanding natural language. It captures the human voice from a microphone, analyzes it (pronounced words, intonation, accent...) and transcribes it into a computer request, in the form of a text or file that can be processed by a computer. This is also known as speech recognition or speech-to-text.
From voice dictation to voice command
In 1952, Bell Laboratories introduced Audrey, the very first voice recognition system. It was capable of identifying the numbers 0 to 9, pronounced separately, with a success rate of 99%. This machine transcribed the human voice: the beginning of voice dictation.
This feat of the time led to Shoebox, IBM's first voice assistant in 1962: a kind of calculator that understood simple mathematical problems dictated orally, then solved them immediately. This software responds to a demand: it's the first step towards voice control.
If early automatic speech recognition systems were slow, clumsy and expensive, recent software is approaching a masterpiece in the world of technology. Driven by machine learning, they are now able to understand different voices, accents and even emotions with increasing ease. Voice dictation and voice command are the two most popular ASR technologies.
Not to be confused with text-to-speech, a technology that creates artificial speech from written text. This is text-to-speech. Many AI systems use both voice command and text-to-speech software to respond orally to requests. This is the case, for example, with the callbot in customer services, a telephone-based conversational agent.
What are the applications of speech recognition?
One thing's for sure: voice recognition has become an integral part of our daily lives. In both the private and professional worlds, we use it without even realizing it. Why is it so successful? The answer can be summed up in one major advantage: all it needs is our voice. With voice recognition, we're free to move. There's no need to type on a keyboard (as with an IVR) or stare at a screen for it to work. So you don't need to know how to write, or even speak a sustained language, because the machine learning software understands accents, French mistakes and adapts accordingly. What's more, the voice can convey information much more quickly than the written word. In short, voice recognition saves us time.
Today, it can be found in a wide range of sectors, including the following applications:
Book an appointment by phone 24/7;
- Check your account balance
- Dictate medical consultation reports;
- Obtaining a replacement vehicle in the event of a claim
How does voice recognition work?
Automatic Speech Recognition (ASR) is a complex technology designed to make life easier. We explain how it works in a few sentences.
The 5 models of automatic speech recognition
In order to understand natural language, the software generally combines 5 ASR-specific models:
- Acoustic pre-processing: identifies moments of speech in the recording;
- The pronunciation model: associates words known to the system via phonetics;
- The acoustic model: predicts the most likely phonemes;
- The linguistic model: predicts the most likely sequence of words;
- The decoder: combines predictions to propose a text transcription.
Want to find out more about our technologies? Discover Zaion Lab.