Speech recognition technology engine gathers more pace
Speechmatics, a Cambridge-based speech recognition technology scale-up, has unveiled a globally transformative advance to the engine driving its solutions.
It reveals a major step in delivering “truly comprehensive” speech recognition by adding Entity Formatting functionality to its Autonomous Speech Recognition (ASR) software.
Tackling one of machine learning’s biggest challenges, using Inverse Text Normalization (ITN), the software’s ability to consistently and more accurately interpret how entities such as numbers, currencies, percentages, addresses, dates, and times should appear in written form makes transcripts more readable and reduces post-processing work, the company says.
The company says that this update makes using speech recognition technology significantly more valuable to enterprise-level customers, where there is a higher dependency on the consistent and appropriate formatting of numbers in text, such as those in media, financial services, and healthcare.
Entity Formatting is notoriously challenging in speech recognition because of the way that entities are spoken in conversation varies – even between countries that speak the same language.This adds layers of complexity, says Speechmatics.
Telephone numbers are a prime example where people might use ‘oh’ instead of ‘zero’ or use double/triple digits such as ‘triple three’.
Speechmatics CEO Katy Wigdahl said: “Creating a more professional transcript will speed up our customers’ workflows by making large numbers easier to read, requiring less human editing.
“Context is also critical – there are so many nuances and ambiguities that need to be accounted for in language, such as whether ‘pounds’ is a reference to weight or currency? And whether ‘venti’ is being used as the Italian word for 20 or winds?”
This challenge has overwhelmingly been met, she says. Numerals are represented accurately and consistently, dramatically reducing the level of human intervention in the post-editing process.
Based on pre-selected standardisations chosen by the customer, numbers can either be represented in written format or spoken in a transcript.
Wigdahl added: “This new functionality in our breakthrough Autonomous Speech Recognition will have a decisive impact on our customers working in numerically intensive industries.
“Entity Formatting has always been a notoriously challenging task for speech recognition but with this latest update we are delivering best-in-market functionality and bringing significant value to our customers operating in industries where getting numbers right for speech-to-text tasks is mission-critical.”
These novel additions follow Speechmatics’ major advances in Autonomous Speech Recognition across the board. Its technology is trained on huge amounts of unlabelled data without the need for human intervention, delivering a far more comprehensive understanding of all voices and dramatically reducing AI Bias and errors in speech recognition.
The technology is used by enterprises in scenarios such as media & entertainment, contact centres, CRM, financial services, security, and software. Speechmatics processes millions of hours of transcription worldwide every month in 33 languages.
In 2019 Speechmatics received the Queen’s Award for Enterprise Innovation. Its offices are in Cambridge and London in the UK, Denver in the US, Chennai in India and Brno in the Czech Republic.