One challenge that AI has to grapple with when producing automatic transcripts is differing accents. Naturally speech recognition technology only recognises accents it is trained to detect. But of course humans aren’t the same and the need for wider speech understanding will only grow. AI has reignited a cultural divide between accents.
What is Received Pronunciation?
Received Pronunciation, normally abbreviated as RP, is the accent traditionally regarded as standard in British English. It’s notable for being geographically neutral, hence why iti is typically identified with a social group, and taught at public school.
Attitudes to RP have changed over the years. At one time it was the norm on the BBC where it was dubbed “BBC English”, but today the broadcaster encourages diverse speakers and voices. RP is particularly associated with the Queen and the royal family, although the Queen’s accent has modified throughout her reign into a more conservative RP (have a listen to her first televised address in 1956). Other speakers of RP are Jeremy Paxman, Emma Watson and several recent UK prime ministers including Sir Tony Blair, David Cameron, and yes, Boris Johnson.
So why is RP great for speech recognition software?
The neutrality and clarity of RP makes it attractive for speech recognition systems, which tend to struggle with stronger or regional accents. To demonstrate why this is, let’s have a look at an example. One of the most difficult accents for AI is of course the Scottish accent.
Here’s a highlights video of the ITV EU referendum debate back in 2016. On either side of the debate are Nicola Sturgeon and Boris Johnson. As political figures they couldn’t be further apart, not just in their ideology, but in their vastly different accents. Sturgeon on the one hand is known for her Scottish, specifically Ayrshire accent with strong vowels (to the extent that you sometimes can’t tell an E from an I. Johnson on the other hand boasts Eton taught RP. Note how he often says “pounds” to sound like “ponds”. The important thing to remember is that neither is the right or wrong accent, and crucially both are perfectly understood by humans.
But it’s of course Sturgeon that presents the biggest problems for speech recognition software. Why? Simply because AI cannot rely on the context that we use to make sense of her accent where we might not pick up every word. Rather than make sense of the meaning of the sentence as a whole to fill in gaps as we would, AI is tasked with simultaneously transcribing on the spot, choosing the best fitting word there and then based entirely on what it heard. In contrast, Johnson’s RP features little to no peculiarities, his accent is not weighted one way or the other, and therefore the vast majority of his words are picked up because they “sound as they should”
A final word
RP is a funny one – it’s the accent of no accents, so to speak. You could call it the vanilla ice cream of speech, which is why it’s easy to programme AI to recognise it. Add in any other flavour, or accent, and things get complicated. This isn’t to suggest that there is a right or wrong way to speak – language is descriptive of society and culture rather than being a prescriptive set of rules. But it’s the unpredictable, sometimes irrational nature of human beings that AI, and speech recognition, have the most difficulty in understanding.