Personal assistants like Siri have gotten better and better at recognizing what we're saying, at least in general. When it comes to recognizing names, including business names, especially regional names, the challenge has been greater.

Apple's Machine Learning Journal describes how the Siri team has been tackling it:

Generally, virtual assistants correctly recognize and understand the names of high-profile businesses and chain stores like Starbucks, but have a harder time recognizing the names of the millions of smaller, local POIs that users ask about. In ASR, there's a known performance bottleneck when it comes to accurately recognizing named entities, like small local businesses, in the long tail of a frequency distribution.

We decided to improve Siri's ability to recognize names of local POIs by incorporating knowledge of the user's location into our speech recognition system.

ASR systems generally comprise two major components:

  • An acoustic model, which captures the relationship between acoustic properties of speech and sequences of linguistic units, like speech sounds or words
  • A language model (LM), which determines the prior probability that a certain sequence of words occurs in a particular language

We can identify two factors that account for this difficulty:

  • Systems that don't typically have a representation of how a user is likely to pronounce obscure named entities.
  • Entity names that occur only once, or never, in the training data for LMs. To understand this challenge, think of the variety of business names in your neighborhood alone.

The second factor causes the word sequences that make up local business names to be assigned very low prior probabilities by a general LM. This, in turn, makes the name of a business less likely to be correctly selected by the speech recognizer.

The method we present in this article assumes that users are more likely to search for nearby local POIs with mobile devices than with Macs, for instance, and therefore uses geolocation information from mobile devices to improve POI recognition. This helps us better estimate the user's intended sequence of words. We've been able to significantly improve the accuracy of local POI recognition and understanding by incorporating users' geolocation information into Siri's ASR system.

It's way over my head but still a fascinating read on not only what but how the Siri team is trying to crack some of the tougher problems in voice assistant technology.