The system, which works locally on a smartphone or other portable device, comprises two kinds of neural networks: a recurrent neural network (RNN), which uses its internal state, or memory, to process inputs, and a convolutional neural network, a neural network that mimics the connectivity pattern between neurons in the human brain. On average, it’s 95 percent capable of recognizing words and phrases, Lott said.
“It learns from patterns [and] from your use of the device,” he said. “It can personalize its behavior to you.”
Most voice recognition systems today do most of their processing in the cloud, Lott explained. The microphones and chips in phones, smart home speakers like Google Home and Amazon’s Echo speakers, and Windows computers with Microsoft’s Cortana assistant enabled listen for “hot words” like “OK Google” and “Hey Cortana,” which prime the system for the string of voice commands to come. But they don’t analyze those commands — they relegate the grunt work to powerful remote servers running complex machine learning algorithms.
For some users, surrendering their voice data to the cloud raises privacy concerns. Both Amazon’s Alexa assistant and Google Assistant record snippets before sending them off for analysis, and they retain those voice snippets until users choose to delete them. Both companies say they use audio recordings to improve their services and provide more…