If you’re a Google Cloud customer who’s tapping into the company’s artificially intelligent (AI) suite for text-to-speech or speech-to-text services, good news: New features are headed your way. The Mountain View company today announced significant updates on those fronts, including the general availability of Cloud Text-to-Speech, new audio profiles that optimize sound for playback on different devices, enhancements to multichannel recognition, and more.
First on the list: improved speech synthesis in Google’s Cloud Text-to-Speech. Starting this week, it’ll offer multilingual access to voices generated using WaveNet, a machine learning technique developed by Alphabet subsidiary DeepMind. Without diving too deep into the weeds, it mimics things like stress and intonation in speech — sounds referred to in linguistics as prosody — by identifying tonal patterns. In addition to producing much more convincing voice snippets than previous models, it’s also more efficient — running on Google’s Cloud TPU hardware, WaveNet can generate a one-second sample in just 50 milliseconds.
Cloud Text-to-Speech now offers 17 new WaveNet voices and supports 14 languages and variants. In total, it’s got 56 total voices: 30 standard voices and 26 WaveNet voices on offer. (Check out this webpage for the full list.)
Expanded WaveNet support isn’t the only new feature on tap for Cloud Text-to-Speech customers. Audio profiles, which were previously available in beta, are launching broadly today.
In a nutshell, audio profiles let you optimize the speech produced by Cloud Text-to-Speech’s APIs for…