While elite digital assistants like Alexa, Cortana, Google Assistant and Siri have so far been receiving data through your words, Mozilla is planning to enhance all such existing artificial intelligence (AI) developments by open sourcing human voices on a mass level. The web giant has already launched a project called Common Voice to build a large-scale repository of voice recordings for their future use.
Mozilla has started capturing human voices since June to build its open source database. The database will be live later this year to “let anyone quickly and easily train voice-enabled apps” that goes beyond Alexa, Google Assistant and Siri.
“Experts think voice recognition applications represent the next big thing. The problem is the current ecosystem favours Big Tech and leaves out the next wave of innovators,” said Daniel Kessler, senior brand manager, Mozilla, in a recent blog post.
End of proprietary voice data
Tech companies are presently using different voices to teach computers to understand the variety of languages for their solutions. But the data sets with the voice collection are mostly proprietary as of now. Therefore, a large number of developers have no access to voice recording samples to test their own voice recognition projects that ultimately leads to a limited number of apps understanding our speech.
Things are appearing to be transformed with Common Voice. “The time has come for an open source data sets that can change the game. The time is right for Project Common Voice,” Kessler stated.
Donate recordings
Mozilla is asking individuals to donate their voice recordings either on the Common Voice webpage or through downloading a dedicated iOS app. Once you are ready with your recording, you need to read a set of sentences that will be saved into the system.
The recorded voices, which would come in a variety of languages with various accents and demographics, will be provided to third-party developers.
A plan to validate 10,000 hours of audio
In addition to simply receiving voice donations, Mozilla has built a model where users will validate the recordings that are stored in the system. This process will help train speech-to-text capabilities.
All this will enable not just one or two but 10,000 hours of validated audio that will power tons of AI models in the coming future.
Notably, recordings received through the Common Voice initiative would be integrated into Firefox browser as well. But the main target is to provide a public resource.