How Mozilla Is Crowdsourcing Speech To Diversify Voice Recognition

To solve this, Mozilla, a free software community, created “Common Voice” in 2017, a tool that crowdsources voices as a dataset to diversify AI and represent the global population, not just the west. Common Voice works by releasing its growing dataset publicly so any company can use it for research and to build and train their own voice-enabled applications, ultimately working to improve vocal recognition for all regardless of language, gender, age, or accent. Currently, there’s more than 2,400 hours of voice data and 29 languages represented — including English, French, German,Traditional Mandarin Chinese, Welsh, and Kabyle. “Existing speech recognition services are only available in languages that are financially profitable,” Kelly Davis, the Head of Machine Learning, at Mozilla told TNW. “They also tend to work better for men than women and struggle to understand people with different accents, all of which are a result of biases within the data on which they are trained.” At the start of 2018, Google announced Hindi support for its voice assistant, but the feature was limited to just a few queries. A few months after the initial release, the tech giant updated its feature so the Google Assistant can now have a conversation in Hindi — the third most spoken language in the world. “Largely, the efforts to address the race gap in AI have fallen on non-corporate hands,” Davis said. For example, Black In AI, a project creating ways to increase the representation of people of color in AI, was launched by ex-Googlers in 2017. However, it didn’t launch as an official extension of their company’s work, it was launched to address what they saw as a need in the community. In April, a study by New York University’s AI Now Institute found that a lack of diverse representation at major technology companies such as Microsoft, Google, and Facebook causes AI to cater more readily to white men. The report highlighted that only 15 percent of Facebook’s AI staff are women, and the problem is even more substantial at Google where just 10 percent are female. Davis argues only a fraction of people benefit from vocal recognition technology. “Think about how speech recognition could be used by minority language speakers to enable more people to have access to technology and the services the internet can provide, even if they never learned how to read?” Davis said. “The same is true for visually impaired or physically handicapped people, but regular market forces will not help them.” Common Voice hopes to speed up the process of collecting data in all languages around the globe, regardless of accent, gender, or age. “By making this data available – and developing a speech recognition engine in the open, project Deep Speech – we can empower entrepreneurs and communities to address existing gaps on their own,” Davis added. Anyone can help diversify the vocal recognition in Mozilla’s project. Just head over to Common Voice and record yourself reading out sentences, or listen to others recordings and verify if they’re accurate. It’s projects like this that’ll eventually close the racial gap in AI, and large corporations should take note.