Howard And Google Release New Dataset To Help AI Understand African American English

Howard University and Google Research have released a dataset comprising over 600 hours of AAE dialects from 32 states to enhance AI’s recognition of diverse Black dialects.
As part of Project Elevate Black Voices, researchers traveled across the US to capture commonly used speech patterns in Black communities that AI systems often overlook. The project aims to enhance the way Black people interact with technology.
Black dialects being ignored by artificial intelligence
African American English (AAE)—also known as African American Vernacular, Black English, or Black talk—is widely spoken in Black communities and rooted in rich cultural history. However, bias against these dialects in AI development often leads to inaccurate responses when Black users issue voice commands.
Research, including studies by Google, shows that Black Americans consistently experience poorer results with automatic speech recognition (ASR) than white users. As a result, many many Black people alter the way they speak so AI systems can understand them.
“African American English has been at the forefront of United States culture since almost the beginning of the country,” Gloria Washington, Ph.D, a Howard University researcher and co-principal investigator of Project Elevate Black Voices, said in a press release. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects.”
How did the project collect data?
Researchers spent 600 hours collecting data from different AAE dialects from thirty-two states. They identified that there is a lack of natural AAE speech found within speech data, as Black users have been taught to alter their voices when using ASR-based technology. The current data that is available is complex to use due to code-switching.
The ownership of the dataset and licensing will remain with Howard University, serving as a step to ensure that the data benefits Black communities. Google will also be able to use the data to enhance its products, enabling more people to utilize its tools.
Image: Nubelson Fernandes