Africa’s AI ambitions stunted by data scarcity

When requesting information with ChatGPT about African topics, the responses may lack depth and accuracy due to the limited training data on African subjects available to AI models.

To overcome this data deficiency, it is essential for African nations to compile and share their datasets online, as discussed by experts at a recent event.

Many Africans rely on large language models (LLMs) developed by global organizations, but these models are trained on inadequate African data, with only a small percentage representing African cultures, such as healthcare data.

Sharing African datasets online faces challenges, including the scarcity of documentation for certain languages and the financial burden on individuals in poverty-stricken regions. Additionally, low digital literacy rates across Africa present a significant obstacle. Suggestions to address these issues include utilizing voice-to-text technology for data documentation.

Some African startups have already begun implementing solutions like Intron Health, a Nigerian AI company, which enables medical professionals to input records through speech-to-text conversion.

Startups are employing agents to collect voice data across Africa, but capturing diverse African speech patterns, which often include pidgin or local languages like Yoruba, presents unique challenges. Suggestions include incorporating code-switching in AI models to account for these variations.

Experts emphasize the importance of global partnerships to enhance the documentation of African languages and cultures. Collaborations in infrastructure and skills development are seen as crucial for improving data collection and fostering AI adoption on the continent.

Get the best African tech newsletters in your inbox