New projects to promote locally developed language datasets across Africa

Funding recipients will produce training datasets in Eastern, Western, and Southern Africa that will support a range of needs for low-resourced languages. These needs include machine translation, speech recognition, named-entity recognition, part-of-speech tagging, sentiment analysis, and multi-modal datasets. All the datasets will be locally developed and owned, and will be openly accessible to the international data community.
“In South Africa, the government uses chatbots to provide daily updates on COVID,” explained Vukosi Marivate, Absa Chair of Data Science at the University of Pretoria. “Right now, translating those updates to Latin languages is really easy, but the datasets necessary to translate those updates to a range of African languages don’t exist, which means that the government isn’t currently able to communicate with many of its people in their native languages. That is one of the many examples of why we need this work now,” he said.
For more information on the selected projects, please visit http://www.lacunafund.org.
About the Lacuna Fund:
The Lacuna Fund is the world’s first collaborative effort to provide data scientists, researchers, and social entrepreneurs in low- and middle-income contexts with the resources they need to produce training datasets that address urgent problems in their communities. It began as a collaborative between The Rockefeller Foundation, Google.org, and IDRC, with support from the German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development (BMZ). It has since evolved into a multi-stakeholder engagement composed of technical experts, thought leaders, local beneficiaries, and end users. The Lacuna Fund is committed to creating and mobilizing training datasets that solve urgent local problems and lead to a step change in machine learning’s potential worldwide.