Rethinking AI from the Ground Up: Lessons from EACL 2026

Komlan

19 May 2026

What if the biggest limitation in today’s AI systems isn’t the model, but the quality of the data we feed into it?

That question sat at the center of many conversations at EACL 2026, and it reflects a broader shift happening across the NLP community. For years, progress has been driven by scaling models – more parameters, more compute, more data. But at EACL this year, the focus was noticeably different. The conversation is changing. It’s becoming more grounded, more practical, and more concerned with how these systems actually perform in the real world.

Nowhere is this shift more important than in African contexts, where gaps in data, evaluation, and representation continue to shape how , and how well , AI systems work.

Moving Beyond Western-Centric Evaluation

As part of this conversation, we presented AfriStereo, a benchmark dataset designed to evaluate bias in large language models within African contexts.

Most existing benchmarks for bias are built around Western social and cultural assumptions. That creates a blind spot. Models may appear “fair” by standard metrics – meaning they avoid well-documented biases in Western contexts, while still reproducing harmful or inaccurate patterns when applied in different regions.

AfriStereo is an attempt to close that gap. It introduces evaluation scenarios that are grounded in African realities – capturing how bias and stereotypes actually manifest across different cultures and languages. The goal is not just to measure bias, but to rethink what meaningful evaluation looks like in a truly global setting.

A Strong Signal from the AfricaNLP Community

These ideas were echoed strongly at the AfricaNLP Workshop, which brought together researchers focused on African languages and low-resource NLP.

One thing became clear very quickly: the challenge is not just about building better models, it’s about building the right foundations.

There is still a significant gap in:

* High-quality datasets for many African languages
* Benchmarks that reflect real-world usage
* Evaluation frameworks that account for cultural and linguistic nuance

Without these, it becomes difficult to even measure progress accurately. A model that performs well on standard benchmarks might still fail in practice, simply because those benchmarks don’t reflect the environments where the model is deployed.
What’s emerging is a coordinated push toward African-centric benchmarks and more localized approaches to annotation and evaluation. Not as an edge case, but as a necessary step toward building systems that actually work.

The Bigger Shift: From Models to Data

Stepping back, one of the clearest trends at EACL 2026 is the move from model-centric to data-centric AI.

The key question is no longer just:

> “How do we build better models?” It’s increasingly:

> “How do we build the right data – and evaluate it properly?”

This shift shows up in different ways. There’s growing interest in cost-aware annotation, smarter data selection, and evaluation under real-world constraints. There’s also increasing attention on issues like tokenization bias, where some languages require more tokens to express the same meaning, leading to higher costs and lower performance. For example, a sentence in English like “I am going home” might be represented in fewer tokens than the equivalent in a language like Yorùbá or Amharic, where morphology and word structure often cause the same meaning to be split into more subword tokens, increasing computational cost for the model., In effect, parts of the system are structurally disadvantageous to certain languages.

All of this points to the same conclusion: improving AI systems, especially in low-resource settings, requires rethinking the entire pipeline – not just the model at the end of it.

From Research to Real Systems

A lot of these ideas are coming out of academia, which continues to play a central role in pushing the field forward. But there’s also a growing need to translate these insights into tools and systems that can be used in practice.

That’s where infrastructure becomes critical.

At kitala.ai, we’re building toward exactly this need: a platform that supports culturally grounded annotation and evaluation, particularly for multilingual and low-resource environments.

Rather than treating annotation, data quality, and benchmarking as separate steps, the goal is to bring them together into a single system, one that reflects how models are actually developed and deployed. This includes enabling more localized annotation workflows, improving data quality, and supporting the creation of benchmarks that are relevant to specific contexts.
It’s a small but important part of a larger shift: moving from abstract evaluation toward systems that are grounded in real-world use.

Where This Leaves Us

If there’s one takeaway from EACL 2026, it’s this: the future of AI will be shaped less by how large models become, and more by how well we understand and structure the data behind them.

For Africa, this creates both a challenge and an opportunity. The challenge is clear , there are still significant gaps in data, evaluation, and infrastructure. But the opportunity is just as significant: to build systems that are not only technically strong, but also contextually relevant from the ground up.

The direction is becoming clearer. The question now is how quickly we can build toward it.

Rethinking AI from the Ground Up: Lessons from EACL 2026

Stay up to date!

Meet our team in