Global AI may speak English, but the future of intelligence will be multilingual. While ChatGPT, Gemini, and Claude dominate global conversations, a quieter revolution is taking place across Southeast Asia. From Kuala Lumpur to Hanoi and Jakarta, researchers, startups, and governments are training their own large language models – systems that understand local dialects, cultural nuances, and national histories.
While Russia and the UAE invest billions into centralized, state-backed AI giants, Southeast Asia is building something different: a distributed, collaborative model of innovation where universities, telecoms, and fintech companies work together to create local intelligence.
Here’s how – and why it matters for the region’s digital future.
Global models, local blind spots
The most well-known LLM-based services today – ChatGPT, Gemini, Claude, and Perplexity – have become indispensable tools for millions of people and businesses. However, these global systems often fail to account for regional realities. They may overlook local dialects, produce culturally inappropriate responses (for example, suggesting an alcohol party in a Muslim country), ignore national laws relevant to AI agents, or simply lack awareness of local history and education systems.
As a result, many countries are now developing their own LLMs, both through startup initiatives and government programs. Governments, in particular, want to store model weights and computing resources within national borders so that public institutions can use AI safely without depending on foreign infrastructure.
Two ways to build a national LLM
There are currently two main paths for developing national large language models.
In countries such as Russia, China, and the UAE, national LLMs are typically created by state-owned corporations or major banks. These projects often become centralized structures with a single chain of command, unified budgets, and clear PR goals around technological sovereignty and import substitution.
Examples include GigaChat (Sber, Russia), Ernie (Baidu, China), and Falcon (Technology Innovation Institute, Abu Dhabi).
The Southeast Asian model: open, distributed, collaborative
Southeast Asia, however, has its own unique conditions that shape a different approach to LLM development:
These factors encourage the creation of consortia where LLM development is shared between universities, IT and fintech companies, and national innovation agencies.
Roles are usually divided like this:
For example, in Malaysia, the MADANI AI strategy allows startups to access GPU resources hosted at universities through national innovation programs.
Open and semi-open LLMs
In many consortium-based projects, models are released openly, but the groups later monetize access through paid APIs or cloud inference services. Even open models need substantial compute power to run, which most independent developers and startups can’t afford.
Examples of open or semi-open regional LLMs include:
SEA-LION (Singapore) – a family of multilingual open models for Southeast Asian languages, developed through collaboration between AI Singapore, national universities, and GovTech.
Sahabat-AI (Indonesia) – a partnership between the telecom giant Indosat Ooredoo Hutchison, the GoTo technology group, startup incubators, and Indonesian universities.
Other notable projects are MaLLaM (Malaysia), PhoGPT and Viet-Mistral/Vi-VLM (Vietnam), Typhoon and OpenThaiGPT (Thailand), and Khmer LLM (Cambodia).
In the Philippines, Myanmar, and Laos, local teams are also building benchmarks and corpora for regional dialects.
Hybrid corporate–academic models
Another path – the hybrid model – emerges when collaboration between a university and a major corporation evolves into a national project. These models are usually not open-source (similar to proprietary systems like ChatGPT, Gemini, or Claude), but corporate infrastructure provides computing power far beyond what startups can afford. This results in higher quality and more reliable systems.
A strong example is Malaysia’s ILMU, a multimodal LLM developed by YTL AI Labs in collaboration with the University of Malaya. The system supports Malay dialects, is hosted and managed locally, and already offers a public API for businesses. Because YTL Power owns extensive energy and data-center infrastructure – including a new AI supercomputer built with NVIDIA – this approach avoids many of the common issues that small startups face, such as undertraining and limited scalability.
According to the developers, ILMU outperforms other models on Malay MMLU and similar benchmarks, while being trained entirely on national datasets.
Projects like ILMU show how hybrid development can combine academic expertise with corporate resources. This model allows countries with limited compute capacity to reach performance levels close to global LLMs while keeping control over their own data and infrastructure.
Why it matters
Beyond the technical success, regional LLMs represent something much bigger – a new form of digital sovereignty. In a world where AI increasingly shapes how people learn, communicate, and make decisions, having a locally trained model means more than owning hardware or datasets. It means protecting cultural nuance, linguistic diversity, and local identity.
The Southeast Asian experience shows that innovation doesn’t depend only on massive budgets or Silicon Valley infrastructure. It thrives on collaboration – when universities, startups, and governments work together to build intelligence that speaks the language of their people.
Regional LLMs mark a quiet but powerful shift: from consuming global AI to creating local intelligence.
While Russia and the UAE invest billions into centralized, state-backed AI giants, Southeast Asia is building something different: a distributed, collaborative model of innovation where universities, telecoms, and fintech companies work together to create local intelligence.
Here’s how – and why it matters for the region’s digital future.
Global models, local blind spots
The most well-known LLM-based services today – ChatGPT, Gemini, Claude, and Perplexity – have become indispensable tools for millions of people and businesses. However, these global systems often fail to account for regional realities. They may overlook local dialects, produce culturally inappropriate responses (for example, suggesting an alcohol party in a Muslim country), ignore national laws relevant to AI agents, or simply lack awareness of local history and education systems.
As a result, many countries are now developing their own LLMs, both through startup initiatives and government programs. Governments, in particular, want to store model weights and computing resources within national borders so that public institutions can use AI safely without depending on foreign infrastructure.
Two ways to build a national LLM
There are currently two main paths for developing national large language models.
In countries such as Russia, China, and the UAE, national LLMs are typically created by state-owned corporations or major banks. These projects often become centralized structures with a single chain of command, unified budgets, and clear PR goals around technological sovereignty and import substitution.
Examples include GigaChat (Sber, Russia), Ernie (Baidu, China), and Falcon (Technology Innovation Institute, Abu Dhabi).
The Southeast Asian model: open, distributed, collaborative
Southeast Asia, however, has its own unique conditions that shape a different approach to LLM development:
- smaller national budgets and limited resources
- high electricity and cooling costs
- a long tradition of flexible collaboration between universities and industry
- a shortage of large data centers with thousands of GPUs
These factors encourage the creation of consortia where LLM development is shared between universities, IT and fintech companies, and national innovation agencies.
Roles are usually divided like this:
- universities contribute researchers, labeled datasets, and GPU time from academic grants
- businesses (telecom, fintech, e-commerce) provide real data and use cases
- governments build the policy and infrastructure layer but don’t own the model directly
For example, in Malaysia, the MADANI AI strategy allows startups to access GPU resources hosted at universities through national innovation programs.
Open and semi-open LLMs
In many consortium-based projects, models are released openly, but the groups later monetize access through paid APIs or cloud inference services. Even open models need substantial compute power to run, which most independent developers and startups can’t afford.
Examples of open or semi-open regional LLMs include:
SEA-LION (Singapore) – a family of multilingual open models for Southeast Asian languages, developed through collaboration between AI Singapore, national universities, and GovTech.
Sahabat-AI (Indonesia) – a partnership between the telecom giant Indosat Ooredoo Hutchison, the GoTo technology group, startup incubators, and Indonesian universities.
Other notable projects are MaLLaM (Malaysia), PhoGPT and Viet-Mistral/Vi-VLM (Vietnam), Typhoon and OpenThaiGPT (Thailand), and Khmer LLM (Cambodia).
In the Philippines, Myanmar, and Laos, local teams are also building benchmarks and corpora for regional dialects.
Hybrid corporate–academic models
Another path – the hybrid model – emerges when collaboration between a university and a major corporation evolves into a national project. These models are usually not open-source (similar to proprietary systems like ChatGPT, Gemini, or Claude), but corporate infrastructure provides computing power far beyond what startups can afford. This results in higher quality and more reliable systems.
A strong example is Malaysia’s ILMU, a multimodal LLM developed by YTL AI Labs in collaboration with the University of Malaya. The system supports Malay dialects, is hosted and managed locally, and already offers a public API for businesses. Because YTL Power owns extensive energy and data-center infrastructure – including a new AI supercomputer built with NVIDIA – this approach avoids many of the common issues that small startups face, such as undertraining and limited scalability.
According to the developers, ILMU outperforms other models on Malay MMLU and similar benchmarks, while being trained entirely on national datasets.
Projects like ILMU show how hybrid development can combine academic expertise with corporate resources. This model allows countries with limited compute capacity to reach performance levels close to global LLMs while keeping control over their own data and infrastructure.
Why it matters
Beyond the technical success, regional LLMs represent something much bigger – a new form of digital sovereignty. In a world where AI increasingly shapes how people learn, communicate, and make decisions, having a locally trained model means more than owning hardware or datasets. It means protecting cultural nuance, linguistic diversity, and local identity.
The Southeast Asian experience shows that innovation doesn’t depend only on massive budgets or Silicon Valley infrastructure. It thrives on collaboration – when universities, startups, and governments work together to build intelligence that speaks the language of their people.
Regional LLMs mark a quiet but powerful shift: from consuming global AI to creating local intelligence.