Published on 31 July 2025, this Tech.eu article highlights how many large language models (LLMs) — such as those underpinning major AI tools — are heavily biased towards major Western languages like English, German and French, while Baltic and Eastern European languages (Latvian, Lithuanian, Polish, Czech etc.) are largely underserved. The article notes that although Europe has 24 official EU languages and over 80 spoken languages overall, the dominant models focus almost entirely on the major ones.
It introduces the Latvian initiative TildeLM, an open‐source large language model (30 billion parameters) developed for and by Baltic/Eastern European languages. The project benefits from EU-supercomputing resources and aims to deploy secure, customizable, multilingual AI that works for these less-represented languages, enhances digital sovereignty and supports translation/localisation demands.
Key quotes include: “The models often make basic mistakes … in languages with gendered cases or flexible word order like Latvian, Polish or Russian.” The article argues that language technology equality is a digital justice issue and that if European languages outside the major ones are ignored, then multilingualism suffers not socially only, but technically.
This article is richly relevant to our themes of multilingualism, language policy, identity and professional practice in translation/localisation. Here are some reflections:
- It brings the language-technology angle strongly into multilingualism: it’s not just about teaching languages, official recognition or sociolinguistic rights — it’s also about whether our digital tools work for every language. If your language is excluded from AI, data and services, then even though you speak it you may be excluded from the digital economy.
- The TildeLM initiative is a great example of language empowerment via technology. A small-language community (Baltic/Eastern Europe) is building infrastructure to redress the imbalance. That suggests a model for other minority/regional languages in Europe: if institutional recognition is slow, technological investment may still provide a path forward.
- For translation, localisation and professionalising modules: the article flags likely growth areas — e.g., terminological corpora, localised AI assistants, on-premises deployment of language models for smaller languages. If you are involved in training translators, localisers, or AI language services, this is a very hot field.
- From a policy perspective: this highlights the gap between official status (for example in the EU) and digital status. A language may be officially recognised, but if it lacks large digital corpora or AI support, it still remains at a disadvantage. The interplay of multilingual policy and digital infrastructure becomes vital.
- The sovereign-data angle is also noteworthy: the article mentions that many models are US/China‐based; deploying local models supports national or regional digital sovereignty. That means language policy merges with techno-policy.
- Finally, for Europe’s multilingual identity: this shows that the future of multilingualism isn’t only in schools or constitutions, but also in code, data, algorithms and infrastructure. To really respect linguistic diversity, you must respect technological inclusion too.
- In your language or region, do you think AI tools (translation, voice assistants, chatbots) currently support your language well? If not, what are the most obvious gaps?
- Would you prioritise building language-technology infrastructure (LLMs, corpora, localisation) for smaller/regional languages — or would you first focus on teaching and institutional recognition? How do you rank those priorities?
- What role should the EU or national governments play in ensuring digital equality for minority/less-used languages? Should they fund open-source LLMs? Mandate language-model coverage? Support local training-data creation?
- For professionals in translation/localisation: do you see this article as signalling new job-markets (for smaller languages, for AI integration) or simply highlighting continuing disadvantage? What actions would you suggest to capitalise on this field?
Happy to hear how you see the intersection of language, technology and policy — your thoughts matter!
