Terjemahan: Why Machine Translation is Reshaping Global Information Access

Every second, someone searches terjemahan—the Indonesian word for "translation." Multiplied globally across hundreds of languages, translation queries exceed 500 million monthly searches. Yet this staggering demand masks a deeper reality: terjemahan and machine translation systems reveal how technology amplifies inequality between languages.

The question isn't whether machine translation works—it does. Google Translate handles 143 billion translations yearly. DeepL, Claude, and ChatGPT have made professional-quality translation accessible to billions. The question is whose languages get translated well, and what happens to everyone else.

The Translation Demand Explosion

Terjemahan searches concentrate in Indonesia, but the pattern repeats globally. In India, "translation" ranks among the top 50 most-searched terms. In Nigeria, translation queries spike around education and job applications. In Vietnam, Mexico, Brazil—across the Global South—translation search volume reflects a fundamental asymmetry: English dominates digital spaces, but most people don't speak it natively.

This creates a paradox:

English speakers need translation less: Only 15% of native English speakers speak a second language fluently, yet 70% of the internet's original content is in English
Non-English speakers need it constantly: 1.5 billion people use English as a second language; billions more don't speak it at all
Translation volume is inversely proportional to power: The languages with highest translation demand (Indonesian, Hindi, Vietnamese, Portuguese, Arabic) have the least representation in AI training data

Why Machine Translation Quality Fractures Along Economic Lines

Google Translate's neural network was trained on internet-scraped bilingual text. This creates systematic bias: language pairs used frequently in academic publishing and tech documentation (English-German, English-French, English-Japanese) achieve 85-95% accuracy. But English-Tagalog achieves 60-70%. English-Amharic? Often under 50%.

The data shows this explicitly:

Language Pair	Typical Accuracy	Economic Status
English-German	90-95%	High-income Europe
English-Spanish	85-92%	Upper-middle income, large economy
English-Hindi	70-78%	Lower-middle income
English-Yoruba	45-60%	Low-income West Africa
English-Nepali	55-70%	Low-income South Asia

Why? Because training data reflects power: German manufacturing and engineering documentation is digitized, standardized, and publicly available. Yoruba exists primarily in speech, informal markets, and oral tradition. Machine learning can only learn from what's already digitized.

The Economics of Underserved Languages

Terjemahan searches in Indonesia don't just reflect need—they reflect market failure. Indonesia has 270 million people, 700+ languages, and a growing digital economy. But commercial translation services charge $0.10-0.30 per word. A 5,000-word business document costs $500-1,500—prohibitive for small businesses in Southeast Asia.

This creates a two-tier system:

High-resource languages (English, Mandarin, Spanish): Professional translation, automated systems, and AI all work well. Cost per word: $0.05-0.20
Low-resource languages: Only machine translation available, quality inconsistent, no professional ecosystem. Cost: free but unreliable

A Malaysian startup needs to translate product documentation into Indonesian. Google Translate: free but potentially embarrassing. Professional translator: $2,000. Reality: most choose Google Translate, errors propagate, and the user experience suffers.

How Translation Shapes Information Access

This isn't abstract. Translation quality determines who accesses information and knowledge.

Education: A student in Vietnam wants to learn advanced machine learning. Most cutting-edge research publishes in English. DeepL can translate it, but technical terminology gets lost. The student must switch to English halfway through—a cognitive tax that an English-speaking peer doesn't pay.

Health: During COVID-19, vaccine misinformation spread faster in low-resource languages because accurate medical information wasn't available in those languages. People relied on WhatsApp rumors because professional health content hadn't been translated. Machine translation of technical medical terms is poor—"thrombosis" doesn't translate precisely into every language.

Law and Justice: Contract translation errors cost businesses millions. A Tanzanian firm signs a Chinese deal based on machine-translated Mandarin. Hidden clauses misinterpreted. The firm has no recourse because professional translators cost more than their profit margin.

Economic participation: E-commerce platforms must support local languages to reach emerging markets. But poor translation of product descriptions, reviews, and terms of service creates friction. Alibaba invests heavily in Chinese-English translation; smaller platforms skip Indonesian, Swahili, or Bengali support because the ROI is unclear.

The Training Data Problem and Its Solutions

Why can't AI just learn to translate better? Because the problem is structural, not technical.

Machine learning models require labeled examples. For English-German translation, millions of parallel texts exist: European Union documents translated into both languages, published books, academic papers, technical manuals. For English-Somali? Maybe 50,000 total parallel sentences online.

Some solutions emerging:

Unsupervised translation: Models like Meta's M2M-100 translate between 100 languages without English as intermediary, reducing error propagation
Community-driven data: Projects like Opus MT crowdsource translations for low-resource pairs
Synthetic data generation: AI generates training examples when human translations don't exist (risky, but better than nothing)
Transfer learning: Skills from high-resource pairs (English-Spanish) transfer partially to low-resource pairs (Spanish-Catalan)

But these are patches. The core problem persists: languages with less digital infrastructure get worse translation, which means less digital content creation in those languages, which means even less training data.

So What: Implications Across Sectors

For businesses in emerging markets: Translation quality is now a competitive disadvantage. Companies must either invest in professional translation (costly) or accept the friction of machine translation errors, which reduces customer trust and market reach.

For governments and development organizations: Language access is now a digital equity issue. If vaccination campaigns, legal information, or financial literacy exist only in English or Mandarin, billions of people are excluded. Investing in translation infrastructure becomes development policy.

For AI companies: The economic incentive to improve low-resource language translation is weak. These markets are smaller, less profitable. But the social cost is high—translation inequality becomes another mechanism of global inequality.

For speakers of low-resource languages: Terjemahan will remain imperfect. The burden falls on them to learn English, to accept poor translations, or to be excluded from digital knowledge entirely.

The 500 million translation searches monthly aren't just people looking for tools. They're requests for access—to information, opportunity, and participation in a digital world that was built in English and Mandarin, and is slowly being translated, unevenly, into everything else.

Everything in Perspective