The Alphabet Is Not a Font Setting: Script Conversion in Central Asian Localization

When a Turkic MT engine's headline feature is dual-script Uzbek support, it's telling you something: in this region, choosing the right alphabet is half the localization job. Here's why script conversion is a separate skill, and why Turkmen makes it harder than anyone budgets for.

There's a detail in this summer's Custom.MT and Tilmoch.ai integration that most people skimmed past. When the Uzbek engine's coverage was announced, the headline number wasn't just 35 million speakers — it was that the model handles Uzbek in both Cyrillic and Latin script. That parenthetical is the whole story. In Central Asia, the alphabet is not a rendering detail you flip at the end. It's a decision that sits upstream of everything, and it's the part of a localization project that clients most reliably fail to scope.

The regional context makes this urgent rather than academic. Central Asia is projected to grow 5.7 percent in 2025, outpacing China, and brands are noticing. Localization programs are following the money. But the money is arriving into a linguistic landscape where several major languages are mid-transition between two writing systems, and where the two systems are not interchangeable by find-and-replace.

Transliteration is a translation problem in disguise

Project managers tend to file script conversion under "engineering" — a Unicode mapping, a table, something a script can run overnight. For a language with a stable, one-to-one orthography, fine. For the Turkic languages of this region, it isn't one-to-one, and pretending it is produces text that a native reader clocks as broken within a sentence.

The problems are the boring, unglamorous kind that defeat automation. Loanwords that were spelled phonetically in Cyrillic have conventional Latin spellings that don't derive from the Cyrillic letters. Soft signs and hard signs in Cyrillic carry information that Latin encodes differently, or drops, or handles with a diacritic. Digraphs collide with letter sequences that mean something else. And because these languages absorbed Russian vocabulary through the Soviet period, you constantly hit the question of whether a given word should be re-Turkified in its Latin form or left in its Russified shape — which is not a spelling decision at all, it's an editorial and political one.

That is why Tilmoch treating dual-script Uzbek as a feature worth naming is the right instinct. They understood that "translate into Uzbek" is an incomplete instruction. The correct question is: which Uzbek, in which script, for which audience — the younger urban readers educated in Latin, or the older and institutional readership still comfortable in Cyrillic?

Turkmen makes this worse, and it's still not in the engine

Here's where I have to be the bearer of the usual news. Tilmoch's roadmap lists Turkmen alongside Kyrgyz and Tajik as coming — not live. Uzbek, Kazakh, and Karakalpak are in production inside Trados, memoQ, XTM, Smartcat, and the rest. Turkmen is not. So the tooling that could at least propose a script-aware draft for its neighbors does nothing for Turkmen today. The work is still fully human, and the script question lands entirely on the linguist.

And Turkmen's script history is genuinely messy. The Latin alphabet adopted in 1993 was revised within a few years — the first version used some eccentric letter choices that were quietly abandoned — before settling into the current inventory with its ä, ö, ü, ň, ş, ç, ž, and ý, plus the habit of writing the /v/ sound as w. Cyrillic never disappeared. Older documents, a good deal of official and archival material, and many older readers still live in it. So a Turkmen project can require not just English-to-Turkmen translation but a judgment call on target script, and sometimes a conversion of legacy Cyrillic source material into modern Latin as a preprocessing step — a step that, done carelessly, silently corrupts the very source you're translating from.

None of this shows up in a word count. That's the trap. A client sends a Turkmen job scoped as N words, and the actual deliverable involves a script decision, possibly a source-side conversion, and a consistency pass to make sure the diacritics survived every tool in the chain. QA checkers, for their part, don't understand any of it and will happily wave through mojibake as long as the character count matches.

What to actually do about it

If you're a PM commissioning Turkmen or wider Central Asian work, three concrete things. First, make script an explicit field in the brief, not an assumption — name the target script and the target audience, and don't let "Latin" and "Cyrillic" be treated as the same content in two skins. Second, treat any Cyrillic-to-Latin source conversion as its own line item with its own review, because it's editorial work, not a batch job. Third, when you do get access to a regional MT engine for a neighboring language, remember that script-correct output is a real advantage worth paying for — and that its absence for Turkmen means budgeting for the human who does that reasoning by hand.

The growth in this region is real and the localization demand is following it. But the market rewards vendors who understand that in Central Asia the first question isn't what does it say — it's what letters do we say it in.