The Terminology Vacuum: Localizing Software Into a Language Without Settled Words

Turkmen has no consolidated IT lexicon, so every UI string forces a choice between Russian habit, Turkish borrowing, English carry-over, or a neologism. Here's how I make those calls — and why a project glossary is the real deliverable.

Most discussions about Turkmen localization stop at "there's a talent shortage." True, but it misses the harder, more interesting problem underneath: even when you have a competent linguist, the language itself often hasn't decided what to call the thing on the screen. There is no Académie, no widely-adopted national IT dictionary, no equivalent of the term banks that Turkish or Russian translators lean on. When I localize a settings menu or an API error string into Turkmen, I am frequently not finding the term — I'm deciding what it will be.

That's a different kind of work, and project managers should understand it, because it changes what you're actually paying for and what you should expect back.

Four directions every term can pull

For any given technical concept, Turkmen offers competing instincts, and they rarely agree.

The Russian habit. Two generations of Turkmen speakers learned computing in Russian. "Файл," "папка," "сервер" live in people's mouths. A literal carry-over feels natural to older office users and instantly dated to younger, post-independence ones who associate it with the Soviet past. It's the path of least resistance and often the wrong one for a brand that wants to read as modern.

The Turkish borrowing. Turkmen and Turkish are close cousins, and Turkish has done enormous lexical engineering — yazılım for software, uygulama for application, çevrimiçi for online. The temptation is to import wholesale. Sometimes it works because the roots are shared; often it produces something that reads as foreign, because Turkish neologisms followed Turkish phonology and morphology, not Turkmen.

The English carry-over. "Login," "online," "download" travel through global products and increasingly appear untranslated. For a developer audience this can be the safest, least ambiguous choice. For a consumer app aimed at a broad audience it can alienate.

The native neologism. Turkmen has perfectly good native roots to build from — and sometimes the cleanest answer is to coin a transparent compound. But coin too freely and you produce a translation no real user recognizes, which is its own failure.

The point is that none of these is automatically correct. "Settings" can defensibly become a Russian-flavored term, a Turkish-flavored one, or a native compound, and the right answer depends entirely on the product, the audience, and — critically — internal consistency. Which brings me to the part that actually matters.

The glossary is the deliverable, not a byproduct

When there is no external authority to defer to, the worst outcome is inconsistency: "save" rendered three different ways across three screens because three decisions were made independently, or worse, by three freelancers. In a language with settled terminology this is sloppiness. In Turkmen it's almost guaranteed unless someone fights it deliberately.

So on any Turkmen software project of real size, I treat the bilingual glossary as the first and most valuable thing I produce — before the bulk of strings are touched. I lock down the high-frequency UI vocabulary, document why each choice was made (audience, register, the alternatives rejected), and run the rest of the translation against it. That documentation is what lets a second linguist, a reviewer, or next year's update maintain coherence instead of relitigating every term.

This is also where I'd push back on a common agency assumption: that Turkmen can be handled like any other target in a high-volume, distribute-and-merge workflow. It can't, not safely. A 40,000-word job split across four random Turkmen freelancers without a shared, enforced glossary will come back as four dialects of one product. The scarcity of linguists that everyone complains about is real, but the coordination problem is the one that actually wrecks deliverables.

What this means for machine translation

There's understandable pressure to throw MT at low-resource languages now that model coverage has widened. For Turkmen, MT post-editing is genuinely useful for gist and for some structured, repetitive content. But it is actively dangerous for terminology, precisely because the engine has scraped the same inconsistent web that created the vacuum. It will happily give you the Russian carry-over in one segment and an English term in the next, because both appear in its training data and neither has been crowned. The model has no opinion, and the absence of an opinion is the whole problem.

So the human value isn't "fixing fluency" — the output is often passably fluent. It's enforcing a decision the machine is structurally incapable of making: this product calls it this, everywhere, for this reason. If you're scoping a Turkmen MTPE project, budget for terminology governance as a distinct line item, not as something folded into a discounted per-word rate. Otherwise you're paying for fast inconsistency.

The broader lesson for anyone localizing into thin-resource Turkic languages: the work shifts from selection to standardization. You're not buying a translation of existing words; you're commissioning a small piece of the language's technical vocabulary, and then asking that it be applied with discipline. Treat that as the actual scope, and Turkmen localization stops being a gamble and starts being a controllable, repeatable process.