Research Models & Releases·arXiv cs.CL·1d ago

TUDUM: A Turkish-Thinking Reasoning Pipeline for Qwen3.5-27B

Researchers have developed TUDUM, a fine-tuning pipeline that forces reasoning models to think in Turkish rather than defaulting to English-language internal scratchpads. Starting from Qwen3.5-27B, the approach applies supervised fine-tuning on 16K Turkish reasoning examples via LoRA, then reinforcement learning to lock in Turkish-language chain-of-thought behavior. This addresses a real gap in multilingual LLM reasoning: most thinking models translate non-English prompts into English internally, undermining transparency and potentially degrading reasoning quality for non-English speakers. The work signals growing attention to making reasoning traces themselves culturally and linguistically native, not just final outputs.

Modelwire context

Explainer

The key insight isn't just that models can be fine-tuned to reason in Turkish, but that most reasoning models actively suppress non-English internal thought regardless of input language. This reveals a hidden architectural bias: the model's scratchpad defaults to English even when the user and task are entirely non-English, potentially degrading both transparency and reasoning quality.

This work sits at the intersection of two recent threads in our coverage. The Graph-PRefLexOR paper from last week emphasized that reasoning chains must be traceable and inspectable to matter in high-stakes domains. TUDUM extends that logic to multilingual contexts: if the reasoning trace is opaque because it's in a language the user can't verify, interpretability collapses. Separately, YOMI-Bench exposed how character-level and morphological gaps persist in non-Latin scripts, suggesting that language-specific tuning hasn't solved structural multilingual challenges. TUDUM suggests the problem runs deeper than tokenization or vocabulary; it's about where the model's internal cognition actually happens.

If Qwen or another vendor ships a multilingual reasoning model where users can select internal reasoning language at inference time (not just fine-tuning), and performance holds across Turkish, Arabic, and Mandarin without separate checkpoints, that confirms this is a replicable pattern rather than a one-off fix. If it doesn't ship within 12 months, the finding likely stays academic.

Coverage we drew on

Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsQwen3.5-27B · TUDUM · Unsloth · LoRA · GRPO · Turkish

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.