Research Models & Releases·arXiv cs.CL·Apr 22

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

Researchers propose Parallel-SFT, a supervised fine-tuning strategy that enables Llama-3.1 to transfer coding skills across programming languages without degradation. Standard RL training on code tasks in one language actually hurts performance on others; the new method uses generalized SFT initialization to unlock zero-shot cross-language transfer.

Modelwire context

Explainer

The more important finding here isn't the solution but the diagnosis: standard reinforcement learning on code tasks creates a kind of language-specific overfitting that erodes whatever multilingual coding competence the base model already had. Parallel-SFT is an initialization strategy designed to prevent that regression, not a new training objective.

This connects directly to the generalization work we covered in 'Generalization in LLM Problem Solving: The Case of the Shortest Path' (April 16), where models showed strong transfer within a domain but broke down when problem structure changed slightly. The pattern is consistent: RL-style optimization tends to exploit the specific distribution it trains on rather than reinforcing general reasoning. What's different here is the domain is programming languages rather than spatial planning, and the researchers have a concrete mitigation rather than just a diagnosis. The competitive context from 'OpenAI's big Codex update' (April 16) is also relevant — if coding agents are going to operate across polyglot codebases, cross-language transfer isn't academic.

Watch whether the Parallel-SFT initialization approach holds when applied to larger RL runs or to models already fine-tuned on multilingual code corpora. If the gains collapse at scale, the method is solving a problem specific to Llama-3.1's base training distribution rather than a general RL failure mode.

Coverage we drew on

Generalization in LLM Problem Solving: The Case of the Shortest Path · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLlama-3.1 · Parallel-SFT

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.