Models & Releases Tools & Code·Simon Willison·8h ago

Quoting Georgi Gerganov

Georgi Gerganov, a core maintainer at ggml-org, reports sustained real-world success running Qwen3.6-27B for coding assistance on consumer hardware (M2 Ultra, RTX 5090). His month-plus deployment demonstrates that mid-size open models now deliver practical utility for routine development tasks without requiring cloud infrastructure, shifting the economics of AI-assisted coding toward local inference. This validates a growing trend where developers prioritize on-device models over API-dependent alternatives, reshaping expectations around model size, latency, and privacy for professional workflows.

Modelwire context

Analyst take

The signal here isn't that local inference works in theory, it's that a core infrastructure maintainer (someone who builds the tooling others depend on) has committed to it as a primary workflow for over a month. That's a practitioner endorsement with skin in the game, which carries different weight than a benchmark post.

This is largely disconnected from recent activity in our archive, as we have no prior coverage to anchor it to. It belongs to a broader conversation happening across developer communities about the crossover point where open, locally-run models become good enough to displace API calls for routine tasks. The relevant context is the rapid capability improvement in the 20B-30B parameter range, where models are now large enough to handle complex code but small enough to fit on prosumer hardware. Gerganov's position at ggml-org makes this particularly notable: the people maintaining the inference stack are now also the people validating it in production.

Watch whether other maintainers or prominent open-source contributors publicly shift their primary coding workflows to local models in the next 60 days. If that pattern holds, it signals a tipping point in developer trust, not just hardware capability.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGeorgi Gerganov · Qwen3.6-27B · ggml-org · Simon Willison · M2 Ultra · RTX 5090

Read full story at Simon Willison →(simonwillison.net)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on simonwillison.net. If you’re a publisher and want a different summarization policy for your work, see our takedown page.