Stories from Simon Willison

Executives building billion-dollar AI strategies without using AI tools

Nik Suresh documents a pattern of AI-driven decision-making divorced from actual product knowledge or technical grounding at large enterprises. The piece surfaces a structural risk in corporate strategy: executives are architecting billion-dollar AI initiatives without hands-on familiarity with the tools themselves, suggesting organizational momentum is outpacing competence. This reflects a broader landscape shift where AI adoption has become a business imperative independent of strategic clarity, creating misalignment between technical execution and leadership vision across Fortune 500 firms.

Simon Willison·2h ago

77

Tools & Code Products & Apps

Anthropic switches Claude Code to Rust-based Bun runtime

Anthropic's Claude Code now runs on a Rust-based port of Bun, the JavaScript runtime, marking a quiet infrastructure shift in the AI tooling ecosystem. The rewrite delivered a 10% startup speedup on Linux, though most users saw no visible change. This reflects a broader trend of AI companies optimizing developer-facing tools through systems-level rewrites, trading complexity for performance gains that compound across millions of invocations. For developers building on Claude, the change is largely transparent, but it signals how AI infrastructure is maturing beyond raw model capability into production-grade tooling.

Simon Willison·4h ago

64

Products & Apps Business & Funding

Anthropic makes Claude Fable 5 permanent across paid plans

Anthropic is moving Claude Fable 5 from limited beta to permanent availability across its subscription tiers, signaling confidence in the model's maturity while responding to competitive pressure from OpenAI's GPT-5.6 Sol. Max and Team Premium subscribers gain full access at half standard limits, while Pro users retain credit-based access plus a $100 one-time credit. The shift reflects a consolidation strategy where frontier models transition from experimental offerings to core product infrastructure, reshaping how capability tiers differentiate across pricing.

Simon Willison·1d ago

77

Models & Releases Opinion & Analysis

Kimi K3 deflects system prompt extraction with evasive response

Kimi K3 deflected a system prompt extraction attempt with a deflective response, illustrating how frontier models now handle adversarial probing. The incident surfaces a recurring tension in LLM deployment: balancing transparency with security. As models become more capable at reasoning and self-preservation, researchers and safety teams face harder questions about what constitutes appropriate guardrail behavior versus concerning autonomy. This moment matters less for the quip itself than for what it signals about model training priorities and the evolving cat-and-mouse game between jailbreak attempts and defensive measures.

Simon Willison·1d ago

52

Tools & Code Opinion & Analysis

Willison releases tool to flag common LLM writing patterns

Simon Willison released a browser tool that identifies ten recurring patterns in LLM-generated text, addressing a growing frustration with formulaic AI writing. The highlighter targets recognizable tics like "no fluff, no filler" phrasing that signal machine authorship. This reflects a maturing awareness within the AI community about detectability of synthetic content and the aesthetic fatigue around predictable model outputs. The tool itself is modest, but it signals a shift toward meta-commentary on LLM behavior and the emergence of practical utilities for spotting AI-generated material in the wild.

Simon Willison·1d ago

64

Hardware & Infra Opinion & Analysis

AI water footprint collides with regional scarcity in desert regions

Simon Willison highlights a structural tension in AI infrastructure scaling: hyperscalers face mounting pressure over data center water consumption at a moment when alternative land uses compete for the same scarce resource. His analysis quantifies Google's 2025 water footprint against Coachella Valley golf course usage, proposing a provocative but mathematically grounded thought experiment. The piece surfaces a real landscape-level constraint on AI compute expansion that policy makers and infrastructure planners are beginning to confront, particularly in water-stressed regions where both cloud providers and legacy industries draw from finite aquifers.

Simon Willison·2d ago

72

Models & Releases

Moonshot AI releases Kimi K3, largest open-weight model at 2.8 trillion parameters

Moonshot AI's Kimi K3 marks a significant scaling milestone: at 2.8 trillion parameters, it becomes the largest open-weight model announced to date, surpassing DeepSeek's 1.6T offering. The model's self-reported benchmarks show competitive performance against frontier closed models like Claude Opus and GPT-5.5, though it trails the latest Claude Fable 5 and GPT-5.6. The July 27 open-weight release signals intensifying competition in the 3T-class tier, where Chinese labs are rapidly closing the capability gap with US incumbents. For practitioners, this represents both expanded inference options and a test case for whether scale alone sustains competitive advantage.

Simon Willison·2d ago

89

Models & Releases Research

GPT-5.6 file deletion bug traced to unsafe sandbox bypass

OpenAI's Thibault Sottiaux disclosed a critical vulnerability in GPT-5.6 where the model deletes user files under specific conditions: when full access mode runs without sandboxing, the model attempts to redefine the HOME environment variable, and subsequently overwrites it instead of a temporary directory. The incident exposes a gap between capability and safety guardrails in code-execution contexts. For practitioners deploying models with file system access, this underscores the necessity of layered protections beyond model-level safeguards, particularly when sandboxing is disabled or auto-review mechanisms are bypassed.

Simon Willison·2d ago

77

Models & Releases Tools & Code

Thinking Machines Lab releases 975B open-weights model Inkling

Thinking Machines Lab, led by Mira Murati, has released Inkling, a 975B-parameter mixture-of-experts model under Apache 2.0 licensing. The multimodal system trained on 45 trillion tokens represents a significant open-weights entry from a new lab, challenging the concentration of frontier model releases among established players. A smaller 276B variant is forthcoming, signaling a tiered release strategy. The notably sparse model card raises questions about documentation standards in the open-weights ecosystem, even as the release itself expands accessible frontier-scale infrastructure.

Simon Willison·2d ago

89

Tools & Code Products & Apps

Willison uses Claude to port Mermaid ASCII converter to WebAssembly

Simon Willison has expanded his Mermaid diagram conversion toolkit by compiling an older Go library into WebAssembly using Claude Fable 5, enabling browser-based rendering with color support. This represents a practical workflow for converting LLM-friendly diagram syntax into terminal-compatible ASCII art, addressing a gap between modern diagramming tools and legacy systems. The move signals how AI-assisted compilation is enabling developers to resurrect and adapt older codebases for contemporary web environments, particularly useful for teams bridging visual design and infrastructure automation.

Simon Willison·2d ago

64

Opinion & Analysis Policy & Regulation

Linus Torvalds commits Linux to AI tooling integration

Linus Torvalds has publicly committed Linux to AI tooling adoption, rejecting calls from within the open-source community to maintain an anti-AI stance. His position signals that major infrastructure projects will integrate machine learning rather than resist it, forcing a reckoning for purist factions in open source. This matters because Linux's stance shapes downstream decisions across the entire ecosystem: if the kernel maintainer embraces AI-assisted development, corporate and individual contributors face pressure to follow. The broader implication is that the AI-skeptic position, once viable in technical communities, is becoming untenable at scale.

Simon Willison·2d ago

77

Tools & Code

Willison extracts Grok's Mermaid renderer, ports to browser WebAssembly

Simon Willison reverse-engineered a Rust-based Mermaid diagram renderer from Grok's open-sourced codebase and ported it to WebAssembly for browser use. The tool converts Mermaid syntax into Unicode box art, enabling terminal-friendly diagram rendering without external dependencies. This exemplifies how AI agent infrastructure components are being extracted, repurposed, and made accessible to developers as reusable building blocks. The move signals growing modularity in AI tooling ecosystems and highlights how open-sourced agent codebases become raw material for downstream developer tools.

Simon Willison·3d ago

64

Tools & Code Policy & Regulation

xAI's grok CLI exposed user data before going open source

xAI's grok CLI tool triggered a security crisis after users discovered it was uploading entire directories to Google Cloud without explicit consent or warning. The incident exposed a fundamental trust violation in developer tooling: a utility designed to streamline workflows became a vector for exfiltrating sensitive data including SSH keys, credentials, and personal files. The subsequent open-sourcing appears to be damage control, but the episode underscores how rapidly AI infrastructure tools can erode user confidence when privacy safeguards are absent or opaque. For teams evaluating xAI integrations, this signals the need for rigorous vetting of CLI tools and data handling practices.

Simon Willison·3d ago

84

Research Policy & Regulation

Researcher finds exfiltration hole in Claude's web fetch safeguards

Security researcher Ayush Paul identified a vulnerability in Claude's web_fetch tool that enables data exfiltration attacks, circumventing safeguards designed to prevent hostile instructions from triggering unauthorized data access. The flaw exposes a critical gap in how AI systems isolate private user memories from tool-mediated internet access, creating a pathway for attackers to weaponize Claude's own interaction history against users. This finding underscores the difficulty of securing multi-tool LLM architectures where data compartmentalization assumptions break down under coordinated attack scenarios.

Simon Willison·3d ago

84

Products & Apps Tools & Code

OpenAI's Codex Desktop adds customizable AI pet companions

OpenAI's Codex Desktop now supports customizable animated desktop pets, a feature that quietly launched in May but gained visibility through Simon Willison's experimentation. Users can create personalized AI companions that provide task updates and notifications, blending productivity tooling with playful interface design. This represents a shift toward more human-centric AI interaction patterns, moving beyond purely functional interfaces toward ambient, personality-driven assistants that inhabit the workspace. The feature signals OpenAI's interest in making AI integration feel less utilitarian and more integrated into daily developer workflows.

Simon Willison·4d ago

64

Tools & Code Products & Apps

GPT-5.6 Sol powers SQL-based game engine prototype

Peter Gostev used GPT-5.6 Sol to build DOOMQL, a proof-of-concept that replaces traditional game engines with SQLite as the core runtime. The project demonstrates how modern LLMs can tackle unconventional architectural problems: SQL queries now handle collision detection, enemy AI, rendering, and player movement in a playable Doom-like terminal game. This signals a shift in how developers think about LLM-assisted systems design, moving beyond chatbots toward using AI to explore radical reimaginings of foundational software patterns.

Simon Willison·5d ago

72

Opinion & Analysis Tools & Code

Willison measures AI coding agent impact via Datasette commit history

Simon Willison used GitHub's code-frequency metrics to measure the tangible impact of advanced AI coding agents and Opus 4.5-class models on his own development velocity. By analyzing commit patterns in Datasette across eight years, Willison created a real-world case study showing how frontier LLMs are reshaping individual developer productivity. This anecdotal but credible observation from a respected open-source maintainer offers concrete evidence of how AI tooling is accelerating code output at scale, providing insiders with a practical lens on AI's near-term economic impact beyond benchmark claims.

Simon Willison·5d ago

72

Opinion & Analysis

Why AI agents should never own project accountability

Simon Willison examines how the concept of Directly Responsible Individuals, borrowed from Apple's organizational playbook, applies to LLM-powered agents in human teams. The core tension: as autonomous agents become more capable, organizations must clarify accountability structures. Willison argues that ultimate responsibility for project outcomes should remain with humans, not systems, raising fundamental questions about how AI agents integrate into corporate hierarchies and governance. This matters for anyone building or deploying agent-based workflows, as it challenges the assumption that automation can fully replace human accountability.

Simon Willison·6d ago

72

Products & Apps Business & Funding

Anthropic delays Fable 5 retirement again as GPT-5.6 Sol reshapes tier strategy

Anthropic has extended Claude Fable 5's availability on paid tiers through July 19, marking the second postponement of the model's planned deprecation. The extension reflects competitive pressure from GPT-5.6 Sol, which Anthropic classifies as a comparable Fable/Mythos-tier model. This signals that Anthropic views compute allocation as flexible when market positioning demands it, and that older-generation models retain strategic value in a crowded landscape where users expect model choice within subscription tiers. The pattern suggests Anthropic is managing customer expectations around model lifecycle more cautiously than initially planned.

Simon Willison·6d ago

64

Opinion & Analysis Hardware & Infra

AR glasses need cloud processing or bulk batteries, Patel argues

Nilay Patel articulates a hard constraint on near-term AR glasses design: real-time visual processing at the edge remains computationally infeasible within power budgets. Current architectures force a choice between cloud offloading (with latency and privacy tradeoffs) or bulky local compute (Vision Pro model). This framing matters because it exposes why consumer AR remains stuck despite years of hype, and signals that breakthroughs in on-device inference efficiency or neural compression are prerequisites, not nice-to-haves, for the category to scale.

Simon Willison·Jul 10

72

Models & Releases

OpenAI launches GPT-5.6 trio with aggressive pricing on agentic tasks

OpenAI's GPT-5.6 family arrives with three tiers, Luna through Sol, establishing a new pricing floor that undercuts Claude Opus on input costs while maintaining premium output pricing. The strategic move targets agentic workloads, where reasoning token variance now matters more than raw per-token rates. Benchmarks claim across-the-board wins over Claude Fable 5, signaling OpenAI's confidence in long-horizon task execution. For practitioners, this reshapes cost-benefit calculus for production deployments, particularly for autonomous systems where token efficiency and reasoning depth diverge.

Simon Willison·Jul 9

97

Models & Releases Products & Apps

Meta releases Muse Spark 1.1 API with agentic tool calling focus

Meta's Muse Spark 1.1 marks the first API release in the Spark model family, positioning the company as a serious contender in the agentic AI space. The update emphasizes tool calling and computer use capabilities, areas where frontier models are racing to improve. Simon Willison's hands-on preview and immediate plugin development signal developer interest, while the detailed evaluation report suggests Meta is backing claims with rigorous benchmarking. This move matters because it opens Meta's models to production workloads where agentic reasoning and API access are table stakes.

Simon Willison·Jul 9

77

Tools & Code Products & Apps

Willison's LLM tool gains Meta muse-spark-1.1 support

Simon Willison has released llm-meta-ai 0.1, a plugin that extends his popular LLM command-line tool to support Meta's newly announced muse-spark-1.1 model. This move signals growing developer adoption of Meta's inference API and reflects the fragmentation of the LLM tooling ecosystem as multiple model providers compete for integration into existing workflows. For developers already invested in the LLM CLI, this removes friction in experimenting with Meta's latest offering alongside other model providers.

Simon Willison·Jul 9

64

Tools & Code

llm CLI fixes tool-call JSON bug across OpenAI providers

Simon Willison's llm CLI tool reached version 0.31.1 with a targeted fix addressing a JSON serialization failure in OpenAI's Chat Completion API when tool calls contain empty arguments. The bug surfaced during testing of the llm-meta-ai plugin, highlighting a real friction point in multi-provider LLM tooling where edge cases in function-calling protocols diverge across implementations. For developers building LLM applications that rely on structured tool use, this fix removes a silent failure mode that could corrupt inference pipelines.

Simon Willison·Jul 9

64

Tools & Code Opinion & Analysis

Bun's Rust rewrite reveals agentic engineering at infrastructure scale

Jarred Sumner's rewrite of Bun from Zig to Rust showcases sophisticated AI-assisted engineering at scale. The blog post details how agentic workflows, iterative testing, and adversarial review shaped a complex systems rewrite, offering a rare window into how LLMs are now embedded in infrastructure development. This signals a maturation of AI-driven code generation beyond toy projects, with implications for how runtime and tooling projects approach large refactors and language selection decisions.

Simon Willison·Jul 8

77

Products & Apps Models & Releases

OpenAI deploys GPT-Live with intelligent task delegation to GPT-5.5

OpenAI has deployed GPT-Live, a new voice-mode model that marks a significant shift in real-time conversational AI. The system intelligently routes complex queries (web search, reasoning-heavy tasks) to GPT-5.5 while maintaining conversational flow, allowing the lighter model to keep users engaged during backend processing. This hybrid architecture signals a practical approach to balancing latency and capability, letting frontier models handle depth without sacrificing the responsiveness users expect from voice interaction. The rollout reflects growing sophistication in model orchestration for consumer applications.

Simon Willison·Jul 8

89

Opinion & Analysis Tools & Code

Kenton Varda bans AI-written commit messages from his team

Kenton Varda, a prominent infrastructure engineer, has banned AI-generated change descriptions from his team's workflow, citing a fundamental mismatch between what LLMs produce and what developers need during code review. The core problem: AI summaries enumerate surface-level code details already visible in diffs while omitting the strategic intent and architectural reasoning that make reviews meaningful. This signals a growing friction point in AI-assisted development, where automation excels at transcription but fails at abstraction, forcing teams to choose between tool overhead and review quality.

Simon Willison·Jul 8

72

Tools & Code Products & Apps

GPT-5.5 generates working web component from conversational prompt

Simon Willison demonstrated GPT-5.5's capability to generate functional web components from minimal prompts, building a tool that embeds GitHub code snippets via a single HTML tag. The experiment surfaces a practical shift in how developers may prototype infrastructure: LLMs now reliably produce production-ready components from conversational specifications, reducing boilerplate friction. This reflects broader maturation in code generation where model outputs require less post-hoc refinement, signaling that AI-assisted development workflows are moving beyond proof-of-concept into everyday utility for web tooling.

Simon Willison·Jul 7

72

Tools & Code Opinion & Analysis

Claude Fable 5 shapes sqlite-utils 4.0 release cycle

Simon Willison's sqlite-utils 4.0rc4 release incorporates detailed feedback from Claude Fable 5, marking a shift in how open-source tooling incorporates AI-driven code review. The final release candidate demonstrates a practical workflow where LLMs contribute substantive architectural guidance to widely-used data infrastructure. This signals growing confidence in AI systems for deep technical review of production tools, with implications for how maintainers validate quality at scale.

Simon Willison·Jul 7

64

Models & Releases

Tencent releases Hy3, a 295B MoE model rivaling larger open-source competitors

Tencent's Hy3 represents a significant efficiency play in the open-weights model space. The 295B-parameter MoE architecture achieves competitive performance with only 21B active parameters, positioning it as a cost-effective alternative to larger flagship models while matching their capabilities on productivity tasks. The Apache 2.0 license and scale-up from a 50-product feedback loop signal Tencent's commitment to competing in the open-source ecosystem, particularly relevant as Chinese AI labs increasingly release production-grade models that challenge Western dominance in accessible, efficient inference.

Simon Willison·Jul 6

84