Molecular Representations for Large Language Models

Researchers have systematized a critical gap in LLM chemistry workflows by introducing MolJSON, a purpose-built molecular representation format, and benchmarking it against five incumbent standards across multiple frontier models. The work matters because chemistry-focused LLM systems depend on reliable molecular encoding, yet the field has defaulted to SMILES and IUPAC names without rigorous comparative validation. This evaluation across GPT-5 variants and Claude establishes which representations maximize reasoning accuracy on translation and structure tasks at scale (78K+ test cases), directly informing how labs architect chemistry agents and whether domain-specific tokenization strategies outperform generic text formats.
Modelwire context
ExplainerThe deeper issue MolJSON surfaces is tokenization mismatch: SMILES strings were designed for database lookup, not for transformer attention patterns, so models trained on general text corpora have been asked to reason about a notation that fragments molecules in ways that obscure chemical meaning. The benchmark's 78K test cases make this a statistical argument, not an anecdote.
This connects most directly to the AutoMat paper from May 1st ('Can Coding Agents Reproduce Findings in Computational Materials Science?'), which exposed how LLM agents fail when confronted with unfamiliar scientific toolchains and underspecified domain procedures. Both papers are probing the same underlying problem from different angles: general-purpose models applied to specialized scientific domains carry hidden failure modes that generic benchmarks don't surface. The procedural execution study from the same week adds another layer, showing that accuracy degrades sharply as task complexity grows, which is precisely the regime chemistry reasoning occupies.
Watch whether chemistry agent frameworks like RXN for Chemistry or similar lab automation pipelines adopt MolJSON as a default input format within the next two quarters. Adoption there would confirm the format solves a real integration problem rather than an academic one.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsGPT-5-nano · GPT-5-mini · GPT-5 · Claude Haiku 4.5 · MolJSON · SMILES
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.