Research Tools & Code·arXiv cs.CL·Apr 20

Automatic Slide Updating with User-Defined Dynamic Templates and Natural Language Instructions

Researchers introduced DynaSlide, a 20K-example benchmark for automatically updating presentation slides via natural language commands on custom templates, plus SlideAgent, an agent framework combining multimodal parsing and language models to handle real-world business reporting decks.

Modelwire context

Explainer

The harder problem here isn't generating slides from scratch but editing existing ones while respecting user-defined template constraints, a distinction that makes DynaSlide meaningfully different from prior document-generation benchmarks. The 20K-example scale also suggests the researchers are targeting fine-tuning and evaluation use cases, not just prompting experiments.

SlideAgent sits in a cluster of agent frameworks that combine multimodal parsing with structured output tasks. The MM-WebAgent paper from April 16 tackled a closely parallel problem: maintaining visual and stylistic coherence across a generated document while an agent coordinates layout and content decisions. Both papers treat the document as a constrained artifact, not a blank canvas, which is the architectural choice that actually matters. OpenAI's updated Agents SDK from April 15 provides the kind of native sandbox execution that frameworks like SlideAgent would need to run reliably in production, so the infrastructure layer is maturing alongside the task-specific research.

Watch whether SlideAgent's benchmark gets adopted by any of the major office-suite integrations (Google Workspace or Microsoft 365 Copilot) within the next six months. Adoption there would confirm the task formulation is practically useful; continued academic-only citation would suggest the template-constraint problem is harder to productize than the paper implies.

Coverage we drew on

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsDynaSlide · SlideAgent

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.