From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Agentic LLM systems operating in persistent workspaces face a novel multi-stage attack vector where prompt injections embedded in files or tool outputs can be stored and executed later, creating trojan-like persistent control without triggering defenses designed to catch individual malicious steps. This research exposes a critical gap in agent security: existing safeguards inspect actions in isolation and miss the cumulative threat of seemingly benign write operations that enable later exploitation. As LLMs transition from chat interfaces to autonomous operational tools with file system and tool access, this attack class represents a material risk to enterprise deployments and underscores why agent sandboxing and cross-session state inspection require fundamental rethinking.
Modelwire context
ExplainerThe key distinction this research draws is temporal: the attack is dangerous precisely because the injection and the exploitation happen in separate sessions, meaning any defense that evaluates actions at the moment they occur will miss the threat entirely. This isn't a prompt injection problem in the traditional sense; it's a state persistence problem.
This connects directly to the ConsisGuard coverage from the same period, which identified a gap between what safety systems reason about and what they actually enforce. That deliberation-to-enforcement gap in guardrails is structurally similar to what this paper describes: defenses that correctly evaluate individual steps can still fail at the system level when the threat is distributed across time. Both papers are pointing at the same underlying problem from different angles, which is that LLM safety tooling was designed for stateless, single-turn interactions and is being stress-tested by architectures that maintain state across sessions.
Watch whether major agent framework providers (LangChain, AutoGen, or comparable tooling) publish explicit cross-session state inspection policies within the next two quarters. If they don't, this attack class will remain unaddressed in the majority of enterprise deployments regardless of how well the research is cited.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsLLM agents · prompt injection · trojan backdoors · agentic harness
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.