Modelwire
Subscribe

Latent Preference Modeling for Cross-Session Personalized Tool Calling

Illustration accompanying: Latent Preference Modeling for Cross-Session Personalized Tool Calling

Researchers introduced MPT, a 265-dialogue benchmark for personalized tool calling in LLM agents, and PRefine, a memory-augmented method that cuts token usage to 1.24% of full-history prompting while handling incomplete user requests across sessions.

Modelwire context

Explainer

The more consequential detail buried in the summary is the 1.24% token figure: PRefine doesn't just compress history, it selectively reconstructs latent user preferences from incomplete signals across sessions, which is a different problem than general sequence compression.

Token efficiency has been a recurring thread in recent coverage. The K-Token Merging paper from April 16 attacked the same cost problem from the inference side, merging embeddings to shrink sequence length. PRefine attacks it from the memory-retrieval side, deciding what history is worth including at all. These are complementary pressure points on the same bottleneck. The ReCoQA benchmark from April 20 is also relevant context: it shows the field is actively building domain-specific tool-calling benchmarks, and MPT's 265-dialogue scale looks modest by comparison to ReCoQA's 29,270 instances, which is worth noting when evaluating how broadly MPT's findings will generalize.

If a follow-up study applies PRefine to a larger, multi-domain benchmark like ReCoQA or a comparable open dataset and the token savings hold above 95% without measurable accuracy drop, the method is genuinely robust. If the gains shrink under denser tool vocabularies or longer session gaps, the 1.24% figure is specific to MPT's narrow construction.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsMPT · PRefine

MW

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Latent Preference Modeling for Cross-Session Personalized Tool Calling · Modelwire