LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Researchers introduce LEXIS, a method for reconstructing 3D human-object interactions from single images by modeling continuous proximity fields across body and object surfaces rather than relying on sparse contact cues. The approach captures interaction structure through action and object geometry to resolve the inherent ambiguity of inferring 3D relationships from 2D input.

Modelwire context

Explainer

The key technical bet LEXIS makes is that interaction geometry is better captured by a dense, continuous field across surfaces than by discrete contact points, which tend to be sparse and brittle when inferred from a single 2D image. This framing shifts the problem from 'where do surfaces touch' to 'how close are all surface regions to each other,' which is a more tractable signal when depth information is absent.

Recent coverage on Modelwire has concentrated heavily on LLM capability races and agentic coding tools, so LEXIS sits largely disconnected from that activity. It belongs instead to a quieter but consequential thread in computer vision: giving machines a structured understanding of how bodies relate to objects in physical space. That capability feeds directly into robotics, AR/VR scene reconstruction, and any system that needs to reason about physical manipulation. The closest thematic neighbor in the archive is the shortest-path generalization paper from arXiv (April 16), which also probed how well learned representations hold up when the problem geometry changes, though the domains are quite different.

Watch whether LEXIS benchmarks extend to video input or multi-view settings within the next year. If the proximity field representation generalizes beyond single-image input, that would signal the approach is robust rather than a solution fitted tightly to a constrained problem setup.

Coverage we drew on

Generalization in LLM Problem Solving: The Case of the Shortest Path · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLEXIS · InterFields

Read full story at arXiv cs.LG →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.