Research Tools & Code·arXiv cs.CL·3d ago

RCT: A Robot-Collected Touch-Vision-Language Dataset for Tactile Generalization

Researchers have released RCT, a robot-collected dataset pairing tactile sensor data with vision and language annotations across 122 industrial materials. The work addresses a critical gap in embodied AI: most tactile models fail on unseen materials because training data conflates sensor noise with genuine material properties. By carefully structuring evaluation to prevent contact-sequence leakage between train and test splits, the authors expose how naive benchmarking inflates generalization claims by up to 17.7 percent. This matters for any robotics system deployed in uncontrolled environments, and signals growing rigor in multimodal embodied learning.

Modelwire context

Explainer

The real contribution isn't the dataset itself but the exposure of how standard benchmarking practices systematically overstate tactile generalization. The 17.7 percent inflation gap reveals that prior work may have been measuring sensor-specific memorization rather than genuine material understanding.

This connects directly to the STEB benchmark paper from earlier this week, which made the same structural argument about fragmented evaluation: without shared, rigorous testing protocols, the field allows incomparable claims to coexist. RCT applies that lesson to embodied AI, where the stakes are higher (deployed robots failing on unseen materials) and the evaluation trap is subtler (contact sequences leaking across splits rather than semantic contamination). Both papers argue that benchmark infrastructure forces honest measurement and prevents the community from mistaking noise for signal.

If downstream tactile models trained on RCT show less than 5 percent performance drop on held-out material families compared to in-distribution test sets, the careful train-test split design has worked. If the gap remains above 10 percent, it suggests the leakage problem runs deeper than contact-sequence ordering and the field needs even stricter evaluation constraints.

Coverage we drew on

STEB: Style Text Embedding Benchmark · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsRCT · DIGIT sensors

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.