Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

Researchers have created a controlled dataset to test whether language models actually grasp the semantic distinction between light verbs (make a decision) and full predicates (make a cake). The work reveals that LLMs do encode this phraseological nuance even in minimal contexts, with separable activation patterns tied to object type. This matters for interpretability: it suggests models capture linguistic structure beyond surface statistics, and the released dataset and framework enable systematic probing of how well models handle compositional meaning across languages and verb classes.
Modelwire context
ExplainerThe paper's core claim rests on a narrow finding: that models show separable activation patterns tied to object type in light-verb contexts. But the summary glosses over a crucial limitation: minimal pairs are artificial constructs that may not reflect how models actually process phraseological meaning in natural text, where context is messier and ambiguity is the norm.
This connects directly to the activation-based probing work from June 3rd, which found that MLP activations and statistical moments fail to predict downstream model behavior. That study challenged whether internal signals actually correlate with performance. This paper takes the opposite approach: it assumes activation patterns are meaningful and uses them to validate semantic understanding. The tension matters. If activation patterns don't reliably predict what models do (as the earlier work suggests), then finding separable activations in a controlled dataset may be interpretable theater rather than evidence of genuine compositional reasoning.
Test whether the same activation signatures generalize to naturally occurring light-verb uses in corpus data (not minimal pairs). If the model's internal representations collapse or become noisy outside the controlled setting, that signals the finding is an artifact of the dataset design rather than proof of robust phraseological competence.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsLanguage models · English verbs · Light-verb constructions · Minimal-pair dataset
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.