UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

UKP_Psycontrol's SemEval-2026 Task 2 submission ranks first in modeling emotional valence and arousal shifts across user texts, finding that LLMs capture static affect well but recent numeric trajectories better predict short-term emotional change than semantic content alone.

Modelwire context

Explainer

The buried finding here is not the first-place ranking but the architectural implication: LLMs, despite their fluency with emotional language, are outperformed on short-term affect prediction by simple numeric trend features derived from recent emotional scores. That suggests the semantic richness LLMs offer may be less useful than a lightweight time-series signal when the task is forecasting the next emotional state rather than describing the current one.

This connects to a pattern visible across recent Modelwire coverage of LLM evaluation reliability. The 'Diagnosing LLM Judge Reliability' paper from April 16 showed that LLMs appear consistent in aggregate but break down at the instance level, and the 'Context Over Content' piece from the same date found judges responding to framing rather than substance. The UKP_Psycontrol result adds another data point: LLMs capture a static snapshot of affect well but miss the directional dynamics that matter most for real prediction tasks. These are related failure modes, not identical ones, but together they sketch a picture of models that are better at recognizing patterns than tracking change over time.

Watch whether the numeric trajectory advantage holds when the same approach is tested on longer conversational windows or lower-resource languages in future SemEval tasks. If it degrades quickly with window length, the finding is narrow; if it holds, it argues for hybrid architectures as a default in affective computing pipelines.

Coverage we drew on

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsUKP_Psycontrol · SemEval-2026 Task 2 · LLM · Maximum Entropy

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.