Research·arXiv cs.CL·May 5

SERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identification

Researchers propose SERE, a retrieval-augmented framework that addresses a critical failure mode in LLM reasoning: causal hallucination, where models overpredict relationships between events. The work combines few-shot learning with structural metrics from ConceptNet and syntactic analysis to ground event causality identification in concrete examples rather than learned biases. This tackles a fundamental problem in how LLMs reason about temporal and causal dependencies, with implications for information extraction, question answering, and knowledge graph construction pipelines that depend on accurate causal signal.

Modelwire context

Explainer

SERE's core insight is that causal hallucination stems not from factual gaps but from learned statistical biases that override actual event relationships. The framework doesn't add external knowledge; it uses structural metrics to weight which in-context examples the model attends to, making the reasoning process auditable rather than opaque.

This connects directly to the hallucination detection work (LaaB, May 5) which treated neural uncertainty and self-reasoning as coupled signals. SERE takes a different angle: instead of detecting hallucination after the fact, it prevents causal overconfidence by grounding predictions in concrete structural examples before inference. It also echoes the procedural faithfulness finding from May 1, which showed LLMs lose track of constraints in multi-step tasks. Here, the constraint is causal plausibility, enforced via retrieval rather than training.

If SERE's performance gains hold on out-of-domain event datasets (domains not represented in ConceptNet training), that confirms the approach generalizes. If gains collapse on those splits, the method is just memorizing ConceptNet's existing biases rather than fixing the underlying reasoning problem.

Coverage we drew on

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsSERE · ConceptNet · Large Language Models

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.