Research·arXiv cs.CL·1d ago

Beyond Supervised Clarification: Input Rewriting with LLMs for Dialogue Discourse Parsing

Researchers challenge a common NLP optimization pattern: using LLMs to rewrite dialogue inputs before parsing. Prior work showed supervised clarification models could resolve ellipsis and references to boost downstream accuracy. This study tests whether the same strategy works under realistic constraints, where no labeled clarification data exists and systems must rely on zero-shot prompting or parser feedback alone. Across three discourse datasets, unsupervised rewriting proved unreliable, often degrading performance rather than improving it. The finding exposes a gap between controlled research settings and production deployment, suggesting practitioners should reconsider input-rewriting pipelines without explicit supervision.

Modelwire context

Skeptical read

The critical omission: prior work on supervised clarification never tested whether the approach generalizes without labeled data. This paper reveals the gap isn't a minor engineering problem but a fundamental one, suggesting the entire input-rewriting pipeline may have been validated only under conditions that don't exist in production.

This connects directly to the broader pattern across recent work on model behavior under realistic constraints. The Taboo paper (early July) studied how models handle competing constraints at inference time, and this dialogue parsing work exposes a similar tension: techniques that work in controlled settings (supervised clarification) collapse when constraints change (no labeled data). Both reveal that controlled research and production deployment operate under different rule sets. The groupthink startup piece also touches this theme, showing that model behavior isn't what we assume when we only test in narrow conditions.

If the same three discourse datasets show performance recovery when researchers add even weak supervision (e.g., 50 labeled examples per dataset) in a follow-up, that confirms the failure is about data scarcity rather than fundamental incompatibility between rewriting and parsing. If unsupervised rewriting continues to degrade performance across new datasets, practitioners should treat input rewriting as a supervised-only technique and stop experimenting with zero-shot variants.

Coverage we drew on

"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsSDRT · LLM · Segmented Discourse Representation Theory

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.