LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues

Researchers built a dataset from 769 Malaysian court cases to test LLMs on legal issue identification, finding GPT-4o achieves only 62% precision despite generating diverse candidates. They propose LePREC, a neuro-symbolic approach combining reasoning with structured classification to improve legal AI accuracy in underserved justice systems.

Modelwire context

Explainer

The more telling detail is the failure mode: GPT-4o generates diverse candidate issues but can't reliably rank or filter them, meaning the problem isn't recall but precision under legal specificity constraints. LePREC's bet is that structured classification over predefined legal factors can supply the discriminative signal that prompt-based generation alone cannot.

This connects directly to the reliability problems our coverage has been tracking. The 'Diagnosing LLM Judge Reliability' piece from April 16 found that aggregate consistency scores mask per-instance logical failures in one-third to two-thirds of documents. Legal issue identification has the same structure: headline accuracy looks tolerable until you examine individual case-level errors, where the cost of a miss is not a degraded benchmark score but a misrepresented legal argument. The Malaysian Contract Act framing also matters because it signals the paper is testing a jurisdiction where training data is sparse, which is a harder and more honest stress test than English common law benchmarks.

Watch whether LePREC's structured factor schema generalizes to a second civil-law jurisdiction within the next year. If it does, the architecture has legs; if the factors require full re-annotation per jurisdiction, the method's practical reach is narrower than the paper implies.

Coverage we drew on

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGPT-4o · LePREC · Malaysian Contract Act · OpenAI

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.