Research Tools & Code·arXiv cs.CL·2d ago

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

Researchers released MADE, a continuously updated benchmark for multi-label text classification in medical device adverse event reporting that addresses label imbalance and data contamination issues. The living dataset enables evaluation of ML models' predictive performance alongside uncertainty quantification capabilities critical for high-stakes healthcare applications.

MentionsMADE · multi-label text classification · medical device adverse events

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

arXiv cs.CL·3d ago

Research

Context Over Content: Exposing Evaluation Faking in Automated Judges

arXiv cs.CL·2d ago

Research

Benchmarking Optimizers for MLPs in Tabular Deep Learning

arXiv cs.LG·2d ago

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

Related

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

Context Over Content: Exposing Evaluation Faking in Automated Judges

Benchmarking Optimizers for MLPs in Tabular Deep Learning