Modelwire
Subscribe

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

Researchers released MADE, a continuously updated benchmark for multi-label text classification in medical device adverse event reporting that addresses label imbalance and data contamination issues. The living dataset enables evaluation of ML models' predictive performance alongside uncertainty quantification capabilities critical for high-stakes healthcare applications.

MentionsMADE · multi-label text classification · medical device adverse events

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Related

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

arXiv cs.CL·

Context Over Content: Exposing Evaluation Faking in Automated Judges

arXiv cs.CL·

Benchmarking Optimizers for MLPs in Tabular Deep Learning

arXiv cs.LG·
MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events · Modelwire