Research Models & Releases·arXiv cs.CL·3d ago

QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies

Researchers introduced QuantCode-Bench, a 400-task benchmark for evaluating LLMs on generating executable algorithmic trading strategies for the Backtrader framework. The benchmark tests whether models can combine financial domain knowledge, API mastery, and correct syntax to produce strategies that execute on historical data.

MentionsQuantCode-Bench · Backtrader · Large Language Models

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Research

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

arXiv cs.CL·2d ago

Research

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

arXiv cs.CL·2d ago

Products & Apps

OpenAI’s big Codex update is a direct shot at Claude Code

The Verge — AI·2d ago