Modelwire
Subscribe

Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

Illustration accompanying: Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

PLOS and DataSeer have deployed an LLM-based measurement system to quantify research data reuse across scholarly publications, revealing a 43% reuse rate that exceeds traditional bibliometric methods. This work demonstrates that generative AI can operate at scale to track downstream impacts of open science practices, shifting focus from monitoring compliance to measuring actual scientific value creation. The finding that data reuse may be significantly underestimated by existing tools has implications for how funding bodies and institutions evaluate research impact and incentivize data sharing.

Modelwire context

Explainer

The more consequential finding isn't the 43% reuse rate itself but the gap between that number and what traditional bibliometric tools detect, which suggests that current incentive structures for data sharing are built on systematically incomplete evidence. Funders and institutions may be rewarding or penalizing researchers based on measurements that miss nearly half of actual downstream use.

This story sits largely disconnected from the governance and product coverage dominating Modelwire this week, including the OpenAI litigation and regional ChatGPT adoption stories. It belongs instead to a quieter but consequential thread: how LLMs get embedded into scientific infrastructure rather than consumer products. The Platformer piece from May 1st frames the current AI cycle as one of structural, long-term value creation beneath the hype, and this PLOS-DataSeer work is a concrete example of that thesis playing out in a domain where the payoff is institutional rather than commercial.

Watch whether major funding bodies such as the NIH or Wellcome Trust formally cite this methodology in updated data-sharing evaluation frameworks within the next 12 months. Adoption at that level would confirm the measurement gap is being treated as a policy problem, not just an academic one.

Coverage we drew on

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsPLOS · DataSeer · LLM

MW

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results · Modelwire