Finding Duplicates in 1.1M BDD Steps: cukereuse, a Paraphrase-Robust Static Detector for Cucumber and Gherkin

Researchers released cukereuse, a static analysis tool that detects duplicate test steps in Gherkin/Cucumber BDD suites using embeddings and fuzzy matching. Analysis of 1.1M steps across 347 GitHub repos found 80% exact duplication rates, revealing significant maintenance overhead in test automation codebases.
Modelwire context
ExplainerThe more striking finding isn't the tool itself but what the corpus analysis revealed: duplication in BDD test suites isn't a marginal hygiene problem but a near-universal one, suggesting that the way teams share and reuse Gherkin steps is fundamentally broken at the workflow level, not just the tooling level.
The related coverage on this site skews heavily toward LLM inference, benchmarks, and agentic coding assistants. The closest thread is the April 16 piece on OpenAI's Codex update competing with Claude Code, which frames AI coding tools as increasingly responsible for writing and maintaining test suites. If AI agents are generating Gherkin steps at scale, the duplication problem cukereuse documents could compound rapidly, since no current agentic coding tool appears to check for semantic overlap across an existing test corpus before writing new steps. This story is otherwise largely disconnected from recent Modelwire coverage and belongs more squarely in the software quality and DevOps tooling space.
Watch whether any of the major BDD framework maintainers (Cucumber Ltd or the Behave project) formally integrate or endorse cukereuse within the next two release cycles. Adoption at that level would signal the duplication problem is being treated as infrastructure, not an academic curiosity.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
Mentionscukereuse · Gherkin · Cucumber · sentence-transformer
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.