A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles
Fragmented evaluation standards have long obscured which controlled text generation methods actually work best, forcing researchers to cherry-pick favorable datasets and metrics. This paper establishes a unified benchmarking framework that applies identical evaluation protocols and datasets across competing CTG systems, creating the first genuinely comparable performance landscape. The work addresses a structural problem in AI research where methodological inconsistency masks real capability differences, enabling practitioners to make informed system choices rather than relying on isolated claims.52























