
Implicit Representations of Grammaticality in Language Models
Researchers probed whether language models develop an internal notion of grammaticality separate from raw token probability. Using linear probes on synthetic ungrammatical perturbations, they discovered LMs do encode grammatical structure as a distinct representational feature, even though surface probabilities conflate grammaticality with corpus likelihood. This finding matters for interpretability: it suggests neural language models acquire linguistic abstractions beyond next-token prediction, reshaping how we understand what these systems actually learn versus what they merely memorize.62












