
ProteinJEPA: Latent prediction complements protein language models
Researchers propose masked-position MLM+JEPA, a hybrid training recipe that combines token-level and latent-space prediction objectives for protein language models. Under equivalent compute budgets, this approach outperforms standard masked language modeling on 10 of 16 downstream tasks, suggesting that joint optimization across representation levels yields stronger generalization for biological sequence understanding. The finding challenges the dominance of token-centric pretraining and opens a new design axis for foundation models in computational biology.58



























