PHALAR: Phasors for Learned Musical Audio Representations
PHALAR advances audio representation learning by encoding phase and pitch invariances directly into contrastive embeddings, achieving 70% relative accuracy gains on stem retrieval while cutting model size and training time by half. The work signals a shift toward domain-specific inductive biases in self-supervised audio, moving beyond generic spectral approaches. Downstream validation through zero-shot beat tracking and chord probing suggests the learned representations capture genuine musical structure, positioning phase-aware pooling as a reusable primitive for music AI systems.58













