Symbolic music generation with transformer-GANs
2020
Transformers have emerged as the dominant approach in music literature for generating minute-long compositions with compelling musical structure. These models are trained by minimizing the negative log-likelihood (NLL) of the observed sequence autoregressively. Unfortunately, the quality of samples from these models tends to degrade significantly for long sequences, a phenomenon attributed to exposure bias. Fortunately, we are able to detect these failures with classifiers trained to distinguish between real and sampled sequences. This motivates our Transformer-GAN framework that trains an additional discriminator to complement the NLL objective. We use a pre-trained SpanBERT model for the discriminator, which in our experiments helped with training stability. Using human evaluations and other objective metrics we demonstrate that music generated by our approach outperforms a baseline trained with likelihood maximization and the state-of-the-art Music Transformer.
Research areas