MQTransformer: Multi-horizon forecasts with context dependent attention and optimal bregman volatility
In many forecasting applications (e.g. retail demand, electricity load, weather, finance, etc.), the forecasts must obey certain properties such as having certain context-dependent and time-varying seasonality patterns and avoiding excessive revision as new information becomes available. Here we propose a new forecasting neural net architecture that addresses some of these issues, MQ-Transformer, by incorporating three architectural improvements to the current state-of-the-art: 1) a novel decoder-encoder attention that aligns the historical and future time periods 2) a novel positional encoding that learns seasonality from the historical time series and 3) a novel decoder-self attention that allows the network to minimize the forecast volatility. We then define a new measure of forecast volatility, Bregman Volatility, to understand one major source of the improvement from our model. Bregman Volatility allows us to compute the optimal volatility of a sequence of forecasts in terms of the improvement in forecast accuracy over that time period. We show both theoretically and empirically that the decoder-self attention module optimizes Bregman volatility and thereby improves forecast accuracy as well.