Time series modeling techniques based on deep learning have seen many advancements in recent years, especially in data-abundant settings, and with the central aim of learning global models that can extract patterns across multiple time series. While the crucial importance of appropriate data pre-processing and scaling has often been noted in prior work, most published work focuses on improved model architectures. In this paper we empirically investigate the effect of data input and output transformations on the predictive performance of several neural forecasting architectures. In particular, we investigate the impact of several forms of data binning, i.e. converting real-valued time series into categorical ones, on prediction performance, when combined with feed-forward, recurrent, and convolution-based models. We find that binning can significantly improve performance when combined with certain model architectures (compared to scaling techniques), but that the particular type of binning chosen is of lesser importance.