Improving programming Q&A with neural generative augmentation
Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show RePro-QANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.