Extractive narrativeQA with heuristic pre-training
Although advances in neural architectures for NLP problems as well as unsupervised pretraining have led to substantial improvements on question answering and natural language inference, understanding of and reasoning over long texts still poses a substantial challenge. Here, we consider the task of question answering from full narratives (e.g., books or movie scripts), or their summaries, tackling the NarrativeQA dataset (NQA; Kocisky et al. (2018)). We introduce a heuristic extractive version of the data set, which (a) leads to a large data set of questions with answers extracted from the summaries; (b) allows us to tackle the more feasible problem of answer extraction (rather than generation). We train systems for passage retrieval as well as answer span prediction using this data set, on top of pre-trained BERT embeddings. We show that our setup leads to state of the art performance on summary-level QA. On QA from raw narrative text, we show that our model performs comparatively to previous models. We analyze the relative contributions of pre-trained embeddings and the extractive training paradigm, and provide a detailed error analysis.