Embedding-based Zero-shot Retrieval through Query Generation leverages query synthesis over large corpuses of unlabeled text (such as Wikipedia) to pre-train siamese neural retrieval models. The resulting models significantly improve over previous BM25 baselines as well as state-of-the-art neural methods.
This package provides support for leveraging BART-large for query synthesis as well as code for training and finetuning a transformer based neural retriever. We also provide pre-generated synthetic queries on Wikipedia, and relevant pre-trained models that are obtainable through our download scripts.