InDi: Informative and diverse sampling for dense retrieval

Nachshon Cohen; Hedda Cohen Indelman; Yaron Fairstein; Guy Kushilevitz

Publication

InDi: Informative and diverse sampling for dense retrieval

By Nachshon Cohen, Hedda Cohen Indelman, Yaron Fairstein, Guy Kushilevitz

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Negative sample selection has been shown to have a crucial effect on the training procedure of dense retrieval systems. Nevertheless, most existing negative selection methods end by randomly choosing from some pool of samples. This calls for a better sampling solution. We define desired requirements for negative sample selection; the samples chosen should be informative, to advance the learning process, and diverse, to help the model generalize. We compose a sampling method designed to meet these requirements, and show that using our sampling method to enhance the training procedure of a recent significant dense retrieval solution (coCondenser) improves the obtained model’s performance. Specifically, we see a ∼ 2% improvement in MRR@10 on the MS MARCO dataset (from 38.2 to 38.8) and a ∼ 1.5% improvement in Recall@5 on the Natural Questions dataset (from 71% to 72.1%), both statistically significant. Our solution, as opposed to other methods, does not require training or inferencing a large model, and adds only a small overhead (∼ 1% added time) to the training procedure. Finally, we report ablation studies showing that the objectives defined are indeed important when selecting negative samples for dense retrieval.

InDi: Informative and diverse sampling for dense retrieval

Latest news

Work with us