We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to the query. Each query-product pair is accompanied by additional information. The dataset is multilingual, as it contains queries in English, Japanese, and Spanish. The primary objective of releasing this dataset is to create a benchmark for building new ranking strategies and simultaneously identifying interesting categories of results (i.e., substitutes) that can be used to improve the customer experience when searching for products.
Shopping queries dataset: A large-scale ESCI benchmark for improving product search
2022
Last updated May 13, 2023
Research areas