-
2024Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning method leveraging
-
SIGIR Forum2024Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision. While product search is useful for shoppers to find the actual products meeting their requirements in the catalog, information seeking
-
Instruction fine-tuning (IFT) elicits instruction following capabilities and steers the behavior of large language models (LLMs) via supervised learning. However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a.k.a. system prompts. The ability to follow these roles and
-
2024In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC more
-
2024Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically
Related content
-
July 22, 2019Using machine learning to train information retrieval models — such as Internet search engines — is difficult because it requires so much manually annotated data. Of course, training most machine learning systems requires manually annotated data, but because information retrieval models must handle such a wide variety of queries, they require a lot of data. Consequently, most information retrieval systems rely primarily on mechanisms other than machine learning.
-
June 27, 2019Earlier this month, Varun Sharma and Akshit Tyagi, two master’s students from the University of Massachusetts Amherst, began summer internships at Amazon, where, like many other scientists in training, they will be working on Alexa’s spoken-language-understanding systems.
-
June 13, 2019Alexa’s ability to respond to customer requests is largely the result of machine learning models trained on annotated data. The models are fed sample texts such as “Play the Prince song 1999” or “Play River by Joni Mitchell”. In each text, labels are attached to particular words — SongName for “1999” and “River”, for instance, and ArtistName for Prince and Joni Mitchell. By analyzing annotated data, the system learns to classify unannotated data on its own.
-
June 11, 2019As Alexa expands into new countries, she usually has to be trained on new languages. But sometimes, she has to be re-trained on languages she’s already learned. British English, American English, and Indian English, for instance, are different enough that for each of them, we trained a new machine learning model from scratch.
-
Animation by O’Reilly Science ArtJune 6, 2019New approach to reference resolution rewrites queries to clarify ambiguous references. -
June 5, 2019Today, customer exchanges with Alexa are generally either one-shot requests, like “Alexa, what’s the weather?”, or interactions that require multiple requests to complete more complex tasks.