-
2025Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations— generating content that appears plausible but contains factual inaccuracies. We present FINCH-ZK, a black-box framework that leverages FINe-grained Cross-model consistency to detect and mitigate Hallucinations in LLM outputs without requiring external knowledge sources
-
Amazon Technical Reports2025We present Amazon Nova Multimodal Embeddings (MME), a state-of-the-art multimodal embedding model for agentic RAG and semantic search applications. Nova MME is the first embeddings model that supports five modalities as input: text, documents, images, video and audio, and transforms them into a single, unified embedding space. This powerful capability enables cross-modal retrieval —allowing users to search
-
2025We present MASSIVE-Agents, a new benchmark for assessing multilingual function calling across 52 languages. We created MASSIVE-Agents by cleaning the original MASSIVE dataset and then reformatting it for evaluation within the Berkeley Function-Calling Leaderboard (BFCL) framework. The full benchmark comprises 47,020 samples with an average of 904 samples per language, covering 55 different functions and
-
2025Structured information extraction from unstructured text is critical for emerging Software 3.0 systems where LLM agents autonomously interact with APIs and tools. Recent approaches apply large language models directly to extraction tasks using existing JSON schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat JSON schemas as static contracts
-
2025For an e-commerce domain, the address is the single most important piece of data for ensuring accurate and reliable deliveries. In this two-part study, we first outline the construction of a language model to assist customers with address standardization and in the latter part, we detail a novel Pareto-ensemble multi-task prediction algorithm that derives critical insights from addresses to minimize operational
Related content
-
May 24, 2018Amazon scientists are continuously expanding Alexa’s natural-language-understanding (NLU) capabilities to make Alexa smarter, more useful, and more engaging.
-
May 11, 2018Smart speakers, such as the Amazon Echo family of products, are growing in popularity among consumer and business audiences. In order to improve the automatic speech recognition (ASR) and full-duplex voice communication (FDVC) performance of these smart speakers, acoustical echo cancellation (AEC) and noise reduction systems are required. These systems reduce the noises and echoes that can impact operation, such as an Echo device accurately hearing the wake word “Alexa.”
-
May 4, 2018In recent years, the amount of textual information produced daily has increased exponentially. This information explosion has been accelerated by the ease with which data can be shared across the web. Most of the textual information is generated as free-form text, and only a small fraction is available in structured format (Wikidata, Freebase etc.) that can be processed and analyzed directly by machines.
-
April 25, 2018This morning, I am delivering a keynote talk at the World Wide Web Conference in Lyon, France, with the title, Conversational AI for Interacting with the Digital and Physical World.
-
April 12, 2018The Amazon Echo is a hands-free smart home speaker you control with your voice. The first important step in enabling a delightful customer experience with an Echo or other Alexa-enabled device is wake word detection, so accurate detection of “Alexa” or substitute wake words is critical. It is challenging to build a wake word system with low error rates when there are limited computation resources on the device and it's in the presence of background noise such as speech or music.
-
April 10, 2018Just as Alexa can wake up without the need to press a button, she also automatically detects when a user finishes her query and expects a response. This task is often called “end-of-utterance detection,” “end-of-query detection,” “end-of-turn detection,” or simply “end-pointing.”