-
Speculative decoding is a method for accelerating inference in large language models (LLMs) by predicting multiple tokens using a smaller ‘draft model’ and validating them against the larger ‘base model.’ If a draft token is inconsistent with what the base model would have generated, speculative decoding ‘backtracks’ to the last consistent token before resuming generation. This is straightforward in autoregressive
-
Amazon Technical Reports2024We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text
-
IEEE Big Data 20242024Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML led to a scarcity of the expensive conventional accelerators (such as GPUs
-
2024We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span (“context” in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent hybrid
-
2024Using tools by Large Language Models (LLMs) is a promising avenue to extend their reach beyond language or conversational settings. The number of tools can scale to thousands as they enable accessing sensory information, fetching updated factual knowledge, or taking actions in the real world. In such settings, in-context learning by providing a short list of relevant tools in the prompt is a viable approach
Related content
-
March 11, 2021University teams will compete in building agents that can help customers complete complex tasks, like cooking and home improvement. Deadline for university team applications is April 16.
-
March 02, 2021The newest chapter addresses a problem that often bedevils nonparametric machine learning models.
-
March 01, 2021The Art Museum skill uses Alexa Conversations, an AI-driven dialogue management tool.
-
February 08, 2021Technique that relies on inverse reinforcement learning, or learning by example, improves task completion rate by 14% to 17% in simulations.
-
February 08, 2021Yanagisawa discusses the science behind Alexa's new bilingual Polyglot model, her career in speech research, and more.
-
February 03, 2021Neural text-to-speech enables new multilingual model to use the same voice for Spanish and English responses.