-
Amazon Technical Reports2025Nova Premier is Amazon’s most capable multimodal foundation model and teacher for model distillation. It processes text, images, and video with a one-million-token context window, enabling analysis of large codebases, 400-page documents, and 90-minute videos in a single prompt [2]. We present the first comprehensive evaluation of Nova Premier’s critical risk profile under the Frontier Model Safety Framework
-
2025Video summarization aims to generate a condensed textual version of an original video. Summaries may consist of either plain text or a shortlist of salient events, possibly including temporal or spatial references. Video Large Language Models (VLLMs) exhibit impressive zero-shot capabilities in video analysis. However, their performance varies significantly according to the LLM prompt, the characteristics
-
NeuS 20252025The “state” of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have “eidetic” (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike
-
It is well known that Large language models (LLMs) have good zero-shot and few-shot performance which makes them a promising candidate for inference when no or few training samples are available. However, when there is abundant task data, small custom trained models perform as well or are superior in performance to pre-trained LLMs, even after accounting for in-context examples. Further, smaller models
-
2025Fine-tuning large language models (LLMs) for specific tasks requires diverse, high-quality training data. However, obtaining sufficient relevant data remains a significant challenge. Existing data synthesis methods either depend on extensive seed datasets or struggle to balance task relevance and data diversity. To address these challenges, we propose Attributeguided multI-hop Data Expansion (AIDE), a novel
Related content
-
September 24, 2020A combination of audio and visual signals guide the device’s movement, so the screen is always in view.
-
September 24, 2020Adjusting prosody and speaking style to conversational context is a first step toward “concept-to-speech”.
-
September 24, 2020Natural turn-taking uses multiple cues — acoustic, linguistic, and visual — to help Alexa interact more naturally, without the need to repeat the wake word.
-
September 24, 2020Deep learning and reasoning enable customers to explicitly teach Alexa how to interpret their novel requests.
-
September 18, 2020Learn how Alexa Conversations helps developers in authoring complex dialogue management rules.
-
September 16, 2020How Amazon conducted customer-obsessed science research and engineering to release a vastly improved experience.