The 10 most viewed publications of 2025

From foundation model safety frameworks and formal verification at cloud scale to advanced robotics and multimodal AI reasoning, these are the most viewed publications from Amazon scientists and collaborators in 2025.

By Staff writer

December 29, 2025

6 min read

Amazon Nova Premier: Technical report and model card

Amazon Nova Premier is Amazon’s most capable multimodal foundation model and teacher for model distillation. Nova Premier processes text, images, and videos with a one-million-token context window, enabling analysis of large codebases, long documents, and long videos in a single prompt.
The model also enables customers to use Amazon Bedrock to create customized variants of Amazon Nova Pro, Nova Lite, and Nova Micro that maintain high accuracy while offering improved speed and cost efficiency. Like all Nova models, Nova Premier is built with integrated safety measures and responsible AI practices, maintaining our commitment to customer trust, security, and reliability.
Amazon Nova Sonic: Technical report and model card

Speech and text have traditionally been processed by separate AI systems, creating latency and limiting the naturalness of voice interaction. Amazon Nova Sonic changes this by unifying speech and text processing in a single architecture, delivering frontier voice intelligence and industry-leading price performance.
Nova Sonic builds on advances in large pretrained text and speech models, fusing the two modalities to power applications such as voice-enabled AI assistants and agents, speech recognition, and speech generation. The unified architecture enables the model to adapt generated speech to the acoustic context (e.g., tone, style) and spoken content of user input. Designed with streaming-first capability in mind, Nova Sonic enables low-latency applications supporting natural turn taking and user interruptions, breaking free from the rigid turn taking of traditional speech applications built on cascaded systems.
Amazon's frontier model safety framework

This framework establishes the processes Amazon will use to identify, assess, and manage potential risks that could arise with the development of more advanced and highly capable frontier AI models. First, it specifies critical-capability thresholds, a set of model capabilities that have the potential to cause significant harm to the public if misused. Second, it describes critical-capability evaluations, a variety of automated and human-in-the-loop strategies to determine whether Amazon models demonstrate capabilities that meet or exceed the critical-capability thresholds. Third, it details the development and implementation of risk mitigations when a model demonstrates capabilities that meet or exceed a critical capability threshold.
Formally verified cloud-scale authorization

This paper describes how Amazon Web Services used formal verification to rebuild its authorization engine, providing mathematical certainty that it works correctly. Rather than prove correct the existing Java-based engine, the team found it more effective to write a new engine in the verification-aware programming language Dafny and then compile the result to readable, idiomatic Java code. The team can now confidently deploy enhancements and optimizations while maintaining the highest assurance of both correctness and backward compatibility. The new engine was deployed in 2024 without incident, and customers immediately enjoyed a threefold performance improvement.
Amazon Nova 2: Multimodal reasoning and generation models

The next generation of the Nova model family introduces dynamic reasoning capabilities that let customers control how deeply models think through problems, balancing speed and accuracy based on their specific needs.
Amazon Nova 2 is a family of four foundation models designed to meet diverse enterprise needs in reasoning, multimodal processing, and real-time conversational AI. The family includes Nova 2 Lite and Nova 2 Pro, multimodal models with configurable "extended thinking" controls; Nova 2 Omni, a unified multimodal model that processes text, images, video, and audio inputs while generating both text and images; and Nova 2 Sonic, a speech-to-speech foundation model for natural conversational AI. Nova 2 models process contexts of up to a million tokens, enabling analysis of extensive codebases, long documents, and videos within a single prompt.
Vulcan Pick: A robotic system for picking targeted objects from fabric pods

This paper addresses common challenges in robotic picking, including diverse-object handling, densely packed storage, and dynamic inventories, while introducing advances in 3-D scene understanding and adaptive motion control with continuous visual feedback. The researchers introduce an end-to-end solution that combines proven classical methods with state-of-the-art approaches in computer vision, motion planning, and customized hardware. The resulting system has been operating in a live warehouse environment for over six months, processing more than 12,000 customer orders.
Statistical power calculations revisited: Incorporating beliefs about effect sizes

In A/B testing, statistical power depends on both the variance of estimated effects and the distribution of true effects. Traditional power calculations compute the probability of detecting either effects of a fixed size or the "minimum detectable effect" (MDE). While such calculations capture the role of variance, they don't account for uncertainty about the distribution of true effects. The researchers present two approaches — "prior-informed average power" for frequentists and "Bayesian decision power" for Bayesians — that connect power calculations to beliefs about effect distributions. When true effects are assumed to be normally distributed, both approaches yield simple closed-form expressions that can be computed using data readily available in most A/B testing tools.
UXAgent: An LLM agent-based usability testing framework for web design

Usability testing is a fundamental method for evaluating web design in user experience (UX) studies. UXAgent helps UX researchers evaluate and iterate their usability study designs before conducting real human-subject studies. The system features an LLM agent module and a universal browser connector module that automatically generate thousands of simulated users to test target websites. The system can generate UX study results in qualitative (e.g., interviewing an agent on how it thinks), quantitative (e.g., number of actions), and video recording formats.
Let a neural network be your invariant

Model checking is the mathematical proof that a model of a software or hardware system meets a particular specification. A complete specification of functional correctness must combine both safety properties, which ensure that a system avoids undesired behavior, and “liveness” properties, which ensure that the system also achieves its desired objectives. Proving safety requires an appropriate inductive invariant, whereas proving liveness requires showing a measure of progress via a ranking function. Neural model checking has recently introduced a data-driven approach to formal verification but has focused only on liveness properties. In this paper, the researchers extend neural model checking to inductive invariants and thus safety properties as well, introducing a neural-certificate architecture that jointly represents both types of proofs and is amenable to training using constraint solvers.
Stow: Robotic packing of items into fabric pods

This paper presents a manipulation system capable of placing items onto densely packed shelves. The wide diversity of items in the retail setting and the strict business requirements of high storage rates and few defects have historically prohibited warehouse robots from performing this task. The researchers' innovations in hardware, perception, decision making, motion planning, and control have enabled this system to perform more than 500,000 stows in a large e-commerce fulfillment center. The system, which gives robots the ability to compress and manipulate deformable storage spaces, achieves human levels of packing density and speed while prioritizing work on overhead shelves to enhance the safety of humans working alongside the robots.

About the Author

Staff writer

The 10 most viewed publications of 2025

From foundation model safety frameworks and formal verification at cloud scale to advanced robotics and multimodal AI reasoning, these are the most viewed publications from Amazon scientists and collaborators in 2025.

Related content

Work with us