On-device speech processing makes Alexa faster, lower-bandwidth

Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

At Amazon, we always look to invent new technology for improving customer experience. One technology we have been working on at Alexa is on-device speech processing, which has multiple benefits: a reduction in latency, or the time it takes Alexa to respond to queries; lowered bandwidth consumption, which is important on portable devices; and increased availability in in-car units and other applications where Internet connectivity is intermittent. On-device processing also enables the fusion of the speech signal with other modalities, like vision, for features such as Alexa’s natural turn-taking.

In the last year, we’ve continued to build upon Alexa’s on-device speech-processing capabilities. As a result of these inventions, we are launching a new setting that gives customers the option of having the audio of their Alexa voice requests processed locally, without being sent to the cloud.

In the cloud, storage space and computational capacity are effectively unconstrained. To ensure accuracy, our cloud models can be large and computationally demanding. Executing the same functions on-device means compressing our models into less than 1% as much space — with minimal loss in accuracy.

Moreover, in the cloud, the separate components of Alexa’s speech-processing stack — automatic speech recognition (ASR), whisper detection, and speaker identification — run on separate server nodes with their own powerful processors. On-device, those functions have to share hardware not only with each other but with Alexa’s other core device functions, such as music playback.

Re-creating Alexa’s speech-processing stack on-device was a massive undertaking. New methods for training small-footprint ASR models were part of the solution, but so were innovations in system design and hardware-software codesign. It was a joint effort across science and engineering teams over a span of years. Here’s a quick overview of how it works.

System architecture

Our on-device ASR model takes in an acoustic speech signal and outputs a set of hypotheses about what the speaker said, ranked according to probability. We represent those hypotheses as a lattice — a graph whose edges represent recognized words and the probability that a given word follows from the previous one.

Sample lattice.cropped.png
An example of a lattice representing ASR hypotheses.

With cloud-based ASR, encrypted audio streams to the cloud in small snippets called “frames”. With on-device ASR, only the lattice is sent to the cloud, where a large and powerful neural language model reranks the hypotheses. The lattice can’t be sent until the customer has finished speaking, as words later in a sequence can dramatically change the overall probability of a hypothesis.

The model that determines when the customer has finished speaking is called an end-pointer. End-pointers offer a natural trade-off between accuracy and latency: an aggressive end-pointer will initiate speech processing earlier, but it might cut the speaker off prematurely, resulting in a poor customer experience.

On the device, we in fact run two end-pointers: One is a speculative end-pointer that we have tuned to be about 200 milliseconds faster than the final end-pointer, so we can initiate downstream processing — such as natural-language understanding (NLU) — ahead of the final end-pointed ASR result. In exchange for speed, however, we trade off a little accuracy.

The final end-pointer takes longer to make a decision but is more accurate. In cases in which the first end-pointer cuts speech off too early, the final end-pointer sends a revised lattice and the instruction to reset downstream processing. In the large majority of cases, however, the aggressive end-pointer is correct, which reduces user-perceived latency, since downstream tasks are initiated earlier.

Another aspect of ASR that had to move on-device is context awareness. When computing the probabilities in a lattice, the ASR model should, for instance, give added weight to otherwise uncommon names that happen to be in the customer’s address book or the names the customer has assigned to household devices.

AmazonScience_StaticGraphic
A diagram of the on-device ASR network, with a closeup of the biasing mechanism that allows the network to ingest dynamic content. (Based on figures in "Context-aware Transformer transducer for speech recognition")
Attention map.png
This attention map indicates that the trained network is attending to the correct entry in a list of Alexa-linked home appliances. (From "Context-aware Transformer transducer for speech recognition")

Context awareness can’t wait for the cloud because the lattice, though it encodes multiple hypotheses, doesn’t come close to encoding all possible hypotheses. When constructing the lattice, the ASR system has to prune a lot of low-probability hypotheses. If context awareness isn’t built into the on-device model, names of contacts or linked skills might end up getting pruned.

Initially, we use a so-called shallow-fusion model to add context and personalize content on-device. When the system is building the lattice, it boosts the probabilities of contextually relevant words such as contact or appliance names.

The probability boosts are heuristic, however — they’re not learned jointly with the core ASR model. To achieve even better accuracy on personalized and long-tail content, we have developed a multihead attention-based context-biasing mechanism that is jointly trained with the rest of the ASR subnetworks.

Model training

On-device ASR required us to build a new model from the ground up, an end-to-end recurrent neural network-transducer (RNN-T) model that directly maps the input speech signal to an output sequence of words. Using a single neural network results in a significantly reduced memory footprint. But we had to develop new techniques, both for inference and for training, to achieve the degree of accuracy and compression that would let this technology handle utterances on-device.

Previously on Amazon Science, we’ve discussed some of the techniques we used to increase the accuracy of small-footprint end-to-end ASR models. With teacher-student training, for instance, we teach a small, lean model to match the outputs of a more-powerful but slower model. We developed a training methodology that made it possible to do teacher-student training efficiently with a million hours of unannotated speech.

Stream-level context.png
During the training of a context-aware ASR model, a long-short-term-memory (LSTM) encoder encodes both unlabeled and labeled segments of the audio stream, so the model can use the entire input audio to improve ASR accuracy. (From "Improving RNN-T ASR accuracy using context audio")

To further boost the accuracy of on-device RNN-T ASR, we developed techniques that allow the neural network to learn and exploit audio context within a stream. For example, for a stream comprising two utterances, “Alexa” and “Play a song”, the audio context from the keyword segment (“Alexa”) helps the model focus on the foreground speech and speaker. Separately, we implemented a novel discriminative-loss and training algorithm that aims at directly minimizing the word error rate (WER) of RNN-T ASR.

On top of these innovations, however, we still had to develop some new compression techniques to get the RNN-T to run efficiently on-device. A neural network consists of simple processing nodes each of which is connected to several others. The connections between nodes have associated weights, which determine how much one node’s output contributes to the computation performed by the next node.

One way to shrink a neural network’s memory footprint is to quantize its weights — to divide the total range of weights into a small set of intervals and use a single value to represent all the weights in each interval. So, for instance, the weights 0.70, 0.76, and 0.79 might all get quantized to the single value 0.75. Specifying an interval requires fewer bits than specifying several different floating-point values.

If quantization is done after a network has been trained, performance can suffer. We developed a method of <i class="rte2-style-italic">quantization-aware</i> training that imposes a probability distribution on the network weights during training, so that they can be easily quantized with little effect on performance. Unlike previous quantization-aware training methods, which mostly take quantization into account in the forward pass, ours accounts for quantization in the backward direction, during weight updates, through network loss regularization. And it does that efficiently.

A way to make neural networks run more efficiently — also a vital concern on resource-constrained devices — is to reduce low weights to zero. Computations involving zero weights can be discarded, reducing the computational burden.

Sparsification.png
Over successive training epochs, sparsification gradually drops low weights in a weight matrix.

But again, doing that reduction after the network is trained can compromise performance. We developed a <i class="rte2-style-italic">sparsification</i> method that enables the gradual reduction of low-value weights during training, so the network learns a model amenable to weight pruning.

Neural networks are typically trained on multiple passes through the same set of training data, or epochs. During each epoch, we force the network weights to diverge more and more, so that at the end of the final epoch, a fixed number of weights — say, half — are effectively zero. They can be safely discarded.

AmazonScience_AmnetDemo_V1.gif
A demonstration of the branching encoder network.

To improve on-device efficiency, we also developed a branching encoder network that uses two different neural networks to convert speech inputs into numeric representations suitable for speech classification. One network is complex, one simple, and the ASR model decides on the fly whether it can get away with passing an input frame to the simple model, saving computational cost and time. We described this work in more detail in an earlier Amazon Science blog post.

Hardware-software codesign

Quantization and sparsification make no difference to performance if the underlying hardware can’t take advantage of them. Another key to getting ASR to run on-device was the design of Amazon’s AZ family of neural edge processors, which are optimized for our specific approach to compression.

For one thing, where a typical processor might represent data using 16 or 32 bits, for certain core operations, the AZ processors accelerate computation by using an 8-bit or even lower-bit representation, because that’s all we need to handle quantized values.

The weights of a neural network are typically represented using a matrix — a big grid of numbers. A matrix half of whose values are zeroes takes up as much space as a matrix that’s all nonzero.

On computer chips, transferring data tends to be much more time consuming than executing computations. So when we load our matrix into memory, we use a compression scheme that takes advantage of low-bit quantization and zero values. The circuitry for decoding the compressed representation is built into the chip.

In the neural processor’s memory, the matrix is reconstituted: the zeroes are filled back in. But the processor’s circuitry is designed to recognize zero values and discard computations involving them. So the time savings from sparsification are realized in the hardware itself.

Moving speech recognition on device entails a number of innovations in other areas, such as reduction in the bandwidth required for model updates and compression of NLU models, to ensure basic functionality on devices with intermittent Internet connectivity. And we’re also hard at work on multilingual on-device ASR models for dynamic language switching, or automatically recognizing which of two languages a customer is speaking and responding in kind.

The launch of on-device speech processing is a huge step in bringing the benefits of “processing on the edge” to our customers, and we will continue to invent on their behalf in this area.

Research areas

Related content

IN, HR, Gurugram
Our customers have immense faith in our ability to deliver packages timely and as expected. A well planned network seamlessly scales to handle millions of package movements a day. It has monitoring mechanisms that detect failures before they even happen (such as predicting network congestion, operations breakdown), and perform proactive corrective actions. When failures do happen, it has inbuilt redundancies to mitigate impact (such as determine other routes or service providers that can handle the extra load), and avoids relying on single points of failure (service provider, node, or arc). Finally, it is cost optimal, so that customers can be passed the benefit from an efficiently set up network. Amazon Shipping is hiring Applied Scientists to help improve our ability to plan and execute package movements. As an Applied Scientist in Amazon Shipping, you will work on multiple challenging machine learning problems spread across a wide spectrum of business problems. You will build ML models to help our transportation cost auditing platforms effectively audit off-manifest (discrepancies between planned and actual shipping cost). You will build models to improve the quality of financial and planning data by accurately predicting ship cost at a package level. Your models will help forecast the packages required to be pick from shipper warehouses to reduce First Mile shipping cost. Using signals from within the transportation network (such as network load, and velocity of movements derived from package scan events) and outside (such as weather signals), you will build models that predict delivery delay for every package. These models will help improve buyer experience by triggering early corrective actions, and generating proactive customer notifications. Your role will require you to demonstrate Think Big and Invent and Simplify, by refining and translating Transportation domain-related business problems into one or more Machine Learning problems. You will use techniques from a wide array of machine learning paradigms, such as supervised, unsupervised, semi-supervised and reinforcement learning. Your model choices will include, but not be limited to, linear/logistic models, tree based models, deep learning models, ensemble models, and Q-learning models. You will use techniques such as LIME and SHAP to make your models interpretable for your customers. You will employ a family of reusable modelling solutions to ensure that your ML solution scales across multiple regions (such as North America, Europe, Asia) and package movement types (such as small parcel movements and truck movements). You will partner with Applied Scientists and Research Scientists from other teams in US and India working on related business domains. Your models are expected to be of production quality, and will be directly used in production services. You will work as part of a diverse data science and engineering team comprising of other Applied Scientists, Software Development Engineers and Business Intelligence Engineers. You will participate in the Amazon ML community by authoring scientific papers and submitting them to Machine Learning conferences. You will mentor Applied Scientists and Software Development Engineers having a strong interest in ML. You will also be called upon to provide ML consultation outside your team for other problem statements. If you are excited by this charter, come join us!
US, NJ, Newark
At Audible, we believe stories have the power to transform lives. It’s why we work with some of the world’s leading creators to produce and share audio storytelling with our millions of global listeners. We are dreamers and inventors who come from a wide range of backgrounds and experiences to empower and inspire each other. Imagine your future with us. ABOUT THIS ROLE We are seeking a Data Scientist to own our causal inference infrastructure and drive sophisticated modeling that measures the incremental impact of business decisions. This role requires deep expertise in advanced causal inference methodologies—including synthetic control methods, Synthetic Difference-in-Differences (SDID), and Bayesian approaches—to design rigorous experiments, estimate long-term customer behavior effects, and translate complex analytical results into clear business recommendations. You will own the development and continuous improvement of these causal inference models while being responsible for machine learning operations at scale to ensure our organization makes data-driven decisions with confidence. At Audible, you will have an opportunity to make the best of your skillsets to both develop advanced scientific solutions and drive critical customer and business impact. You will play a key role to drive end-to-end solutions from understanding our business and business requirements, identifying opportunities from a large amount of historical data and engaging in research to solve the business problems. You'll seek to create value for both stakeholders and customers and inform findings in a clear, actionable way to managers and senior leaders. You will be at the heart of an agile and growing area at Audible. ABOUT THE TEAM Audible Data Scientists are members of a global interdisciplinary insights and research team with an integral role in the design and integration of models to automate decision making throughout the business in every country. We empower the machine learning and deep learning techniques in many areas of the business. We translate business goals into agile, insightful analytics and seek to create value for both stakeholders and customers and convey findings in a clear, actionable way to managers and senior leaders. As a Data Scientist, you will... - Design and execute geo-level randomized experiments to measure incremental impact - Apply statistical techniques to evaluate causal impact in quasi-experimental settings - Ensure experiments are statistically valid by evaluating sampling strategies, statistical power, and potential sources of bias - Develop models that estimate long-term effects from short-term experiments using machine learning - Estimate how changes in customer behavior persist and decay over time - Own and maintain the geo-testing codebase, including deployment and scalability - Implement machine learning models at scale with focus on performance optimization - Partner with stakeholders to ensure models align with real business dynamics - Engage deeply with business problems through curiosity-driven questioning and brainstorming - Translate experimental results into financial impact and investment recommendations - Analyze marginal and average revenue impacts relative to costs - Communicate complex quantitative ideas clearly to non-technical stakeholders - Demonstrate understanding of Audible's business model and customer experience ABOUT AUDIBLE Audible is the leading producer and provider of audio storytelling. We spark listeners’ imaginations, offering immersive, cinematic experiences full of inspiration and insight to enrich our customers daily lives. We are a global company with an entrepreneurial spirit. We are dreamers and inventors who are passionate about the positive impact Audible can make for our customers and our neighbors. This spirit courses throughout Audible, supporting a culture of creativity and inclusion built on our People Principles and our mission to build more equitable communities in the cities we call home.
US, WA, Bellevue
Do you enjoy solving challenging problems and driving innovations in research? Are you seeking for an environment with a group of motivated and talented scientists like yourself? Do you want to create scalable optimization models and apply machine learning techniques to guide real-world decisions? Do you want to play a key role in the future of Amazon transportation and operations? Come and join us at Amazon's Modeling and Optimization team (MOP). Key job responsibilities A Research Scientist in the Modeling and Optimization (MOP) team - provides analytical decision support to Amazon planning teams via applying advanced mathematical and statistical techniques. - collaborates effectively with Amazon internal business customers, and is their trusted partner - is proactive and autonomous in discovering and resolving business pain-points within a given scope - is able to identify a suitable level of sophistication in resolving the different business needs - is confident in leveraging existing solutions to new problems where appropriate and is independent in designing and implementing new solutions where needed - is aware of the limitations of their proposed solutions and is proactive in communicating them to the business, and advances the application of sciences towards Amazon business problems by bringing new methods, ideas, and practices to the team and scientific community. A day in the life - Your will be developing model-based optimization, simulation, and/or predictive tools to identify and evaluate opportunities to improve customer experience, network speed, cost, and efficiency of capital investment. - You will quantify the improvements resulting from the application of these tools and you will evaluate the trade-offs between potentially competing objectives. - You will develop good communication skills and ability to speak at a level appropriate for the audience, will collaborate effectively with fellow scientists, software development engineers, and product managers, and will deliver business value in a close partnership with many stakeholders from operations, finance, IT, and business leadership. About the team - At the Modeling and Optimization (MOP) team, we use mathematical optimization, algorithm design, statistics, and machine learning to improve decision-making capabilities across WW Operations and Amazon Logistics. - We focus on transportation topology, labor and resource planning for fulfillment facilities, routing science, visualization research, data science and development, and process optimization. - We create models to simulate, optimize, and control the fulfillment network with the objective of reducing cost while improving speed and reliability. - We support multiple business lanes, therefore maintain a comprehensive and objective view, coordinating solutions across organizational lines where possible.
US, WA, Bellevue
What does it take to build a foundation model that can forecast demand for hundreds of millions of products — including ones that have never been sold before? At Amazon, our Demand Forecasting team is tackling one of the most ambitious challenges in applied time series research: designing and building large-scale foundation models that generalize across an enormous and diverse catalog of products, geographies, and business contexts. This is not incremental modeling work. We are redefining what's possible in demand forecasting through novel architectures, training strategies, and data generation techniques. Our team operates at a scale that is unmatched in industry or academia. You'll design experiments across millions of products simultaneously, developing new model architectures and training methodologies that push the boundaries of what foundation models can learn from vast, heterogeneous time series data. You'll explore techniques in transfer learning, zero-shot forecasting, and synthetic data generation. The models you design here will ship to production and directly influence hundreds of millions of dollars in automated inventory decisions every week. Beyond operational impact, you'll publish your work at top-tier conferences and contribute to advancing the state of the art in time series foundation models for the broader scientific community. If you are a scientist who wants to work at the frontier of time series research, design novel solutions to problems no one else has solved at this scale, and see your research deployed to real-world impact — this is the team for you. Key job responsibilities 1. Design and implement novel deep learning architectures (e.g., Transformers, SSMs, or Graph Neural Networks) for time-series foundation models that generalize across hundreds of millions of products and diverse global contexts. 2. Drive the full development cycle - from whiteboarding new algorithmic approaches to overseeing production-scale deployments. 3. Collaborate with SDEs to build high-performance, distributed training and inference pipelines; translate complex scientific concepts into scalable, production-grade code in Python and Scala. 4. Leverage and develop agentic GenAI workflows to automate the end-to-end research cycle from synthesizing state-of-the-art literature and auto-generating experimental code to rapidly iterating on model architectures across millions of products. 5. Maintain a high bar for scientific excellence by publishing novel research in top-tier venues (e.g., NeurIPS, ICLR, KDD) and contributing to Amazon’s internal patent and science community. A day in the life No two days look the same, but most will involve a high-velocity blend of deep architectural work, distributed system design, and frontier scientific thinking at a scale you won’t find anywhere else. You might start the morning by designing a synthetic data pipeline to stress-test your foundation model. You’ll use generative techniques to simulate rare "black swan" supply chain events, ensuring your model remains robust where historical data is thin. You'll then lead a Scientific Design Review, walking senior leaders through your model’s architecture, defending your choice of loss functions with data-driven rigor. You’ll write high-performance code often paired with AI-coding assistants to handle the heavy lifting of boilerplate and unit testing. You’ll collaborate across a "Two-Pizza Team" of scientists and engineers, pushing the boundaries of research with a clear goal: contributing to work that will be published at top-tier venues (ICLR, NeurIPS) while simultaneously driving multi-million dollar automated decisions. The work is hard, the math is complex, and the tools are state-of-the-art. If you want to build the models that actually ship—this is where you do it. About the team The Demand Forecasting team sits at the heart of Amazon's supply chain, building the science that determines what products are available, when, and at what cost — for hundreds of millions of customers around the world. Our mission is to push the frontier of what's possible in large-scale time series forecasting, and to deploy that science where it creates real, measurable impact. We are a team of scientists who care deeply about both research rigor and real-world outcomes. We don't just publish — we ship. And we don't just ship — we measure, iterate, and raise the bar. Our work spans the full lifecycle: from foundational research and large-scale experimentation to production deployment and downstream impact measurement across supply chain, inventory, and financial planning.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to solve perception, reasoning, and planning to build useful enterprise agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. Key job responsibilities You will contribute directly to AI agent development in an applied research role to improve the multi-model perception and visual-reasoning abilities of our agent. Daily responsibilities including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.
US, WA, Seattle
WW Amazon Stores Finance Science (ASFS) works to leverage science and economics to drive improved financial results, foster data backed decisions, and embed science within Finance. ASFS is focused on developing products that empower controllership, improve business decisions and financial planning by understanding financial drivers, and innovate science capabilities for efficiency and scale. We are looking for a data scientist to lead high visibility initiatives for forecasting Amazon Stores' financials. You will develop new science-based forecasting methodologies and build scalable models to improve financial decision making and planning for senior leadership up to VP and SVP level. You will build new ML and statistical models from the ground up that aim to transform financial planning for Amazon Stores. We prize creative problem solvers with the ability to draw on an expansive methodological toolkit to transform financial decision-making with science. The ideal candidate combines data-science acumen with strong business judgment. You have versatile modeling skills and are comfortable owning and extracting insights from data. You are excited to learn from and alongside seasoned scientists, engineers, and business leaders. You are an excellent communicator and effectively translate technical findings into business action. Key job responsibilities Demonstrating thorough technical knowledge, effective exploratory data analysis, and model building using industry standard ML models Working with technical and non-technical stakeholders across every step of science project life cycle Collaborating with finance, product, data engineering, and software engineering teams to create production implementations for large-scale ML models Innovating by adapting new modeling techniques and procedures Presenting research results to our internal research community
US, WA, Seattle
Are you motivated to explore research in ambiguous spaces? Are you interested in conducting research that will improve the employee and manager experience at Amazon? Do you want to work on an interdisciplinary team of scientists that collaborate rather than compete? Join us at PXT Central Science! The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are seeking a senior Applied Scientist with expertise in more than one or more of the following areas: machine learning, natural language processing, computational linguistics, algorithmic fairness, statistical inference, causal modeling, reinforcement learning, Bayesian methods, predictive analytics, decision theory, recommender systems, deep learning, time series modeling. In this role, you will lead and support research efforts within all aspects of the employee lifecycle: from candidate identification to recruiting, to onboarding and talent management, to leadership and development, to finally retention and brand advocacy upon exit. The ideal candidate should have strong problem-solving skills, excellent business acumen, the ability to work independently and collaboratively, and have an expertise in both science and engineering. The ideal candidate is not methods-driven, but driven by the research question at hand; in other words, they will select the appropriate method for the problem, rather than searching for questions to answer with a preferred method. The candidate will need to navigate complex and ambiguous business challenges by asking the right questions, understanding what methodologies to employ, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces.
IN, TN, Chennai
Are you excited about the digital media revolution and passionate about designing and delivering advanced analytics that directly influence the product decisions of Amazon's digital businesses. Do you see yourself as a champion of innovating on behalf of the customer by turning data insights into action? The Amazon Digital Acceleration Analytics team is looking for an analytical and technically skilled individual to join our team. In this role, you will invent, build and deploy state of the art machine-learning models and systems to enable and enhance the team's mission This role offers wide scope, autonomy, and ownership. You will work closely with software engineers & data engineers to put algorithms into practice. You should have strong business judgement, excellent written and verbal communication skills. The candidate should be willing to take on challenging initiatives and be capable of working both independently and with others as a team. Key job responsibilities We are looking for an experienced data scientist with strong foundations in mathematics, statistics & machine learning with exceptional communication and leadership skills, and a proven track record of delivery. In this role, You will Define a long-term science vision and roadmap for the team, driven fundamentally from our customers' needs, translating those directions into specific plans for engineering teams. Design and execute machine learning projects/products end-to-end: from ideation, analysis, prototyping, development, metrics, and monitoring. Drive end-to-end statistical analysis that have a high degree of ambiguity, scale, and complexity. Research and develop advanced Generative AI based solutions to solve diverse customer problems. About the team The MIDAS team operates within Amazon's Digital Analytics (DA) engineering organization, building analytics and data engineering solutions that support cross-digital teams. Our platform delivers a wide range of capabilities, including metadata discovery, data lineage, customer segmentation, compliance automation, AI-driven data access through generative AI and LLMs, and advanced data quality monitoring. Today, more than 100 Amazon business and technology teams rely on MIDAS, with over 20,000 monthly active users leveraging our mission-critical tools to drive data-driven decisions at Amazon scale.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are forming a new organization within Prime Video to redefine our operational landscape through the power of artificial intelligence. As a Applied Scientist within this initiative, you will be a technical leader helping to design and build the intelligent systems that power our vision. You will tackle complex and ambiguous problems, designing and delivering scalable and resilient agentic AI and ML solutions from the ground up. You will not only write high-quality, maintainable software and models, but also mentor other scientists, influence our technical strategy, and drive engineering best practices across the team. Your work will directly contribute to making Prime Video's operations more efficient and will set the technical foundation for years to come. We're seeking candidates with strong experience in computer vision and generative AI technologies. In this role, you'll apply cutting-edge techniques in image and video understanding, visual content generation, and multimodal AI systems to transform how Prime Video operates at scale. Key job responsibilities • Lead the design and architecture of highly scalable, available, and resilient services for our AI automation platform. • Write high-quality, maintainable, and robust code to solve complex business problems, building flexible systems without over-engineering. • Act as a technical leader and mentor for other engineers on the team, assisting with career growth and encouraging excellence. • Work through ambiguous requirements, cut through complexity, and translate business needs into scalable technical solutions. • Take ownership of the full software development lifecycle, including design, testing, deployment, and operations. • Work closely with product managers, scientists, and other engineers to build and launch new features and systems. About the team This role offers a unique opportunity to shape the future of one of Amazon's most exciting businesses through the application of AI technologies. If you're passionate about leveraging AI to drive real-world impact at massive scale, we want to hear from you.
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as perception, manipulation, science understanding, locomotion, manipulation, sim2real transfer, multi-modal foundation models and multi-task robot learning, designing novel frameworks that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Drive independent research initiatives across the robotics stack, including robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Lead full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development, ensuring robust performance in production environments - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack, optimizing and scaling models for real-world applications - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement novel foundation model architectures and innovative systems and algorithms, leveraging our extensive infrastructure to prototype and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.