Alexa’s speech recognition research at ICASSP 2022

Multimodal training, signal-to-interpretation, and BERT rescoring are just a few topics covered by Amazon’s 21 speech-related papers.

This week, the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) got under way in virtual form, to be followed by an in-person meeting two weeks later (May 22-27) in Singapore. ICASSP is the flagship conference of the IEEE Signal Processing Society and, as such, one of the premier venues for publishing the latest advances in automatic speech recognition (ASR) and other speech-processing and speech-related fields, with strong participation from both industry and academia.

More ICASSP coverage on Amazon Science

This year, the Alexa AI ASR organization is represented by 21 papers, more than in any prior year, reflecting the growth of speech-related science in Alexa AI. Here we highlight a few of these papers, to give an idea of their breadth.

Multimodal pretraining for end-to-end ASR

Deep-learning methods have taken over as the method of choice in speech-based recognition and classification tasks, and increasingly, self-supervised representation learning is used to pretrain models on large unlabeled datasets, followed by “fine-tuning” on task-labeled data.

In their paper “Multi-modal Pretraining for Automated Speech Recognition”, David Chan and colleagues give a new twist to this approach by pretraining speech representations on audiovisual data. As the self-supervision task for both modalities, they adapt the masked language model, in which words of training sentences are randomly masked out, and the model learns to predict them. In their case, however, the masks are applied to features extracted from the video and audio stream.

Multimodal MLM.png
In "Multi-modal pre-training for automated speech recognition", Amazon researchers adapt the masked language model, which learns to predict masked-out words of training sentences, to features extracted from video and audio streams.

Once pretrained, the audio-only portion of the learned representation is fused with a more standard front-end representation to feed into an end-to-end speech recognition system. The researchers show that this approach yields more accurate ASR results than pretraining with only audio-based self-supervision, suggesting that the correlations between acoustic and visual signals are helpful in extracting higher-level structures relevant to the encoding of speech.

Signal-to-interpretation with multimodal embeddings

The advantages of multimodality are not limited to unsupervised-learning settings. In “Tie your embeddings down: Cross-modal latent spaces for end-to-end spoken language understanding”, Bhuvan Agrawal and coauthors study signal-to-interpretation (S2I) recognizers that map a sequential acoustic input to an embedding, from which the intent of an utterance is directly inferred.

Cross-modal SLU.png
In "Tie your embeddings down: Cross-modal latent spaces for end-to-end spoken language understanding", Amazon researchers train encoders to generate acoustic and text embeddings in the same representational space, so that the origin of the embeddings becomes indistinguishable.

This bypasses the need for explicit speech transcription but still uses supervision for utterance intents. Due to their compactness, S2I models are attractive for on-device deployment, which has multiple benefits. For example, Alexa AI has used on-device speech processing to make Alexa faster and lower-bandwidth.

Agrawal and colleagues show that S2I recognizers give better results when their acoustic embeddings are constrained to be close to embeddings of the corresponding textual input produced by a pretrained language model (BERT). As in the earlier paper, this cross-modal signal is used during learning only and not required for inference (i.e., at runtime). It is a clever way to sneak linguistic structure back into the S2I system while also infusing it with knowledge gleaned from the vastly larger language model training data.

TinyS2I.png
The TinyS2I architecture. From "TINYS2I: A small-footprint utterance classification model with contextual support for on-device SLU".

The idea of matching embeddings derived from audio to those for corresponding text strings (i.e., transcripts) also has other applications. In their paper “TinyS2I: A small-footprint utterance classification model with contextual support for on-device SLU”, Anastasios Alexandridis et al. show that extremely compact, low-latency speech-understanding models can be obtained for the utterances most frequently used to control certain applications, such as media playback.

The most frequent control commands (“pause”, “volume up”, and the like) can be classified directly from an acoustic embedding. For commands involving an item from a contextual menu (“play [title]”), the acoustic embedding is matched to the media title’s textual embedding. In this paper, unlike the previous one, the textual embeddings are trained jointly with the acoustic ones. But the same triplet loss function can be used to align the cross-modal embeddings in a shared space.

ASR rescoring with BERT

Deep encoders of text trained using the masked-language-model (MLM) paradigm, such as BERT, have been widely used as the basis for all sorts of natural-language tasks. As mentioned earlier, they can incorporate vast amounts of language data through self-supervised pretraining, followed by task-specific supervised fine-tuning.

Related content
Second-pass language models that rescore automatic-speech-recognition hypotheses benefit from multitask training on natural-language-understanding objectives.

So far, however, the practical impact of MLMs on ASR proper has been limited, in part because of unsatisfactory tradeoffs between computational overhead (latency) and achievable accuracy gains. This is now changing with the work of Liyan Xu et al., as described in “RescoreBERT: Discriminative speech recognition rescoring with BERT”.

The researchers show how BERT-generated sentence encodings can be incorporated into a model that rescores the text strings output by an ASR model. Because BERT is trained on large corpora of (text-only) public data, it understands the relative probabilities of different ASR hypotheses better than the ASR model can.

The researchers achieved their best results with a combined loss function that is based on both sentence pseudo-likelihood — a more computationally tractable estimate of sentence likelihood — and word error prediction. The resulting rescoring model is so effective compared to standard LSTM (long short-term memory) language models, while also exhibiting lower latency, that the RescoreBERT method has gone from internship project to Alexa production in less than a year.

Ontological biasing for acoustic-event detection

We round out this short selection of papers with one from an ASR-adjacent field. In “Improved representation learning for acoustic event classification using tree-structured ontology”, Arman Zharmagambetov and coauthors look at an alternative to self-supervised training for the task of acoustic-event detection (AED). (AED is the technology behind Alexa’s ability to detect breaking glass, smoke alarms, and other noteworthy events around the house.)

They show that AED classifier training can be enhanced by forcing the resulting representations to identify not only the target event label (such as “dog barking”) but also supercategories (such as “domestic animal” and “animal sound”) drawn from an ontology, a hierarchical representation of relationships between concepts. The method can be further enhanced by forcing the classification to stay the same under distortions of the inputs. The researchers found that their method is more effective than purely self-supervised pretraining and comes close to fully supervised training with only a fraction of the labeled data.

AED architecture.png
In "Improved representation learning for acoustic event classification using tree-structured ontology", Amazon researchers present a two-module joint model consisting of a representation neural network and a decision tree based on a predefined tree-structured ontology.

Conclusion and outlook

As we have seen, Alexa relies on a range of audio-based technologies that use deep-learning architectures. The need to train these models robustly, fairly, and with limited supervision, as well as computational constraints at runtime, continues to drive research in Alexa Science. We have highlighted some of the results from that work as they are about to be presented to the wider science community, and we are excited to see the field as a whole come up with creative solutions and push toward ever more capable applications of speech-based AI.

Research areas

Related content

IN, TS, Hyderabad
Are you fascinated by the power of Natural Language Processing (NLP) and Large Language Models (LLM) to transform the way we interact with technology? Are you passionate about applying advanced machine learning techniques to solve complex challenges in the e-commerce space? If so, Amazon's International Seller Services team has an exciting opportunity for you as an Applied Scientist. At Amazon, we strive to be Earth's most customer-centric company, where customers can find and discover anything they want to buy online. Our International Seller Services team plays a pivotal role in expanding the reach of our marketplace to sellers worldwide, ensuring customers have access to a vast selection of products. As an Applied Scientist, you will join a talented and collaborative team that is dedicated to driving innovation and delivering exceptional experiences for our customers and sellers. You will be part of a global team that is focused on acquiring new merchants from around the world to sell on Amazon’s global marketplaces around the world. Join us at the Central Science Team of Amazon's International Seller Services and become part of a global team that is redefining the future of e-commerce. With access to vast amounts of data, technology, and a diverse community of talented individuals, you will have the opportunity to make a meaningful impact on the way sellers engage with our platform and customers worldwide. Together, we will drive innovation, solve complex problems, and shape the future of e-commerce. Please visit https://www.amazon.science for more information Key job responsibilities - Apply your expertise in LLM models to design, develop, and implement scalable machine learning solutions that address complex language-related challenges in the international seller services domain. - Collaborate with cross-functional teams, including software engineers, data scientists, and product managers, to define project requirements, establish success metrics, and deliver high-quality solutions. - Conduct thorough data analysis to gain insights, identify patterns, and drive actionable recommendations that enhance seller performance and customer experiences across various international marketplaces. - Continuously explore and evaluate state-of-the-art NLP techniques and methodologies to improve the accuracy and efficiency of language-related systems. - Communicate complex technical concepts effectively to both technical and non-technical stakeholders, providing clear explanations and guidance on proposed solutions and their potential impact. - Mentor and guide team of Applied Scientists from technical and project advancement stand point - Contribute research to science community and conference quality level papers.
IN, TS, Hyderabad
Have you ever wondered how Amazon launches and maintains a consistent customer experience across hundreds of countries and languages it serves its customers? Are you passionate about data and mathematics, and hope to impact the experience of millions of customers? Are you obsessed with designing simple algorithmic solutions to very challenging problems? If so, we look forward to hearing from you! At Amazon, we strive to be Earth's most customer-centric company, where both internal and external customers can find and discover anything they want in their own language of preference. Our Translations Services (TS) team plays a pivotal role in expanding the reach of our marketplace worldwide and enables thousands of developers and other stakeholders (Product Managers, Program Managers, Linguists) in developing locale specific solutions. Amazon Translations Services (TS) is seeking an Applied Scientist to be based in our Hyderabad office. As a key member of the Science and Engineering team of TS, this person will be responsible for designing algorithmic solutions based on data and mathematics for translating billions of words annually across 130+ and expanding set of locales. The successful applicant will ensure that there is minimal human touch involved in any language translation and accurate translated text is available to our worldwide customers in a streamlined and optimized manner. With access to vast amounts of data, technology, and a diverse community of talented individuals, you will have the opportunity to make a meaningful impact on the way customers and stakeholders engage with Amazon and our platform worldwide. Together, we will drive innovation, solve complex problems, and shape the future of e-commerce. Key job responsibilities * Apply your expertise in LLM models to design, develop, and implement scalable machine learning solutions that address complex language translation-related challenges in the eCommerce space. * Collaborate with cross-functional teams, including software engineers, data scientists, and product managers, to define project requirements, establish success metrics, and deliver high-quality solutions. * Conduct thorough data analysis to gain insights, identify patterns, and drive actionable recommendations that enhance seller performance and customer experiences across various international marketplaces. * Continuously explore and evaluate state-of-the-art modeling techniques and methodologies to improve the accuracy and efficiency of language translation-related systems. * Communicate complex technical concepts effectively to both technical and non-technical stakeholders, providing clear explanations and guidance on proposed solutions and their potential impact. About the team We are a start-up mindset team. As the long-term technical strategy is still taking shape, there is a lot of opportunity for this fresh Science team to innovate by leveraging Gen AI technoligies to build scalable solutions from scratch. Our Vision: Language will not stand in the way of anyone on earth using Amazon products and services. Our Mission: We are the enablers and guardians of translation for Amazon's customers. We do this by offering hands-off-the-wheel service to all Amazon teams, optimizing translation quality and speed at the lowest cost possible.
US, VA, Arlington
Are you fascinated by the power of Large Language Models (LLM) and Artificial Intelligence (AI) to transform the way we learn and interact with technology? Are you passionate about applying advanced machine learning (ML) techniques to solve complex challenges in the cloud learning space? If so, AWS Training & Certification (T&C) team has an exciting opportunity for you as an Applied Scientist. At AWS T&C, we strive to be leaders in not only how we learn about the latest AI/ML development and AWS services, but also how the same technologies transform the way we learn about them. As an Applied Scientist, you will join a talented and collaborative team that is dedicated to driving innovation and delivering exceptional experiences in our Skill Builder platform for both new learners and seasoned developers. You will be a part of a global team that is focused on transforming how people learn. The position will interact with global leaders and teams across the globe as well as different business and technical organizations. Join us at the AWS T&C Science Team and become a part of a global team that is redefining the future of cloud learning. With access to vast amounts of data, exciting new technology, and a diverse community of talented individuals, you will have the opportunity to make a meaningful impact on the ways how worldwide learners engage with our learning system and builders develop on our platform. Together, we will drive innovation, solve complex problems, and shape the future of future-generation cloud builders. Please visit https://skillbuilder.awsto learn more. Key job responsibilities - Apply your expertise in LLM to design, develop, and implement scalable machine learning solutions that address challenges in discovery and engagement for our international audiences. - Collaborate with cross-functional teams, including software engineers, data engineers, scientists, and product managers, to define project requirements, establish success metrics, and deliver high-quality solutions. - Conduct thorough data analysis to gain insights, identify patterns, and drive actionable recommendations that enhance operational performance and customer experiences across Skill Builder. - Continuously explore and evaluate state-of-the-art techniques and methodologies to improve the accuracy and efficiency of AI/ML systems. - Communicate complex technical concepts effectively to both technical and non-technical stakeholders, providing clear explanations and guidance on proposed solutions and their potential impact. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics dexterous hands that: - Enable unprecedented generalization across diverse tasks - Are compliant and durable - Can span tasks from power grasps to fine dexterity and nonprehensile manipulation - Can navigate the uncertainty of the environment - Leverage mechanical intelligence, multi-modal sensor feedback and advanced control techniques. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement robust sensing for dexterous manipulation, including but not limited to: Tactile sensing, Position sensing, Force sensing, Non-contact sensing - Prototype the various identified sensing strategies, considering the constraints of the rest of the hand design - Build and test full hand sensing prototypes to validate the performance of the solution - Develop testing and validation strategies, supporting fast integration into the rest of the robot - Partner with cross-functional teams to iterate on concepts and prototypes - Work with Amazon's robotics engineering and operations customers to deeply understand their requirements and develop tailored solutions - Document the designs, performance, and validation of the final system
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team. As a Senior Applied Scientist, you'll spearhead the development of breakthrough foundation models and full-stack robotics systems that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive technical excellence in areas such as locomotion, manipulation, sim2real transfer, multi-modal and multi-task robot learning, designing novel frameworks that bridge the gap between research and real-world deployment at Amazon scale. In this role, you'll combine hands-on technical work with scientific leadership, ensuring your team delivers robust solutions for dynamic real-world environments. You'll leverage Amazon's vast computational resources to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Lead technical initiatives across the robotics stack, driving breakthrough approaches through hands-on research and development in areas including robot co-design, dexterous manipulation mechanisms, innovative actuation strategies, state estimation, low-level control, system identification, reinforcement learning, and sim-to-real transfer, as well as foundation models for perception and manipulation - Guide technical direction for full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development - Develop and optimize control algorithms and sensing pipelines that enable robust performance in production environments - Mentor fellow scientists while maintaining strong individual technical contributions - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack - Influence technical decisions and implementation strategies within your area of focus A day in the life - Design and implement innovative systems and algorithms, working hands-on with our extensive infrastructure to prototype and evaluate at scale - Guide fellow scientists in solving complex technical challenges across the full robotics stack - Lead focused technical initiatives from conception through deployment, ensuring successful integration with production systems - Drive technical discussions within your team and with key stakeholders - Conduct experiments and prototype new ideas using our massive compute cluster and extensive robotics infrastructure - Mentor team members while maintaining significant hands-on contribution to technical solutions About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as locomotion, manipulation, sim2real transfer, multi-modal and multi-task robot learning, designing novel frameworks that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Drive independent research initiatives across the robotics stack, including robot co-design, dexterous manipulation mechanisms, innovative actuation strategies, state estimation, low-level control, system identification, reinforcement learning, and sim-to-real transfer, as well as foundation models for perception and manipulation - Lead full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development - Develop and optimize control algorithms and sensing pipelines that enable robust performance in production environments - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement innovative systems and algorithms, leveraging our extensive infrastructure to prototype and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through ground breaking foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the International Emerging Stores organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team Central Machine Learning team works closely with the IES business and engineering teams in building ML solutions that create an impact for Emerging Marketplaces. This is a great opportunity to leverage your machine learning and data mining skills to create a direct impact on millions of consumers and end users.
US, WA, Seattle
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. Amazon's advertising portfolio helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! Amazon continues to develop its advertising program. Ads run in our Stores (including Consumer Stores, Books, Amazon Business, Whole Foods Market, and Fresh) and Media and Entertainment publishers (including Fire TV, Fire Tablets, Kindle, Alexa, Twitch, Prime Video, Freevee, Amazon Music, MiniTV, Audible, IMDb, and others). In addition to these first-party (1P) publishers, we also deliver ads on third-party (3P) publishers. We have a number of ad products, including Sponsored Products and Sponsored Brands, display and video products for smaller brands, including Sponsored Display and Sponsored TV. We also operate ad tech products, including Amazon Marketing Cloud (a clean-room for advertisers), Amazon Publisher Cloud (a clean-room for publishers), and Amazon DSP (an enterprise-level buying tool that brings together our ad tech for buying video, audio, and display ads). Key job responsibilities This role is focused on developing core models that will be the foundational of the core advertising-facing tools that we are launching. You will conduct literature reviews to stay on the current news in the field. You will regularly engage with product managers and technical program managers, who will partner with you to productize your work.
CA, QC, Montreal
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as perception, manipulation, scene understanding, sim2real transfer, multi-modal foundation models, and multi-task learning, designing novel algorithms that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Drive independent research initiatives in robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Lead technical projects from conceptualization through deployment, ensuring robust performance in production environments - Collaborate with platform teams to optimize and scale models for real-world applications - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement novel foundation model architectures, leveraging our extensive compute infrastructure to train and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through ground breaking foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
IL, Tel Aviv
Come build the future of entertainment with us. Are you interested in helping shape the future of movies and television? Do you want to help define the next generation of how and what Amazon customers are watching? Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows from Originals and Exclusive content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at any time and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 240 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on We are seeking an exceptional Applied Scientist to join our Prime Video Sports personalization team in Israel. Our team is dedicated to developing state-of-the-art science to personalize the customer experience and help customers seamlessly find any live event in our selection. You will have the opportunity to work on innovative, large-scale projects that push the boundaries of what's possible in sports content delivery and engagement. Your expertise will be crucial in tackling complex challenges such as information retrieval, sequential modeling, realtime model optimizations, utilizing Large Language Models (LLMs), and building state-of-the-art complex recommender systems. Key job responsibilities We are looking for an Applied Scientist with domain expertise in Personalization, Information Retrieval, and Recommender Systems, or general ML to develop new algorithms and end-to-end solutions. As part of our team of applied scientists and software development engineers, you will be responsible for researching, designing, developing, and deploying algorithms into production pipelines. Your role will involve working with cutting-edge technologies in recommender systems and search. You'll also tackle unique challenges like temporal information retrieval to improve real-time sports content recommendations. As a technologist, you will drive the publication of original work in top-tier conferences in Machine Learning and Recommender Systems. We expect you to thrive in ambiguous situations, demonstrating outstanding analytical abilities and comfort in collaborating with cross-functional teams and systems. The ideal candidate is a self-starter with the ability to learn and adapt quickly in our fast-paced environment. About the team We are the Prime Video Sports team. In September 2018 Prime Video launched its first full-scale live streaming experience to world-wide Prime customers with NFL Thursday Night Football. That was just the start. Now Amazon has exclusive broadcasting rights to major leagues like NFL Thursday Night Football, Tennis majors like Roland-Garros and English Premier League to list a few and are broadcasting live events across 30+ sports world-wide. Prime Video is expanding not just the breadth of live content that it offers, but the depth of the experience. This is a transformative opportunity, the chance to be at the vanguard of a program that will revolutionize Prime Video, and the live streaming experience of customers everywhere.