Alexa speech science developments at Interspeech 2022

Research from Alexa Speech covers a range of topics related to end-to-end neural speech recognition and fairness.

Interspeech, the world’s largest and most comprehensive conference on the science and technology of spoken-language processing, took place this week in Incheon, Korea, with Amazon as a platinum sponsor. Amazon Science asked three of Alexa AI’s leading scientists — in the fields of speech, spoken-language-understanding, and text-to-speech — to highlight some of Amazon’s contributions to the conference.

Related content
Methods for learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training are a few of the highlights.

In this installment, senior principal scientist Andreas Stolcke selects papers from Alexa AI’s speech science organization, focusing on two overarching themes in recent research on speech-enabled AI: end-to-end neural speech recognition and fairness.

End-to-end neural speech recognition

Traditionally, speech recognition systems have included components specialized for different aspects of linguistic knowledge: acoustic models to capture the correspondence between speech sounds and acoustic waveforms (phonetics), pronunciation models to map those sounds to words, and language models (LMs) to capture higher-order properties such as syntax, semantics, and dialogue context.

All these models are trained on separate data and combined using graph and search algorithms, to infer the most probable sequence of words corresponding to acoustic input. The latest versions of these systems employ neural networks for individual components, typically in the acoustic and language models, while still relying on non-neural methods for model integration; they are therefore known as “hybrid” automatic-speech-recognition (ASR) systems.

While the hybrid ASR approach is structured and modular, it also makes it hard to model the ways in which acoustic, phonetic, and word-level representations interact and to optimize the recognition system end to end. For these reasons, much recent research in ASR has focused on so-called end-to-end or all-neural recognition systems, which infer a sequence of words directly from acoustic inputs.

Related content
Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

End-to-end ASR systems use deep multilayered neural architectures that can be optimized end to end for recognition accuracy. While they do require large amounts of data and computation for training, once trained, they offer a simplified computational architecture for inference, as well as superior performance.

Alexa’s ASR employs end-to-end as its core algorithm, both in the cloud and on-device. Across the industry and in academic research, end-to-end architectures are still being improved to achieve better accuracy, to require less computation and/or latency, or to mitigate the lack of modularity that makes it challenging to inject external (e.g., domain-specific) knowledge at run time.

Alexa AI papers at Interspeech address several open problems in end-to-end ASR, and we summarize a few of those papers here.

In “ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition”, Martin Radfar and coauthors propose a new variant of the popular recurrent-neural-network-transducer (RNN-T) end-to-neural architecture. One of their goals is to preserve the property of causal processing, meaning that the model output depends only on past and current (but not future) inputs, which enables streaming ASR. At the same time, they want to improve the model’s ability to capture long-term contextual information.

ConvRNN.png
A high-level block diagram of ConvRNN-T.

To achieve both goals, they augment the vanilla RNN-T with two distinct convolutional (CNN) front ends: a standard one for encoding correlations localized in time and a novel “global CNN” encoder that is designed to capture long-term correlations by summarizing activations over the entire utterance up to the current time step (while processing utterances incrementally through time).

The authors show that the resulting ConvRNN-T gives superior accuracy compared to other proposed neural streaming ASR architectures, such as the basic RNN-T, Conformer, and ContextNet.

Another concern with end-to-end ASR models is computational efficiency, especially since the unified neural architecture makes these models very attractive for on-device deployment, where compute cycles and (for mobile devices) power are at a premium.

In their paper “Compute cost amortized Transformer for streaming ASR”, Yi Xie and colleagues exploit the intuitive observation that the amount of computation a model performs should vary as a function of the difficulty of the task; for instance, input in which noise or an accent causes ambiguity may require more computation than a clean input with a mainstream accent. (We may think of this as the ASR model “thinking harder” in places where the words are more difficult to discern.)

Related content
A new approach to determining the “channel configuration” of convolutional neural nets improves accuracy while maintaining runtime efficiency.

The researchers achieve this with a very elegant method that leverages the integrated neural structure of the model. Their starting point is a Transformer-based ASR system, consisting of multiple stacked layers of multiheaded self-attention (MHA) and feed-forward neural blocks. In addition, they train “arbitrator” networks that look at the acoustic input (and, optionally, also at intermediate block outputs) to toggle individual components on or off.

Because these component blocks have “skip connections” that combine their outputs with the outputs of earlier layers, they are effectively optional for the overall computation to proceed. A block that is toggled off for a given input frame saves all the computation normally carried out by that block, producing a zero vector output. The following diagram shows the structure of both the elementary Transformer building block and the arbitrator that controls it:

Arbitrator:Transformer backbone.png
Illustration of the arbitrator and Transformer backbone of each block. The lightweight arbitrator toggles whether to evaluate subcomponents during the forward pass.

The arbitrator networks themselves are small enough that they do not contribute significant additional computation. What makes this scheme workable and effective, however, is that both the Transformer assemblies and the arbitrators that control them can be trained jointly, with dual goals: to perform accurate ASR and to minimize the overall amount of computation. The latter is achieved by adding a term to the training objective function that rewards reducing computation. Dialing a hyperparameter up or down selects the desired balance between accuracy and computation.

Related content
Branching encoder networks make operation more efficient, while “neural diffing” reduces bandwidth requirements for model updates.

The authors show that their method can achieve a 60% reduction in computation with only a minor (3%) increase in ASR error. Their cost-amortized Transformer proves much more effective than a benchmark method that constrains the model to attend only to sliding windows over the input, which yields only 13% savings and an error increase of almost three times as much.

Finally, in this short review of end-to-end neural ASR advances, we look at ways to recognize speech from more than one speaker, while keeping track of who said what (also known as speaker-attributed ASR).

This has traditionally been done with modular systems that perform ASR and, separately, perform speaker diarization, i.e., labeling stretches of audio according to who is speaking. However, here, too, neural models have recently brought advances and simplification, by integrating these two tasks in a single end-to-end neural model.

In their paper “Separator-transducer-segmenter: Streaming recognition and segmentation of multi-party speech”, Ilya Sklyar and colleagues not only integrate ASR and segmentation-by-speaker but do so while processing inputs incrementally. Streaming multispeaker ASR with low latency is a key technology to enable voice assistants to interact with customers in collaborative settings. Sklyar’s system does this with a generalization of the RNN-T architecture that keeps track of turn-taking between multiple speakers, up to two of whom can be active simultaneously. The researchers’ separator-transducer-segmenter model is depicted below:

Separator-transducer-segmenter.png
Separator-transducer-segmenter. The tokens <sot> and <eot> represent the start of turn and end of turn. Model blocks with the same color have tied parameters, and transcripts in the color-matched boxes belong to the same speaker.

A key element that yields improvements over an earlier approach is the use of dedicated tokens to recognize both starts and ends of speaker turns, for what the authors call “start-pointing” and “end-pointing”. (End-pointing is a standard feature of many interactive ASR systems necessary to predict when a talker is done.) Beyond representing the turn-taking structure in this symbolic way, the model is also penalized during training for taking too long to output these markers, in order to improve the latency and temporal accuracy of the outputs.

Fairness in the performance of speech-enabled AI

The second theme we’d like to highlight, and one that is receiving increasing attention in speech and other areas of AI, is performance fairness: the desire to avert large differences in accuracy across different cohorts of users or on content associated with protected groups. As an example, concerns about this type of fairness gained prominence with demonstrations that certain computer vision algorithms performed poorly for certain skin tones, in part due to underrepresentation in the training data.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

There’s a similar concern about speech-based AI, with speech properties varying widely as a function of speaker background and environment. A balanced representation in training sets is hard to achieve, since the speakers using commercial products are largely self-selected, and speaker attributes are often unavailable for many reasons, privacy among them. This topic is also the subject of a special session at Interspeech, Inclusive and Fair Speech Technologies, which several Alexa AI scientists are involved in as co-organizers and presenters.

One of the special-session papers, “Reducing geographic disparities in automatic speech recognition via elastic weight consolidation”, by Viet Anh Trinh and colleagues, looks at how geographic location within the U.S. affects ASR accuracy and how models can be adapted to narrow the gap for the worst-performing regions. Here and elsewhere, a two-step approach is used: first, subsets of speakers with higher-than-average error rates are identified; then a mitigation step attempts to improve performance for those cohorts. Trinh et al.’s method identifies the cohorts by partitioning the speakers according to their geographic longitude and latitude, using a decision-tree-like algorithm that maximizes the word-error-rate (WER) differences between resulting regions:

Reducing geographical disparities.png
A map of 126 regions identified by the clustering tree. The color does not indicate a specific word error rate (WER), but regions with the same color do have the same WER.

Next, the regions are ranked by their average WERs; data from the highest-error regions is identified for performance improvement. To achieve that, the researchers use fine-tuning to optimize the model parameters for the targeted regions, while also employing a technique called elastic weight consolidation (EWC) to minimize performance degradation on the remaining regions.

This is important to prevent a phenomenon known as “catastrophic forgetting”, in which neural models degrade substantially on prior training data during fine-tuning. The idea is to quantify the influence that different dimensions of the parameter space have on the overall performance and then avoid large variations along those dimensions when adapting to a data subset. This approach decreases the WER mean, maximum, and variance across regions and even the overall WER (including the regions not fine-tuned on), beating out several baseline methods for model adaptation.

Pranav Dheram et al., in their paper “Toward fairness in speech recognition: Discovery and mitigation of performance disparities”, look at alternative methods for identifying underperforming speaker cohorts. One approach is to use human-defined geographic regions as given by postal (a.k.a. zip) codes, in combination with demographic information from U.S. census data, to partition U.S. geography.

Related content
NSF deputy assistant director Erwin Gianchandani on the challenges addressed by funded projects.

Zip codes are sorted into binary partitions by majority demographic attributes, so as to maximize WER discrepancies. The partition with higher WER is then targeted for mitigations, an approach similar to that adopted in the Trinh et al. paper. However, this approach is imprecise (since it lumps together speakers by zip code) and limited to available demographic data, so it generalizes poorly to other geographies.

Alternatively, Dheram et al. use speech characteristics learned by a neural speaker identification model to group speakers. These “speaker embedding vectors” are clustered, reflecting the intuition that speakers who sound similar will tend to have similar ASR difficulty.

Subsequently, these virtual speaker regions (not individual identities) can be ranked by difficulty and targeted for mitigation, without relying on human labeling, grouping, or self-identification of speakers or attributes. As shown in the table below, the automatic approach identifies a larger gap in ASR accuracy than the “geo-demographic” approach, while at the same time targeting a larger share of speakers for performance mitigation:

Cohort discoveryWER gap (%)Bottom-cohort share (%)

Geodemographic

Automatic

41.7

65.0

0.8

10.0

The final fairness-themed paper we highlight explores yet another approach to avoiding performance disparities, known as adversarial reweighting (ARW). Instead of relying on explicit partitioning of the input space, this approach assigns continuous weights to the training instances (as a function of input features), with the idea that harder examples get higher weights and thereby exert more influence on the performance optimization.

Related content
Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

Secondly, ARW more tightly interleaves, and iterates, the (now weighted) cohort identification and mitigation steps. Mathematically, this is formalized as a min-max optimization algorithm that alternates between maximizing the error by changing the sample weights (hence “adversarial”) and minimizing the weighted verification error by adjusting the target model parameters.

ARW was designed for group fairness in classification and regression tasks that take individual data points as inputs. “Adversarial reweighting for speaker verification fairness”, by Minho Jin et al., looks at how the concept can be applied to a classification task that depends on pairs of input samples, i.e., checking whether two speech samples come from the same speaker. Solving this problem could help make a voice-based assistant more reliable at personalization and other functions that require knowing who is speaking.

The authors look at several ways to adapt ARW to learning similarity among speaker embeddings. The method that ultimately worked best assigns each pair of input samples an adversarial weight that is the sum of individual sample weights (thereby reducing the dimensionality of the weight prediction). The individual sample weights are also informed by which region of the speaker embedding space a sample falls into (as determined by unsupervised k-means clustering, the same technique used in Dheram et al.’s automatic cohort-identification method).

Computing ARW weights.png
Computing adversarial-reweighting (ARW) weights.

I omit the details, but once the pairwise (PW) adversarial weights are formalized in this way, we can insert them into the loss function for metric learning, which is the basis of training a speaker verification model. Min-max optimization can then take turns training the adversary network that predicts the weights and optimizing the speaker embedding extractor that learns speaker similarity.

On a public speaker verification corpus, the resulting system reduced overall equal-error rate by 7.6%, while also reducing the gap between genders by 17%. It also reduced the error variability across different countries of origin, by nearly 10%. Note that, as in the case of the Trinh et al. ASR fairness paper, fairness mitigation improves both performance disparities and overall accuracy.

This concludes our thematic highlights of Alexa Speech Interspeech papers. Note that Interspeech covers much more than speech and speaker recognition. Please check out companion pieces that feature additional work, drawn from technical areas that are no less essential for a functioning speech-enabled AI assistant: natural-language understanding and speech synthesis.

Research areas

Related content

GB, London
Economic Decision Science is a central science team working across a variety of topics in the EU Stores business and beyond. We work closely EU business leaders to drive change at Amazon. We focus on solving long-term, ambiguous and challenging problems, while providing advisory support to help solve short-term business pain points. Key topics include pricing, product selection, delivery speed, profitability, and customer experience. We tackle these issues by building novel econometric models, machine learning systems, and high-impact experiments which we integrate into business, financial, and system-level decision making. Our work is highly collaborative and we regularly partner with EU- and US-based interdisciplinary teams. We are looking for a Senior Economist who is able to provide structure around complex business problems, hone those complex problems into specific, scientific questions, and test those questions to generate insights. The ideal candidate will work with various science, engineering, operations and analytics teams to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. If you have an entrepreneurial spirit, you know how to deliver results fast, and you have a deeply quantitative, highly innovative approach to solving problems, and long for the opportunity to build pioneering solutions to challenging problems, we want to talk to you. Key job responsibilities - Provide data-driven guidance and recommendations on strategic questions facing the EU Retail leadership - Scope, design and implement version-zero (V0) models and experiments to kickstart new initiatives, thinking, and drive system-level changes across Amazon - Build a long-term research agenda to understand, break down, and tackle the most stubborn and ambiguous business challenges - Influence business leaders and work closely with other scientists at Amazon to deliver measurable progress and change We are open to hiring candidates to work out of one of the following locations: London, GBR
US, WA, Seattle
The JP Economics and Decision Science Team is looking for an Intern Economist with experience in empirical economic analysis to conduct research on the impact evaluation and prediction of marketing campaigns in Amazon Japan's online retail business. The successful candidate will work closely with the team to improve the efficiency of designing marketing campaigns. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics and applied microeconomics and familiarity with Stata, R, or Python are necessary. Experience with SQL would be a plus, but not required. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will work in a team of economists, data scientists, and engineers and in collaboration with product and finance managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. Key job responsibilities • Use regression analysis to estimate econometric models and develop forecasting solutions that can predict marketing campaign effectiveness. • Collaborate with other economists and data scientists to validate and refine the econometric models. • Work with product managers and software developers to integrate the forecasting models into the campaign management system. • Monitor the accuracy and effectiveness of the forecasting models and make adjustments as necessary. • Communicate your findings and recommendations to team members and stakeholders. A day in the life - Discussions with business partners, as well as product managers and tech leaders to understand the business problem. - Brainstorming with other scientists and economists to design the right model for the problem in hand. - Present the results and new ideas for existing or forward looking problems to leadership. - Deep dive into the data. - Modeling and creating working prototypes. - Analyze the results and review with partners. About the team We are a team of economists, data scientists, and business intelligence engineers supporting Amazon Japan's Customer Growth and Engagement (CGE) org as the one-stop data science enabler. We use analytical insights and products to empower CGE and align strategic decisions across partner teams (e.g., Operations, Delivery Experience, Pricing). We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Bellevue
The Fulfillment by Amazon (FBA) team is looking for a passionate, curious, and creative Senior Applied Scientist, with expertise in machine learning and a proven record of solving business problems through scalable ML solutions, to join our top-notch cross-domain FBA science team. We want to learn seller behaviors, understand seller experience, build automated LLM-based solutions to sellers, design seller policies and incentives, and develop science products and services that empower third-party sellers to grow their businesses. We also predict potentially costly defects that may occur during packing, shipping, receiving and storing the inventory. We aim to prevent such defects before occurring while we are also fulfilling customer demand as quickly and efficiently as possible, in addition to managing returns and reimbursements. To do so, we build and innovate science solutions at the intersection of machine learning, statistics, economics, operations research, and data analytics. As a senior applied scientist, you will propose and deploy solutions that will likely draw from a range of scientific areas such as supervised and unsupervised learning, recommendation systems, statistical learning, LLMs, and reinforcement learning. This role has high visibility to senior Amazon business leaders and involves working with other scientists, and partnering with engineering and product teams to integrate scientific work into production systems. Key job responsibilities - As a senior member of the science team, you will play an integral part in building Amazon's FBA management system. - Research and develop machine learning models to solve diverse business problems faced in Seller inventory management systems. - Define a long-term science vision and roadmap for the team, driven fundamentally from our customers' needs, translating those directions into specific plans for research and applied scientists, as well as engineering and product teams. - Drive and execute machine learning projects/products end-to-end: from ideation, analysis, prototyping, development, metrics, and monitoring. - Review and audit modeling processes and results for other scientists, both junior and senior. - Advocate the right ML solutions to business stakeholders, engineering teams, as well as executive level decision makers A day in the life In this role, you will be a technical leader in machine learning with significant scope, impact, and high visibility. Your solutions may lead to billions of dollars impact on either the topline or the bottom line of Amazon third-party seller business. As a senior scientist on the team, you will be involved in every aspect of the process - from idea generation, business analysis and scientific research, through to development and deployment of advanced models - giving you a real sense of ownership. From day one, you will be working with experienced scientists, engineers, and designers who love what they do. You are expected to make decisions about technology, models and methodology choices. You will strive for simplicity, and demonstrate judgment backed by mathematical proof. You will also collaborate with the broader decision and research science community in Amazon to broaden the horizon of your work and mentor engineers and scientists. The successful candidate will have the strong expertise in applying machine learning models in an applied environment and is looking for her/his next opportunity to innovate, build, deliver, and impress. We are seeking someone who wants to lead projects that require innovative thinking and deep technical problem-solving skills to create production-ready machine learning solutions. The candidate will need to be entrepreneurial, wear many hats, and work in a fast-paced, high-energy, highly collaborative environment. We value highly technical people who know their subject matter deeply and are willing to learn new areas. We look for individuals who know how to deliver results and show a desire to develop themselves, their colleagues, and their career. About the team Fulfillment by Amazon (FBA) is a service that allows sellers to outsource order fulfillment to Amazon, allowing sellers to leverage Amazon’s world-class facilities to provide customers Prime delivery promise. Sellers gain access to Prime members worldwide, see their sales lift, and are free to focus their time and resources on what they do best while Amazon manages fulfillment. Over the last several years, sellers have enjoyed strong business growth with FBA shipping more than half of all products offered by Amazon. FBA focuses on helping sellers with automating and optimizing the third-party supply chain. FBA sellers leverage Amazon’s expertise in machine learning, optimization, data analytics, econometrics, and market design to deliver the best inventory management experience to sellers. We work full-stack, from foundational backend systems to future-forward user interfaces. Our culture is centered on rapid prototyping, rigorous experimentation, and data-driven decision-making. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Bellevue
The Fulfillment by Amazon (FBA) team is looking for a passionate, curious, and creative Applied Scientist, with expertise and experience in machine learning, to join our top-notch cross-domain FBA science team. We want to learn seller behaviors, understand seller experience, build automated LLM-based solutions to sellers, design seller policies and incentives, and develop science products and services that empower third-party sellers to grow their businesses. We also predict potentially costly defects that may occur during packing, shipping, receiving and storing the inventory. We aim to prevent such defects before occurring while we are also fulfilling customer demand as quickly and efficiently as possible, in addition to managing returns and reimbursements. To do so, we build and innovate science solutions at the intersection of machine learning, statistics, economics, operations research, and data analytics. As an applied scientist, you will design and implement ML solutions that will likely draw from a range of scientific areas such as supervised and unsupervised learning, recommendation systems, statistical learning, LLMs, and reinforcement learning. This role has high visibility to senior Amazon business leaders and involves working with other senior and principal scientists, and partnering with engineering and product teams to integrate scientific work into production systems. Key job responsibilities - Research and develop machine learning models to solve diverse FBA business problems. - Translate business requirements/problems into specific plans for research and applied scientists, as well as engineering and product teams. - Drive and execute machine learning projects/products end-to-end: from ideation, analysis, prototyping, development, metrics, and monitoring. - Work closely with teams of scientists, product managers, program managers, software engineers to drive production model implementations. - Build scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. - Advocate technical solutions to business stakeholders, engineering teams, as well as executive level decision makers A day in the life In this role, you will work in machine learning with significant scope, impact, and high visibility. Your solutions may lead to billions of dollars impact on either the topline or the bottom line of Amazon third-party seller business. As an applied scientist, you will be involved in every aspect of the scientific development process - from idea generation, business analysis and scientific research, through to development and deployment of advanced models - giving you a real sense of ownership. From day one, you will be working with experienced scientists, engineers, and designers who love what they do. You are expected to make decisions about technology, models and methodology choices. You will strive for simplicity, and demonstrate judgment backed by mathematical proof. You will also collaborate with the broader decision and research science community in Amazon to broaden the horizon of your work and mentor engineers and scientists. The successful candidate will have the strong expertise in applying machine learning models in an applied environment and is looking for her/his next opportunity to innovate, build, deliver, and impress. We are seeking someone who wants to lead projects that require innovative thinking and deep technical problem-solving skills to create production-ready machine learning solutions. We value highly technical people who know their subject matter deeply and are willing to learn new areas. We look for individuals who know how to deliver results and show a desire to develop themselves, their colleagues, and their career. About the team Fulfillment by Amazon (FBA) is a service that allows sellers to outsource order fulfillment to Amazon, allowing sellers to leverage Amazon’s world-class facilities to provide customers Prime delivery promise. Sellers gain access to Prime members worldwide, see their sales lift, and are free to focus their time and resources on what they do best while Amazon manages fulfillment. Over the last several years, sellers have enjoyed strong business growth with FBA shipping more than half of all products offered by Amazon. FBA focuses on helping sellers with automating and optimizing the third-party supply chain. FBA sellers leverage Amazon’s expertise in machine learning, optimization, data analytics, econometrics, and market design to deliver the best inventory management experience to sellers. We work full-stack, from foundational backend systems to future-forward user interfaces. Our culture is centered on rapid prototyping, rigorous experimentation, and data-driven decision-making. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Seattle
Outbound Communications own the worldwide charter for delighting our customers with timely, relevant notifications (email, mobile, SMS and other channels) to drive awareness and discovery of Amazon’s products and services. We meet customers at their channel of preference with the most relevant content at the right time and frequency. We directly create and operate marketing campaigns, and we have also enabled select partner teams to build programs by reusing and extending our infrastructure. We optimize for customers to receive the most relevant and engaging content across all of Amazon worldwide, and apply the appropriate guardrails to ensure a consistent and high-quality CX. Outbound Communications seek a talented Applied Scientist to join our team to develop the next generation of automated and personalized marketing programs to help Amazon customers in their shopping journeys worldwide. Come join us in our mission today! Key job responsibilities As an Applied Scientist on the team, you will lead the roadmap and strategy for applying science to solve customer problems in the automated marketing domain. This is an opportunity to come in on Day 0 and lead the science strategy of one of the most interesting problem spaces at Amazon - understanding the Amazon customer to build deeply personalized and adaptive messaging experiences. You will be part of a multidisciplinary team and play an active role in translating business and functional requirements into concrete deliverables. You will work closely with product management and the software development team to put solutions into production. You will apply your skills in areas such as deep learning and reinforcement learning while building scalable industrial systems. You will have a unique opportunity to produce and deliver models that help build best-in-class customer experiences and build systems that allow us to deploy these models to production with low latency and high throughput. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Seattle
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to help build industry-leading technology with multimodal systems. Key job responsibilities As an Applied Scientist with the AGI team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with multimodal systems. Your work will directly impact our customers in the form of products and services that make use of vision and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multimodal Large Language Models (LLMs) and Generative Artificial Intelligence (Gen AI) in Computer Vision. About the team The AGI team has a mission to push the envelope with multimodal LLMs and Gen AI in Computer Vision, in order to provide the best-possible experience for our customers. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
CN, 11, Beijing
Amazon Search JP builds features powering product search on the Amazon JP shopping site and expands the innovations to world wide. As an Applied Scientist on this growing team, you will take on a key role in improving the NLP and ranking capabilities of the Amazon product search service. Our ultimate goal is to help customers find the products they are searching for, and discover new products they would be interested in. We do so by developing NLP components that cover a wide range of languages and systems. As an Applied Scientist for Search JP, you will design, implement and deliver search features on Amazon site, helping millions of customers every day to find quickly what they are looking for. You will propose innovation in NLP and IR to build ML models trained on terabytes of product and traffic data, which are evaluated using both offline metrics as well as online metrics from A/B testing. You will then integrate these models into the production search engine that serves customers, closing the loop through data, modeling, application, and customer feedback. The chosen approaches for model architecture will balance business-defined performance metrics with the needs of millisecond response times. Key job responsibilities - Designing and implementing new features and machine learned models, including the application of state-of-art deep learning to solve search matching, ranking and Search suggestion problems. - Analyzing data and metrics relevant to the search experiences. - Working with teams worldwide on global projects. Your benefits include: - Working on a high-impact, high-visibility product, with your work improving the experience of millions of customers - The opportunity to use (and innovate) state-of-the-art ML methods to solve real-world problems with tangible customer impact - Being part of a growing team where you can influence the team's mission, direction, and how we achieve our goals We are open to hiring candidates to work out of one of the following locations: Beijing, 11, CHN | Shanghai, 31, CHN
US, WA, Seattle
The Automated Reasoning Group in AWS Platform is looking for an Applied Scientist with experience in building scalable solver solutions that delight customers. You will be part of a world-class team building the next generation of automated reasoning tools and services. AWS has the most services and more features within those services, than any other cloud provider–from infrastructure technologies like compute, storage, and databases–to emerging technologies, such as machine learning and artificial intelligence, data lakes and analytics, and Internet of Things. You will apply your knowledge to propose solutions, create software prototypes, and move prototypes into production systems using modern software development tools and methodologies. In addition, you will support and scale your solutions to meet the ever-growing demand of customer use. You will use your strong verbal and written communication skills, are self-driven and own the delivery of high quality results in a fast-paced environment. Each day, hundreds of thousands of developers make billions of transactions worldwide on AWS. They harness the power of the cloud to enable innovative applications, websites, and businesses. Using automated reasoning technology and mathematical proofs, AWS allows customers to answer questions about security, availability, durability, and functional correctness. We call this provable security, absolute assurance in security of the cloud and in the cloud. See https://aws.amazon.com/security/provable-security/ As an Applied Scientist in AWS Platform, you will play a pivotal role in shaping the definition, vision, design, roadmap and development of product features from beginning to end. You will: - Define and implement new solver applications that are scalable and efficient approaches to difficult problems - Apply software engineering best practices to ensure a high standard of quality for all team deliverables - Work in an agile, startup-like development environment, where you are always working on the most important stuff - Deliver high-quality scientific artifacts - Work with the team to define new interfaces that lower the barrier of adoption for automated reasoning solvers - Work with the team to help drive business decisions The AWS Platform is the glue that holds the AWS ecosystem together. From identity features such as access management and sign on, cryptography, console, builder & developer tools, to projects like automating all of our contractual billing systems, AWS Platform is always innovating with the customer in mind. The AWS Platform team sustains over 750 million transactions per second. Learn and Be Curious. We have a formal mentor search application that lets you find a mentor that works best for you based on location, job family, job level etc. Your manager can also help you find a mentor or two, because two is better than one. In addition to formal mentors, we work and train together so that we are always learning from one another, and we celebrate and support the career progression of our team members. Inclusion and Diversity. Our team is diverse! We drive towards an inclusive culture and work environment. We are intentional about attracting, developing, and retaining amazing talent from diverse backgrounds. Team members are active in Amazon’s 10+ affinity groups, sometimes known as employee resource groups, which bring employees together across businesses and locations around the world. These range from groups such as the Black Employee Network, Latinos at Amazon, Indigenous at Amazon, Families at Amazon, Amazon Women and Engineering, LGBTQ+, Warriors at Amazon (Military), Amazon People With Disabilities, and more. Key job responsibilities Work closely with internal and external users on defining and extending application domains. Tune solver performance for application-specific demands. Identify new opportunities for solver deployment. About the team Solver science is a talented team of scientists from around the world. Expertise areas include solver theory, performance, implementation, and applications. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. We are open to hiring candidates to work out of one of the following locations: Portland, OR, USA | Seattle, WA, USA
US, WA, Seattle
We’re working to improve shopping on Amazon using the conversational capabilities of LLMs, and are searching for pioneers who are passionate about technology, innovation, and customer experience, and are ready to make a lasting impact on the industry. You'll be working with talented scientists, engineers, across the breadth of Amazon Shopping and AGI to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey! We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Seattle
We are looking for an Applied Scientist to join our Seattle team. As an Applied Scientist, you are able to use a range of science methodologies to solve challenging business problems when the solution is unclear. Our team solves a broad range of problems ranging from natural knowledge understanding of third-party shoppable content, product and content recommendation to social media influencers and their audiences, determining optimal compensation for creators, and mitigating fraud. We generate deep semantic understanding of the photos, and videos in shoppable content created by our creators for efficient processing and appropriate placements for the best customer experience. For example, you may lead the development of reinforcement learning models such as MAB to rank content/product to be shown to influencers. To achieve this, a deep understanding of the quality and relevance of content must be established through ML models that provide those contexts for ranking. In order to be successful in our team, you need a combination of business acumen, broad knowledge of statistics, deep understanding of ML algorithms, and an analytical mindset. You thrive in a collaborative environment, and are passionate about learning. Our team utilizes a variety of AWS tools such as SageMaker, S3, and EC2 with a variety of skillset in shallow and deep learning ML models, particularly in NLP and CV. You will bring knowledge in many of these domains along with your own specialties. Key job responsibilities • Use statistical and machine learning techniques to create scalable and lasting systems. • Analyze and understand large amounts of Amazon’s historical business data for Recommender/Matching algorithms • Design, develop and evaluate highly innovative models for NLP. • Work closely with teams of scientists and software engineers to drive real-time model implementations and new feature creations. • Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and implementation. • Research and implement novel machine learning and statistical approaches, including NLP and Computer Vision A day in the life In this role, you’ll be utilizing your NLP or CV skills, and creative and critical problem-solving skills to drive new projects from ideation to implementation. Your science expertise will be leveraged to research and deliver often novel solutions to existing problems, explore emerging problems spaces, and create or organize knowledge around them. About the team Our team puts a high value on your work and personal life happiness. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of you. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to establish your own harmony between your work and personal life. We are open to hiring candidates to work out of one of the following locations: New York, NY, USA | Seattle, WA, USA