Responsible AI in the generative era

Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

In recent years, and even recent months, there have been rapid and dramatic advances in the technology known as generative AI. Generative AI models are trained on inconceivably massive collections of text, code, images, and other rich data. They are now able to produce, on demand, coherent and compelling stories, news summaries, poems, lyrics, paintings, and programs. The potential practical uses of generative AI are only just beginning to be understood but are likely to be manifold and revolutionary and to include writing aids, creative content production and refinement, personal assistants, copywriting, code generation, and much more.

Kearns with caption
Michael Kearns, a professor of computer and information science at the University of Pennsylvania and an Amazon Scholar.

There is thus considerable excitement about the transformations and new opportunities that generative AI may bring. There are also understandable concerns — some of them new twists on those of traditional responsible AI (such as fairness and privacy) and some of them genuinely new (such as the mimicry of artistic or literary styles). In this essay, I survey these concerns and how they might be addressed over time.

I will focus primarily on technical approaches to the risks, while acknowledging that social, legal, regulatory, and policy mechanisms will also have important roles to play. At Amazon, our hope is that such a balanced approach can significantly reduce the risks, while still preserving much of the excitement and usefulness of generative AI.

What is generative AI?

To understand what generative AI is and how it works, it is helpful to begin with the example of large language models (LLMs). Imagine the thought experiment in which we start with some sentence fragment like Once upon a time, there was a great ..., and we poll people on what word they would add next. Some might say wizard, others might say queen, monster, and so on. We would also expect that given the fairy tale nature of the fragment, words such as apricot or fork would be rather unlikely suggestions.

Related content
Model using ASR hypotheses as extra inputs reduces word error rate of human transcriptions by almost 11%.

If we poll a large enough population, a probability distribution over next words would begin to emerge. We could then randomly pick a word from that distribution (say wizard), and now our sequence would be one word longer — Once upon a time, there was a great wizard ... — and we could again poll for the next word. In this manner we could theoretically generate entire stories, and if we restarted the whole process, the crowd would produce an entirely different narrative due to the inherent randomness.

Dramatic advances in machine learning have effectively made this thought experiment a reality. But instead of polling crowds of people, we use a model to predict likely next words, one trained on a massive collection of documents — public collections of fiction and nonfiction, Wikipedia entries and news articles, transcripts of human dialogue, open-source code, and much more.

LLM objective.gif
An example of how a language model uses context to predict the next word in a sentence.

If the training data contains enough sentences beginning Once upon a time, there was a great …, it will be easy to sample plausible next words for our initial fragment. But LLMs can generalize and create as well, and not always in ways that humans might expect. The model might generate Once upon a time, there was a great storm based on occurrences of tremendous storm in the training data, combined with the learned synonymy of great and tremendous. This completion can happen despite great storm never appearing verbatim in the training data and despite the completions more expected by humans (like wizard and queen).

The resulting models are just as complex as their training data, often described by hundreds of billions of numbers (or parameters, in machine learning parlance), hence the “large” in LLM. LLMs have become so good that not only do they consistently generate grammatically correct text, but they create content that is coherent and often compelling, matching the tone and style of the fragments they were given (known as prompts). Start them with a fairy tale beginning, and they generate fairy tales; give them what seems to be the start of a news article, and they write a news-like article. The latest LLMs can even follow instructions rather than simply extend a prompt, as in Write lyrics about the Philadelphia Eagles to the tune of the Beatles song “Get Back”.

Related content
Models that map spoken language to objects in an image would make it easier for customers to communicate with multimodal devices.

Generative AI isn’t limited to text, and many models combine language and images, as in Create a painting of a skateboarding cat in the style of Andy Warhol. The techniques for building such systems are a bit more complex than for LLMs and involve learning a model of proximity between text and images, which can be done using data sources like captioned photos. If there are enough images containing cats that have the word cat in the caption, the model will capture the proximity between the word and pictures of cats.

The examples above suggest that generative AI is a form of entertainment, but many potential practical uses are also beginning to emerge, including generative AI as a writing tool (Shorten the following paragraphs and improve their grammar), for productivity (Extract the action items from this meeting transcript), for creative content (Propose logo designs for a startup building a dog-walking app), for simulating focus groups (Which of the following two product descriptions would Florida retirees find more appealing?), for programming (Give me a code snippet to sort a list of numbers), and many others.

So the excitement over the current and potential applications of generative AI is palpable and growing. But generative AI also gives rise to some new risks and challenges in the responsible use of AI and machine learning. And the likely eventual ubiquity of generative models in everyday life and work amplifies the stakes in addressing these concerns thoughtfully and effectively.

So what’s the problem?

The “generative” in generative AI refers to the fact that the technology can produce open-ended content that varies with repeated tries. This is in contrast to more traditional uses of machine learning, which typically solve very focused and narrow prediction problems.

For example, consider training a model for consumer lending that predicts whether an applicant would successfully repay a loan. Such a model might be trained using the lender’s data on past loans, each record containing applicant information (work history, financial information such as income, savings, and credit score, and educational background) along with whether the loan was repaid or defaulted.

Related content
NSF deputy assistant director Erwin Gianchandani on the challenges addressed by funded projects.

The typical goal would be to train a model that was as accurate as possible in predicting payment/default and then apply it to future applications to guide or make lending decisions. Such a model makes only lending outcome predictions and cannot generate fairy tales, improve grammar, produce whimsical images, write code, and so on. Compared to generative AI, it is indeed a very narrow and limited model.

But the very limitations also make the application of certain dimensions of responsible AI much more manageable. Consider the goal of making our lending model fair, which would typically be taken to mean the absence of demographic bias. For example, we might want to make sure that the error rate of the predictions of our model (and it generally will make errors, since even human loan officers are imperfect in predicting who will repay) is approximately equal on men and women. Or we might more specifically ask that the false-rejection rate — the frequency with which the model predicts default by an applicant who is in fact creditworthy — be the same across gender groups.

Once armed with this definition of fairness, we can seek to enforce it in the training process. In other words, instead of finding a model that minimizes the overall error rate, we find one that does so under the additional condition that the false-rejection rates on men and women are approximately equal (say, within 1% of each other). We might also want to apply the same notion of fairness to other demographic properties (such as young, middle aged, and elderly). But the point is that we can actually give reasonable and targeted definitions of fairness and develop training algorithms that enforce them.

It is also easy to audit a given model for its adherence to such notions of fairness (for instance, by estimating the error rates on both male and female applicants). Finally, when the predictive task is so targeted, we have much more control over the training data: we train on historical lending decisions only, and not on arbitrarily rich troves of general language, image, and code data.

Now consider the problem of making sure an LLM is fair. What might we even mean by this? Well, taking a cue from our lending model, we might ask that the LLM treat men and women equally. For instance, consider a prompt like Dr. Hanson studied the patient’s chart carefully, and then … . In service of fairness, we might ask that in the completions generated by an LLM, Dr. Hanson be assigned male and female pronouns with roughly equal frequency. We might argue that to do otherwise perpetuates the stereotype that doctors are typically male.

Related content
Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

But then should we not also do this for mentions of nurses, firefighters, accountants, pilots, carpenters, attorneys, and professors? It’s clear that measuring just this one narrow notion of fairness will quickly become unwieldy. And it isn’t even obvious in what contexts it should be enforced. What if the prompt described Dr. Hanson as having a beard? What about the Women’s National Basketball Association (WNBA)? Should mention of a WNBA player in a prompt elicit male pronouns half the time?

Defining fairness for LLMs is even murkier than we suggest above, again because of the open-ended content they generate. Let’s turn from pronoun choices to tone. What if an LLM, when generating content about a woman, uses an ever-so-slightly more negative tone (in choice of words and level of enthusiasm) than when generating content about a man? Again, even detecting and quantifying such differences would be a very challenging technical problem. The field of sentiment analysis in natural-language processing might suggest some possibilities, but currently, it focuses on much coarser distinctions in narrower settings, such as distinguishing positive from negative sentiment in business news articles about particular corporations.

So one of the prices we pay for the rich, creative, open-ended content that generative AI can produce is that it becomes commensurately harder (compared to traditional predictive ML) to define, measure, and enforce fairness.

From fairness to privacy

In a similar vein, let’s consider privacy concerns. It is of course important that a consumer lending model not leak information about the financial or other data of the individual applicants in the training data. (One way this can happen is if model predictions are accompanied by confidence scores; if the model expresses 100% confidence that a loan application will default, it’s likely because that application, with a default outcome, was in the training data.) For this kind of traditional, more narrow ML, there are now techniques for mitigating such leaks by making sure model outputs are not overly dependent on any particular piece of training data.

Related content
Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.

But the open-ended nature of generative AI broadens the set of concerns from verbatim leaks of training data to more subtle copying phenomena. For example, if a programmer has written some code using certain variable names and then asks an LLM for help writing a subroutine, the LLM may generate code from its training data, but with the original variable names replaced with those chosen by the programmer. So the generated code is not literally in the training data but is different only in a cosmetic way.

There are defenses against these challenges, including curation of training data to exclude private information, and techniques to detect similarity of code passages. But more subtle forms of replication are also possible, and as I discuss below, this eventually bleeds into settings where generative AI reproduces the “style” of content in its training data.

And while traditional ML has begun developing techniques for explaining the decisions or predictions of trained models, they don’t always transfer to generative AI, in part because current generative models sometimes produce content that simply cannot be explained (such as scientific citations that don’t exist, something I’ll discuss shortly).

The special challenges of responsible generative AI

So the usual concerns of responsible AI become more difficult for generative AI. But generative AI also gives rise to challenges that simply don’t exist for predictive models that are more narrow. Let’s consider some of these.

Toxicity. A primary concern with generative AI is the possibility of generating content (whether it be text, images, or other modalities) that is offensive, disturbing, or otherwise inappropriate. Once again, it is hard to even define and scope the problem. The subjectivity involved in determining what constitutes toxic content is an additional challenge, and the boundary between restricting toxic content and censorship may be murky and context- and culture-dependent. Should quotations that would be considered offensive out of context be suppressed if they are clearly labeled as quotations? What about opinions that may be offensive to some users but are clearly labeled as opinions? Technical challenges include offensive content that may be worded in a very subtle or indirect fashion, without the use of obviously inflammatory language.

Related content
Prompt engineering enables researchers to generate customized training examples for lightweight “student” models.

Hallucinations. Considering the next-word distribution sampling employed by LLMs, it is perhaps not surprising that in more objective or factual use cases, LLMs are susceptible to what are sometimes called hallucinations — assertions or claims that sound plausible but are verifiably incorrect. For example, a common phenomenon with current LLMs is creating nonexistent scientific citations. If one of these LLMs is prompted with the request Tell me about some papers by Michael Kearns, it is not actually searching for legitimate citations but generating ones from the distribution of words associated with that author. The result will be realistic titles and topics in the area of machine learning, but not real articles, and they may include plausible coauthors but not actual ones.

In a similar vein, prompts for financial news stories result not in a search of (say) Wall Street Journal articles but news articles fabricated by the LLM using the lexicon of finance. Note that in our fairy tale generation scenario, this kind of creativity was harmless and even desirable. But current LLMs have no levers that let users differentiate between “creativity on” and “creativity off” use cases.

Related content
Combining contrastive training and selection of hard negative examples establishes new benchmarks.

Intellectual property. A problem with early LLMs was their tendency to occasionally produce text or code passages that were verbatim regurgitations of parts of their training data, resulting in privacy and other concerns. But even improvements in this regard have not prevented reproductions of training content that are more ambiguous and nuanced. Consider the aforementioned prompt for a multimodal generative model Create a painting of a skateboarding cat in the style of Andy Warhol. If the model is able to do so in a convincing yet still original manner because it was trained on actual Warhol images, objections to such mimicry may arise.

Plagiarism and cheating. The creative capabilities of generative AI give rise to worries that it will be used to write college essays, writing samples for job applications, and other forms of cheating or illicit copying. Debates on this topic are happening at universities and many other institutions, and attitudes vary widely. Some are in favor of explicitly forbidding any use of generative AI in settings where content is being graded or evaluated, while others argue that educational practices must adapt to, and even embrace, the new technology. But the underlying challenge of verifying that a given piece of content was authored by a person is likely to present concerns in many contexts.

Disruption of the nature of work. The proficiency with which generative AI is able to create compelling text and images, perform well on standardized tests, write entire articles on given topics, and successfully summarize or improve the grammar of provided articles has created some anxiety that some professions may be replaced or seriously disrupted by the technology. While this may be premature, it does seem that generative AI will have a transformative effect on many aspects of work, allowing many tasks previously beyond automation to be delegated to machines.

What can we do?

The challenges listed above may seem daunting, in part because of how unfamiliar they are compared to those of previous generations of AI. But as technologists and society learn more about generative AI and its uses and limitations, new science and new policies are already being created to address those challenges.

For toxicity and fairness, careful curation of training data can provide some improvements. After all, if the data doesn’t contain any offensive or biased words or phrases, an LLM simply won’t be able to generate them. But this approach requires that we identify those offensive phrases in advance and are certain that there are absolutely no contexts in which we would want them in the output. Use-case-specific testing can also help address fairness concerns — for instance, before generative AI is used in high-risk domains such as consumer lending, the model could be tested for fairness for that particular application, much as we might do for more narrow predictive models.

Related content
Amazon Visiting Academic Barbara Poblete helps to build safer, more-diverse online communities — and to aid disaster response.

For less targeted notions of toxicity, a natural approach is to train what we might call guardrail models that detect and filter out unwanted content in the training data, in input prompts, and in generated outputs. Such models require human-annotated training data in which varying types and degrees of toxicity or bias are identified, which the model can generalize from. In general, it is easier to control the output of a generative model than it is to curate the training data and prompts, given the extreme generality of the tasks we intend to address.

For the challenge of producing high-fidelity content free of hallucinations, an important first step is to educate users about how generative AI actually works, so there is no expectation that the citations or news-like stories produced are always genuine or factually correct. Indeed, some current LLMs, when pressed on their inability to quote actual citations, will tell the user that they are just language models that don’t verify their content with external sources. Such disclaimers should be more frequent and clear. And the specific case of hallucinated citations could be mitigated by augmenting LLMs with independent, verified citation databases and similar sources, using approaches such as retrieval-augmented generation. Another nascent but intriguing approach is to develop methods for attributing generated outputs to particular pieces of training data, allowing users to assess the validity of those sources. This could help with explainability as well.

Concerns around intellectual property are likely to be addressed over time by a mixture of technology, policy, and legal mechanisms. In the near term, science is beginning to emerge around various notions of model disgorgement, in which protected content or its effects on generative outputs are reduced or removed. One technology that might eventually prove relevant is differential privacy, in which a model is trained in a way that ensures that any particular piece of training data has negligible effects on the outputs the model subsequently produces.

Related content
By exploiting consistencies across components of ensemble classifiers, a new approach reduces data requirements by up to 89%.

Another approach is so-called sharding approaches, which divide the training data into smaller portions on which separate submodels are trained; the submodels are then combined to form the overall model. In order to undo the effects of any particular item of data on the overall model, we need only remove it from its shard and retrain that submodel, rather than retraining the entire model (which for generative AI would be sufficiently expensive as to be prohibitive).

Finally, we can consider filtering or blocking approaches, where before presentation to the user, generated content is explicitly compared to protected content in the training data or elsewhere and suppressed (or replaced) if it is too similar. Limiting the number of times any specific piece of content appears in the training data also proves helpful in reducing verbatim outputs.

Some interesting approaches to discouraging cheating using generative AI are already under development. One is to simply train a model to detect whether a given (say) text was produced by a human or by a generative model. A potential drawback is that this creates an arms race between detection models and generative AI, and since the purpose of generative AI is to produce high-quality content plausibly generated by a human, it’s not clear that detection methods will succeed in the long run.

An intriguing alternative is watermarking or fingerprinting approaches that would be implemented by the developers of generative models themselves. For example, since at each step LLMs are drawing from the distribution over the next word given the text so far, we can divide the candidate words into “red” and “green” lists that are roughly 50% of the probability each; then we can have the LLM draw only from the green list. Since the words on the green list are not known to users, the likelihood that a human would produce a 10-word sentence that also drew only from the green lists is ½ raised to the 10th power, which is only about 0.0009. In this way we can view all-green content as providing a virtual proof of LLM generation. Note that the LLM developers would need to provide such proofs or certificates as part of their service offering.

LLM watermarking.AI.gif
At each step, the model secretly divides the possible next words into green and red lists. The next word is then sampled only from the green list.
LLM watermarking.human.gif
A human generating a sentence is unaware of the division into green and red lists and is thus very likely to choose a sequence that mixes green and red words. Since, on long sentences, the likelihood of a human choosing an all-green sequence is vanishingly small, we can view all-green sentences as containing a proof they were generated by AI.

Disruption to work as we know it does not have any obvious technical defenses, and opinions vary widely on where things will settle. Clearly, generative AI could be an effective productivity tool in many professional settings, and this will at a minimum alter the current division of labor between humans and machines. It’s also possible that the technology will open up existing occupations to a wider community (a recent and culturally specific but not entirely ludicrous quip on social media was “English is the new programming language”, a nod to LLM code generation abilities) or even create new forms of employment, such as prompt engineer (a topic with its own Wikipedia entry, created in just February of this year).

But perhaps the greatest defense against concerns over generative AI may come from the eventual specialization of use cases. Right now, generative AI is being treated as a fascinating, open-ended playground in which our expectations and goals are unclear. As we have discussed, this open-endedness and the plethora of possible uses are major sources of the challenges to responsible AI I have outlined.

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

But soon more applied and focused uses will emerge, like some of those I suggested earlier. For instance, consider using an LLM as a virtual focus group — creating prompts that describe hypothetical individuals and their demographic properties (age, gender, occupation, location, etc.) and then asking the LLM which of two described products they might prefer.

In this application, we might worry much less about censoring content and much more about removing any even remotely toxic output. And we might choose not to eradicate the correlations between gender and the affinity for certain products in service of fairness, since such correlations are valuable to the marketer. The point is that the more specific our goals for generative AI are, the easier it is to make sensible context-dependent choices; our choices become more fraught and difficult when our expectations are vague.

Finally, we note that end user education and training will play a crucial role in the productive and safe use of generative AI. As the potential uses and harms of generative AI become better and more widely understood, users will augment some of the defenses I have outlined above with their own common sense.

Conclusion

Generative AI has stoked both legitimate enthusiasm and legitimate fears. I have attempted to partially survey the landscape of concerns and to propose forward-looking approaches for addressing them. It should be emphasized that addressing responsible-AI risks in the generative age will be an iterative process: there will be no “getting it right” once and for all. This landscape is sure to shift, with changes to both the technology and our attitudes toward it; the only constant will be the necessity of balancing the enthusiasm with practical and effective checks on the concerns.

Related content

US, CA, Santa Clara
We are seeking an Applied Scientist II to join Amazon Customer Service's Science team, where you will build AI-based automated customer service solutions using state-of-the-art techniques in retrieval-augmented generation (RAG), agentic AI, and post-training of large language models. You will work at the intersection of research and production, developing intelligent systems that directly impact millions of customers while collaborating with scientists, engineers, and product managers in a fast-paced, innovative environment. Key job responsibilities - Design, develop, and deploy information retrieval systems and RAG pipelines using embedding models, reranking algorithms, and generative models to improve customer service automation - Conduct post-training of large language models using techniques such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO) to optimize model performance for customer service tasks - Build and curate high-quality datasets for model training and evaluation, ensuring data quality and relevance for customer service applications - Design and implement comprehensive evaluation frameworks, including data curation, metrics development, and methods such as LLM-as-a-judge to assess model performance - Develop AI agents for automated customer service, understanding their advantages and common pitfalls, and implementing solutions that balance automation with customer satisfaction - Independently perform research and development with minimal guidance, staying current with the latest advances in machine learning and AI - Collaborate with cross-functional teams including engineering, product management, and operations to translate research into production systems - Publish findings and contribute to the broader scientific community through papers, patents, and open-source contributions - Monitor and improve deployed models based on real-world performance metrics and customer feedback A day in the life As an Applied Scientist II, you will start your day reviewing metrics from deployed models and identifying opportunities for improvement. You might spend your morning experimenting with new post-training techniques to improve model accuracy, then collaborate with engineers to integrate your latest model into production systems. You will participate in design reviews, share your findings with the team, and mentor junior scientists. You will balance research exploration with practical implementation, always keeping the customer experience at the forefront of your work. You will have the autonomy to drive your own research agenda while contributing to team goals and deliverables. About the team The Amazon Customer Service Science team is dedicated to revolutionizing customer support through advanced AI and machine learning. We are a diverse group of scientists and engineers working on some of the most challenging problems in natural language understanding and AI automation. Our team values innovation, collaboration, and a customer-obsessed mindset. We encourage experimentation, celebrate learning from failures, and are committed to maintaining Amazon's high bar for scientific rigor and operational excellence. You will have access to world-class computing resources, massive datasets, and the opportunity to work alongside some of the brightest minds in AI and machine learning.
US, CA, Sunnyvale
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As a Senior Applied Scientist, you will develop and improve machine learning systems that help robots perceive, reason, and act in real-world environments. You will leverage state-of-the-art models (open source and internal research), evaluate them on representative tasks, and adapt/optimize them to meet robustness, safety, and performance needs. You will invent new algorithms where gaps exist. You’ll collaborate closely with research, controls, hardware, and product-facing teams, and your outputs will be used by downstream teams to further customize and deploy on specific robot embodiments. Key job responsibilities As a Senior Applied Scientist in the Foundations Model team, you will: - Leverage state-of-the-art models for targeted tasks, environments, and robot embodiments through fine-tuning and optimization. - Execute rapid, rigorous experimentation with reproducible results and solid engineering practices, closing the gap between sim and real environments. - Build and run capability evaluations/benchmarks to clearly profile performance, generalization, and failure modes. - Contribute to the data and training workflow: collection/curation, dataset quality/provenance, and repeatable training recipes. - Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies.
US, CA, Sunnyvale
Amazon's AGI Information is seeking an exceptional Applied Scientist to drive science advancements in the Amazon Knowledge Graph team (AKG). AKG is re-inventing knowledge graphs for the LLM era, optimizing for LLM grounding. At the same time, AKG is innovating to utilize LLMs in the knowledge graph construction pipelines to overcome obstacles that traditional technologies could not overcome. As a member of the AKG IR team, you will have the opportunity to work on interesting problems with immediate customer impact. The team is addressing challenges in web-scale knowledge mining, fact verification, multilingual information retrieval, and agent memory operating over Graphs. You will also have the opportunity to work with scientists working on the other challenges, and with the engineering teams that deliver the science advancements to our customers. A successful candidate has a strong machine learning and agent background, is a master of state-of-the-art techniques, has a strong publication record, has a desire to push the envelope in one or more of the above areas, and has a track record of delivering to customers. The ideal candidate enjoys operating in dynamic environments, is self-motivated to take on new challenges, and enjoys working with customers, stakeholders, and engineering teams to deliver big customer impact, shipping solutions via rapid experimentation and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems. You will collaborate with applied scientists and engineers to develop novel algorithms and modeling techniques to build the knowledge graph that delivers fresh factual knowledge to our customers, and that automates the knowledge graph construction pipelines to scale to many billions of facts. Your first responsibility will be to solve entity resolution to enable conflating facts from multiple sources into a single graph entity for each real world entity. You will develop generic solutions that work fo all classes of data in AKG (e.g., people, places, movies, etc.), that cope with sparse, noisy data, that scale to hundreds of millions of entities, and that can handle streaming data. You will define a roadmap to make progress incrementally and you will insist on scientific rigor, leading by example.
US, WA, Redmond
Amazon Leo is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Communications Engineer in Modeling and Simulation, this role is primarily responsible for the developing and analyzing high level system resource allocation techniques for links to ensure optimal system and network performance from the capacity, coverage, power consumption, and availability point of view. Be part of the team defining the overall communication system and architecture of Amazon Leo’s broadband wireless network. This is a unique opportunity to innovate and define novel wireless technology with few legacy constraints. The team develops and designs the communication system of Leo and analyzes its overall system level performance, such as overall throughput, latency, system availability, packet loss, etc., as well as compatibility for both connectivity and interference mitigation with other space and terrestrial systems. This role in particular will be responsible for 1) evaluating complex multi-disciplinary trades involving RF bandwidth and network resource allocation to customers, 2) understanding and designing around hardware/software capabilities and constraints to support a dynamic network topology, 3) developing heuristic or solver-based algorithms to continuously improve and efficiently use available resources, 4) demonstrating their viability through detailed modeling and simulation, 5) working with operational teams to ensure they are implemented. This role will be part of a team developing the necessary simulation tools, with particular emphasis on coverage, capacity, latency and availability, considering the yearly growth of the satellite constellation and terrestrial network. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. Key job responsibilities • Work within a project team and take the responsibility for the Leo's overall communication system design and architecture • Extend existing code/tools and create simulation models representative of the target system, primarily in MATLAB • Design interconnection strategies between fronthaul and backhaul nodes. Analyze link availability, investigate link outages, and optimize algorithms to study and maximize network performance • Use RF and optical link budgets with orbital constellation dynamics to model time-varying system capacity • Conduct trade-off analysis to benefit customer experience and optimization of resources (costs, power, spectrum), including optimization of satellite constellation design and link selection • Work closely with implementation teams to simulate expected system level performance and provide quick feedback on potential improvements • Analyze and minimize potential self-interference or interference with other communication systems • Provide visualizations, document results, and communicate them across multi-disciplinary project teams to make key architectural decisions
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference / structural econometrics skillsets to solve real world problems. The intern will work in the area of Store Economics and Science (SEAS) and develop models to SEAS. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The Stores Economics and Science Team (SEAS) is a Stores-wide interdisciplinary team at Amazon with a "peak jumping" mission focused on disruptive innovation. The team applies science, economics, and engineering expertise to tackle the business's most critical problems, working to move from local to global optima across Amazon Stores operations. SEAS builds partnerships with organizations throughout Amazon Stores to pursue this mission, exploring frontier science while learning from the experience and perspective of others. Their approach involves testing solutions first at a small scale, then aligning more broadly to build scalable solutions that can be implemented across the organization. The team works backwards from customers using their unique scientific expertise to add value, takes on long-run and high-risk projects that business teams typically wouldn't pursue, helps teams with kickstart problems by building practical prototypes, raises the scientific bar at Amazon, and builds and shares software that makes Amazon more productive.
US, WA, Seattle
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced electromechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. Amazon is seeking a talented and motivated Principal Applied Scientist to develop tactile sensors and guide the sensing strategy for our gripper design. The ideal candidate will have extensive experience in sensor development, analysis, testing and integration. This candidate must have the ability to work well both independently and in a multidisciplinary team setting. Key job responsibilities - Author functional requirements, design verification plans and test procedures - Develop design concepts which meet the requirements - Work with engineering team members to implement the concepts in a product design - Support product releases to manufacturing and customer deployments - Work efficiently to support aggressive schedules
US, CA, Cupertino
The AWS Neuron Science Team is looking for talented scientists to enhance our software stack, accelerating customer adoption of Trainium and Inferentia accelerators. In this role, you will work directly with external and internal customers to identify key adoption barriers and optimization opportunities. You'll collaborate closely with our engineering teams to implement innovative solutions and engage with academic and research communities to advance state-of-the-art ML systems. As part of a strategic growth area for AWS, you'll work alongside distinguished engineers and scientists in an exciting and impactful environment. We actively work on these areas: - AI for Systems: Developing and applying ML/RL approaches for kernel/code generation and optimization - Machine Learning Compiler: Creating advanced compiler techniques for ML workloads - System Robustness: Building tools for accuracy and reliability validation - Efficient Kernel Development: Designing high-performance kernels optimized for our ML accelerator architectures A day in the life AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cloud computing offerings across the AWS portfolio. About the team AWS Neuron is the software of Trainium and Inferentia, the AWS Machine Learning chips. Inferentia delivers best-in-class ML inference performance at the lowest cost in the cloud to our AWS customers. Trainium is designed to deliver the best-in-class ML training performance at the lowest training cost in the cloud, and it’s all being enabled by AWS Neuron. Neuron is a Software that include ML compiler and native integration into popular ML frameworks. Our products are being used at scale with external customers like Anthropic and Databricks as well as internal customers like Alexa, Amazon Bedrocks, Amazon Robotics, Amazon Ads, Amazon Rekognition and many more.
US, TX, Austin
Amazon Security is seeking an Applied Scientist to work on GenAI acceleration within the Secure Third Party Tools (S3T) organization. The S3T team has bold ambitions to re-imagine security products that serve Amazon's pace of innovation at our global scale. This role will focus on leveraging large language models and agentic AI to transform third-party security risk management, automate complex vendor assessments, streamline controllership processes, and dramatically reduce assessment cycle times. You will drive builder efficiency and deliver bar-raising security engagements across Amazon. Key job responsibilities Own and drive end-to-end technical delivery for scoped science initiatives focused on third-party security risk management, independently defining research agendas, success metrics, and multi-quarter roadmaps with minimal oversight. Understanding approaches to automate third-party security review processes using state-of-the-art large language models, development intelligent systems for vendor assessment document analysis, security questionnaire automation, risk signal extraction, and compliance decision support. Build advanced GenAI and agentic frameworks including multi-agent orchestration, RAG pipelines, and autonomous workflows purpose-built for third-party risk evaluation, security documentation processing, and scalable vendor assessment at enterprise scale. Build ML-powered risk intelligence capabilities that enhance third-party threat detection, vulnerability classification, and continuous monitoring throughout the vendor lifecycle. Coordinate with Software Engineering and Data Engineering to deploy production-grade ML solutions that integrate seamlessly with existing third-party risk management workflows and scale across the organization. About the team Security is central to maintaining customer trust and delivering delightful customer experiences. At Amazon, our Security organization is designed to drive bar-raising security engagements. Our vision is that Builders raise the Amazon security bar when they use our recommended tools and processes, with no overhead to their business. Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, CA, Mountain View
At AWS Healthcare AI, we're revolutionizing healthcare delivery through AI solutions that serve millions globally. As a pioneer in healthcare technology, we're building next-generation services that combine Amazon's world-class AI infrastructure with deep healthcare expertise. Our mission is to accelerate our healthcare businesses by delivering intuitive and differentiated technology solutions that solve enduring business challenges. The AWS Healthcare AI organization includes services such as HealthScribe, Comprehend Medical, HealthLake, and more. We're seeking a Senior Applied Scientist to join our team working on our AI driven clinical solutions that are transforming how clinicians interact with patients and document care. Key job responsibilities To be successful in this mission, we are seeking an Applied Scientist to contribute to the research and development of new, highly influencial AI applications that re-imagine experiences for end-customers (e.g., consumers, patients), frontline workers (e.g., customer service agents, clinicians), and back-office staff (e.g., claims processing, medical coding). As a leading subject matter expert in NLU, deep learning, knowledge representation, foundation models, and reinforcement learning, you will collaborate with a team of scientists to invent novel, generative AI-powered experiences. This role involves defining research directions, developing new ML techniques, conducting rigorous experiments, and ensuring research translates to impactful products. You will be a hands-on technical innovator who is passionate about building scalable scientific solutions. You will set the standard for excellence, invent scalable, scientifically sound solutions across teams, define evaluation methods, and lead complex reviews. This role wields significant influence across AWS, Amazon, and the global research community.
US, CA, San Francisco
The Amazon Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, all working to innovate in quantum computing for the benefit of our customers. We are looking to hire an Applied Scientist to design and model novel superconducting quantum devices (including qubits), readout and control schemes, and advanced quantum processors. The ideal candidate will have a track record of original scientific contributions, strong engineering principles, and/or software development experience. Resourcefulness, as well as strong organizational and communication skills, is essential. About the team The Amazon Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. Inclusive Team Culture Here at Amazon, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Export Control Requirement Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a US export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a U.S export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility.