Scientific frontiers of agentic AI

The language AI agents might speak, sharing context without compromising privacy, modeling agentic negotiations, and understanding users’ commonsense policies are some of the open scientific questions that researchers in agentic AI will need to grapple with.

It feels as though we’ve barely absorbed the rapid development and adoption of generative AI technologies such as large language models (LLMs) before the next phenomenon is already upon us, namely agentic AI. Standalone LLMs can be thought of as “chatbots in a sandbox”, the sandbox being a metaphor for a safe and contained play space with limited interaction with the world beyond. In contrast, the vision of agentic AI is a near (or already here?) future in which LLMs are the underlying engines for complex systems that have access to rich external resources such as consumer apps and services, social media, banking and payment systems — in principle, anything you can reach on the Internet. A dream of the AI industry for decades, the “agent” of agentic AI is an intelligent personal assistant that knows your goals and preferences and that you trust to act on your behalf in the real world, much as you might a human assistant.

Related content
Real-world deployment requires notions of fairness that are task relevant and responsive to the available data, recognition of unforeseen variation in the “last mile” of AI delivery, and collaboration with AI activists.

For example, in service of arranging travel plans, my personal agentic AI assistant would know my preferences (both professional and recreational) for flights and airlines, lodging, car rentals, dining, and activities. It would know my calendar and thus be able to schedule around other commitments. It would know my frequent-flier numbers and hospitality accounts and be able to book and pay for itineraries on my behalf. Most importantly, it would not simply automate these tasks but do so intelligently and intuitively, making “obvious” decisions unilaterally and quietly but being sure to check in with me whenever ambiguity or nuance arises (such as whether those theater tickets on a business trip to New York should be charged to my personal or work credit card).

To AI insiders, the progression from generative to agentic AI is exciting but also natural. In just a few years, we have gone from impressive but glorified chatbots with myriad identifiable shortcomings to feature-rich systems exhibiting human-like capabilities not only in language and image generation but in coding, mathematical reasoning, optimization, workflow planning, and many other areas. The increased skill set and reliability of core LLMs has naturally caused the industry to move “up the stack”, to a world in which the LLM itself fades into the background and becomes a new kind of intelligent operating system upon which all manner of powerful functionality can be built. In the same way that your PC or Mac seamlessly handles many details that the vast majority of users don’t (want to) know about — like exactly how and where on your hard drive to store and find files, the networking details of connecting to remote web servers, and other fine-grained operating-system details — agentic systems strive to abstract away the messy and tedious details of many higher-level tasks that, today, we all perform ourselves.

But while the overarching vision of agentic AI is already relatively clear, there are some fundamental scientific and technical questions about the technology whose answers — or even proper formulation — are uncertain (but interesting!). We’ll explore some of them here.

What language will agents speak?

The history of computing technology features a steady march toward systems and devices that are ever more friendly, accessible, and intuitive to human users. Examples include the gradual displacement of clunky teletype monitors and obscure command-line incantations by graphical user interfaces with desktop and folder metaphors, and the evolution from low-level networked file transfer protocols to the seamless ease of the web. And generative AI itself has also made previously specialized tasks like coding accessible to a much broader base of users. In other words, modern technology is human-centric, designed for use and consumption by ordinary people with little or no specialized training.

But now these same technologies and systems will also need to be navigated by agentic AI, and as adept as LLMs are with human language, it may not be their most natural mode of communication and understanding. Thus, a parallel migration to the native language of generative AI may be coming.

What is that native language? When generative AI consumes a piece of content — whether it be a user prompt, a document, or an image — it translates it into an internal representation that is more convenient for subsequent processing and manipulation. There are many examples in biology of such internal representations. For instance, in our own visual systems, it has been known for some time that certain types of inputs (such as facial images) cause specific cells in our brains to respond (a phenomenon known as neuronal selectivity). Thus, an entire category of important images elicits similar neural behaviors.

Related content
Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

In a similar vein, the neural networks underlying modern AI typically translate any input into what is known as an embedding space, which can be thought of as a physical map in which items with similar meanings are placed near each other, and those with unrelated meanings are placed far apart. For example, in an image-embedding space, two photos of different families would be nearer to each other than either would be to a landscape. In a language-embedding space, two romance novels would be nearer to each other than to a car owner’s manual. And hybrid or multimodal embedding spaces would place images of cars near their owner manuals.

Embeddings are an abstraction that provides great power and generality, in the form of the ability to represent not the literal original content (like a long sequence of words) but something closer to its underlying meaning. The price for this abstraction is loss of detail and information. For instance, the embedding of this entire article would place it in close proximity to similar content (for instance, general-audience science prose) but would not contain enough information to re-create the article verbatim. The lossy nature of embeddings has implications we shall return to shortly.

Embeddings are learned from the massive amount of information on the Internet and elsewhere about implicit correspondences. Even aliens landing on earth who could read English but knew nothing else about the world would quickly realize that “doctor” and “hospital” are closely related because of their frequent proximity in text, even if they had no idea what these words actually signified. Furthermore, not only do embeddings permit generative AI to understand existing content, but they allow it to generate new content. When we ask for a picture of a squirrel on a snowboard in the style of Andy Warhol, it is the embedding that lets the technology explore novel images that interpolate between those of actual Warhols, squirrels, and snowboards.

Thus, the inherent language of generative (and therefore agentic) AI is not the sentences and images we are so familiar with but their embeddings. Let us now reconsider a world in which agents interact with humans, content, and other agents. Obviously, we will continue to expect agentic AI to communicate with humans in ordinary language and images. But there is no reason for agent-to-agent communication to take place in human languages; per the discussion above, it would be more natural for it to occur in the native embedding language of the underlying neural networks.

My personal agent, working on a vacation itinerary, might ingest materials such as my previous flights, hotels, and vacation photos to understand my interests and preferences. But to communicate those preferences to another agent — say, an agent aggregating hotel details, prices, and availability — it will not provide the raw source materials; in addition to being massively inefficient and redundant, that could present privacy concerns (more on this below). Rather, my agent will summarize my preferences as a point, or perhaps many points, in an embedding space.

Restaurant embeddings.jpg
In this example, the red, green, and blue points are three-dimensional embeddings of restaurants at which three people (Alice, Bob, and Chris) have eaten. (A real-world embedding, by contrast, might have hundreds of dimensions.) Each glowing point represents the center of one of the clusters, and its values summarize the restaurant preferences of the corresponding person. AI agents could use such vector representations, rather than text, to share information with each other.

By similar reasoning, we might also expect the gradual development of an “agentic Web” meant for navigation by AI, in which the text and images on websites are pre-translated into embeddings that are illegible to humans but are massively more efficient than requiring agents to perform these translations themselves with every visit. In the same way that many websites today have options for English, Spanish, Chinese, and many other languages, there would be an option for Agentic.

All the above presupposes that embedding spaces are shared and standardized across generative and agentic AI systems. This is not true today: embeddings differ from model to model and are often considered proprietary. It’s as if all generative AI systems speak slightly different dialects of some underlying lingua franca. But these observations about agentic language and communication may foreshadow the need for AI scientists to work toward standardization, at least in some form. Each agent can have some special and proprietary details to its embeddings — for instance, a financial-services agent might want to use more of its embedding space for financial terminology than an agentic travel assistant would — but the benefits of a common base embedding are compelling.

Keeping things in context

Even casual users of LLMs may be aware of the notion of “context”, which is informally what and how much the LLM remembers and understands about its recent interactions and is typically measured (at least cosmetically) by the number of words or tokens (word parts) recalled. There is again an apt metaphor with human cognition, in the sense that context can be thought of as the “working memory” of the LLM. And like our own working memory, it can be selective and imperfect.

If we participate in an experiment to test how many random digits or words we can memorize at different time scales, we will of course eventually make mistakes if asked to remember too many things for too long. But we will not forget what the task itself is; our short-term memory may be fallible, but we generally grasp the bigger picture.

Related content
Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

These same properties broadly hold for LLM context — which is sometimes surprising to users, since we expect computers to be perfect at memorization but highly fallible on more abstract tasks. But when we remember that LLMs do not operate directly on the sequence of words or tokens in the context but on the lossy embedding of that sequence, these properties become less mysterious (though perhaps not less frustrating when an LLM can’t remember something it did just a few steps ago).

Some of the principal advances in LLM technology have been around improvements in context: LLMs can now remember and understand more context and leverage that context to tailor their responses with greater accuracy and sophistication. This greater window of working memory is crucial for many tasks to which we would like to apply agentic AI, such as having an LLM read and understand the entire code base of a large software development project, or all the documents relevant to a complex legal case, and then be able to reason about the contents.

How will context and its limitations affect agentic AI? If embeddings are the language of LLMs, and context is the expression of an LLM’s working memory in that language, a crucial design decision in agent-agent interactions will be how much context to share. Sharing too little will handicap the functionality and efficiency of agentic dialogues; sharing too much will result in unnecessary complexity and potential privacy concerns (just as in human-to-human interactions).

Let us illustrate by returning to my personal agent, who having found and booked my hotel is working with an external airline flight aggregation agent. It would be natural for my agent to communicate lots of context about my travel preferences, perhaps including conditions under which I might be willing to pay or use miles for an upgrade to business class (such as an overnight international flight). But my agent should not communicate context about my broader financial status (savings, debt, investment portfolio), even though in theory these details might correlate with my willingness to pay for an upgrade. When we consider that context is not my verbatim history with my travel agent, but an abstract summary in embedding space, decisions about contextual boundaries and how to enforce them become difficult.

Indeed, this is a relatively untouched scientific topic, and researchers are only just beginning to consider questions such as what can be reverse-engineered about raw data given only its embedding. While human or system prompts to shape inter-agent dealings might be a stopgap (“be sure not to tell the flight agent any unnecessary financial information”), a principled understanding of embedding privacy vulnerabilities and how to mitigate them (perhaps via techniques such as differential privacy) is likely to be an important research area going forward.

Agentic bargains

So far, we’ve talked a fair amount about interagent dialogues but have treated these conversations rather generally, much as if we were speaking about two humans in a collaborative setting. But there will be important categories of interaction that will need to be more structured and formal, with identifiable outcomes that all parties commit to. Negotiation, bargaining, and other strategic interactions are a prime example.

I obviously want my personal agent, when booking hotels and flights for my trips, to get the best possible prices and other conditions (room type and view, flight seat location, and so on). The agents aggregating hotels and flights would similarly prefer that I pay more rather than less, on behalf of their own clients and users.

For my agent to act in my interests in these settings, I’ll need to specify at least some broad constraints on my preferences and willingness to pay for them, and not in fuzzy terms: I can’t expect my agent to simply “know a bargain when it sees one” the way I might if I were handling all the arrangements myself, especially because my notion of a bargain might be highly subjective and dependent on many factors. Again, a near-term makeshift approach might address this via prompt shaping — “be sure to get the best deal possible, as long as the flight is nonstop and leaves in the morning, and I have an aisle seat” — but longer-term solutions will have to be more sophisticated and granular.

Related content
Amazon Research Award recipient Éva Tardos studies complex theoretical questions that have far-ranging practical consequences.

Of course, the mathematical and scientific foundations of negotiating and bargaining have been well studied for decades by game theorists, microeconomists, and related research communities. Their analyses typically begin by presuming the articulation of utility functions for all the parties involved — an abstraction capturing (for example) my travel preferences and willingness to pay for them. The literature also considers settings in which I can’t quantitatively express my own utilities but “know bargains when I see them”, in the sense that given two options (a middle seat on a long flight for $200 vs. a first-class seat for $2,000), I will make the choice consistent with my unknown utilities. (This is the domain of the aptly named utility elicitation.)

Much of the science in such areas is devoted to the question of what “should” happen when fully rational parties with precisely specified utilities, perfect memory, and unlimited computational power come to the proverbial bargaining table; equilibrium analysis in game theory is just one example of this kind of research. But given our observations about the human-like cognitive abilities and shortcomings of LLMs, perhaps a more relevant starting point for agentic negotiation is the field of behavioral economics. Instead of asking what should happen when perfectly rational agents interact, behavioral economics asks what does happen when actual human agents interact strategically. And this is often quite different, in interesting ways, than what fully rational agents would do.

For instance, consider the canonical example of behavioral game theory known as the ultimatum game. In this game, there is $10 to potentially divide between two players, Alice and Bob. Alice first proposes any split she likes. Bob then either accepts Alice’s proposal, in which case both parties get their proposed shares, or rejects Alice’s proposal, in which case each party receives nothing. The equilibrium analysis is straightforward: Alice, being fully rational and knowing that Bob is also, proposes the smallest nonzero amount to Bob, which is a penny. Bob, being fully rational, would prefer to receive a penny than nothing, so he accepts.

Ultimatum game 1.jpg
Game theory (left) supposes that the recipient in the ultimatum game will accept a low offer, since something is better than nothing, but behavioral economics (right) reveals that, in fact, offers tend to concentrate in the range of $3 to $5, and lower offers are frequently rejected.

Nothing remotely like this happens when humans play. Across hundreds of experiments varying myriad conditions — social, cultural, gender, wealth, etc. — a remarkably consistent aggregate behavior emerges. Alice almost always proposes a share to Bob of between $3 and $5 (the fact that Alice gets to move first seems to prime both players for Bob to potentially get less than half the pie). And conditioned on Alice’s proposal being in this range, Bob almost always accepts her offer. But on those rare occasions in which Alice is more aggressive and offers Bob an amount much less than $3, Bob’s rejection rate skyrockets. It’s as if pairs of people — who have never heard of or played the ultimatum game before — have an evolutionarily hardwired sense of what’s “fair” in this setting.

Ultimatum game bar graph.jpg
The way in which the ultimatum game is played — the frequency of particular offers and the rate of rejection — varies across cultures, but this graph illustrates general trends in the data. Offers tend to concentrate between $3 and $5, with a steep falloff above $5, and the rejection rate is high for low offers.

Now back to LLMs and agentic AI. There is already a small but growing literature on what we might call LLM behavioral game theory and economics, in which experiments like the one above are replicated — except human participants are replaced by AI. One early work showed that LLMs almost exactly replicated human behavior in the ultimatum game, as well as other classical behavioral-economics findings.

Note that it is possible to simulate the demographic variability of human subjects in such experiments via LLM prompting, e.g., “You are Alice, a 37-year-old Hispanic medical technician living in Boston, Massachusetts”. Other studies have again shown human-like behavior of LLMs in trading games, price negotiations, and other settings. A very recent study claims that LLMs can even engage in collusive price-fixing behaviors and discusses potential regulatory implications for AI agents.

Once we have a grasp on the behaviors of agentic AI in strategic settings, we can turn to shaping that behavior in desired ways. The field of mechanism design in economics complements areas like game theory by asking questions like “given that this is how agents generally negotiate, how can we structure those negotiations to make them fair and beneficial?” A classic example is the so-called second-price auction, where the highest bidder wins the item — but only pays the second highest bid. This design is more truthful than a standard first-price auction, in the sense that everyone’s optimal strategy is to simply bid the price at which they are indifferent to winning or losing (their subjective valuation of the item); nobody needs to think about other agents’ behaviors or valuations.

We anticipate a proliferation of research on topics like these, as agentic bargaining becomes commonplace and an important component of what we delegate to our AI assistants.

The enduring challenge of common sense

I’ll close with some thoughts on a topic that has bedeviled AI from its earliest days and will continue to do so in the agentic era, albeit in new and more personalized ways. It’s a topic that is as fundamental as it is hard to define: common sense.

By common sense, we mean things that are “obvious”, that any human with enough experience in the world would know without explicitly being told. For example, imagine a glass full of water sitting on a table. We would all agree that if we move the glass to the left or right on the table, it’s still a glass of water. But if we turn it upside down, it’s still a glass on the table, but no longer a glass of water (and is also a mess to be cleaned up). It’s quite unlikely any of us were ever sat down and run through this narrative, and it’s also a good bet that you’ve never deliberately considered such facts before. But we all know and agree on them.

Related content
Using large language models to discern commonsense relationships can improve performance on downstream tasks by as much as 60%.

Figuring out how to imbue AI models and systems with common sense has been a priority of AI research for decades. Before the advent of modern large-scale machine learning, there were efforts like the Cyc project (for “encyclopedia”), part of which was devoted to manually constructing a database of commonsense facts like the ones above about glasses, tables, and water. Eventually the consumer Internet generated enough language and visual data that many such general commonsense facts could be learned or inferred: show a neural network millions of pictures of glasses, tables and water and it will figure things out. Very early research also demonstrated that it was possible to directly encode certain invariances (similar to shifting a glass of water on a table) into the network architecture, and LLM architectures are similarly carefully designed in the modern era.

But in agentic AI, we expect our proxies to understand not only generic commonsense facts of the type we’ve been discussing but also “common sense” particular to our own preferences — things that would make sense to most people if only they understood our contexts and perspectives. Here a pure machine learning approach will likely not suffice. There just won’t be enough data to learn from scratch my subjective version of common sense.

For example, consider your own behavior or “policy” around leaving doors open or closed, locked or unlocked. If you’re like me, these policies can be surprisingly nuanced, even though I follow them without thought all the time. Often, I will close and lock doors behind me — for instance, when I leave my car or my house (unless I’m just stepping right outside to water the plants). Other times I will leave a door unlocked and open, such as when I’m in my office and want to signal I am available to chat with colleagues or students. I might close but leave unlocked that same door when I need to focus on something or take a call. And sometimes I’ll leave my office door unlocked and open even when I’m not in it, despite there being valuables present, because I trust the people on my floor and I’m going to be nearby.

We might call behaviors like these subjective common sense, because to me they are natural and obvious and have good reasons behind them, even though I follow them almost instinctually, the same way I know not to turn a glass of water upside down on the table. But you of course might have very different behaviors or policies in the same or similar situations, with your own good reasons.

Related content
Dataset contains more than 11,000 newly collected dialogues to aid research in open-domain conversation.

The point is that even an apparently simple matter like my behavior regarding doors and locks can be difficult to articulate. But agentic AI will need specifications like this: simply replace doors with online accounts and services and locks with passwords and other authentication credentials. Sometimes we might share passwords with family or friends for less-critical privacy-sensitive resources like Netflix or Spotify, but we would not do the same for bank accounts and medical records. I might be less rigorous about restricting access to, or even encrypting, the files on my laptop than I would be about files I store in the cloud.

The circumstances under which I trust my own or other agents with resources that need to be private and secure will be at least as complex as those regarding door closing and locking. The primary difficulty is not in having the right language or formalisms to specify such policies: there are good proposals for such specification frameworks and even for proving the correctness of their behaviors. The problem is in helping people articulate and translate their subjective common sense into these frameworks in the first place.

Conclusion

The agentic-AI era is in its infancy, but we should not take that to mean we have a long and slow development and adoption period before us. We need only look at the trajectory of the underlying generative AI technology — from being almost entirely unknown outside of research circles as recently as early 2022 to now being arguably the single most important scientific innovation of the century so far. And indeed, there is already widespread use of what we might consider early agentic systems, such as the latest coding agents.

Far beyond the initial “autocomplete for Python” tools of a few years ago, such agents now do so much more — writing working code from natural-language prompts and descriptions, accessing external resources and datasets, proactively designing experiments and visualizing the results, and most importantly (especially for a novice programmer like me), seamlessly handling the endless complexity of environment settings, software package installs and dependencies, and the like. My Amazon Scholar and University of Pennsylvania colleague Aaron Roth and I recently wrote a machine learning paper of almost 50 pages — complete with detailed definitions, theorem statements and proofs, code, and experiments — using nothing except (sometimes detailed) English prompts to such a tool, along with expository text we wrote directly. This would have been unthinkable just a year ago.

Despite the speed with which generative AI has permeated industry and society at large, its scientific underpinnings go back many decades, arguably to the birth of AI but certainly no later than the development of neural-network theory and practice in the 1980s. Agentic AI — built on top of these generative foundations, but quite distinct in its ambitions and challenges — has no such deep scientific substrate on which to systematically build. It’s all quite fresh territory. I’ve tried to anticipate some of the more fundamental challenges here, and I’ve probably got half of them wrong. To paraphrase the Philadelphia department store magnate John Wanamaker, I just don’t know which half — yet.

Related content

IN, KA, Bengaluru
Amazon is looking for a passionate, talented, and inventive Applied Scientists with machine learning background to help build industry-leading Speech and Language technology. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Machine Learning (ML) and Computer Vision (CV). Key job responsibilities Amazon is looking for a passionate, talented, and inventive Applied Scientists with machine learning background to help build industry-leading Speech and Language technology. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Machine Learning (ML) and Computer Vision (CV). As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services that make use of speech and language technology. You will gain hands on experience with Amazon’s heterogeneous speech, text, and structured data sources, and large-scale computing resources to accelerate advances in spoken language understanding. We are hiring in all areas of human language technology: ASR, MT, NLU, text-to-speech (TTS), and Dialog Management, in addition to Computer Vision. We are also looking for talents with experiences/expertise in building large-scale, high-performing systems. A day in the life 0
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the International Emerging Stores organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team Central Machine Learning team works closely with the IES business and engineering teams in building ML solutions that create an impact for Emerging Marketplaces. This is a great opportunity to leverage your machine learning and data mining skills to create a direct impact on millions of consumers and end users.
US, TX, Austin
What happens when you combine startup speed with Amazon-scale impact? You get this team. Amazon Enterprise Security Products is a newly launched group building intelligent, cloud-agnostic security tools using AI-first development practices. Here, you build AI and you build with AI — at the same time. This role is a chance to shape the future of security tooling with a small, fast team that ships like a startup but deploys at Amazon scale. We're looking for a Data Scientist who thrives at the intersection of applied ML, agentic AI, and security. You'll design and deploy models that detect threats, power intelligent agents, and make security decisions at cloud scale. You'll work shoulder-to-shoulder with SDEs, applied scientists, security researchers, and PMs on a team where the best idea wins, regardless of title or tenure. Key job responsibilities * Build the intelligence behind AI-first security products: Design, train, and ship ML models that power agentic systems, anomaly detection, threat classification, and automated response — all running across multi-cloud environments. * Own the full science lifecycle: From problem framing and data exploration through model development, evaluation, production deployment, and monitoring. You build it, you ship it, you run it. * Build with AI to build AI: Use agentic coding tools, LLM-powered workflows, and experimental AI tooling to accelerate every phase of your work; from EDA to feature engineering to model iteration. Multiply your velocity and raise the bar for what one scientist can deliver. * Power agentic architectures: Develop the models, embeddings, RAG pipelines, evaluation frameworks, and feedback loops that make multi-agent security systems smart, safe, and customer-ready. * Prototype rapidly and validate with customers: Turn hypotheses into prototypes in days, not quarters. Iterate based on real customer signal and ship what works. * Partner across disciplines: Work directly with SDEs, applied scientists, security researchers, PMs, and UX designers to turn ambiguous problems into shipped solutions. Small team means short lines between you and the decision. * Communicate with impact: Translate complex modeling results into clear recommendations for engineers, product leaders, and senior executives. Influence direction with data. * Raise the science bar: Contribute to technical and science reviews, mentor teammates, and champion AI-first development practices. Help shape the science culture of a fast-growing team from the ground floor. A day in the life No two days look the same on this fast-growing, AI-first team. You might start your morning reviewing evaluation results from overnight model training runs, then dive into building a RAG pipeline or tuning a multi-agent orchestration loop. Before lunch, you're pair-prompting with an agentic coding assistant to stand up a new feature pipeline. In the afternoon, you join a design session with senior and principal scientists and engineers where your ideas carry weight regardless of title. You own science problems end to end, ship using the latest AI-assisted workflows, and see your models reach production fast. This is where builders thrive. About the team Amazon Enterprise Security Products is built by builders who tackle challenges others might consider too ambitious. We're a small team where there are no layers between you and the decision, no waiting quarters to see your work reach customers. Every team member brings an owner's mentality. If there's a problem worth solving, we solve it. No mission is beyond reach, no detail beneath our attention. We move fast, we ship fast, and we learn from what we ship. This is where builders who want to make the impossible routine come to do their best work. Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
IN, KA, Bengaluru
Passionate about books? The Amazon Books team is looking for a talented Applied Scientist II to help invent, design, and deliver science solutions to make it easier for millions of customers to find the next book they will love. In this role, you will - Be a part of a growing team of scientists, economists, engineers, analysts, and business partners. - Use Amazon’s large-scale computing and data resources to generate deep understandings of our customers and products. - Build highly accurate models (and/or agentic systems) to enhance the book reading & discovery experiences. - Design, implement, and deliver novel solutions to some of Amazon’s oldest problems. Key job responsibilities - Inspect science initiatives across Amazon to identify opportunities for application and scaling within book reading and discovery experiences. - Participate in team design, scoping, and prioritization discussions while mapping business goals to scientific problems and aligning business metrics with technical metrics. - Spearhead the design and implementation of new features through thorough research and collaboration with cross-functional teams. - Initiate the design, development, execution, and implementation of project components with input and guidance from team members. - Work with Software Development Engineers (SDEs) to deliver production-ready solutions that benefit customers and business operations. - Invent, refine, and develop solutions to ensure they meet customer needs and team objectives. - Demonstrate ability to use reasonable assumptions, data analysis, and customer requirements to solve complex problems. - Write secure, stable, testable, and maintainable code with minimal defects while taking full responsibility for your components. - Possess strong understanding of data structures, algorithms, model evaluation techniques, performance optimization, and trade-off analysis. - Follow engineering and scientific method best practices, including design reviews, model validation, and comprehensive testing. - Maintain current knowledge of research trends in your field and apply rigorous scrutiny to results and methodologies. A day in the life In this role, you will address complex Books customer challenges by developing innovative solutions that leverage the advancements in science. Working alongside a talented team of scientists, you will conduct research and execute experiments designed to enhance the Books reading and shopping experience. Your responsibilities will encompass close collaboration with cross-functional partner teams, including engineering, product management, and fellow scientists, to ensure optimal data quality, robust model development, and successful productionization of scientific solutions. Additionally, you will provide mentorship to other scientists, conduct reviews of their work, and contribute to the development of team roadmaps. About the team The team consists of a collaborative group of scientists, product leaders, and dedicated engineering teams. We work with multiple partner teams to leverage our systems to drive a diverse array of customer experiences, owned both by ourselves and others, that enable shoppers to easily find their perfect next read and enable delightful reading experiences that would make Kindle the best place to read.
US, WA, Bellevue
The Amazon Fulfillment Technologies (AFT) Science team is looking for an exceptional Applied Scientist, with strong optimization and analytical skills, to develop production solutions for one of the most complex systems in the world: Amazon’s Fulfillment Network. At AFT Science, we design, build and deploy optimization, simulation, and machine learning solutions to power the production systems running at world wide Amazon Fulfillment Centers. We solve a wide range of problems that are encountered in the network, including labor planning and staffing, demand prioritization, pick assignment and scheduling, and flow process optimization. We are tasked to develop innovative, scalable, and reliable science-driven solutions that are beyond the published state of art in order to run frequently (ranging from every few minutes to every few hours per use case) and continuously in our large scale network. Key job responsibilities As an Applied Scientist, you will work with other scientists, software engineers, product managers, and operations leaders to develop scientific solutions and analytics using a variety of tools and observe direct impact to process efficiency and associate experience in the fulfillment network. Key responsibilities include: * Develop an understanding and domain knowledge of operational processes, system architecture and functions, and business requirements * Deep dive into data and code to identify opportunities for continuous improvement and/or disruptive new approach * Develop scalable mathematical models for production systems to derive optimal or near-optimal solutions for existing and new challenges * Create prototypes and simulations for agile experimentation of devised solutions * Advocate technical solutions to business stakeholders, engineering teams, and senior leadership * Partner with engineers to integrate prototypes into production systems * Design experiment to test new or incremental solutions launched in production and build metrics to track performance About the team Amazon Fulfillment Technology (AFT) designs, develops and operates the end-to-end fulfillment technology solutions for all Amazon Fulfillment Centers (FC). We harmonize the physical and virtual world so Amazon customers can get what they want, when they want it. The AFT Science team has expertise in operations research, optimization, scheduling, planning, simulation, and machine learning. We also have domain expertise in the operational processes within the FCs and their defects. We prioritize advancements that support AFT tech teams and focus areas rather than specific fields of research or individual business partners. We influence each stage of innovation from inception to deployment which includes both developing novel solutions or improving existing approaches. Resulting production systems rely on a diverse set of technologies, our teams therefore invest in multiple specialties as the needs of each focus area evolves.
US, WA, Bellevue
Amazon’s Last Mile Team is looking for a passionate individual with strong optimization and analytical skills to join its Last Mile Science team in the endeavor of designing and improving the most complex planning of delivery network in the world. Last Mile builds global solutions that enable Amazon to attract an elastic supply of drivers, companies, and assets needed to deliver Amazon's and other shippers' volumes at the lowest cost and with the best customer delivery experience. Last Mile Science team owns the core decision models in the space of jurisdiction planning, delivery channel and modes network design, capacity planning for on the road and at delivery stations, routing inputs estimation and optimization. Our research has direct impact on customer experience, driver and station associate experience, Delivery Service Partner (DSP)’s success and the sustainable growth of Amazon. Optimizing the last mile delivery requires deep understanding of transportation, supply chain management, pricing strategies and forecasting. Only through innovative and strategic thinking, we will make the right capital investments in technology, assets and infrastructures that allows for long-term success. Our team members have an opportunity to be on the forefront of supply chain thought leadership by working on some of the most difficult problems in the industry with some of the best product managers, scientists, and software engineers in the industry. Key job responsibilities Candidates will be responsible for developing solutions to better manage and optimize delivery capacity in the last mile network. The successful candidate should have solid research experience in one or more technical areas of Operations Research or Machine Learning. These positions will focus on identifying and analyzing opportunities to improve existing algorithms and also on optimizing the system policies across the management of external delivery service providers and internal planning strategies. They require superior logical thinkers who are able to quickly approach large ambiguous problems, turn high-level business requirements into mathematical models, identify the right solution approach, and contribute to the software development for production systems. To support their proposals, candidates should be able to independently mine and analyze data, and be able to use any necessary programming and statistical analysis software to do so. Successful candidates must thrive in fast-paced environments, which encourage collaborative and creative problem solving, be able to measure and estimate risks, constructively critique peer research, and align research focuses with the Amazon's strategic needs.
US, WA, Bellevue
Alexa International is looking for a passionate, talented, and inventive Applied Scientist to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. You will contribute to developing novel solutions and deliver high-quality results that impact Alexa's international products and services. Key job responsibilities As an Applied Scientist with the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs. Your work will directly impact our international customers in the form of products and services that make use of digital assistant technology. You will leverage Amazon's heterogeneous data sources, unique and diverse international customer nuances and large-scale computing resources to accelerate advances in text, voice, and vision domains in a multimodal setup. The ideal candidate possesses a solid understanding of machine learning, natural language understanding, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environments to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and collaborate effectively with cross-functional teams. A day in the life * Analyze, understand, and model customer behavior and the customer experience based on large-scale data. * Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up experimentation frameworks for agile model analysis and A/B testing. * Collaborate with partner teams on LLM evaluation frameworks and post-training methodologies. * Contribute to end-to-end delivery of solutions from research to production, including reusable science components. * Communicate solutions clearly to partners and stakeholders. * Contribute to the scientific community through publications and community engagement.
US, WA, Bellevue
Alexa International is looking for a passionate, talented, and inventive Applied Scientist to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. You will contribute to developing novel solutions and deliver high-quality results that impact Alexa's international products and services. Key job responsibilities As an Applied Scientist with the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs. Your work will directly impact our international customers in the form of products and services that make use of digital assistant technology. You will leverage Amazon's heterogeneous data sources, unique and diverse international customer nuances and large-scale computing resources to accelerate advances in text, voice, and vision domains in a multimodal setup. The ideal candidate possesses a solid understanding of machine learning, natural language understanding, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environments to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and collaborate effectively with cross-functional teams. A day in the life * Analyze, understand, and model customer behavior and the customer experience based on large-scale data. * Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up experimentation frameworks for agile model analysis and A/B testing. * Collaborate with partner teams on LLM evaluation frameworks and post-training methodologies. * Contribute to end-to-end delivery of solutions from research to production, including reusable science components. * Communicate solutions clearly to partners and stakeholders. * Contribute to the scientific community through publications and community engagement.
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of theoretical and experimental physicists, materials scientists, and hardware and software engineers on a mission to develop a fault-tolerant quantum computer. Throughout your internship journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated. Join us at the forefront of applied science, where your contributions will shape the future of Quantum Computing and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Amazon has positions available for Quantum Research Science and Applied Science Internships in Santa Clara, CA and Pasadena, CA. We are particularly interested in candidates with expertise in any of the following areas: superconducting qubits, cavity/circuit QED, quantum optics, open quantum systems, superconductivity, electromagnetic simulations of superconducting circuits, microwave engineering, benchmarking, quantum error correction, fabrication, etc. Key job responsibilities In this role, you will work alongside global experts to develop and implement novel, scalable solutions that advance the state-of-the-art in the areas of quantum computing. You will tackle challenging, groundbreaking research problems, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.