Scientific frontiers of agentic AI

The language AI agents might speak, sharing context without compromising privacy, modeling agentic negotiations, and understanding users’ commonsense policies are some of the open scientific questions that researchers in agentic AI will need to grapple with.

It feels as though we’ve barely absorbed the rapid development and adoption of generative AI technologies such as large language models (LLMs) before the next phenomenon is already upon us, namely agentic AI. Standalone LLMs can be thought of as “chatbots in a sandbox”, the sandbox being a metaphor for a safe and contained play space with limited interaction with the world beyond. In contrast, the vision of agentic AI is a near (or already here?) future in which LLMs are the underlying engines for complex systems that have access to rich external resources such as consumer apps and services, social media, banking and payment systems — in principle, anything you can reach on the Internet. A dream of the AI industry for decades, the “agent” of agentic AI is an intelligent personal assistant that knows your goals and preferences and that you trust to act on your behalf in the real world, much as you might a human assistant.

Related content
Real-world deployment requires notions of fairness that are task relevant and responsive to the available data, recognition of unforeseen variation in the “last mile” of AI delivery, and collaboration with AI activists.

For example, in service of arranging travel plans, my personal agentic AI assistant would know my preferences (both professional and recreational) for flights and airlines, lodging, car rentals, dining, and activities. It would know my calendar and thus be able to schedule around other commitments. It would know my frequent-flier numbers and hospitality accounts and be able to book and pay for itineraries on my behalf. Most importantly, it would not simply automate these tasks but do so intelligently and intuitively, making “obvious” decisions unilaterally and quietly but being sure to check in with me whenever ambiguity or nuance arises (such as whether those theater tickets on a business trip to New York should be charged to my personal or work credit card).

To AI insiders, the progression from generative to agentic AI is exciting but also natural. In just a few years, we have gone from impressive but glorified chatbots with myriad identifiable shortcomings to feature-rich systems exhibiting human-like capabilities not only in language and image generation but in coding, mathematical reasoning, optimization, workflow planning, and many other areas. The increased skill set and reliability of core LLMs has naturally caused the industry to move “up the stack”, to a world in which the LLM itself fades into the background and becomes a new kind of intelligent operating system upon which all manner of powerful functionality can be built. In the same way that your PC or Mac seamlessly handles many details that the vast majority of users don’t (want to) know about — like exactly how and where on your hard drive to store and find files, the networking details of connecting to remote web servers, and other fine-grained operating-system details — agentic systems strive to abstract away the messy and tedious details of many higher-level tasks that, today, we all perform ourselves.

But while the overarching vision of agentic AI is already relatively clear, there are some fundamental scientific and technical questions about the technology whose answers — or even proper formulation — are uncertain (but interesting!). We’ll explore some of them here.

What language will agents speak?

The history of computing technology features a steady march toward systems and devices that are ever more friendly, accessible, and intuitive to human users. Examples include the gradual displacement of clunky teletype monitors and obscure command-line incantations by graphical user interfaces with desktop and folder metaphors, and the evolution from low-level networked file transfer protocols to the seamless ease of the web. And generative AI itself has also made previously specialized tasks like coding accessible to a much broader base of users. In other words, modern technology is human-centric, designed for use and consumption by ordinary people with little or no specialized training.

But now these same technologies and systems will also need to be navigated by agentic AI, and as adept as LLMs are with human language, it may not be their most natural mode of communication and understanding. Thus, a parallel migration to the native language of generative AI may be coming.

What is that native language? When generative AI consumes a piece of content — whether it be a user prompt, a document, or an image — it translates it into an internal representation that is more convenient for subsequent processing and manipulation. There are many examples in biology of such internal representations. For instance, in our own visual systems, it has been known for some time that certain types of inputs (such as facial images) cause specific cells in our brains to respond (a phenomenon known as neuronal selectivity). Thus, an entire category of important images elicits similar neural behaviors.

Related content
Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

In a similar vein, the neural networks underlying modern AI typically translate any input into what is known as an embedding space, which can be thought of as a physical map in which items with similar meanings are placed near each other, and those with unrelated meanings are placed far apart. For example, in an image-embedding space, two photos of different families would be nearer to each other than either would be to a landscape. In a language-embedding space, two romance novels would be nearer to each other than to a car owner’s manual. And hybrid or multimodal embedding spaces would place images of cars near their owner manuals.

Embeddings are an abstraction that provides great power and generality, in the form of the ability to represent not the literal original content (like a long sequence of words) but something closer to its underlying meaning. The price for this abstraction is loss of detail and information. For instance, the embedding of this entire article would place it in close proximity to similar content (for instance, general-audience science prose) but would not contain enough information to re-create the article verbatim. The lossy nature of embeddings has implications we shall return to shortly.

Embeddings are learned from the massive amount of information on the Internet and elsewhere about implicit correspondences. Even aliens landing on earth who could read English but knew nothing else about the world would quickly realize that “doctor” and “hospital” are closely related because of their frequent proximity in text, even if they had no idea what these words actually signified. Furthermore, not only do embeddings permit generative AI to understand existing content, but they allow it to generate new content. When we ask for a picture of a squirrel on a snowboard in the style of Andy Warhol, it is the embedding that lets the technology explore novel images that interpolate between those of actual Warhols, squirrels, and snowboards.

Thus, the inherent language of generative (and therefore agentic) AI is not the sentences and images we are so familiar with but their embeddings. Let us now reconsider a world in which agents interact with humans, content, and other agents. Obviously, we will continue to expect agentic AI to communicate with humans in ordinary language and images. But there is no reason for agent-to-agent communication to take place in human languages; per the discussion above, it would be more natural for it to occur in the native embedding language of the underlying neural networks.

My personal agent, working on a vacation itinerary, might ingest materials such as my previous flights, hotels, and vacation photos to understand my interests and preferences. But to communicate those preferences to another agent — say, an agent aggregating hotel details, prices, and availability — it will not provide the raw source materials; in addition to being massively inefficient and redundant, that could present privacy concerns (more on this below). Rather, my agent will summarize my preferences as a point, or perhaps many points, in an embedding space.

Restaurant embeddings.jpg
In this example, the red, green, and blue points are three-dimensional embeddings of restaurants at which three people (Alice, Bob, and Chris) have eaten. (A real-world embedding, by contrast, might have hundreds of dimensions.) Each glowing point represents the center of one of the clusters, and its values summarize the restaurant preferences of the corresponding person. AI agents could use such vector representations, rather than text, to share information with each other.

By similar reasoning, we might also expect the gradual development of an “agentic Web” meant for navigation by AI, in which the text and images on websites are pre-translated into embeddings that are illegible to humans but are massively more efficient than requiring agents to perform these translations themselves with every visit. In the same way that many websites today have options for English, Spanish, Chinese, and many other languages, there would be an option for Agentic.

All the above presupposes that embedding spaces are shared and standardized across generative and agentic AI systems. This is not true today: embeddings differ from model to model and are often considered proprietary. It’s as if all generative AI systems speak slightly different dialects of some underlying lingua franca. But these observations about agentic language and communication may foreshadow the need for AI scientists to work toward standardization, at least in some form. Each agent can have some special and proprietary details to its embeddings — for instance, a financial-services agent might want to use more of its embedding space for financial terminology than an agentic travel assistant would — but the benefits of a common base embedding are compelling.

Keeping things in context

Even casual users of LLMs may be aware of the notion of “context”, which is informally what and how much the LLM remembers and understands about its recent interactions and is typically measured (at least cosmetically) by the number of words or tokens (word parts) recalled. There is again an apt metaphor with human cognition, in the sense that context can be thought of as the “working memory” of the LLM. And like our own working memory, it can be selective and imperfect.

If we participate in an experiment to test how many random digits or words we can memorize at different time scales, we will of course eventually make mistakes if asked to remember too many things for too long. But we will not forget what the task itself is; our short-term memory may be fallible, but we generally grasp the bigger picture.

Related content
Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

These same properties broadly hold for LLM context — which is sometimes surprising to users, since we expect computers to be perfect at memorization but highly fallible on more abstract tasks. But when we remember that LLMs do not operate directly on the sequence of words or tokens in the context but on the lossy embedding of that sequence, these properties become less mysterious (though perhaps not less frustrating when an LLM can’t remember something it did just a few steps ago).

Some of the principal advances in LLM technology have been around improvements in context: LLMs can now remember and understand more context and leverage that context to tailor their responses with greater accuracy and sophistication. This greater window of working memory is crucial for many tasks to which we would like to apply agentic AI, such as having an LLM read and understand the entire code base of a large software development project, or all the documents relevant to a complex legal case, and then be able to reason about the contents.

How will context and its limitations affect agentic AI? If embeddings are the language of LLMs, and context is the expression of an LLM’s working memory in that language, a crucial design decision in agent-agent interactions will be how much context to share. Sharing too little will handicap the functionality and efficiency of agentic dialogues; sharing too much will result in unnecessary complexity and potential privacy concerns (just as in human-to-human interactions).

Let us illustrate by returning to my personal agent, who having found and booked my hotel is working with an external airline flight aggregation agent. It would be natural for my agent to communicate lots of context about my travel preferences, perhaps including conditions under which I might be willing to pay or use miles for an upgrade to business class (such as an overnight international flight). But my agent should not communicate context about my broader financial status (savings, debt, investment portfolio), even though in theory these details might correlate with my willingness to pay for an upgrade. When we consider that context is not my verbatim history with my travel agent, but an abstract summary in embedding space, decisions about contextual boundaries and how to enforce them become difficult.

Indeed, this is a relatively untouched scientific topic, and researchers are only just beginning to consider questions such as what can be reverse-engineered about raw data given only its embedding. While human or system prompts to shape inter-agent dealings might be a stopgap (“be sure not to tell the flight agent any unnecessary financial information”), a principled understanding of embedding privacy vulnerabilities and how to mitigate them (perhaps via techniques such as differential privacy) is likely to be an important research area going forward.

Agentic bargains

So far, we’ve talked a fair amount about interagent dialogues but have treated these conversations rather generally, much as if we were speaking about two humans in a collaborative setting. But there will be important categories of interaction that will need to be more structured and formal, with identifiable outcomes that all parties commit to. Negotiation, bargaining, and other strategic interactions are a prime example.

I obviously want my personal agent, when booking hotels and flights for my trips, to get the best possible prices and other conditions (room type and view, flight seat location, and so on). The agents aggregating hotels and flights would similarly prefer that I pay more rather than less, on behalf of their own clients and users.

For my agent to act in my interests in these settings, I’ll need to specify at least some broad constraints on my preferences and willingness to pay for them, and not in fuzzy terms: I can’t expect my agent to simply “know a bargain when it sees one” the way I might if I were handling all the arrangements myself, especially because my notion of a bargain might be highly subjective and dependent on many factors. Again, a near-term makeshift approach might address this via prompt shaping — “be sure to get the best deal possible, as long as the flight is nonstop and leaves in the morning, and I have an aisle seat” — but longer-term solutions will have to be more sophisticated and granular.

Related content
Amazon Research Award recipient Éva Tardos studies complex theoretical questions that have far-ranging practical consequences.

Of course, the mathematical and scientific foundations of negotiating and bargaining have been well studied for decades by game theorists, microeconomists, and related research communities. Their analyses typically begin by presuming the articulation of utility functions for all the parties involved — an abstraction capturing (for example) my travel preferences and willingness to pay for them. The literature also considers settings in which I can’t quantitatively express my own utilities but “know bargains when I see them”, in the sense that given two options (a middle seat on a long flight for $200 vs. a first-class seat for $2,000), I will make the choice consistent with my unknown utilities. (This is the domain of the aptly named utility elicitation.)

Much of the science in such areas is devoted to the question of what “should” happen when fully rational parties with precisely specified utilities, perfect memory, and unlimited computational power come to the proverbial bargaining table; equilibrium analysis in game theory is just one example of this kind of research. But given our observations about the human-like cognitive abilities and shortcomings of LLMs, perhaps a more relevant starting point for agentic negotiation is the field of behavioral economics. Instead of asking what should happen when perfectly rational agents interact, behavioral economics asks what does happen when actual human agents interact strategically. And this is often quite different, in interesting ways, than what fully rational agents would do.

For instance, consider the canonical example of behavioral game theory known as the ultimatum game. In this game, there is $10 to potentially divide between two players, Alice and Bob. Alice first proposes any split she likes. Bob then either accepts Alice’s proposal, in which case both parties get their proposed shares, or rejects Alice’s proposal, in which case each party receives nothing. The equilibrium analysis is straightforward: Alice, being fully rational and knowing that Bob is also, proposes the smallest nonzero amount to Bob, which is a penny. Bob, being fully rational, would prefer to receive a penny than nothing, so he accepts.

Ultimatum game 1.jpg
Game theory (left) supposes that the recipient in the ultimatum game will accept a low offer, since something is better than nothing, but behavioral economics (right) reveals that, in fact, offers tend to concentrate in the range of $3 to $5, and lower offers are frequently rejected.

Nothing remotely like this happens when humans play. Across hundreds of experiments varying myriad conditions — social, cultural, gender, wealth, etc. — a remarkably consistent aggregate behavior emerges. Alice almost always proposes a share to Bob of between $3 and $5 (the fact that Alice gets to move first seems to prime both players for Bob to potentially get less than half the pie). And conditioned on Alice’s proposal being in this range, Bob almost always accepts her offer. But on those rare occasions in which Alice is more aggressive and offers Bob an amount much less than $3, Bob’s rejection rate skyrockets. It’s as if pairs of people — who have never heard of or played the ultimatum game before — have an evolutionarily hardwired sense of what’s “fair” in this setting.

Ultimatum game bar graph.jpg
The way in which the ultimatum game is played — the frequency of particular offers and the rate of rejection — varies across cultures, but this graph illustrates general trends in the data. Offers tend to concentrate between $3 and $5, with a steep falloff above $5, and the rejection rate is high for low offers.

Now back to LLMs and agentic AI. There is already a small but growing literature on what we might call LLM behavioral game theory and economics, in which experiments like the one above are replicated — except human participants are replaced by AI. One early work showed that LLMs almost exactly replicated human behavior in the ultimatum game, as well as other classical behavioral-economics findings.

Note that it is possible to simulate the demographic variability of human subjects in such experiments via LLM prompting, e.g., “You are Alice, a 37-year-old Hispanic medical technician living in Boston, Massachusetts”. Other studies have again shown human-like behavior of LLMs in trading games, price negotiations, and other settings. A very recent study claims that LLMs can even engage in collusive price-fixing behaviors and discusses potential regulatory implications for AI agents.

Once we have a grasp on the behaviors of agentic AI in strategic settings, we can turn to shaping that behavior in desired ways. The field of mechanism design in economics complements areas like game theory by asking questions like “given that this is how agents generally negotiate, how can we structure those negotiations to make them fair and beneficial?” A classic example is the so-called second-price auction, where the highest bidder wins the item — but only pays the second highest bid. This design is more truthful than a standard first-price auction, in the sense that everyone’s optimal strategy is to simply bid the price at which they are indifferent to winning or losing (their subjective valuation of the item); nobody needs to think about other agents’ behaviors or valuations.

We anticipate a proliferation of research on topics like these, as agentic bargaining becomes commonplace and an important component of what we delegate to our AI assistants.

The enduring challenge of common sense

I’ll close with some thoughts on a topic that has bedeviled AI from its earliest days and will continue to do so in the agentic era, albeit in new and more personalized ways. It’s a topic that is as fundamental as it is hard to define: common sense.

By common sense, we mean things that are “obvious”, that any human with enough experience in the world would know without explicitly being told. For example, imagine a glass full of water sitting on a table. We would all agree that if we move the glass to the left or right on the table, it’s still a glass of water. But if we turn it upside down, it’s still a glass on the table, but no longer a glass of water (and is also a mess to be cleaned up). It’s quite unlikely any of us were ever sat down and run through this narrative, and it’s also a good bet that you’ve never deliberately considered such facts before. But we all know and agree on them.

Related content
Using large language models to discern commonsense relationships can improve performance on downstream tasks by as much as 60%.

Figuring out how to imbue AI models and systems with common sense has been a priority of AI research for decades. Before the advent of modern large-scale machine learning, there were efforts like the Cyc project (for “encyclopedia”), part of which was devoted to manually constructing a database of commonsense facts like the ones above about glasses, tables, and water. Eventually the consumer Internet generated enough language and visual data that many such general commonsense facts could be learned or inferred: show a neural network millions of pictures of glasses, tables and water and it will figure things out. Very early research also demonstrated that it was possible to directly encode certain invariances (similar to shifting a glass of water on a table) into the network architecture, and LLM architectures are similarly carefully designed in the modern era.

But in agentic AI, we expect our proxies to understand not only generic commonsense facts of the type we’ve been discussing but also “common sense” particular to our own preferences — things that would make sense to most people if only they understood our contexts and perspectives. Here a pure machine learning approach will likely not suffice. There just won’t be enough data to learn from scratch my subjective version of common sense.

For example, consider your own behavior or “policy” around leaving doors open or closed, locked or unlocked. If you’re like me, these policies can be surprisingly nuanced, even though I follow them without thought all the time. Often, I will close and lock doors behind me — for instance, when I leave my car or my house (unless I’m just stepping right outside to water the plants). Other times I will leave a door unlocked and open, such as when I’m in my office and want to signal I am available to chat with colleagues or students. I might close but leave unlocked that same door when I need to focus on something or take a call. And sometimes I’ll leave my office door unlocked and open even when I’m not in it, despite there being valuables present, because I trust the people on my floor and I’m going to be nearby.

We might call behaviors like these subjective common sense, because to me they are natural and obvious and have good reasons behind them, even though I follow them almost instinctually, the same way I know not to turn a glass of water upside down on the table. But you of course might have very different behaviors or policies in the same or similar situations, with your own good reasons.

Related content
Dataset contains more than 11,000 newly collected dialogues to aid research in open-domain conversation.

The point is that even an apparently simple matter like my behavior regarding doors and locks can be difficult to articulate. But agentic AI will need specifications like this: simply replace doors with online accounts and services and locks with passwords and other authentication credentials. Sometimes we might share passwords with family or friends for less-critical privacy-sensitive resources like Netflix or Spotify, but we would not do the same for bank accounts and medical records. I might be less rigorous about restricting access to, or even encrypting, the files on my laptop than I would be about files I store in the cloud.

The circumstances under which I trust my own or other agents with resources that need to be private and secure will be at least as complex as those regarding door closing and locking. The primary difficulty is not in having the right language or formalisms to specify such policies: there are good proposals for such specification frameworks and even for proving the correctness of their behaviors. The problem is in helping people articulate and translate their subjective common sense into these frameworks in the first place.

Conclusion

The agentic-AI era is in its infancy, but we should not take that to mean we have a long and slow development and adoption period before us. We need only look at the trajectory of the underlying generative AI technology — from being almost entirely unknown outside of research circles as recently as early 2022 to now being arguably the single most important scientific innovation of the century so far. And indeed, there is already widespread use of what we might consider early agentic systems, such as the latest coding agents.

Far beyond the initial “autocomplete for Python” tools of a few years ago, such agents now do so much more — writing working code from natural-language prompts and descriptions, accessing external resources and datasets, proactively designing experiments and visualizing the results, and most importantly (especially for a novice programmer like me), seamlessly handling the endless complexity of environment settings, software package installs and dependencies, and the like. My Amazon Scholar and University of Pennsylvania colleague Aaron Roth and I recently wrote a machine learning paper of almost 50 pages — complete with detailed definitions, theorem statements and proofs, code, and experiments — using nothing except (sometimes detailed) English prompts to such a tool, along with expository text we wrote directly. This would have been unthinkable just a year ago.

Despite the speed with which generative AI has permeated industry and society at large, its scientific underpinnings go back many decades, arguably to the birth of AI but certainly no later than the development of neural-network theory and practice in the 1980s. Agentic AI — built on top of these generative foundations, but quite distinct in its ambitions and challenges — has no such deep scientific substrate on which to systematically build. It’s all quite fresh territory. I’ve tried to anticipate some of the more fundamental challenges here, and I’ve probably got half of them wrong. To paraphrase the Philadelphia department store magnate John Wanamaker, I just don’t know which half — yet.

Related content

US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. We are seeking a talented Applied Scientist to join our advanced robotics team, focusing on developing and applying cutting-edge simulation methodologies for advanced robotics systems. This role centers on research and development of physics-based simulation techniques, sim-to-real transfer methods, and machine learning approaches that enable rapid development, testing, and validation of robotic systems operating in complex, real-world environments. Key job responsibilities - Advance physics-based simulation fidelity for contact-rich manipulation and locomotion - Design and build high-performance simulation tools integrated into a production robotics stack - Translate research ideas into robust, scalable software pipelines - Develop methods to quantify and reduce simulation-to-reality gaps across design, safety, and control - Architect scalable simulation solutions for rigid and deformable body dynamics - Build simulation pipelines optimized for large-scale reinforcement and policy learning - Establish frameworks for continuous simulation improvement using real-world deployment data - Collaborate with engineering, science, and safety teams on simulation requirements and validation About the team Our team is building a comprehensive simulation platform for advanced robotics development, combining locomotion and manipulation capabilities. We operate at the cutting edge of physics simulation, reinforcement learning, and sim-to-real transfer, collaborating with world-class robotics engineers, applied scientists, and mechanical designers in a fast-paced, innovation-driven environment. This role uniquely combines fundamental research with real-world deployment. You will pursue core research questions in physics-based simulation while seeing your work translated into production systems, validated on real hardware, and informed by deployment data. Working alongside Simulation Software Engineers, you will help transform research ideas into scalable, production-grade simulation capabilities that directly impact how robots are designed, trained, and deployed.
US, WA, Redmond
Amazon Leo is Amazon’s low Earth orbit satellite network. Our mission is to deliver fast, reliable internet connectivity to customers beyond the reach of existing networks. From individual households to schools, hospitals, businesses, and government agencies, Amazon Leo will serve people and organizations operating in locations without reliable connectivity. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. This position is part of the Satellite Attitude Determination and Control team. You will design and analyze the control system and algorithms, support development of our flight hardware and software, help integrate the satellite in our labs, participate in flight operations, and see a constellation of satellites flow through the production line into orbit. Key job responsibilities - Design and analyze algorithms for estimation, flight control, and precise pointing using linear methods and simulation. - Develop and apply models and simulations, with various levels of fidelity, of the satellite and our constellation. - Component level environmental testing, functional and performance checkout, subsystem integration, satellite integration, and in space operations. - Manage the spacecraft constellation as it grows and evolves. - Continuously improve our ability to serve customers by maximizing payload operations time. - Develop autonomy for Fault Detection and Isolation on board the spacecraft. A day in the life This is an opportunity to play a significant role in the design of an entirely new satellite system with challenging performance requirements. The large, integrated constellation brings opportunities for advanced capabilities that need investigation and development. The constellation size also puts emphasis on engineering excellence so our tools and methods, from conceptualization through manufacturing and all phases of test, will be state of the art as will the satellite and supporting infrastructure on the ground. You will find that Amazon Leo's mission is compelling, so our program is staffed with some of the top engineers in the industry. Our daily collaboration with other teams on the program brings constant opportunity for discovery, learning, and growth. About the team Our team has lots of experience with various satellite systems and many other flight vehicles. We have bench strength in both our mission and core GNC disciplines. We design, prototype, test, iterate and learn together. Because GNC is central to safe flight, we tend to drive Concepts of Operation and many system level analyses.
US, NY, New York
Advertising at Amazon is growing incredibly fast and we are responsible for defining and delivering a collection of advertising products that drive discovery and sales. Amazon Business Ads is equally growing fast ($XXXMs to $XBs) and owns engineering and science for the AB WW ad experience. We build business-to-business (“B2B”) specific ad solutions distributed across retail and ad systems for shopper and advertiser experiences. Some include new ad placements or widgets, creatives, sourcing techniques, ad campaign management capabilities and much more! We consider unique AB qualities which are differentiated from the consumer experience such as varying shopper role types, purchasing complexities based on business size and industry (eg education vs healthcare), AB specific features (eg business discounts, buying policies to restrict and prefer products), and AB buyer behaviors (eg buying in bulk). We are seeking a scientific leader who can drive innovation in complex problem areas and new business initiatives. The ideal candidate will: Technical & Research Requirements: * Demonstrate fluency in Python, R, Matlab or other statistical languages and familiarity with deep learning frameworks like PyTorch, TensorFlow * Lead end-to-end solution development from research to prototyping and experimentation * Write and deploy significant parts of scientifically novel software solutions into production Leadership & Influence: * Drive team's scientific agenda by proposing new initiatives and securing management buy-in including PM, SDM * Build consensus on large projects and influence decisions across different teams in Ads Key Leadership Principles: * Dive Deep: Uncover non-obvious insights in data * Deliver Results: Create solutions aligned with customer and product needs * Learn and Be Curious: Demonstrate self-driven desire to explore new research areas * Earn Trust: Build relationships with stakeholders through understanding business needs
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! As an Applied Scientist in the Prime Video Playback Intelligence Organization, you will have deep subject matter expertise in applied machine learning and data science, with specializations in video streaming optimization, information retrieval, anomaly detection and root-causing systems, large language models and generative AI across various modalities. Key job responsibilities - Work with multiple teams of scientists, engineers, and product managers to translate business and functional requirements into concrete deliverables leading strategic efforts to enhance customer quality of experiences. - Work on problems spaces such as: improving the customer playback quality of experience across Video on Demand, Live Events and Linear Content. - Reduce the time/cost/effort to optimize the customer experience as well as detect, root-cause, and mitigate defects in the customer experience. You’ll seek to understand the depth and nuance of streaming video at scale and identify opportunities to grow our business and improve customer quality of experience via principled ML/AI solutions. - Lead integration of new algorithms and processes into existing modeling stacks, simplify and streamline the existing modeling stacks, and develop testing and evaluation strategies. Ultimately, you'll work backwards from the desired outcomes and lead the way on determining the ideal solution (statistical techniques, traditional ML, GenAI, etc). A day in the life We love solving challenging and hard problems in our quest to innovate on behalf of our customers and provide the best video streaming experience. We push the boundaries to leverage and invent technologies which help create unrivaled experiences for our customers to help us move fast in a growing and changing environment. We use data to guide our decisions, work closely with our engineering and product counterparts, and partner with other Science teams as well as academic institutions to learn and guide in an environment of innovation.
IN, KA, Bengaluru
Selection Monitoring team is responsible for making the biggest catalog on the planet even bigger. In order to drive expansion of the Amazon catalog, we develop advanced ML/AI technologies to process billions of products and algorithmically find products not already sold on Amazon. We work with structured, semi-structured and Visually Rich Documents using deep learning, NLP and image processing. The role demands a high-performing and flexible candidate who can take responsibility for success of the system and drive solutions from research, prototype, design, coding and deployment. We are looking for Applied Scientists to tackle challenging problems in the areas of Information Extraction, Efficient crawling at internet scale, developing ML models for website comprehension and agents to take multi-step decisions. You should have depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning. You should also have programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale. You will encounter many challenges, including: - Scale (build models to handle billions of pages), - Accuracy (requirements for precision and recall) - Speed (generate predictions for millions of new or changed pages with low latency) - Diversity (models need to work across different languages, market places and data sources) You will help us to - Build a scalable system which can algorithmically extract information from world wide web. - Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web. - Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents. Key job responsibilities - Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems. - Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes. - Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc. - Work closely with software engineering teams to drive real-time model implementations. - Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance. - Lead projects and mentor other scientists, engineers in the use of ML techniques. - Publish innovation in research forums.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
US, WA, Bellevue
Alexa International Science team is looking for a passionate, talented, and inventive Senior Applied Scientist to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. At this level, you will drive cross-team scientific strategy, influence partner teams, and deliver solutions that have broad impact across Alexa's international products and services. Key job responsibilities As a Senior Applied Scientist with the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs, particularly delivering industry-leading scientific research and applied AI for multi-lingual applications — a challenging area for the industry globally. Your work will directly impact our global customers in the form of products and services that support Alexa+. You will leverage Amazon's heterogeneous data sources and large-scale computing resources to accelerate advances in text, speech, and vision domains. The ideal candidate possesses a solid understanding of machine learning, speech and/or natural language processing, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environment, like to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and are able to influence and align multiple teams around a shared scientific vision.
US, NY, New York
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their structural econometrics skillsets to solve real world problems. The intern will work in the area of Amazon Private Brands and develop models to improve our product selection. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The Amazon Private Brands science advance team applies Machine Learning, Statistics and Econometrics/economics to solve high-impact business problems, develop prototypes for Amazon-scale science solutions, and optimize key business functions of Amazon Private Brands and other Amazon orgs. We are an interdisciplinary team, using science and technology and leveraging the strengths of engineers and scientists to build solutions for some of the toughest business problems at Amazon, covering areas such as pricing, discovery, negotiation, forecasting, supply chain and product selection/development.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to solve perception, reasoning, and planning to build useful enterprise agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. Key job responsibilities You will contribute directly to AI agent development in an applied research role to improve the multi-model perception and visual-reasoning abilities of our agent. Daily responsibilities including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.