Solomonic learning: Large language models and the art of induction

Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

“One year of research in neural networks is sufficient to believe in God.” The writing on the wall of John Hopfield’s lab at Caltech made no sense to me in 1992. Three decades later, and after years of building large language models, I see its sense if one replaces sufficiency with necessity: understanding neural networks as we teach them today requires believing in an immanent entity.

Stefano Soatto.png
Stefano Soatto, a vice president and distinguished scientist with Amazon Web Services.
Credit: UCLA Samueli

Let’s start from the basics: when we teach machine learning, we say that memorization is bad, because it leads to overfitting and prevents generalization. Generalization is good — so good that, to achieve it, we incentivize machines not to memorize, through “regularization”. We even prove theorems — so-called uniform generalization bounds — that guarantee generalization no matter what distribution the data are drawn from, provided we avoid memorization.

But my mother always told me not to generalize, and she had me commit to memory countless useless poems in elementary school. Why am I teaching that generalization is good and memorization is bad, when I was taught the opposite?

Biology vs. technology

Machine learning has historically drawn inspiration from biology. But biological systems have hard ontogenic and phylogenic memory bounds: our synapses cannot memorize everything we experience, and our DNA cannot transmit the knowledge we’ve accumulated to our descendants. (As an educator and father, I often wished I could upload what I have learned into my students and kids. I haven’t figured that one out, but can we at least do it for AI models?) Furthermore, biology imposes a strong evolutionary bias toward minimizing inference latency: when facing an animal in the wild and having to determine who’s whose meal, we can’t reason through all past memories lest the decision be made for us.

In other words, biological systems are forced to adopt inductive learning, using specific data from the past (or a “training set”) to devise a process for handling any future data. Success in inference from inductive learning (or more simply, induction) relies on the so-called inductive hypothesis, that past performance can guarantee future rewards (the primate species called “financial advisor” has evolved out of this belief).

Related content
New method leverages vision-language models to formalize a comparison that had previously required human judgment.

Technology does not have the limitations of biological systems: there are no hard memory bounds (we can always add more storage) and no hard computational bounds (we can fire up more computers), at least until we hit cosmic limits. If we accept that machines do not have the same limitations as biology, what is the best inference paradigm for them? That is, given a training set and a test query, how can they devise the best answer?[1] If we want our model to operate in the constantly evolving real world, we shouldn’t assume the existence of a single distribution from which all data are drawn, in principio, nunc, et semper.

Inference that allows processing the training data at inference time is called transductive inference, or transduction. Transduction calls for us to memorize and reason, unlike induction, which wants us to generalize and forget. To perform optimal inference with respect to any hypothetical distribution in the future, one must memorize past data and, only when presented with a specific query, deploy “reasoning” skills and access memory to compute the best possible answer to that query.

Induction calls for forgetting what does not matter during training, under the assumption that the training set is representative of all future data. But in reality, one cannot know what data will be useful when, so memorization is wise if one can afford it, even when the data — like the writing on John Hopfield’s lab’s wall — does not make sense in that moment.

Transductive inference from inductive learning

Uniform generalization bounds may seem powerful because they are valid for any distribution; but for them to work, there can be only one distribution from which both past and future data are independently sampled. Paraphrasing the statistician Bruno de Finetti, this distribution does not exist in any objective or material sense. It is an abstract concept, the product of our imagination. Something we concoct to guide our intuition and analysis.

Related content
In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.

The inductive hypothesis is fundamentally not verifiable: any finite training data could have been drawn with identical likelihood from infinitely many distributions, so even if there was a single true one, how would we know which? Once the present is past, we cannot repeat the experiment. The inductive hypothesis is a statement of faith and uniform generalization bounds an expression of hope, not quite within the scientific realm.

Don’t get me wrong: hope can pay off. The future often does resemble the past. But many of the mechanisms that generate the data we care about today, in business, finance, climate, and language, evolve over time. The same word can carry a different meaning today than it did a century, or even a decade, ago. The point is that whether the inductive hypothesis holds or not cannot be known ahead of time.

Solomonoff inference

What if we forgo generalization and embrace memorization and reasoning? Is that what LLMs are doing? If so, where are they heading? What does the limit of optimal transductive inference look like?

The answer was given in 1964 by the mathematician Ray Solomonoff and is now known, somewhat confusingly, as Solomonoff induction. I will refer to it as Solomonoff inference, which can be thought of as the limit of scaling laws when we allow memory, computational capacity, and time to grow to infinity.

Solomonoff inference is optimal with respect to all computable distributions, averaged with respect to the universal prior. The Church-Turing thesis predicates that any physically realizable mechanism belongs to this class. While infeasible in practice, since it requires infinite resources, Solomonoff’s algorithm is quite simple: execute all programs in increasing order of length until one manages to spit out all the data observed up to now, bit by bit, if it terminates.

Related content
The surprising dynamics related to learning that are common to artificial and biological systems.

The optimal algorithm is basically a lookup table with a switch. There is no insight, no knowledge, not even learning. If presented with the same query twice in a row, the optimal algorithm would repeat the same procedure all over, having learned nothing from past experience.

Solomonoff inference is quite unlike neural networks, which are trained by comparing gradient vectors in a high-dimensional space, where the data are embedded. But could it be that, as we scale LLMs to larger and larger sizes, their behavior is beginning to resemble Solomonoff inference? After all, LLMs are known to memorize, albeit imperfectly, and they can perform universal computation, at least if augmented with a scratchpad. Indeed, LLMs are already able to perform rudimentary transductive inference, now known as “in-context learning” — somewhat confusingly, as it involves no learning: if presented with the same context twice, an LLM would repeat the same process, with no improvement from experience.

So, if LLMs were to begin to perform Solomonoff inference, would they become “superintelligent”? Given no accepted definition of intelligence, let alone its superlatives, many tacitly assume inference performance as its proxy: “smarter” models (or students) perform better on tests, whether the SAT, GRE, or BAR, or the famed IMO math competition. The higher the score, the more “intelligent” the model must be! But the absolute best would be Solomonoff’s algorithm, and no matter what one’s definition of intelligence is, Solomonoff’s algorithm cannot meet it: if by mistake the IMO printed each question twice, Solomonoff’s algorithm would redo the same work twice, not exactly what most would call “intelligent” behavior.

As an analogy, an “inductive student” is a diligent pupil who studies the textbook and completes all homework assignments and practice problems before showing up at the exam. So long as the questions are close enough to practice problems, the inductive student does well. On the occasional odd (or out-of-distribution, as a believer in induction would say) question, the inductive student may not do as well.

By contrast, the “transductive student” does not study at all and instead shows up at the exam with the textbook in hand. Only after reading the first question does the transductive student go through the book to find all the pieces needed to assemble an answer. The student could, in principle, repeat the exercise all the way to the last question, learning nothing in the process. As Solomonoff showed us, there is no need to be smart if one has unbounded time, memory, and computational power.

Do we want models that perform well on benchmark exams, or is the kind of “intelligence” we want something else? Fortunately, inductive and transductive inference are not mutually exclusive. In fact, their difference is quite subtle, as one could frame either as a special case of the other, and the two coincide when the data are independently and identically distributed.

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

What matters is that LLMs are inductively trained transductive-inference engines and can therefore support both forms of inference.[2] They are capable of performing inference by inductive learning, like any trained classifier, akin to Daniel Kahneman’s “system 1” behavior — the fast thinking of his book title Thinking Fast and Slow. But LLMs are also capable of rudimentary forms of transduction, such as in-context-learning and chain of thought, which we may call system 2 — slow-thinking — behavior. The more sophisticated among us have even taught LLMs to do deduction — the ultimate test for their emergent abilities.

AI models’ inferential abilities are improving organically with scale — although they’re still inferior to those of the best humans on most tasks. But they are also being actively fostered through the use of formal-verification tools such as LEAN, as is happening at AWS. One could call this paradigm Solomonic learning: embrace memorization and foster reasoning, yet do not eschew induction. Simple tasks that might benefit from past experience can be solved inductively, saving time and energy, but doing so requires “understanding” and “insight”.

Given that paradigm, the question is what classes of models best support Solomonic learning.

Architectures for Solomonic learning

Solomonic learning requires models that can memorize and perform computation at inference time, in addition to performing ordinary induction. The model architectures therefore need eidetic (verbatim) working memory, which could fade over time, to support computation; but they also need long-term memory to easily retrieve facts from the distant past (the purpose for which humans invented the printing press).

To adapt to changing conditions, they need their long-term memory to decay in synchrony with changes to the mechanisms that generate the data they process. Evolution does that for biological agents, to the benefit of the species rather than any one individual. Transformers, the workhorses of current LLMs, have eidetic (verbatim) memory “in context”, but only until tokens slide out of context. They also have permanent memory “in weights”, but training data are not accessible eidetically from the weights, and there is no long-term adaptation. Eidetic long-term memory can be accessed through RAG (retrieval-augmented generation), but in current Transformers, RAG is not integrated into the primary (autoregressive) inference loop.

Stochastic realization theory and input-dependent state space models

Half a century ago, stochastic realization theory tackled the question of how to model sequential data for downstream decision or control tasks. The “state” of the model was defined as the function of past data that is sufficient for the future, meaning that, given the state, one can discard all past data and predict future data as well as if the data had been retained.

The trivial state is the data itself. An optimal state, by definition, supports an optimal predictor, which is one that makes the prediction error unpredictable. Then, by construction, the state contains all the “information” in past data. During training, the states of LLMs are their weights, so it should be no surprise that next-token prediction is the method of choice for training them. During inference, the state of a Transformer-based LLM is the sliding window of tokens, which is “deadbeat”, meaning that it decays to zero in finite steps without a driving input.

B'MOJO.jpg
In B’MOJO, a state-space model (SSM) computes a fading memory that represents long-range dependencies through a fixed-dimensional representation (pink). The eidetic memory, by contrast, selects tokens from the past (dark-blue x's) using an innovation test over the SSM output and appends them to the current sliding window. Adapted from "B'MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory".

In general, as we observe more and more data during both training and inference, the state must grow apace. In the 1970s, an unbounded state was unthinkable, so the key question was how to find a fixed-dimensional state that is optimal even as the data volume grows to infinity. Therefore, stochastic realization theory focused on Markov processes that admit a finite-dimensional state.

Since any finite-memory sequence could be modeled as the output of a linear model driven by white zero-mean Gaussian noise, the attention was all on linear state-space models (SSMs). While simplistic, such SSMs were good enough to take us to the moon. Today, an unbounded state is not unthinkable. Nonetheless, LLM weights are fixed after training, and the context size is imposed by hardware limitations. So we need richer architecture families.

As an aside, I wish to stress the distinction between the model, which is any state-space realization that supports optimal prediction (there are generally infinitely many), and the system, which is the “real” mechanism that generates the data. The system is unknown and unknowable; the model is tangible and entirely under our control. Although as engineers we are trained to believe that models of the world converge to the “true” system as they improve, this position — known in epistemology as "naïve realism" — is scientifically indefensible.[3]

Amazon’s Stefano Soatto on how learning representations came to dominate machine learning.

To stress the dichotomy between the system and the model, in 1979, Anders Lindqvist and Giorgio Picci derived an equation that, four decades later, is at the heart of diffusion models. In a dissipative physical system, time cannot be reversed, bu it can in a model of that system, for instance a Gaussian SSM. The structure of the reverse diffusion in the model is the same as the forward diffusion, a fact that is exploited in diffusion models for image generation.[4]

Unlike deadbeat Transformers, SSMs have unbounded memory, but it fades, making them incompatible with optimal transductive inference. Again in the 1970s, the late Roger Brockett triggered a burst of interest in input-dependent state-space models, where some of the parameters are affected by the input, the simplest case being when they interact (bi-)linearly with the state. Art Krener showed that such bilinear SSMs can approximate an arbitrarily complex nonlinear (smooth) model. Alberto Isidori and coworkers extended stochastic realization theory to bilinear models, but still with an eye to making the state as small as possible.

Even 30 years later, prior to the deep-learning revolution, when we used input-dependent SSMs to generate videos of dynamic textures, we were still focused on keeping the state dimension as small as possible, encouraged by the fact that 20 states were sufficient to animate and control the rendering of waterfalls, flames, smoke, foliage, talking faces, and other stationary processes. Thanks to the reversibility of the model, we could even make smoke or steam move faster, slower, or backwards!

Deep learning twisted Occam’s razor by trying to make the embedding dimension of the training state (the weights) as large as possible, not as small as possible. Dimension is only an upper bound on “information,” and the key to induction is to limit the “information” in, not the dimension of, the trained weights.[5] Two decades later, we stacked SSMs into a neural architecture by feeding the (input-dependent) prediction residual of one layer to the next.

A breakthrough came with Mamba, which showed that efficient implementation at the hardware level is key. When Mamba is stripped down (as it is in appendix E of our recent paper on architectures to support transductive inference), it is a stack of bilinear SSMs (which Mamba’s developers call “selective state-space models”) restricted to non-interacting states (diagonal dynamics), so it can be implemented efficiently in hardware.

Diagonal SSMs are disjoint from and complementary to Transformers. Autoregressive (AR) Transformers have nilpotent dynamics, meaning that the state transition matrix becomes zero in a finite number of steps in the absence of external input. Mamba has diagonal dynamics, and nilpotent matrices cannot be diagonalized. Diagonal SSMs support infinite fading memory; AR Transformers support finite eidetic memory, and neither is general. Instead, any general (bi-)linear system can be converted to a so-called canonical form, also derived in the 1970s, which can support both eidetic and fading memory.

Meet B’MOJO

B’MOJO is a family of architectures based on canonical realizations that include Transformers, Mamba-like SSMs, and any hybrid combination of the two. There are combinatorially many options, and the name of the game is to find those that are sufficiently general to support different memory regimes yet can be efficiently mapped to specific hardware in order to scale. We plan to release basic versions of B’MOJO both for GPU hardware and for Amazon’s Trainium hardware, so they can be easily compared with existing Transformers, SSMs, and hybrid architectures.

The writing on the wall

While a representation of the “true” system is fundamentally elusive, lending credence to the writing on the wall of John Hopfield’s lab back in 1992, building model realizations is a concrete exercise grounded in data. LLMs, where the “L” refers not to natural language but to the inner language that emerges in the trained model at scale, are stochastic realizations trained inductively as optimal predictors and coopted for (suboptimal) transductive inference and generation. If the training data subtend latent logical structures, as do sensory data such as visual or acoustic data, models trained as optimal predictors are forced to capture their statistical structure.

Related content
From the urgent challenge of "machine unlearning" to overcoming the problem of critical learning periods in deep neural networks, Alessandro Achille is tackling fundamental issues on behalf of Amazon customers.

Thus, LLMs in our parlance include so-called world models trained with visual, acoustic, olfactory, tactile, and other sensory data. The model is indifferent to whether tokenized data express some abstract concept in natural language or a physical measurement process in finite precision. The resulting LLMs can represent concepts and meanings, including physical concepts such as the laws of physics, and can in principle reason, although at present they appear to be mostly building ever bigger lookup tables. Regardless, as stochastic dynamical models, LLMs can be controlled, probed with causal interventions, made observable, and studied with the tools of dynamical-systems theory.

A model is an abstraction of the underlying world — not a representation of it, because there is no objective “it” to re-present, but a realization of it, made real through the only objective entity, which is the data. Synthetic data are just as real to the model as data produced by a physical measurement process, and aligning the two is the essence of perception, for this reason often referred to as controlled hallucination.

While much of the popular discourse denigrates hallucinations[6] as something to be avoided, the ability to hallucinate is necessary for reasoning. The question is not how to avoid hallucinations but how to control them, which is the process of alignment. Architectures designed for decision and control can help, and decades of work in dynamical systems and controls may provide insights — hopefully without the need to resort to divinity, as the writing on the wall suggested.

Footnotes

[1] Note that "best" does not mean "correct." If the data is insufficient to identify the correct conclusion, even the best answer can be wrong.

[2] The simplest form of inductive learning for transductive inference is transductive fine-tuning, a form of meta-learning: past data is used to "meta-train" a model that, at inference time, is fine-tuned with a small number of examples ("few shots") to perform a new task. LLMs take this program steps further, by using sequential data with a latent logical structure (not only natural language but also video, audio, and other signals) to produce an “inner language” (we call it "Neuralese") that can then be co-opted for transductive inference.

[3] Quoting Bertrand Russell: “We all start from 'naïve realism,' i.e., the doctrine that things are what they seem. ... The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself. Thus science seems to be at war with itself: when it most means to be objective, it finds itself plunged into subjectivity against its will. Naïve realism leads to physics, and physics, if true, shows that naïve realism is false. Therefore naïve realism, if true, is false; therefore it is false.” Even the International Vocabulary of Metrology has dispensed with the notion of “true value” in its most recent revisions.

[4] In the paper that introduced diffusion models for image generation, the reverse-diffusion equation was attributed to a 1949 work of Feller. However, forward diffusion in the form in use today was not derived until 1960, so neither was reverse diffusion. Later references attribute the reverse-diffusion equation to a 1982 paper by B. D. O. Anderson, which, however, did not introduce it but instead described it, based on the 1979 paper of Lindqvist and Picci, correctly referenced in Anderson’s work, and extended it to more general models different from those in use in diffusion models today. The correct reference for the reverse-diffusion equation used in diffusion models is therefore Lindqvist-Picci 1979.

[5] I use quotes because defining information for the weights of a trained model entails some subtleties, but it can be done.

[6] "Hallucinations" are data generated by a model that are statistically compatible with the training set (in the sense of high likelihood under the trained model), yet "wrong", i.e., individually inconsistent with constraints that some external oracle has deemed "true" ("facts", or "axioms"). In other words, hallucinations are the product of any generative model. Outside formalized domains such as math or code, there is no objective "truth", so the oracle is replaced by an accepted knowledge base, which depends on the application. For "common sense" knowledge, the base is generally a large corpus of (more or less) verified facts, such as WikiData. Outside formalized domains, including the law, there is no guarantee that the facts or "axioms" are mutually compatible.

Research areas

Related content

US, CA, San Francisco
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents. As an MTS on our team, you will design, build, and maintain a Spark-based infrastructure to process and manage large datasets critical for machine learning research. You’ll work closely with our researchers to develop data workflows and tools that streamline the preparation and analysis of massive multimodal datasets, ensuring efficiency and scalability. We operate at Amazon's large scale with the energy of a nimble start-up. If you have a learner's mindset, enjoy solving challenging problems and value an inclusive and collaborative team culture, you will thrive in this role, and we hope to hear from you. Key job responsibilities * Develop and maintain reliable infrastructure to enable large-scale data extraction and transformation. * Work closely with researchers to create tooling for emerging data-related needs. * Manage project prioritization, deliverables, timelines, and stakeholder communication. * Illuminate trade-offs, educate the team on best practices, and influence technical strategy. * Operate in a dynamic environment to deliver high quality software.
IN, KA, Bangalore
Have you ever ordered a product on Amazon and when that box with the smile arrived you wondered how it got to you so fast? Have you wondered where it came from and how much it cost Amazon to deliver it to you? If so, the WW Amazon Logistics, Business Analytics team is for you. We manage the delivery of tens of millions of products every week to Amazon’s customers, achieving on-time delivery in a cost-effective manner. We are looking for an enthusiastic, customer obsessed, Applied Scientist with good analytical skills to help manage projects and operations, implement scheduling solutions, improve metrics, and develop scalable processes and tools. The primary role of an Operations Research Scientist within Amazon is to address business challenges through building a compelling case, and using data to influence change across the organization. This individual will be given responsibility on their first day to own those business challenges and the autonomy to think strategically and make data driven decisions. Decisions and tools made in this role will have significant impact to the customer experience, as it will have a major impact on how the final phase of delivery is done at Amazon. Candidates will be a high potential, strategic and analytic graduate with a PhD in (Operations Research, Statistics, Engineering, and Supply Chain) ready for challenging opportunities in the core of our world class operations space. Great candidates have a history of operations research, and the ability to use data and research to make changes. This role requires robust program management skills and research science skills in order to act on research outcomes. This individual will need to be able to work with a team, but also be comfortable making decisions independently, in what is often times an ambiguous environment. Responsibilities may include: - Develop input and assumptions based preexisting models to estimate the costs and savings opportunities associated with varying levels of network growth and operations - Creating metrics to measure business performance, identify root causes and trends, and prescribe action plans - Managing multiple projects simultaneously - Working with technology teams and product managers to develop new tools and systems to support the growth of the business - Communicating with and supporting various internal stakeholders and external audiences
US, NY, New York
Amazon is investing heavily in building a world class advertising business and we are responsible for defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. The Ad Response Prediction team in the Sponsored Products organization builds GenAI-based shopper understanding and audience targeting systems, along with advanced deep-learning models for Click-through Rate (CTR) and Conversion Rate (CVR) predictions. We develop large-scale machine-learning (ML) pipelines and real-time serving infrastructure to match shoppers' intent with relevant ads across all devices, contexts, and marketplaces. Through precise estimation of shoppers' interactions with ads and their long-term value, we aim to drive optimal ad allocation and pricing, helping to deliver a relevant, engaging, and delightful advertising experience to Amazon shoppers. As our business grows and we undertake increasingly complex initiatives, we are looking for entrepreneurial, and self-driven science leaders to join our team. Key job responsibilities As a Principal Applied Scientist in the team, you will: * Seek to understand in depth the Sponsored Products offering at Amazon and identify areas of opportunities to grow our business via principled ML solutions. * Mentor and guide the applied scientists in our organization and hold us to a high standard of technical rigor and excellence in ML. * Design and lead organization wide ML roadmaps to help our Amazon shoppers have a delightful shopping experience while creating long term value for our sellers. * Work with our engineering partners and draw upon your experience to meet latency and other system constraints. * Identify untapped, high-risk technical and scientific directions, and simulate new research directions that you will drive to completion and deliver. * Be responsible for communicating our ML innovations to the broader internal & external scientific community.
CA, BC, Vancouver
Do you want a role with deep meaning and the ability to make a major impact? As part of Intelligent Talent Acquisition (ITA), you'll have the opportunity to reinvent the hiring process and deliver unprecedented scale, sophistication, and accuracy for Amazon Talent Acquisition operations. ITA is an industry-leading people science and technology organization made up of scientists, engineers, analysts, product professionals and more, all with the shared goal of connecting the right people to the right jobs in a way that is fair and precise. Last year we delivered over 6 million online candidate assessments, and helped Amazon deliver billions of packages around the world by making it possible to hire hundreds of thousands of workers in the right quantity, at the right location and at exactly the right time. You’ll work on state-of-the-art research, advanced software tools, new AI systems, and machine learning algorithms, leveraging Amazon's in-house tech stack to bring innovative solutions to life. Join ITA in using technologies to transform the hiring landscape and make a meaningful difference in people's lives. Together, we can solve the world's toughest hiring problems. Global Hiring Science owns and develops products and services using Artificial Intelligence and Machine Learning (ML) that enhance recruitment. We collaborate with scientists to build and maintain machine learning solutions for hiring, offering opportunities to both apply and develop ML engineering skills in a production environment. Key job responsibilities • Design and implement advanced AI models using the latest LLM and GenAI technologies to develop fair and accurate machine learning models for hiring. • Clearly and cogently present your work and ideas, and respond effectively to feedback. • Collaborate with cross-functional teams with Research Scientists and Software Engineers to integrate AI-driven products into Amazon’s hiring process. • Stay at the advance of AI research, continuously exploring and implementing new techniques in NLP, LLMs, and GenAI to drive innovation in hiring. • Implement advanced natural language processing models to extract insights from diverse data sources. • Ensure effective teamwork, communication, collaboration, and commitment across multiple teams with competing priorities. • Contribute to the scientific community through publications, presentations, and collaborations with academic institutions. About the team The mission of Global Hiring Science (GHS) is to improve both the efficiency and effectiveness of hiring across Amazon with assessments and interview improvements. We are a team of experts in machine learning, industrial-organizational psychology, data science, and measuring the knowledge, skills, and abilities that it takes to be successful at Amazon.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our research builds on that of Amazon’s broader AGI organization, which recently introduced Amazon Nova, a new generation of state-of-the-art foundation models (FMs). Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities You will contribute directly to AI agent development in an applied research role, including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.
US, WA, Seattle
PXTCS is looking for an economist who can apply economic methods to address business problems. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure impact, and transform successful prototypes into improved policies and programs at scale. PXTCS is looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team. Ideal candidates will work in a team setting with individuals from diverse disciplines and backgrounds. They will work with teammates to develop scientific models and conduct the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. A day in the life The Economist will work with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions. About the team The People eXperience and Technology Central Science (PXTCS) team uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. PXTCS is an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.
US, CA, San Francisco
The Amazon General Intelligence “AGI” organization is looking for an Executive Assistant to support leaders of our Autonomy Team in our growing AI Lab space located in San Francisco. This role is ideal for exceptionally talented, dependable, customer-obsessed, and self-motivated individuals eager to work in a fast paced, exciting and growing team. This role serves as a strategic business partner, managing complex executive operations across the AGI organization. The position requires superior attention to detail, ability to meet tight deadlines, excellent organizational skills, and juggling multiple critical requests while proactively anticipating needs and driving improvements. High integrity, discretion with confidential information, and professionalism are essential. The successful candidate will complete complex tasks and projects quickly with minimal guidance, react with appropriate urgency, and take effective action while navigating ambiguity. Flexibility to change direction at a moment's notice is critical for success in this role. Key job responsibilities - Serve as strategic partner to senior leadership, identifying opportunities to improve organizational effectiveness and drive operational excellence - Manage complex calendars and scheduling for multiple executives - Drive continuous improvement through process optimization and new mechanisms - Coordinate team activities including staff meetings, offsites, and events - Schedule and manage cost-effective travel - Attend key meetings, track deliverables, and ensure timely follow-up - Create expense reports and manage budget tracking - Serve as liaison between executives and internal/external stakeholders - Build collaborative relationships with Executive Assistants across the company and with critical external partners - Help us build a great team culture in the SF Lab!
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as perception, manipulation, science understanding, locomotion, manipulation, sim2real transfer, multi-modal foundation models and multi-task robot learning, designing novel frameworks that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Drive independent research initiatives across the robotics stack, including robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Lead full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development, ensuring robust performance in production environments - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack, optimizing and scaling models for real-world applications - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement novel foundation model architectures and innovative systems and algorithms, leveraging our extensive infrastructure to prototype and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As a Senior Applied Scientist, you'll spearhead the development of breakthrough foundation models and full-stack robotics systems that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive technical excellence in areas such as perception, manipulation, science understanding, locomotion, manipulation, sim2real transfer, multi-modal foundation models and multi-task robot learning, designing novel frameworks that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll combine hands-on technical work with scientific leadership, ensuring your team delivers robust solutions for dynamic real-world environments. You'll leverage Amazon's vast computational resources to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Lead technical initiatives across the robotics stack, driving breakthrough approaches through hands-on research and development in areas including robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Guide technical direction for full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development, ensuring robust performance in production environments - Mentor fellow scientists while maintaining strong individual technical contributions - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack - Influence technical decisions and implementation strategies within your area of focus A day in the life - Design and implement novel foundation model architectures and innovative systems and algorithms, leveraging our extensive infrastructure to prototype and evaluate at scale - Guide fellow scientists in solving complex technical challenges across the full robotics stack - Lead focused technical initiatives from conception through deployment, ensuring successful integration with production systems - Drive technical discussions within your team and with key stakeholders - Conduct experiments and prototype new ideas using our massive compute cluster and extensive robotics infrastructure - Mentor team members while maintaining significant hands-on contribution to technical solutions About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Amazon AGI Autonomy develops foundational capabilities for useful AI agents. We are the research lab behind Amazon Nova Act, a state-of-the-art computer-use agent. Our work combines Large Language Models (LLMs) with Reinforcement Learning (RL) to solve reasoning, planning, and world modeling in the virtual world. We are a small, talent-dense team with the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. Come be a part of our journey! --- About the team We’re looking for a generalist software engineer to build and evolve our internal data platform. The team builds data-intensive services that ingest, process, store, and distribute multi-modal training data across multiple internal and external sources. This work emphasizes data integrity, reliability, and extensibility in support of large-scale training and experimentation workloads. The team also builds and maintains APIs and SDKs that enable product engineers and researchers to build on top of the platform. As research directions change, so does our data, and today the team is focused on hardening the platform to reliably deliver an evolving set of data schemas, sources, and modalities. By building strong foundations and durable abstractions, we aim to enable new kinds of tooling and workflows over time. The team will play a key role in shaping them as the research evolves. --- Key job responsibilities * Build and operate reliable, performant backend and data platform services that support continuous ingestion and use of multi-modal training data. * Identify and implement opportunities to accelerate data generation, validation, and usage across training and evaluation workflows from multiple internal and external sources. * Partner closely with Human Feedback, Data Generation, Product Engineering, and Research teams to evolve and scale the data platform, APIs, and SDKs. * Own projects end to end, from technical design and implementation through deployment, observability, and long-term maintainability. * Write clear technical documentation and communicate design decisions and tradeoffs to stakeholders across multiple teams. * Raise the team’s technical aptitude through thoughtful code reviews, knowledge sharing, and mentorship.