Solomonic learning: Large language models and the art of induction

Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

“One year of research in neural networks is sufficient to believe in God.” The writing on the wall of John Hopfield’s lab at Caltech made no sense to me in 1992. Three decades later, and after years of building large language models, I see its sense if one replaces sufficiency with necessity: understanding neural networks as we teach them today requires believing in an immanent entity.

Stefano Soatto.png
Stefano Soatto, a vice president and distinguished scientist with Amazon Web Services.
Credit: UCLA Samueli

Let’s start from the basics: when we teach machine learning, we say that memorization is bad, because it leads to overfitting and prevents generalization. Generalization is good — so good that, to achieve it, we incentivize machines not to memorize, through “regularization”. We even prove theorems — so-called uniform generalization bounds — that guarantee generalization no matter what distribution the data are drawn from, provided we avoid memorization.

But my mother always told me not to generalize, and she had me commit to memory countless useless poems in elementary school. Why am I teaching that generalization is good and memorization is bad, when I was taught the opposite?

Biology vs. technology

Machine learning has historically drawn inspiration from biology. But biological systems have hard ontogenic and phylogenic memory bounds: our synapses cannot memorize everything we experience, and our DNA cannot transmit the knowledge we’ve accumulated to our descendants. (As an educator and father, I often wished I could upload what I have learned into my students and kids. I haven’t figured that one out, but can we at least do it for AI models?) Furthermore, biology imposes a strong evolutionary bias toward minimizing inference latency: when facing an animal in the wild and having to determine who’s whose meal, we can’t reason through all past memories lest the decision be made for us.

In other words, biological systems are forced to adopt inductive learning, using specific data from the past (or a “training set”) to devise a process for handling any future data. Success in inference from inductive learning (or more simply, induction) relies on the so-called inductive hypothesis, that past performance can guarantee future rewards (the primate species called “financial advisor” has evolved out of this belief).

Related content
New method leverages vision-language models to formalize a comparison that had previously required human judgment.

Technology does not have the limitations of biological systems: there are no hard memory bounds (we can always add more storage) and no hard computational bounds (we can fire up more computers), at least until we hit cosmic limits. If we accept that machines do not have the same limitations as biology, what is the best inference paradigm for them? That is, given a training set and a test query, how can they devise the best answer?[1] If we want our model to operate in the constantly evolving real world, we shouldn’t assume the existence of a single distribution from which all data are drawn, in principio, nunc, et semper.

Inference that allows processing the training data at inference time is called transductive inference, or transduction. Transduction calls for us to memorize and reason, unlike induction, which wants us to generalize and forget. To perform optimal inference with respect to any hypothetical distribution in the future, one must memorize past data and, only when presented with a specific query, deploy “reasoning” skills and access memory to compute the best possible answer to that query.

Induction calls for forgetting what does not matter during training, under the assumption that the training set is representative of all future data. But in reality, one cannot know what data will be useful when, so memorization is wise if one can afford it, even when the data — like the writing on John Hopfield’s lab’s wall — does not make sense in that moment.

Transductive inference from inductive learning

Uniform generalization bounds may seem powerful because they are valid for any distribution; but for them to work, there can be only one distribution from which both past and future data are independently sampled. Paraphrasing the statistician Bruno de Finetti, this distribution does not exist in any objective or material sense. It is an abstract concept, the product of our imagination. Something we concoct to guide our intuition and analysis.

Related content
In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.

The inductive hypothesis is fundamentally not verifiable: any finite training data could have been drawn with identical likelihood from infinitely many distributions, so even if there was a single true one, how would we know which? Once the present is past, we cannot repeat the experiment. The inductive hypothesis is a statement of faith and uniform generalization bounds an expression of hope, not quite within the scientific realm.

Don’t get me wrong: hope can pay off. The future often does resemble the past. But many of the mechanisms that generate the data we care about today, in business, finance, climate, and language, evolve over time. The same word can carry a different meaning today than it did a century, or even a decade, ago. The point is that whether the inductive hypothesis holds or not cannot be known ahead of time.

Solomonoff inference

What if we forgo generalization and embrace memorization and reasoning? Is that what LLMs are doing? If so, where are they heading? What does the limit of optimal transductive inference look like?

The answer was given in 1964 by the mathematician Ray Solomonoff and is now known, somewhat confusingly, as Solomonoff induction. I will refer to it as Solomonoff inference, which can be thought of as the limit of scaling laws when we allow memory, computational capacity, and time to grow to infinity.

Solomonoff inference is optimal with respect to all computable distributions, averaged with respect to the universal prior. The Church-Turing thesis predicates that any physically realizable mechanism belongs to this class. While infeasible in practice, since it requires infinite resources, Solomonoff’s algorithm is quite simple: execute all programs in increasing order of length until one manages to spit out all the data observed up to now, bit by bit, if it terminates.

Related content
The surprising dynamics related to learning that are common to artificial and biological systems.

The optimal algorithm is basically a lookup table with a switch. There is no insight, no knowledge, not even learning. If presented with the same query twice in a row, the optimal algorithm would repeat the same procedure all over, having learned nothing from past experience.

Solomonoff inference is quite unlike neural networks, which are trained by comparing gradient vectors in a high-dimensional space, where the data are embedded. But could it be that, as we scale LLMs to larger and larger sizes, their behavior is beginning to resemble Solomonoff inference? After all, LLMs are known to memorize, albeit imperfectly, and they can perform universal computation, at least if augmented with a scratchpad. Indeed, LLMs are already able to perform rudimentary transductive inference, now known as “in-context learning” — somewhat confusingly, as it involves no learning: if presented with the same context twice, an LLM would repeat the same process, with no improvement from experience.

So, if LLMs were to begin to perform Solomonoff inference, would they become “superintelligent”? Given no accepted definition of intelligence, let alone its superlatives, many tacitly assume inference performance as its proxy: “smarter” models (or students) perform better on tests, whether the SAT, GRE, or BAR, or the famed IMO math competition. The higher the score, the more “intelligent” the model must be! But the absolute best would be Solomonoff’s algorithm, and no matter what one’s definition of intelligence is, Solomonoff’s algorithm cannot meet it: if by mistake the IMO printed each question twice, Solomonoff’s algorithm would redo the same work twice, not exactly what most would call “intelligent” behavior.

As an analogy, an “inductive student” is a diligent pupil who studies the textbook and completes all homework assignments and practice problems before showing up at the exam. So long as the questions are close enough to practice problems, the inductive student does well. On the occasional odd (or out-of-distribution, as a believer in induction would say) question, the inductive student may not do as well.

By contrast, the “transductive student” does not study at all and instead shows up at the exam with the textbook in hand. Only after reading the first question does the transductive student go through the book to find all the pieces needed to assemble an answer. The student could, in principle, repeat the exercise all the way to the last question, learning nothing in the process. As Solomonoff showed us, there is no need to be smart if one has unbounded time, memory, and computational power.

Do we want models that perform well on benchmark exams, or is the kind of “intelligence” we want something else? Fortunately, inductive and transductive inference are not mutually exclusive. In fact, their difference is quite subtle, as one could frame either as a special case of the other, and the two coincide when the data are independently and identically distributed.

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

What matters is that LLMs are inductively trained transductive-inference engines and can therefore support both forms of inference.[2] They are capable of performing inference by inductive learning, like any trained classifier, akin to Daniel Kahneman’s “system 1” behavior — the fast thinking of his book title Thinking Fast and Slow. But LLMs are also capable of rudimentary forms of transduction, such as in-context-learning and chain of thought, which we may call system 2 — slow-thinking — behavior. The more sophisticated among us have even taught LLMs to do deduction — the ultimate test for their emergent abilities.

AI models’ inferential abilities are improving organically with scale — although they’re still inferior to those of the best humans on most tasks. But they are also being actively fostered through the use of formal-verification tools such as LEAN, as is happening at AWS. One could call this paradigm Solomonic learning: embrace memorization and foster reasoning, yet do not eschew induction. Simple tasks that might benefit from past experience can be solved inductively, saving time and energy, but doing so requires “understanding” and “insight”.

Given that paradigm, the question is what classes of models best support Solomonic learning.

Architectures for Solomonic learning

Solomonic learning requires models that can memorize and perform computation at inference time, in addition to performing ordinary induction. The model architectures therefore need eidetic (verbatim) working memory, which could fade over time, to support computation; but they also need long-term memory to easily retrieve facts from the distant past (the purpose for which humans invented the printing press).

To adapt to changing conditions, they need their long-term memory to decay in synchrony with changes to the mechanisms that generate the data they process. Evolution does that for biological agents, to the benefit of the species rather than any one individual. Transformers, the workhorses of current LLMs, have eidetic (verbatim) memory “in context”, but only until tokens slide out of context. They also have permanent memory “in weights”, but training data are not accessible eidetically from the weights, and there is no long-term adaptation. Eidetic long-term memory can be accessed through RAG (retrieval-augmented generation), but in current Transformers, RAG is not integrated into the primary (autoregressive) inference loop.

Stochastic realization theory and input-dependent state space models

Half a century ago, stochastic realization theory tackled the question of how to model sequential data for downstream decision or control tasks. The “state” of the model was defined as the function of past data that is sufficient for the future, meaning that, given the state, one can discard all past data and predict future data as well as if the data had been retained.

The trivial state is the data itself. An optimal state, by definition, supports an optimal predictor, which is one that makes the prediction error unpredictable. Then, by construction, the state contains all the “information” in past data. During training, the states of LLMs are their weights, so it should be no surprise that next-token prediction is the method of choice for training them. During inference, the state of a Transformer-based LLM is the sliding window of tokens, which is “deadbeat”, meaning that it decays to zero in finite steps without a driving input.

B'MOJO.jpg
In B’MOJO, a state-space model (SSM) computes a fading memory that represents long-range dependencies through a fixed-dimensional representation (pink). The eidetic memory, by contrast, selects tokens from the past (dark-blue x's) using an innovation test over the SSM output and appends them to the current sliding window. Adapted from "B'MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory".

In general, as we observe more and more data during both training and inference, the state must grow apace. In the 1970s, an unbounded state was unthinkable, so the key question was how to find a fixed-dimensional state that is optimal even as the data volume grows to infinity. Therefore, stochastic realization theory focused on Markov processes that admit a finite-dimensional state.

Since any finite-memory sequence could be modeled as the output of a linear model driven by white zero-mean Gaussian noise, the attention was all on linear state-space models (SSMs). While simplistic, such SSMs were good enough to take us to the moon. Today, an unbounded state is not unthinkable. Nonetheless, LLM weights are fixed after training, and the context size is imposed by hardware limitations. So we need richer architecture families.

As an aside, I wish to stress the distinction between the model, which is any state-space realization that supports optimal prediction (there are generally infinitely many), and the system, which is the “real” mechanism that generates the data. The system is unknown and unknowable; the model is tangible and entirely under our control. Although as engineers we are trained to believe that models of the world converge to the “true” system as they improve, this position — known in epistemology as "naïve realism" — is scientifically indefensible.[3]

Amazon’s Stefano Soatto on how learning representations came to dominate machine learning.

To stress the dichotomy between the system and the model, in 1979, Anders Lindqvist and Giorgio Picci derived an equation that, four decades later, is at the heart of diffusion models. In a dissipative physical system, time cannot be reversed, bu it can in a model of that system, for instance a Gaussian SSM. The structure of the reverse diffusion in the model is the same as the forward diffusion, a fact that is exploited in diffusion models for image generation.[4]

Unlike deadbeat Transformers, SSMs have unbounded memory, but it fades, making them incompatible with optimal transductive inference. Again in the 1970s, the late Roger Brockett triggered a burst of interest in input-dependent state-space models, where some of the parameters are affected by the input, the simplest case being when they interact (bi-)linearly with the state. Art Krener showed that such bilinear SSMs can approximate an arbitrarily complex nonlinear (smooth) model. Alberto Isidori and coworkers extended stochastic realization theory to bilinear models, but still with an eye to making the state as small as possible.

Even 30 years later, prior to the deep-learning revolution, when we used input-dependent SSMs to generate videos of dynamic textures, we were still focused on keeping the state dimension as small as possible, encouraged by the fact that 20 states were sufficient to animate and control the rendering of waterfalls, flames, smoke, foliage, talking faces, and other stationary processes. Thanks to the reversibility of the model, we could even make smoke or steam move faster, slower, or backwards!

Deep learning twisted Occam’s razor by trying to make the embedding dimension of the training state (the weights) as large as possible, not as small as possible. Dimension is only an upper bound on “information,” and the key to induction is to limit the “information” in, not the dimension of, the trained weights.[5] Two decades later, we stacked SSMs into a neural architecture by feeding the (input-dependent) prediction residual of one layer to the next.

A breakthrough came with Mamba, which showed that efficient implementation at the hardware level is key. When Mamba is stripped down (as it is in appendix E of our recent paper on architectures to support transductive inference), it is a stack of bilinear SSMs (which Mamba’s developers call “selective state-space models”) restricted to non-interacting states (diagonal dynamics), so it can be implemented efficiently in hardware.

Diagonal SSMs are disjoint from and complementary to Transformers. Autoregressive (AR) Transformers have nilpotent dynamics, meaning that the state transition matrix becomes zero in a finite number of steps in the absence of external input. Mamba has diagonal dynamics, and nilpotent matrices cannot be diagonalized. Diagonal SSMs support infinite fading memory; AR Transformers support finite eidetic memory, and neither is general. Instead, any general (bi-)linear system can be converted to a so-called canonical form, also derived in the 1970s, which can support both eidetic and fading memory.

Meet B’MOJO

B’MOJO is a family of architectures based on canonical realizations that include Transformers, Mamba-like SSMs, and any hybrid combination of the two. There are combinatorially many options, and the name of the game is to find those that are sufficiently general to support different memory regimes yet can be efficiently mapped to specific hardware in order to scale. We plan to release basic versions of B’MOJO both for GPU hardware and for Amazon’s Trainium hardware, so they can be easily compared with existing Transformers, SSMs, and hybrid architectures.

The writing on the wall

While a representation of the “true” system is fundamentally elusive, lending credence to the writing on the wall of John Hopfield’s lab back in 1992, building model realizations is a concrete exercise grounded in data. LLMs, where the “L” refers not to natural language but to the inner language that emerges in the trained model at scale, are stochastic realizations trained inductively as optimal predictors and coopted for (suboptimal) transductive inference and generation. If the training data subtend latent logical structures, as do sensory data such as visual or acoustic data, models trained as optimal predictors are forced to capture their statistical structure.

Related content
From the urgent challenge of "machine unlearning" to overcoming the problem of critical learning periods in deep neural networks, Alessandro Achille is tackling fundamental issues on behalf of Amazon customers.

Thus, LLMs in our parlance include so-called world models trained with visual, acoustic, olfactory, tactile, and other sensory data. The model is indifferent to whether tokenized data express some abstract concept in natural language or a physical measurement process in finite precision. The resulting LLMs can represent concepts and meanings, including physical concepts such as the laws of physics, and can in principle reason, although at present they appear to be mostly building ever bigger lookup tables. Regardless, as stochastic dynamical models, LLMs can be controlled, probed with causal interventions, made observable, and studied with the tools of dynamical-systems theory.

A model is an abstraction of the underlying world — not a representation of it, because there is no objective “it” to re-present, but a realization of it, made real through the only objective entity, which is the data. Synthetic data are just as real to the model as data produced by a physical measurement process, and aligning the two is the essence of perception, for this reason often referred to as controlled hallucination.

While much of the popular discourse denigrates hallucinations[6] as something to be avoided, the ability to hallucinate is necessary for reasoning. The question is not how to avoid hallucinations but how to control them, which is the process of alignment. Architectures designed for decision and control can help, and decades of work in dynamical systems and controls may provide insights — hopefully without the need to resort to divinity, as the writing on the wall suggested.

Footnotes

[1] Note that "best" does not mean "correct." If the data is insufficient to identify the correct conclusion, even the best answer can be wrong.

[2] The simplest form of inductive learning for transductive inference is transductive fine-tuning, a form of meta-learning: past data is used to "meta-train" a model that, at inference time, is fine-tuned with a small number of examples ("few shots") to perform a new task. LLMs take this program steps further, by using sequential data with a latent logical structure (not only natural language but also video, audio, and other signals) to produce an “inner language” (we call it "Neuralese") that can then be co-opted for transductive inference.

[3] Quoting Bertrand Russell: “We all start from 'naïve realism,' i.e., the doctrine that things are what they seem. ... The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself. Thus science seems to be at war with itself: when it most means to be objective, it finds itself plunged into subjectivity against its will. Naïve realism leads to physics, and physics, if true, shows that naïve realism is false. Therefore naïve realism, if true, is false; therefore it is false.” Even the International Vocabulary of Metrology has dispensed with the notion of “true value” in its most recent revisions.

[4] In the paper that introduced diffusion models for image generation, the reverse-diffusion equation was attributed to a 1949 work of Feller. However, forward diffusion in the form in use today was not derived until 1960, so neither was reverse diffusion. Later references attribute the reverse-diffusion equation to a 1982 paper by B. D. O. Anderson, which, however, did not introduce it but instead described it, based on the 1979 paper of Lindqvist and Picci, correctly referenced in Anderson’s work, and extended it to more general models different from those in use in diffusion models today. The correct reference for the reverse-diffusion equation used in diffusion models is therefore Lindqvist-Picci 1979.

[5] I use quotes because defining information for the weights of a trained model entails some subtleties, but it can be done.

[6] "Hallucinations" are data generated by a model that are statistically compatible with the training set (in the sense of high likelihood under the trained model), yet "wrong", i.e., individually inconsistent with constraints that some external oracle has deemed "true" ("facts", or "axioms"). In other words, hallucinations are the product of any generative model. Outside formalized domains such as math or code, there is no objective "truth", so the oracle is replaced by an accepted knowledge base, which depends on the application. For "common sense" knowledge, the base is generally a large corpus of (more or less) verified facts, such as WikiData. Outside formalized domains, including the law, there is no guarantee that the facts or "axioms" are mutually compatible.

Research areas

Related content

IN, KA, Bengaluru
RBS (Retail Business Services) Tech team works towards enhancing the customer experience (CX) and their trust in product data by providing technologies to find and fix Amazon CX defects at scale. Our platforms help in improving the CX in all phases of customer journey, including selection, discoverability & fulfilment, buying experience and post-buying experience (product quality and customer returns). The team also develops GenAI platforms for automation of Amazon Stores Operations. As a Sciences team in RBS Tech, we focus on foundational ML research and develop scalable state-of-the-art ML solutions to solve the problems covering customer experience (CX) and Selling partner experience (SPX). We work to solve problems related to multi-modal understanding (text and images), task automation through multi-modal LLM Agents, supervised and unsupervised techniques, multi-task learning, multi-label classification, aspect and topic extraction for Customer Anecdote Mining, image and text similarity and retrieval using NLP and Computer Vision for product groupings and identifying duplicate listings in product search results. Key job responsibilities As an Applied Scientist, you will be responsible to design and deploy scalable GenAI, NLP and Computer Vision solutions that will impact the content visible to millions of customer and solve key customer experience issues. You will develop novel LLM, deep learning and statistical techniques for task automation, text processing, image processing, pattern recognition, and anomaly detection problems. You will define the research and experiments strategy with an iterative execution approach to develop AI/ML models and progressively improve the results over time. You will partner with business and engineering teams to identify and solve large and significantly complex problems that require scientific innovation. You will independently file for patents and/or publish research work where opportunities arise. The RBS org deals with problems that are directly related to the selling partners and end customers and the ML team drives resolution to organization level problems. Therefore, the Applied Scientist role will impact the large product strategy, identifies new business opportunities and provides strategic direction which is very exciting.
IN, KA, Bengaluru
Selection Monitoring team is responsible for making the biggest catalog on the planet even bigger. In order to drive expansion of the Amazon catalog, we develop advanced ML/AI technologies to process billions of products and algorithmically find products not already sold on Amazon. We work with structured, semi-structured and Visually Rich Documents using deep learning, NLP and image processing. The role demands a high-performing and flexible candidate who can take responsibility for success of the system and drive solutions from research, prototype, design, coding and deployment. We are looking for Applied Scientists to tackle challenging problems in the areas of Information Extraction, Efficient crawling at internet scale, developing ML models for website comprehension and agents to take multi-step decisions. You should have depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning. You should also have programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale. You will encounter many challenges, including: - Scale (build models to handle billions of pages), - Accuracy (requirements for precision and recall) - Speed (generate predictions for millions of new or changed pages with low latency) - Diversity (models need to work across different languages, market places and data sources) You will help us to - Build a scalable system which can algorithmically extract information from world wide web. - Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web. - Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents. Key job responsibilities - Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems. - Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes. - Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc. - Work closely with software engineering teams to drive real-time model implementations. - Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance. - Lead projects and mentor other scientists, engineers in the use of ML techniques. - Publish innovation in research forums.
US, CA, Santa Clara
We are seeking an Applied Scientist II to join Amazon Customer Service's Science team, where you will build AI-based automated customer service solutions using state-of-the-art techniques in retrieval-augmented generation (RAG), agentic AI, and post-training of large language models. You will work at the intersection of research and production, developing intelligent systems that directly impact millions of customers while collaborating with scientists, engineers, and product managers in a fast-paced, innovative environment. Key job responsibilities - Design, develop, and deploy information retrieval systems and RAG pipelines using embedding models, reranking algorithms, and generative models to improve customer service automation - Conduct post-training of large language models using techniques such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO) to optimize model performance for customer service tasks - Build and curate high-quality datasets for model training and evaluation, ensuring data quality and relevance for customer service applications - Design and implement comprehensive evaluation frameworks, including data curation, metrics development, and methods such as LLM-as-a-judge to assess model performance - Develop AI agents for automated customer service, understanding their advantages and common pitfalls, and implementing solutions that balance automation with customer satisfaction - Independently perform research and development with minimal guidance, staying current with the latest advances in machine learning and AI - Collaborate with cross-functional teams including engineering, product management, and operations to translate research into production systems - Publish findings and contribute to the broader scientific community through papers, patents, and open-source contributions - Monitor and improve deployed models based on real-world performance metrics and customer feedback A day in the life As an Applied Scientist II, you will start your day reviewing metrics from deployed models and identifying opportunities for improvement. You might spend your morning experimenting with new post-training techniques to improve model accuracy, then collaborate with engineers to integrate your latest model into production systems. You will participate in design reviews, share your findings with the team, and mentor junior scientists. You will balance research exploration with practical implementation, always keeping the customer experience at the forefront of your work. You will have the autonomy to drive your own research agenda while contributing to team goals and deliverables. About the team The Amazon Customer Service Science team is dedicated to revolutionizing customer support through advanced AI and machine learning. We are a diverse group of scientists and engineers working on some of the most challenging problems in natural language understanding and AI automation. Our team values innovation, collaboration, and a customer-obsessed mindset. We encourage experimentation, celebrate learning from failures, and are committed to maintaining Amazon's high bar for scientific rigor and operational excellence. You will have access to world-class computing resources, massive datasets, and the opportunity to work alongside some of the brightest minds in AI and machine learning.
US, MA, N.reading
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement whole body control methods for balance, locomotion, and dexterous manipulation - Utilize state-of-the-art in methods in learned and model-based control - Create robust and safe behaviors for different terrains and tasks - Implement real-time controllers with stability guarantees - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for loco-manipulation - Mentor junior engineer and scientists
US, CA, Sunnyvale
Amazon's AGI Information is seeking an exceptional Applied Scientist to drive science advancements in the Amazon Knowledge Graph team (AKG). AKG is re-inventing knowledge graphs for the LLM era, optimizing for LLM grounding. At the same time, AKG is innovating to utilize LLMs in the knowledge graph construction pipelines to overcome obstacles that traditional technologies could not overcome. As a member of the AKG IR team, you will have the opportunity to work on interesting problems with immediate customer impact. The team is addressing challenges in web-scale knowledge mining, fact verification, multilingual information retrieval, and agent memory operating over Graphs. You will also have the opportunity to work with scientists working on the other challenges, and with the engineering teams that deliver the science advancements to our customers. A successful candidate has a strong machine learning and agent background, is a master of state-of-the-art techniques, has a strong publication record, has a desire to push the envelope in one or more of the above areas, and has a track record of delivering to customers. The ideal candidate enjoys operating in dynamic environments, is self-motivated to take on new challenges, and enjoys working with customers, stakeholders, and engineering teams to deliver big customer impact, shipping solutions via rapid experimentation and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems. You will collaborate with applied scientists and engineers to develop novel algorithms and modeling techniques to build the knowledge graph that delivers fresh factual knowledge to our customers, and that automates the knowledge graph construction pipelines to scale to many billions of facts. Your first responsibility will be to solve entity resolution to enable conflating facts from multiple sources into a single graph entity for each real world entity. You will develop generic solutions that work fo all classes of data in AKG (e.g., people, places, movies, etc.), that cope with sparse, noisy data, that scale to hundreds of millions of entities, and that can handle streaming data. You will define a roadmap to make progress incrementally and you will insist on scientific rigor, leading by example.
US, CA, Sunnyvale
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As a Senior Applied Scientist, you will develop and improve machine learning systems that help robots perceive, reason, and act in real-world environments. You will leverage state-of-the-art models (open source and internal research), evaluate them on representative tasks, and adapt/optimize them to meet robustness, safety, and performance needs. You will invent new algorithms where gaps exist. You’ll collaborate closely with research, controls, hardware, and product-facing teams, and your outputs will be used by downstream teams to further customize and deploy on specific robot embodiments. Key job responsibilities As a Senior Applied Scientist in the Foundations Model team, you will: - Leverage state-of-the-art models for targeted tasks, environments, and robot embodiments through fine-tuning and optimization. - Execute rapid, rigorous experimentation with reproducible results and solid engineering practices, closing the gap between sim and real environments. - Build and run capability evaluations/benchmarks to clearly profile performance, generalization, and failure modes. - Contribute to the data and training workflow: collection/curation, dataset quality/provenance, and repeatable training recipes. - Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies.
JP, 13, Tokyo
Amazon.com strives to be Earth's most customer-centric company where people can find and discover anything they want to buy. We hire the world's brightest minds and offer them a fast-paced, technologically sophisticated, and collaborative work environment. We are seeking a talented, customer-focused Economist to join our JCI Measurement and Optimization Science Team (JCI MOST). In this role, you will design experiments and build econometric models to measure intervention impacts and deliver data-driven insights that inform leadership decisions. Amazon Economists leverage our world-class data systems to build sophisticated econometric models, drawing from diverse methodological approaches including econometric theory, empirical IO, empirical health, labor, and public economics—all highly valued skillsets at Amazon. You will work in a fast-moving environment solving critical business problems as part of cross-functional teams embedded within business units or our central science and economics organization. This role requires exceptional Causal Inference expertise, strong cross-functional collaboration skills, business acumen, and an entrepreneurial spirit to drive measurable improvements in our pricing quality and business outcomes.
CN, 31, Shanghai
As a Sr. Applied Scientist, you will be responsible for bringing new product designs through to manufacturing. You will work closely with multi-disciplinary groups including Product Design, Industrial Design, Hardware Engineering, and Operations, to drive key aspects of engineering of consumer electronics products. In this role, you will use expertise in physical sciences, theoretical, numerical or empirical techniques to create scalable models representing response of physical systems or devices, including: * Applying domain scientific expertise towards developing innovative analysis and tests to study viability of new materials, designs or processes * Working closely with engineering teams to drive validation, optimization and implementation of hardware design or software algorithmic solutions to improve product and customer risks * Establishing scalable, efficient, automated processes to handle large scale design and data analysis * Conducting research into use conditions, materials and analysis techniques * Tracking general business activity including device health in field and providing clear, compelling reports to management on a regular basis * Developing, implementing guidelines to continually optimize design processes * Using simulation tools like LS-DYNA, and Abaqus for analysis and optimization of product design * Using of programming languages like Python and Matlab for analytical/statistical analyses and automation * Demonstrating strong understanding across multiple physical science domains, e.g. structural, thermal, fluid dynamics, and materials * Developing, analyzing and testing structural solutions from concept design, feature development, product architecture, through system validation * Supporting product development and optimization through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques
US, WA, Redmond
Amazon Leo is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Communications Engineer in Modeling and Simulation, this role is primarily responsible for the developing and analyzing high level system resource allocation techniques for links to ensure optimal system and network performance from the capacity, coverage, power consumption, and availability point of view. Be part of the team defining the overall communication system and architecture of Amazon Leo’s broadband wireless network. This is a unique opportunity to innovate and define novel wireless technology with few legacy constraints. The team develops and designs the communication system of Leo and analyzes its overall system level performance, such as overall throughput, latency, system availability, packet loss, etc., as well as compatibility for both connectivity and interference mitigation with other space and terrestrial systems. This role in particular will be responsible for 1) evaluating complex multi-disciplinary trades involving RF bandwidth and network resource allocation to customers, 2) understanding and designing around hardware/software capabilities and constraints to support a dynamic network topology, 3) developing heuristic or solver-based algorithms to continuously improve and efficiently use available resources, 4) demonstrating their viability through detailed modeling and simulation, 5) working with operational teams to ensure they are implemented. This role will be part of a team developing the necessary simulation tools, with particular emphasis on coverage, capacity, latency and availability, considering the yearly growth of the satellite constellation and terrestrial network. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. Key job responsibilities • Work within a project team and take the responsibility for the Leo's overall communication system design and architecture • Extend existing code/tools and create simulation models representative of the target system, primarily in MATLAB • Design interconnection strategies between fronthaul and backhaul nodes. Analyze link availability, investigate link outages, and optimize algorithms to study and maximize network performance • Use RF and optical link budgets with orbital constellation dynamics to model time-varying system capacity • Conduct trade-off analysis to benefit customer experience and optimization of resources (costs, power, spectrum), including optimization of satellite constellation design and link selection • Work closely with implementation teams to simulate expected system level performance and provide quick feedback on potential improvements • Analyze and minimize potential self-interference or interference with other communication systems • Provide visualizations, document results, and communicate them across multi-disciplinary project teams to make key architectural decisions
US, WA, Seattle
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced electromechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. Amazon is seeking a talented and motivated Principal Applied Scientist to develop tactile sensors and guide the sensing strategy for our gripper design. The ideal candidate will have extensive experience in sensor development, analysis, testing and integration. This candidate must have the ability to work well both independently and in a multidisciplinary team setting. Key job responsibilities - Author functional requirements, design verification plans and test procedures - Develop design concepts which meet the requirements - Work with engineering team members to implement the concepts in a product design - Support product releases to manufacturing and customer deployments - Work efficiently to support aggressive schedules