Solomonic learning: Large language models and the art of induction

Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

“One year of research in neural networks is sufficient to believe in God.” The writing on the wall of John Hopfield’s lab at Caltech made no sense to me in 1992. Three decades later, and after years of building large language models, I see its sense if one replaces sufficiency with necessity: understanding neural networks as we teach them today requires believing in an immanent entity.

Stefano Soatto.png
Stefano Soatto, a vice president and distinguished scientist with Amazon Web Services.
Credit: UCLA Samueli

Let’s start from the basics: when we teach machine learning, we say that memorization is bad, because it leads to overfitting and prevents generalization. Generalization is good — so good that, to achieve it, we incentivize machines not to memorize, through “regularization”. We even prove theorems — so-called uniform generalization bounds — that guarantee generalization no matter what distribution the data are drawn from, provided we avoid memorization.

But my mother always told me not to generalize, and she had me commit to memory countless useless poems in elementary school. Why am I teaching that generalization is good and memorization is bad, when I was taught the opposite?

Biology vs. technology

Machine learning has historically drawn inspiration from biology. But biological systems have hard ontogenic and phylogenic memory bounds: our synapses cannot memorize everything we experience, and our DNA cannot transmit the knowledge we’ve accumulated to our descendants. (As an educator and father, I often wished I could upload what I have learned into my students and kids. I haven’t figured that one out, but can we at least do it for AI models?) Furthermore, biology imposes a strong evolutionary bias toward minimizing inference latency: when facing an animal in the wild and having to determine who’s whose meal, we can’t reason through all past memories lest the decision be made for us.

In other words, biological systems are forced to adopt inductive learning, using specific data from the past (or a “training set”) to devise a process for handling any future data. Success in inference from inductive learning (or more simply, induction) relies on the so-called inductive hypothesis, that past performance can guarantee future rewards (the primate species called “financial advisor” has evolved out of this belief).

Related content
New method leverages vision-language models to formalize a comparison that had previously required human judgment.

Technology does not have the limitations of biological systems: there are no hard memory bounds (we can always add more storage) and no hard computational bounds (we can fire up more computers), at least until we hit cosmic limits. If we accept that machines do not have the same limitations as biology, what is the best inference paradigm for them? That is, given a training set and a test query, how can they devise the best answer?[1] If we want our model to operate in the constantly evolving real world, we shouldn’t assume the existence of a single distribution from which all data are drawn, in principio, nunc, et semper.

Inference that allows processing the training data at inference time is called transductive inference, or transduction. Transduction calls for us to memorize and reason, unlike induction, which wants us to generalize and forget. To perform optimal inference with respect to any hypothetical distribution in the future, one must memorize past data and, only when presented with a specific query, deploy “reasoning” skills and access memory to compute the best possible answer to that query.

Induction calls for forgetting what does not matter during training, under the assumption that the training set is representative of all future data. But in reality, one cannot know what data will be useful when, so memorization is wise if one can afford it, even when the data — like the writing on John Hopfield’s lab’s wall — does not make sense in that moment.

Transductive inference from inductive learning

Uniform generalization bounds may seem powerful because they are valid for any distribution; but for them to work, there can be only one distribution from which both past and future data are independently sampled. Paraphrasing the statistician Bruno de Finetti, this distribution does not exist in any objective or material sense. It is an abstract concept, the product of our imagination. Something we concoct to guide our intuition and analysis.

Related content
In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.

The inductive hypothesis is fundamentally not verifiable: any finite training data could have been drawn with identical likelihood from infinitely many distributions, so even if there was a single true one, how would we know which? Once the present is past, we cannot repeat the experiment. The inductive hypothesis is a statement of faith and uniform generalization bounds an expression of hope, not quite within the scientific realm.

Don’t get me wrong: hope can pay off. The future often does resemble the past. But many of the mechanisms that generate the data we care about today, in business, finance, climate, and language, evolve over time. The same word can carry a different meaning today than it did a century, or even a decade, ago. The point is that whether the inductive hypothesis holds or not cannot be known ahead of time.

Solomonoff inference

What if we forgo generalization and embrace memorization and reasoning? Is that what LLMs are doing? If so, where are they heading? What does the limit of optimal transductive inference look like?

The answer was given in 1964 by the mathematician Ray Solomonoff and is now known, somewhat confusingly, as Solomonoff induction. I will refer to it as Solomonoff inference, which can be thought of as the limit of scaling laws when we allow memory, computational capacity, and time to grow to infinity.

Solomonoff inference is optimal with respect to all computable distributions, averaged with respect to the universal prior. The Church-Turing thesis predicates that any physically realizable mechanism belongs to this class. While infeasible in practice, since it requires infinite resources, Solomonoff’s algorithm is quite simple: execute all programs in increasing order of length until one manages to spit out all the data observed up to now, bit by bit, if it terminates.

Related content
The surprising dynamics related to learning that are common to artificial and biological systems.

The optimal algorithm is basically a lookup table with a switch. There is no insight, no knowledge, not even learning. If presented with the same query twice in a row, the optimal algorithm would repeat the same procedure all over, having learned nothing from past experience.

Solomonoff inference is quite unlike neural networks, which are trained by comparing gradient vectors in a high-dimensional space, where the data are embedded. But could it be that, as we scale LLMs to larger and larger sizes, their behavior is beginning to resemble Solomonoff inference? After all, LLMs are known to memorize, albeit imperfectly, and they can perform universal computation, at least if augmented with a scratchpad. Indeed, LLMs are already able to perform rudimentary transductive inference, now known as “in-context learning” — somewhat confusingly, as it involves no learning: if presented with the same context twice, an LLM would repeat the same process, with no improvement from experience.

So, if LLMs were to begin to perform Solomonoff inference, would they become “superintelligent”? Given no accepted definition of intelligence, let alone its superlatives, many tacitly assume inference performance as its proxy: “smarter” models (or students) perform better on tests, whether the SAT, GRE, or BAR, or the famed IMO math competition. The higher the score, the more “intelligent” the model must be! But the absolute best would be Solomonoff’s algorithm, and no matter what one’s definition of intelligence is, Solomonoff’s algorithm cannot meet it: if by mistake the IMO printed each question twice, Solomonoff’s algorithm would redo the same work twice, not exactly what most would call “intelligent” behavior.

As an analogy, an “inductive student” is a diligent pupil who studies the textbook and completes all homework assignments and practice problems before showing up at the exam. So long as the questions are close enough to practice problems, the inductive student does well. On the occasional odd (or out-of-distribution, as a believer in induction would say) question, the inductive student may not do as well.

By contrast, the “transductive student” does not study at all and instead shows up at the exam with the textbook in hand. Only after reading the first question does the transductive student go through the book to find all the pieces needed to assemble an answer. The student could, in principle, repeat the exercise all the way to the last question, learning nothing in the process. As Solomonoff showed us, there is no need to be smart if one has unbounded time, memory, and computational power.

Do we want models that perform well on benchmark exams, or is the kind of “intelligence” we want something else? Fortunately, inductive and transductive inference are not mutually exclusive. In fact, their difference is quite subtle, as one could frame either as a special case of the other, and the two coincide when the data are independently and identically distributed.

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

What matters is that LLMs are inductively trained transductive-inference engines and can therefore support both forms of inference.[2] They are capable of performing inference by inductive learning, like any trained classifier, akin to Daniel Kahneman’s “system 1” behavior — the fast thinking of his book title Thinking Fast and Slow. But LLMs are also capable of rudimentary forms of transduction, such as in-context-learning and chain of thought, which we may call system 2 — slow-thinking — behavior. The more sophisticated among us have even taught LLMs to do deduction — the ultimate test for their emergent abilities.

AI models’ inferential abilities are improving organically with scale — although they’re still inferior to those of the best humans on most tasks. But they are also being actively fostered through the use of formal-verification tools such as LEAN, as is happening at AWS. One could call this paradigm Solomonic learning: embrace memorization and foster reasoning, yet do not eschew induction. Simple tasks that might benefit from past experience can be solved inductively, saving time and energy, but doing so requires “understanding” and “insight”.

Given that paradigm, the question is what classes of models best support Solomonic learning.

Architectures for Solomonic learning

Solomonic learning requires models that can memorize and perform computation at inference time, in addition to performing ordinary induction. The model architectures therefore need eidetic (verbatim) working memory, which could fade over time, to support computation; but they also need long-term memory to easily retrieve facts from the distant past (the purpose for which humans invented the printing press).

To adapt to changing conditions, they need their long-term memory to decay in synchrony with changes to the mechanisms that generate the data they process. Evolution does that for biological agents, to the benefit of the species rather than any one individual. Transformers, the workhorses of current LLMs, have eidetic (verbatim) memory “in context”, but only until tokens slide out of context. They also have permanent memory “in weights”, but training data are not accessible eidetically from the weights, and there is no long-term adaptation. Eidetic long-term memory can be accessed through RAG (retrieval-augmented generation), but in current Transformers, RAG is not integrated into the primary (autoregressive) inference loop.

Stochastic realization theory and input-dependent state space models

Half a century ago, stochastic realization theory tackled the question of how to model sequential data for downstream decision or control tasks. The “state” of the model was defined as the function of past data that is sufficient for the future, meaning that, given the state, one can discard all past data and predict future data as well as if the data had been retained.

The trivial state is the data itself. An optimal state, by definition, supports an optimal predictor, which is one that makes the prediction error unpredictable. Then, by construction, the state contains all the “information” in past data. During training, the states of LLMs are their weights, so it should be no surprise that next-token prediction is the method of choice for training them. During inference, the state of a Transformer-based LLM is the sliding window of tokens, which is “deadbeat”, meaning that it decays to zero in finite steps without a driving input.

B'MOJO.jpg
In B’MOJO, a state-space model (SSM) computes a fading memory that represents long-range dependencies through a fixed-dimensional representation (pink). The eidetic memory, by contrast, selects tokens from the past (dark-blue x's) using an innovation test over the SSM output and appends them to the current sliding window. Adapted from "B'MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory".

In general, as we observe more and more data during both training and inference, the state must grow apace. In the 1970s, an unbounded state was unthinkable, so the key question was how to find a fixed-dimensional state that is optimal even as the data volume grows to infinity. Therefore, stochastic realization theory focused on Markov processes that admit a finite-dimensional state.

Since any finite-memory sequence could be modeled as the output of a linear model driven by white zero-mean Gaussian noise, the attention was all on linear state-space models (SSMs). While simplistic, such SSMs were good enough to take us to the moon. Today, an unbounded state is not unthinkable. Nonetheless, LLM weights are fixed after training, and the context size is imposed by hardware limitations. So we need richer architecture families.

As an aside, I wish to stress the distinction between the model, which is any state-space realization that supports optimal prediction (there are generally infinitely many), and the system, which is the “real” mechanism that generates the data. The system is unknown and unknowable; the model is tangible and entirely under our control. Although as engineers we are trained to believe that models of the world converge to the “true” system as they improve, this position — known in epistemology as "naïve realism" — is scientifically indefensible.[3]

Amazon’s Stefano Soatto on how learning representations came to dominate machine learning.

To stress the dichotomy between the system and the model, in 1979, Anders Lindqvist and Giorgio Picci derived an equation that, four decades later, is at the heart of diffusion models. In a dissipative physical system, time cannot be reversed, bu it can in a model of that system, for instance a Gaussian SSM. The structure of the reverse diffusion in the model is the same as the forward diffusion, a fact that is exploited in diffusion models for image generation.[4]

Unlike deadbeat Transformers, SSMs have unbounded memory, but it fades, making them incompatible with optimal transductive inference. Again in the 1970s, the late Roger Brockett triggered a burst of interest in input-dependent state-space models, where some of the parameters are affected by the input, the simplest case being when they interact (bi-)linearly with the state. Art Krener showed that such bilinear SSMs can approximate an arbitrarily complex nonlinear (smooth) model. Alberto Isidori and coworkers extended stochastic realization theory to bilinear models, but still with an eye to making the state as small as possible.

Even 30 years later, prior to the deep-learning revolution, when we used input-dependent SSMs to generate videos of dynamic textures, we were still focused on keeping the state dimension as small as possible, encouraged by the fact that 20 states were sufficient to animate and control the rendering of waterfalls, flames, smoke, foliage, talking faces, and other stationary processes. Thanks to the reversibility of the model, we could even make smoke or steam move faster, slower, or backwards!

Deep learning twisted Occam’s razor by trying to make the embedding dimension of the training state (the weights) as large as possible, not as small as possible. Dimension is only an upper bound on “information,” and the key to induction is to limit the “information” in, not the dimension of, the trained weights.[5] Two decades later, we stacked SSMs into a neural architecture by feeding the (input-dependent) prediction residual of one layer to the next.

A breakthrough came with Mamba, which showed that efficient implementation at the hardware level is key. When Mamba is stripped down (as it is in appendix E of our recent paper on architectures to support transductive inference), it is a stack of bilinear SSMs (which Mamba’s developers call “selective state-space models”) restricted to non-interacting states (diagonal dynamics), so it can be implemented efficiently in hardware.

Diagonal SSMs are disjoint from and complementary to Transformers. Autoregressive (AR) Transformers have nilpotent dynamics, meaning that the state transition matrix becomes zero in a finite number of steps in the absence of external input. Mamba has diagonal dynamics, and nilpotent matrices cannot be diagonalized. Diagonal SSMs support infinite fading memory; AR Transformers support finite eidetic memory, and neither is general. Instead, any general (bi-)linear system can be converted to a so-called canonical form, also derived in the 1970s, which can support both eidetic and fading memory.

Meet B’MOJO

B’MOJO is a family of architectures based on canonical realizations that include Transformers, Mamba-like SSMs, and any hybrid combination of the two. There are combinatorially many options, and the name of the game is to find those that are sufficiently general to support different memory regimes yet can be efficiently mapped to specific hardware in order to scale. We plan to release basic versions of B’MOJO both for GPU hardware and for Amazon’s Trainium hardware, so they can be easily compared with existing Transformers, SSMs, and hybrid architectures.

The writing on the wall

While a representation of the “true” system is fundamentally elusive, lending credence to the writing on the wall of John Hopfield’s lab back in 1992, building model realizations is a concrete exercise grounded in data. LLMs, where the “L” refers not to natural language but to the inner language that emerges in the trained model at scale, are stochastic realizations trained inductively as optimal predictors and coopted for (suboptimal) transductive inference and generation. If the training data subtend latent logical structures, as do sensory data such as visual or acoustic data, models trained as optimal predictors are forced to capture their statistical structure.

Related content
From the urgent challenge of "machine unlearning" to overcoming the problem of critical learning periods in deep neural networks, Alessandro Achille is tackling fundamental issues on behalf of Amazon customers.

Thus, LLMs in our parlance include so-called world models trained with visual, acoustic, olfactory, tactile, and other sensory data. The model is indifferent to whether tokenized data express some abstract concept in natural language or a physical measurement process in finite precision. The resulting LLMs can represent concepts and meanings, including physical concepts such as the laws of physics, and can in principle reason, although at present they appear to be mostly building ever bigger lookup tables. Regardless, as stochastic dynamical models, LLMs can be controlled, probed with causal interventions, made observable, and studied with the tools of dynamical-systems theory.

A model is an abstraction of the underlying world — not a representation of it, because there is no objective “it” to re-present, but a realization of it, made real through the only objective entity, which is the data. Synthetic data are just as real to the model as data produced by a physical measurement process, and aligning the two is the essence of perception, for this reason often referred to as controlled hallucination.

While much of the popular discourse denigrates hallucinations[6] as something to be avoided, the ability to hallucinate is necessary for reasoning. The question is not how to avoid hallucinations but how to control them, which is the process of alignment. Architectures designed for decision and control can help, and decades of work in dynamical systems and controls may provide insights — hopefully without the need to resort to divinity, as the writing on the wall suggested.

Footnotes

[1] Note that "best" does not mean "correct." If the data is insufficient to identify the correct conclusion, even the best answer can be wrong.

[2] The simplest form of inductive learning for transductive inference is transductive fine-tuning, a form of meta-learning: past data is used to "meta-train" a model that, at inference time, is fine-tuned with a small number of examples ("few shots") to perform a new task. LLMs take this program steps further, by using sequential data with a latent logical structure (not only natural language but also video, audio, and other signals) to produce an “inner language” (we call it "Neuralese") that can then be co-opted for transductive inference.

[3] Quoting Bertrand Russell: “We all start from 'naïve realism,' i.e., the doctrine that things are what they seem. ... The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself. Thus science seems to be at war with itself: when it most means to be objective, it finds itself plunged into subjectivity against its will. Naïve realism leads to physics, and physics, if true, shows that naïve realism is false. Therefore naïve realism, if true, is false; therefore it is false.” Even the International Vocabulary of Metrology has dispensed with the notion of “true value” in its most recent revisions.

[4] In the paper that introduced diffusion models for image generation, the reverse-diffusion equation was attributed to a 1949 work of Feller. However, forward diffusion in the form in use today was not derived until 1960, so neither was reverse diffusion. Later references attribute the reverse-diffusion equation to a 1982 paper by B. D. O. Anderson, which, however, did not introduce it but instead described it, based on the 1979 paper of Lindqvist and Picci, correctly referenced in Anderson’s work, and extended it to more general models different from those in use in diffusion models today. The correct reference for the reverse-diffusion equation used in diffusion models is therefore Lindqvist-Picci 1979.

[5] I use quotes because defining information for the weights of a trained model entails some subtleties, but it can be done.

[6] "Hallucinations" are data generated by a model that are statistically compatible with the training set (in the sense of high likelihood under the trained model), yet "wrong", i.e., individually inconsistent with constraints that some external oracle has deemed "true" ("facts", or "axioms"). In other words, hallucinations are the product of any generative model. Outside formalized domains such as math or code, there is no objective "truth", so the oracle is replaced by an accepted knowledge base, which depends on the application. For "common sense" knowledge, the base is generally a large corpus of (more or less) verified facts, such as WikiData. Outside formalized domains, including the law, there is no guarantee that the facts or "axioms" are mutually compatible.

Research areas

Related content

IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced ML systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real-world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning team for India Consumer Businesses. Machine Learning, Big Data and related quantitative sciences have been strategic to Amazon from the early years. Amazon has been a pioneer in areas such as recommendation engines, ecommerce fraud detection and large-scale optimization of fulfillment center operations. As Amazon has rapidly grown and diversified, the opportunity for applying machine learning has exploded. We have a very broad collection of practical problems where machine learning systems can dramatically improve the customer experience, reduce cost, and drive speed and automation. These include product bundle recommendations for millions of products, safeguarding financial transactions across by building the risk models, improving catalog quality via extracting product attribute values from structured/unstructured data for millions of products, enhancing address quality by powering customer suggestions We are developing state-of-the-art machine learning solutions to accelerate the Amazon India growth story. Amazon India is an exciting place to be at for a machine learning practitioner. We have the eagerness of a fresh startup to absorb machine learning solutions, and the scale of a mature firm to help support their development at the same time. As part of the India Machine Learning team, you will get to work alongside brilliant minds motivated to solve real-world machine learning problems that make a difference to millions of our customers. We encourage thought leadership and blue ocean thinking in ML. Key job responsibilities Use machine learning and analytical techniques to create scalable solutions for business problems Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes Design, develop, evaluate and deploy, innovative and highly scalable ML models Work closely with software engineering teams to drive real-time model implementations Work closely with business partners to identify problems and propose machine learning solutions Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model maintenance Work proactively with engineering teams and product managers to evangelize new algorithms and drive the implementation of large-scale complex ML models in production Leading projects and mentoring other scientists, engineers in the use of ML techniques About the team International Machine Learning Team is responsible for building novel ML solutions that attack India first (and other Emerging Markets across MENA and LatAm) problems and impact the bottom-line and top-line of India business. Learn more about our team from https://www.amazon.science/working-at-amazon/how-rajeev-rastogis-machine-learning-team-in-india-develops-innovations-for-customers-worldwide
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are forming a new organization within Prime Video to redefine our operational landscape through the power of artificial intelligence. As a Applied Scientist within this initiative, you will be a technical leader helping to design and build the intelligent systems that power our vision. You will tackle complex and ambiguous problems, designing and delivering scalable and resilient agentic AI and ML solutions from the ground up. You will not only write high-quality, maintainable software and models, but also mentor other scientists, influence our technical strategy, and drive engineering best practices across the team. Your work will directly contribute to making Prime Video's operations more efficient and will set the technical foundation for years to come. We're seeking candidates with strong experience in computer vision and generative AI technologies. In this role, you'll apply cutting-edge techniques in image and video understanding, visual content generation, and multimodal AI systems to transform how Prime Video operates at scale. Key job responsibilities • Lead the design and architecture of highly scalable, available, and resilient services for our AI automation platform. • Write high-quality, maintainable, and robust code to solve complex business problems, building flexible systems without over-engineering. • Act as a technical leader and mentor for other engineers on the team, assisting with career growth and encouraging excellence. • Work through ambiguous requirements, cut through complexity, and translate business needs into scalable technical solutions. • Take ownership of the full software development lifecycle, including design, testing, deployment, and operations. • Work closely with product managers, scientists, and other engineers to build and launch new features and systems. About the team This role offers a unique opportunity to shape the future of one of Amazon's most exciting businesses through the application of AI technologies. If you're passionate about leveraging AI to drive real-world impact at massive scale, we want to hear from you.
ES, B, Barcelona
Are you a scientist passionate about advancing the frontiers of computer vision, machine learning, or large language models? Do you want to work on innovative research projects that lead to innovative products and scientific publications? Would you value access to extensive datasets? If you answer yes to any of these questions, you'll find a great fit at Amazon. We're seeking a hands-on researcher eager to derive, implement, and test the next generation of Generative AI, computer vision, ML, and NLP algorithms. Our research is innovative, multidisciplinary, and far-reaching. We aim to define, deploy, and publish pioneering research that pushes the boundaries of what's possible. To achieve our vision, we think big and tackle complex technological challenges at the forefront of our field. Where technology doesn't exist, we create it. Where it does, we adapt it to function at Amazon's scale. We need team members who are passionate, curious, and willing to learn continuously. Key job responsibilities * Derive novel computer vision and ML models or LLMs/VLMs. * Design and develop scalable ML models. * Create and work with large datasets * Work with large GPU clusters. * Work closely with software engineering teams to deploy your innovations. * Publish your work at major conferences/journals. * Mentor team members in the use of your AI models. A day in the life As a Senior Applied Scientist at Amazon, your typical day might look like this: * Dive into coding, deriving new ML models for computer vision or NLP * Experiment with massive datasets on our GPU clusters * Brainstorm with your team to solve complex AI challenges * Collaborate with engineers to turn your research into real products * Write up your findings for publication in top journals or conferences * Mentor junior team members on AI concepts and implementation About the team DiscoVision, a science unit within Amazon's UPMT, focuses on advancing visual content capabilities through state-of-the-art AI technology. Our team specializes in developing state-of-the-art technologies in text-to-image/video Generative AI, 3D modeling, and multimodal Large Language Models (LLMs).
US, WA, Seattle
Are you excited to help customers discover the hottest and best reviewed products? The Discovery Tech team helps customers discover and engage with new, popular and relevant products across Amazon worldwide. We do this by combining technology, science, and innovation to build new customer-facing features and experiences alongside advanced tools for marketers. You will be responsible for creating and building critical services that automatically generate, target, and optimize Amazon’s cross-category marketing and merchandising. Through the enablement of intelligent marketing campaigns that leverage machine-learning models, you will help to deliver the best possible shopping experience for Amazon’s customers all over the globe. We are looking for analytical problem solvers who enjoy diving into data, excited about data science and statistics, can multi-task, and can credibly interface between engineering teams and business stakeholders. Your analytical abilities, business understanding, and technical savvy will be used to identify specific and actionable opportunities to solve existing business problems and look around corners for future opportunities. Your domain spans the design, development, testing, and deployment of data-driven and highly scalable machine learning solutions in product recommendation. As an Applied Scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. To know more about Amazon science, please visit https://www.amazon.science
IN, TS, Hyderabad
Do you want to join an innovative team of scientists who leverage machine learning and statistical techniques to revolutionize how businesses discover and purchase products on Amazon? Are you passionate about building intelligent systems that understand and predict complex B2B customer needs? The Amazon Business team is looking for exceptional Applied Science to help shape the future of B2B commerce. Amazon Business is one of Amazon's fastest-growing initiatives focused on serving business customers, from individual professionals to large institutions, with unique and complex purchasing needs. Our customers require sophisticated solutions that go beyond traditional B2C experiences, including bulk purchasing, approval workflows, and business-grade service support. The AB-MSET Applied Science team focuses on building intelligent systems for delivering personalized, contextual service experiences throughout the customer lifecycle. We apply advanced machine learning techniques to develop sophisticated intent detection models for business customer service needs, create intelligent matching algorithms for optimal service routing based on multiple variables including customer value, maturity, effort, and issue complexity, build predictive models to enable proactive service interventions, design recommendation systems for self-service solutions, and develop ML models for automated service resolution. As an Applied Scientist on the team, you will design and develop state-of-the-art ML models for service intent classification, routing optimization, and customer experience personalization. You will analyze large-scale business customer interaction data to identify patterns and opportunities for automation, create scalable solutions for complex B2B service scenarios using advanced ML techniques, and work closely with engineering teams to implement and deploy models in production. You will collaborate with business stakeholders to identify opportunities for ML applications, establish automated processes for model development, validation, and maintenance, lead research initiatives to advance the state-of-the-art in B2B service science, and mentor other scientists and engineers in applying ML techniques to business problems.
US, WA, Seattle
We are seeking a Principal Applied Scientist to lead research and development in automated reasoning, formal verification, and program analysis. You will drive innovation in making formal methods practical and accessible for real-world systems at cloud scale. Key job responsibilities - Lead research initiatives in automated reasoning, formal verification, SMT solving, model checking, or program analysis - Design and implement novel algorithms and techniques that advance the state of the art - Mentor and guide applied scientists, research scientists, and engineers - Collaborate with product teams to transition research into production systems - Define technical vision and strategy for automated reasoning initiatives - Represent AWS in the academic and research community - Drive cross-organizational impact through technical leadership About the team The Automated Reasoning Group at AWS develops and applies cutting-edge formal methods and automated reasoning techniques to ensure the security, reliability, and correctness of AWS services and customer applications. Our work innovates tools and services to perform verification at scale and apply them to build safe and secure systems at AWS. We are also pioneering the use of formal verification and automated reasoning to develop agentic systems, ensuring AI agents operate within defined safety boundaries.
US, WA, Bellevue
As an Applied Scientist on our Central Learning Solutions Team, you will play a critical role in driving the design, development, and delivery of learning programs and initiatives aimed at enhancing leadership and associate development within the organization. You will leverage your expertise in learning science, data analysis, and statistical model design to create impactful learning journey roadmap that align with organizational goals and priorities. Key job responsibilities Research and Analysis: - Conduct research on learning and development trends, theories, and best practices related to leadership and associate development - Analyze data to identify learning needs, performance gaps, and opportunities for improvement within the organization. - Use data-driven insights to inform the design and implementation of learning interventions. Program Design and Development: - Collaborate with cross-functional teams to develop comprehensive learning programs focused on leadership development and associate growth - Design learning experiences using evidence-based instructional strategies, adult learning principles, and innovative technologies. - Create engaging and interactive learning materials, including e-learning modules, instructor-led workshops, and multimedia resources. Evaluation and Continuous Improvement: - Develop evaluation frameworks to assess the effectiveness and impact of learning programs on leadership development and associate performance - Collect and analyze feedback from participants and stakeholders to identify strengths, areas for improvement, and future learning needs. - Iterate on learning interventions based on evaluation results and feedback to continuously improve program outcomes Thought Leadership and Collaboration: - Serve as a subject matter expert on learning science, instructional design, and leadership development within the organization - Collaborate with stakeholders across the company to align learning initiatives with strategic priorities and business objectives - Share knowledge and best practices with colleagues to foster a culture of continuous learning and development.
US, WA, Bellevue
Amazon Leo is an initiative to increase global broadband access through a constellation of 3,236 satellites in low Earth orbit (LEO). Its mission is to bring fast, affordable broadband to unserved and underserved communities around the world. Amazon Leo will help close the digital divide by delivering fast, affordable broadband to a wide range of customers, including consumers, businesses, government agencies, and other organizations operating in places without reliable connectivity. Do you get excited by aerospace, space exploration, and/or satellites? Do you want to help build solutions at Amazon Leo to transform the space industry? If so, then we would love to talk! Key job responsibilities Work cross-functionally with product, business development, and various technical teams (engineering, science, simulations, etc.) to execute on the long-term vision, strategy, and architecture for the science-based global demand forecast. Design and deliver modern, flexible, scalable solutions to integrate data from a variety of sources and systems (both internal and external) and develop Bandwidth Usage models at granular temporal and geographic grains, deployable to Leo traffic management systems. Work closely with the capacity planning science team to ensure that demand forecasts feed seamlessly into their systems to deliver continuous optimization of resources. Lead short and long terms technical roadmap definition efforts to deliver solutions that meet business needs in pre-launch, early-launch, and mature business phases. Synthesize and communicate insights and recommendations to audiences of varying levels of technical sophistication to drive change across Amazon Leo. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. About the team The Amazon Leo Global Demand Planning team's mission is to map customer demand across space and time. We enable Amazon Leo's long-term success by delivering actionable insights and scientific forecasts across geographies and customer segments to empower long range planning, capacity simulations, business strategy, and hardware manufacturing recommendations through scalable tools and durable mechanisms.
US, CA, Pasadena
Do you enjoy solving challenging problems and driving innovations in research? As a Research Science intern with the Quantum Algorithms Team at CQC, you will work alongside global experts to develop novel quantum algorithms, evaluate prospective applications of fault-tolerant quantum computers, and strengthen the long-term value proposition of quantum computing. A strong candidate will have experience applying methods of mathematical and numerical analysis to assess the performance of quantum algorithms and establish their advantage over classical algorithms. Key job responsibilities We are particularly interested in candidates with expertise in any of the following subareas related to quantum algorithms: quantum chemistry, many-body physics, quantum machine learning, cryptography, optimization theory, quantum complexity theory, quantum error correction & fault tolerance, quantum sensing, and scientific computing, among others. A day in the life Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. This is not a remote internship opportunity. About the team Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of theoretical and experimental physicists, materials scientists, and hardware and software engineers on a mission to develop a fault-tolerant quantum computer.
US, CA, Pasadena
We’re on the lookout for the curious, those who think big and want to define the world of tomorrow. At Amazon, you will grow into the high impact, visionary person you know you’re ready to be. Every day will be filled with exciting new challenges, developing new skills, and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. The Amazon Web Services (AWS) Center for Quantum Computing (CQC) in Pasadena, CA, is looking for a Quantum Research Scientist Intern in the Device and Architecture Theory group. You will be joining a multi-disciplinary team of scientists, engineers, and technicians, all working at the forefront of quantum computing to innovate for the benefit of our customers. Key job responsibilities As an intern with the Device and Architecture Theory team, you will conduct pathfinding theoretical research to inform the development of next-generation quantum processors. Potential focus areas include device physics of superconducting circuits, novel qubits and gate schemes, and physical implementations of error-correcting codes. You will work closely with both theorists and experimentalists to explore these directions. We are looking for candidates with excellent problem-solving and communication skills who are eager to work collaboratively in a team environment. Amazon Science gives you insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in quantum computing and related fields. Our scientists continue to publish, teach, and engage with the academic community, in addition to utilizing our working backwards method to enrich the way we live and work. A day in the life Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS. Within AWS UC, Amazon Dedicated Cloud (ADC) roles engage with AWS customers who require specialized security solutions for their cloud services. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a US export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility.