Computing on private data

Both secure multiparty computation and differential privacy protect the privacy of data used in computation, but each has advantages in different contexts.

Many of today’s most innovative computation-based products and solutions are fueled by data. Where those data are private, it is essential to protect them and to prevent the release of information about data subjects, owners, or users to the wrong parties. How can we perform useful computations on sensitive data while preserving privacy?

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

We will revisit two well-studied approaches to this challenge: secure multiparty computation (MPC) and differential privacy (DP). MPC and DP were invented to address different real-world problems and to achieve different technical goals. However, because they are both aimed at using private information without fully revealing it, they are often confused. To help draw a distinction between the two approaches, we will discuss the power and limitations of both and give typical scenarios in which each can be highly effective.

We are interested in scenarios in which multiple individuals (sometimes, society as a whole) can derive substantial utility from a computation on private data but, in order to preserve privacy, cannot simply share all of their data with each other or with an external party.

Secure multiparty computation

MPC methods allow a group of parties to collectively perform a computation that involves all of their private data while revealing only the result of the computation. More formally, an MPC protocol enables n parties, each of whom possesses a private dataset, to compute a function of the union of their datasets in such a way that the only information revealed by the computation is the output of the function. Common situations in which MPC can be used to protect private interests include

  • auctions: the winning bid amount should be made public, but no information about the losing bids should be revealed;
  • voting: the number of votes cast for each option should be made public but not the vote cast by any one individual;
  • machine learning inference: secure two-party computation enables a client to submit a query to a server that holds a proprietary model and receive a response, keeping the query private from the server and the model private from the client.
Related content
New approach to homomorphic encryption speeds up the training of encrypted machine learning models sixfold.

Note that the number n of participants can be quite small (e.g., two in the case of machine learning inference), moderate in size, or very large; the latter two size ranges both occur naturally in auctions and votes. Similarly, the participants may be known to each other (as they would be, for example, in a departmental faculty vote) or not (as, for example, in an online auction). MPC protocols mathematically guarantee the secrecy of input values but do not attempt to hide the identities of the participants; if anonymous participation is desired, it can be achieved by combining MPC with an anonymous-communication protocol.

Although MPC may seem like magic, it is implementable and even practical using cryptographic and distributed-computing techniques. For example, suppose that Alice, Bob, Carlos, and David are four engineers who want to compare their annual raises. Alice selects four random numbers that sum to her raise. She keeps one number to herself and gives each of the other three to one of the other engineers. Bob, Carlos, and David do the same with their own raises.

Secure multiparty computation
Four engineers wish to compute their average raise, without revealing any one engineer's raise to the others. Each selects four numbers that sum to his or her raise and sends three of them to the other engineers. Each engineer then sums his or her four numbers — one private number and three received from the others. The sum of all four engineers' sums equals the sum of all four raises.

After everyone has distributed the random numbers, each engineer adds up the numbers he or she is holding and sends the sum to the others. Each engineer adds up these four sums privately (i.e., on his or her local machine) and divides by four to get the average raise. Now they can all compare their raises to the team average.


Amount

Alice’s share

Bob’s share

Carlos’s share

David’s share

Sum of sums

Alice’s raise

3800

-1000

2500

900

1400


Bob’s raise

2514

700

400

650

764


Carlos’s raise

2982

750

-100

832

1500


David’s raise

3390

1500

900

-3000

3990


Sum

12686

1950

3700

-618

7654

12686

Average

3171.5





3171.5

Note that, because Alice (like Bob, Carlos, and David) kept part of her raise private (the bold numbers), no one else learned her actual raise. When she summed the numbers she was holding, the sum didn’t correspond to anyone’s raise. In fact, Bob’s sum was negative, because all that matters is that the four chosen numbers add up to the raise; the sign and magnitude of these four numbers are irrelevant.

Summing all of the engineers’ sums results in the same value as summing the raises directly, namely $12,686. If all of the engineers follow this protocol faithfully, dividing this value by four yields the team average raise of $3,171.50, which allows each person to compare his or her raise against the team average (locally and hence privately) without revealing any salary information.

A highly readable introduction to MPC that emphasizes practical protocols, some of which have been deployed in real-world scenarios, can be found in a monograph by Evans, Kolesnikov, and Rosulek. Examples of real-world applications that have been deployed include analysis of gender-based wage gaps in Boston-area companies, aggregate adoption of cybersecurity measures, and Covid exposure notification. Readers may also wish to read our previous blog post on this and related topics.

Differential privacy

Differential privacy (DP) is a body of statistical and algorithmic techniques for releasing an aggregate function of a dataset without revealing the mapping between data contributors and data items. As in MPC, we have n parties, each of whom possesses a data item. Either the parties themselves or, more often, an external agent wishes to compute an aggregate function of the parties’ input data.

Related content
Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.

If this computation is performed in a differentially private manner, then no information that could be inferred from the output about the ith input, xi, can be associated with the individual party Pi. Typically, the number n of participants is very large, the participants are not known to each other, and the goal is to compute a statistical property of the set {x1, …, xn} while protecting the privacy of individual data contributors {P1, …, Pn}.

In slightly more detail, we say that a randomized algorithm M preserves differential privacy with respect to an aggregation function f if it satisfies two properties. First, for every set of input values, the output of M closely approximates the value of f. Second, for every distinct pair (xi, xi') of possible values for the ith individual input, the distribution of M(x1, …, xi,…, xn) is approximately equivalent to the distribution of M(x1, …, xi′, …, xn). The maximum “distance” between the two distributions is characterized by a parameter, ϵ, called the privacy parameter, and M is called an ϵ-differentially private algorithm.

Note that the output of a differentially private algorithm is a random variable drawn from a distribution on the range of the function f. That is because DP computation requires randomization; in particular, it works by “adding noise.” All known DP techniques introduce a salient trade-off between the privacy parameter and the utility of the output of the computation. Smaller values of ϵ produce better privacy guarantees, but they require more noise and hence produce less-accurate outputs; larger values of ϵ yield worse privacy bounds, but they require less noise and hence deliver better accuracy.

For example, consider a poll, the goal of which is to predict who is going to win an election. The pollster and respondents are willing to sacrifice some accuracy in order to improve privacy. Suppose respondents P1, …, Pn have predictions x1, …, xn, respectively, where each xi is either 0 or 1. The poll is supposed to output a good estimate of p, which we use to denote the fraction of the parties who predict 1. The DP framework allows us to compute an accurate estimate and simultaneously to preserve each respondent’s “plausible deniability” about his or her true prediction by requiring each respondent to add noise before sending a response to the pollster.

Related content
Private aggregation of teacher ensembles (PATE) leads to word error rate reductions of more than 26% relative to standard differential-privacy techniques.

We now provide a few more details of the polling example. Consider the algorithm m that takes as input a bit xi and flips a fair coin. If the coin comes up tails, then m outputs xi; otherwise m flips another fair coin and outputs 1 if heads and 0 if tails. This m is known as the randomized response mechanism; when the pollster asks Pi for a prediction, Pi responds with m(xi). Simple statistical calculation shows that, in the set of answers that the pollster receives from the respondents, the expected fraction that are 1’s is

Pr[First coin is tails] ⋅ p + Pr[First coin is heads] ⋅ Pr[Second coin is heads] = p/2 + 1/4.

Thus, the expected number of 1’s received is n(p/2 + 1/4). Let N = m(x1) + ⋅⋅⋅ + m(xn) denote the actual number of 1’s received; we approximate p by M(x1, …, xn) = 2N/n − 1/2. In fact, this approximation algorithm, M, is differentially private. Accuracy follows from the statistical calculation, and privacy follows from the “plausible deniability” provided by the fact that M outputs 1 with probability at least 1/4 regardless of the value of xi.

Differential privacy has dominated the study of privacy-preserving statistical computation since it was introduced in 2006 and is widely regarded as a fundamental breakthrough in both theory and practice. An excellent overview of algorithmic techniques in DP can be found in a monograph by Dwork and Roth. DP has been applied in many real-world applications, most notably the 2020 US Census.

The power and limitations of MPC and DP

We now review some of the strengths and weaknesses of these two approaches and highlight some key differences between them.

Secure multiparty computation

MPC has been extensively studied for more than 40 years, and there are powerful, general results showing that it can be done for all functions f using a variety of cryptographic and coding-theoretic techniques, system models, and adversary models.

Despite the existence of fully general, secure protocols, MPC has seen limited real-world deployment. One obstacle is protocol complexity — particularly the communication complexity of the most powerful, general solutions. Much current work on MPC addresses this issue.

Related content
A privacy-preserving version of the popular XGBoost machine learning algorithm would let customers feel even more secure about uploading sensitive data to the cloud.

More-fundamental questions that must be answered before MPC can be applied in a given scenario include the nature of the function f being computed and the information environment in which the computation is taking place. In order to explain this point, we first note that the set of participants in the MPC computation is not necessarily the same as the set of parties that receive the result of the computation. The two sets may be identical, one may be a proper subset of the other, they may have some (but not all) elements in common, or they may be entirely disjoint.

Although a secure MPC protocol (provably!) reveals nothing to the recipients about the private inputs except what can be inferred from the result, even that may be too much. For example, if the result is the number of votes for and votes against a proposition in a referendum, and the referendum passes unanimously, then the recipients learn exactly how each participant voted. The referendum authority can avoid revealing private information by using a different f, e.g., one that is “YES” if the number of votes for the proposition is at least half the number of participants and “NO” if it is less than half.

This simple example demonstrates a pervasive trade-off in privacy-preserving computation: participants can compute a function that is more informative if they are willing to reveal private information to the recipients in edge cases; they can achieve more privacy in edge cases if they are willing to compute a less informative function.

In addition to specifying the function f carefully, users of MPC must evaluate the information environment in which MPC is to be deployed and, in particular, must avoid the catastrophic loss of privacy that can occur when the recipients combine the result of the computation with auxiliary information. For example, consider the scenario in which the participants are all of the companies in a given commercial sector and metropolitan area, and they wish to use MPC to compute the total dollar loss that they (collectively) experienced in a given year that was attributable to data breaches; in this example, the recipients of the result are the companies themselves.

Related content
Scientists describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.

Suppose further that, during that year, one of the companies suffered a severe breach that was covered in the local media, which identified the company by name and reported an approximate dollar figure for the loss that the company suffered as a result of the breach. If that approximate figure is very close to the total loss imposed by data breaches on all the companies that year, then the participants can conclude that all but one of them were barely affected by data breaches that year.

Note that this potentially sensitive information is not leaked by the MPC protocol, which reveals nothing but the aggregate amount lost (i.e., the value of the function f). Rather, it is inferred by combining the result of the computation with information that was already available to the participants before the computation was done. The same risk that input privacy will be destroyed when results are combined with auxiliary information is posed by any computational method that reveals the exact value of the function f.

Differential privacy

The DP framework provides some elegant, simple mechanisms that can be applied to any function f whose output is a vector of real numbers. Essentially, one can independently perturb or “noise up” each component of f(x) by an appropriately defined random value. The amount of noise that must be added in order to hide the contribution (or, indeed, the participation) of any single data subject is determined by the privacy parameter and the maximum amount by which a single input can change the output of f. We explain one such mechanism in slightly more mathematical detail in the following paragraph.

One can apply the Laplace mechanism with privacy parameter ϵ to a function f, whose outputs are k-tuples of real numbers, by returning the value f(x1, …, xn) + (Y1, …, Yk) on input (x1, …, xn), where the Yi are independent random variables drawn from the Laplace distribution with parameter Δ(f)/ϵ. Here Δ(f) denotes the 1sensitivity of the function f, which captures the magnitude by which a single individual’s data can change the output of f in the worst case. The technical definition of the Laplace distribution is beyond the scope of this article, but for our purposes, its important property is that the Yi can be sampled efficiently.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

Crucially, DP protects data contributors against privacy loss caused by post-processing computational results or by combining results with auxiliary information. The scenario in which privacy loss occurred when the output of an MPC protocol was combined with information from an existing news story could not occur in a DP application; moreover, no harm could be done by combining the result of a DP computation with auxiliary information in a future news story.

DP techniques also benefit from powerful composition theorems that allow separate differentially private algorithms to be combined in one application. In particular, the independent use of an ϵ1-differentially private algorithm and an ϵ2-differentially private algorithm, when taken together, is (ϵ1 + ϵ2)-differentially private.

One limitation on the applicability of DP is the need to add noise — something that may not be tolerable in some application scenarios. More fundamentally, the ℓ1 sensitivity of a function f, which yields an upper bound on the amount of noise that must be added to the output in order to achieve a given privacy parameter ϵ, also yields a lower bound. If the output of f is strongly influenced by the presence of a single outlier in the input, then it is impossible to achieve strong privacy and high accuracy simultaneously.

For example, consider the simple case in which f is the sum of all of the private inputs, and each input is an arbitrary positive integer. It is easy to see that the ℓ1 sensitivity is unbounded in this case; to hide the contribution or the participation of an individual whose data item strongly dominates those of all other individuals would require enough noise to render the output meaningless. If one can restrict all of the private inputs to a small interval [a,b], however, then the Laplace mechanism can provide meaningful privacy and accuracy.

DP was originally designed to compute statistical aggregates while preserving the privacy of individual data subjects; in particular, it was designed with real-valued functions in mind. Since then, researchers have developed DP techniques for non-numerical computations. For example, the exponential mechanism can be used to solve selection problems, in which both input and output are of arbitrary type.

Related content
Amazon is helping develop standards for post-quantum cryptography and deploying promising technologies for customers to experiment with.

In specifying a selection problem, one must define a scoring function that maps input-output pairs to real numbers. For each input x, a solution y is better than a solution y′ if the score of (x,y) is greater than that of (x,y′). The exponential mechanism generally works well (i.e., achieves good privacy and good accuracy simultaneously) for selection problems (e.g., approval voting) that can be defined by scoring functions of low sensitivity but not for those (e.g., set intersection) in which the scoring function must have high sensitivity. In fact, there is no differentially private algorithm that works well for set intersection; by contrast, MPC for set intersection is a mature and practical technology that has seen real-world deployment.

Conclusion

In conclusion, both secure multiparty computation and differential privacy can be used to perform computations on sensitive data while preserving the privacy of those data. Important differences between the bodies of technique include

  • The nature of the privacy guarantee: Use of MPC to compute a function y = f(x1, x2, ..., xn) guarantees that the recipients of the result learn the output y and nothing more. For example, if there are exactly two input vectors that are mapped to y by f, the recipients of the output y gain no information about which of two was the actual input to the MPC computation, regardless of the number of components in which these two input vectors differ or the magnitude of the differences. On the other hand, for any third input vector that does not map to y, the recipient learns with certainty that the real input to the MPC computation was not this third vector, even if it differs from one of the first two in only one component and only by a very small amount. By contrast, computing f with a DP algorithm guarantees that, for any two input vectors that differ in only one component, the (randomized!) results of the computation are approximately indistinguishable, regardless of whether the exact values of f on these two input vectors are equal, nearly equal, or extremely different. Straightforward use of composition yields a privacy guarantee for inputs that differ in c components at the expense of increasing the privacy parameter by a factor of c.
  • Typical use cases: DP techniques are most often used to compute aggregate properties of very large datasets, and typically, the identities of data contributors are not known. None of these conditions is typical of MPC use cases.
  • Exact vs. noisy answers: MPC can be used to compute exact answers for all functions f. DP requires the addition of noise. This is not a problem in many statistical computations, but even small amounts of noise may not be acceptable in some application scenarios. Moreover, if f is extremely sensitive to outliers in the input data, the amount of noise needed to achieve meaningful privacy may preclude meaningful accuracy.
  • Auxiliary information: Combining the result of a DP computation with auxiliary information cannot result in privacy loss. By contrast, any computational method (including MPC) that returns the exact value y of a function f runs the risk that a recipient of y might be able to infer something about the input data that is not implied by y alone, if y is combined with auxiliary information.

Finally, we would like to point out that, in some applications, it is possible to get the benefits of both MPC and DP. If the goal is to compute f, and g is a differentially private approximation of f that achieves good privacy and accuracy simultaneously, then one natural way to proceed is to use MPC to compute g. We expect to see both MPC and DP used to enhance data privacy in Amazon’s products and services.

Related content

US, CA, San Francisco
Amazon Industrial Robotics is on a mission to redefine the future of automation — and we're looking for exceptional talent to help lead the way. We are building the next generation of advanced robotic systems that seamlessly blend cutting-edge AI, sophisticated control systems, and novel mechanical design to create adaptable, intelligent automation solutions capable of operating safely alongside humans in dynamic, real-world environments. At Amazon Industrial Robotics, we leverage the power of machine learning, artificial intelligence, and advanced robotics to solve some of the most complex operational challenges at a scale unlike anywhere else in the world. Our fleet of robots spans hundreds of facilities globally, working in sophisticated coordination to deliver on our promise of customer excellence — and we're just getting started. As a Sr. Applied Scientist in Robot Perception, you will be at the forefront of this transformation. You will develop and deploy state-of-the-art perception algorithms that enable robots to truly understand and interact with the physical world — bridging the gap between theoretical research and realworld impact. Bringing deep expertise in Computer Vision and a nuanced understanding of the capabilities and limitations of modern Vision-Language Models (VLMs), you will innovate boldly and push the boundaries of what's possible. Our vision for the Perception layer is ambitious: to enable seamless, intelligent interaction between the user, the robot, and its environment. This is a rare opportunity to work at the intersection of deep learning, large language models, and robotics — contributing to research that doesn't just advance the field, but reshapes it. You will collaborate with world-class teams pioneering breakthroughs in dexterous manipulation, locomotion, and humanrobot interaction, all at an unprecedented scale. Key job responsibilities Design, develop, and deploy perception algorithms for robotics systems, including object detection, segmentation, tracking, depth estimation, and scene understanding • Lead research initiatives in computer vision, sensor fusion and 3D perception • Collaborate with cross-functional teams including robotics engineers, software engineers, and product managers to define and deliver perception capabilities • Drive end-to-end ownership of ML models — from data collection and labeling strategy to training, evaluation, and deployment • Mentor junior scientists and engineers; contribute to a culture of technical excellence • Define and track key metrics to measure perception system performance in real-world environments • Publish research findings in top-tier venues (CVPR, ICCV, ECCV, ICRA, NeurIPS, etc.) and contribute to patents A day in the life Train ML models for deployment in simulation and real-world robots, identify and document their limitations post-deployment • Drive technical discussions within your team and with key stakeholders to develop innovative solutions to address identified limitations • Actively contribute to brainstorming sessions on adjacent topics, bringing fresh perspectives that help peers grow and succeed — and in doing so, build lasting trust across the team • Mentor team members while maintaining significant hands-on contribution to technical solutions About the team Our Industrial Robotics Group is a diverse group of scientists and engineers passionate about building intelligent machines. We value curiosity, rigor, and a bias for action. We believe in learning from failure and iterating quickly toward solutions that matter.
US, CA, Pasadena
The Amazon Center for Quantum Computing in Pasadena, CA, is looking to hire a Fabrication R&D Scientist with experience in semiconductor process development who will aid in Amazon’s effort to bring cloud quantum computing services to its worldwide customer base. You will join a multi-disciplinary team of scientists, and hardware and software engineers working at the forefront of quantum computing. Through your work inside and outside of the cleanroom environment in the fabrication research and development group, you will solve problems related to developing next-generation quantum processors. Candidates must have a demonstrated background in sound scientific and engineering principles, and must have excellent data analysis, bias for action, problem solving, and communication skills, and be highly motivated and curious to research and learn new technical topics as needed. As a Fab R&D scientist you will be expected to work on new ideas and stay abreast of novel approaches in fabricating and packaging superconducting quantum processors. Working effectively within a team environment is critical. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Export Control Requirement Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a US export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility. Key job responsibilities Responsibilities include developing and optimizing processes to fabricate high-coherence superconducting qubits; developing advanced 3DI interconnect and routing technologies for integrating superconducting quantum technologies; analyzing inline metrology and electrical test data; developing and maintaining integration documentation, design rules, and standard operating procedures; interacting with project leads to provide feedback that continuously improves different processes; staying updated with the latest advancements and industry trends in process integration and apply knowledge to improve processes and drive innovation providing technical guidance and support to junior colleagues, fostering a collaborative and knowledge-sharing work environment. A day in the life The candidate will develop novel technologies using micro-/nano-fabrication techniques inside the cleanroom (independently or in collaboration with other scientists, engineers, and technicians) for next-generation quantum computing. Outside the cleanroom, the candidate will plan experiments, analyze data, and conceive future innovations.
US, WA, Bellevue
The Supply Chain Optimization Technologies (SCOT) team builds technology to automate and optimize Amazon’s supply chain of physical goods. We seek a Data Scientist with strong analytical and communication skills to join our team. SCOT manages Amazon's inventory under uncertainty of demand, pricing, promotions, supply, vendor lead times, and product life cycle. We optimize complex trade-offs between customer experience, inventory costs, fulfillment costs, fulfillment center capacity, etc. We develop sophisticated algorithms that involve learning from large amounts of data such as prices, promotions, similar products, and other data from our product catalog in order to automatically act on millions of dollars’ worth of inventory weekly and establish plans for tens of thousands of employees. As a Data Scientist, you will contribute to the research community, by working with other scientists across Amazon and our Supply Chain, as well as collaborating with academic researchers and publishing papers both internally and externally. Key job responsibilities Major responsibilities include: - Analysis of large amounts of data from different parts of the supply chain and their associated business functions - Improving upon existing machine learning methodologies by developing new data sources, developing and testing model enhancements, running computational experiments, and fine-tuning model parameters for new models - Formalizing assumptions about how models are expected to behave, creating definitions of outliers, developing methods to systematically identify these outliers, and explaining why they are reasonable or identifying fixes for them - Communicating verbally and in writing to business customers with various levels of technical knowledge, educating them about our research, as well as sharing insights and recommendations - Utilizing code (Python, R, Scala, etc.) for analyzing data and building statistical and machine learning models and algorithms A day in the life As a Data Scientist in SCOT, you will be tasked to understand and work with innovative research tools to enable the implementation of sophisticated models on big data. As a successful data scientist in the SCOT team, you are an analytical problem solver who enjoys diving into data from various businesses, is excited about investigations and algorithms, can multi-task, and can credibly interface between scientists, engineers and business stakeholders. Your expertise in synthesizing and communicating insights and recommendations to audiences of varying levels of technical sophistication will enable you to answer specific business questions and innovate for the future. Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: - Medical, Dental, and Vision Coverage - Maternity and Parental Leave Options - Paid Time Off (PTO) - 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply!
US, CA, San Jose
Are you excited about using econometrics to make multi-million dollar decisions more Science and Data Driven? Are you interested in supporting Consumer Hardware device concepts from innovative idea inception to launch? Do you want to work on a Economics and Data Science team focused on tackling some of the hardest business questions within the Devices business at Amazon and then scaling those Statistics and Econometrics solutions via internal to Amazon tools? Then this could be the role for you! The Decision Science team owns demand estimates and pricing recommendations of concept devices before customers know they exist. We support analyses on hardware and services ranging from Echo Frames to Kindle Paperwhite to Blink Video Camera subscriptions to the Amazon Smart Plug - all prior to launch. In this role, you will develop science for high visible senior leadership decisions on new devices and services and work with a cross-functional team to apply and scale innovative science broadly. Key job responsibilities - Design, estimate, and scale Berry-Levinsohn-Pakes (BLP) random coefficients demand models to quantify consumer heterogeneity, own- and cross-price elasticities, and substitution patterns across large product markets. - Implement and optimize numerical routines—including GMM estimation, contraction mappings, and simulation-based inversion—to solve structural demand systems at scale in Python. - Develop and validate instrumental variables strategies to address price endogeneity in differentiated product markets, ensuring unbiased and robust demand parameter estimates. - Build production-grade pipelines that ingest large-scale observational datasets, estimate consumer preferences, and generate product-level demand forecasts on recurring schedules. - Collaborate with cross-functional teams including product management, marketing, and operations to translate structural model outputs—such as willingness-to-pay and competitive diversion ratios—into actionable pricing and portfolio strategies. - Advance the team's structural modeling capabilities by researching and deploying extensions to classical BLP frameworks (e.g., supply-side estimation, dynamic demand, micro-moments) and documenting approaches in clear technical reports.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next-level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Key job responsibilities * Develop, deploy, and operate scalable bioinformatics analysis workflows on AWS * Evaluate and incorporate novel bioinformatic approaches to solve critical business problems * Originate and lead the development of new data collection workflows with cross-functional partners * Partner with laboratory science teams on design and analysis of experiments About the team Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, NY, New York
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. Our products are used daily to surface new selection and provide customers a wider set of product choices along their shopping journeys. The business is focused on generating value for shoppers as well as advertisers. Our team uses a combination of econometrics, machine learning, and data science to build disruptive products for all our Advertising products. We also generate insights to guide Amazon Advertising strategy, providing direct support to senior leadership. We are looking for an experienced Economist with a deep passion for building econometric solutions and the ability to communicate data insights and scientific vision to execute on strategic projects. Key job responsibilities - Leverage econometrics and ML models to optimize advertising strategies on behalf of our customers. - Influence key business and product decisions based on insights from models you develop. - Perform hands-on analysis and modeling with enormous data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience. - Work closely with software engineers on detailed requirements to productionize the models you build. - Run A/B experiments that affect hundreds of millions of customers, evaluate the impact of your optimizations and communicate your results to various business stakeholders. - Work with other scientists, software developers, and product partners to implement your solutions.
US, WA, Seattle
RISC's vision is to make Amazon Earth’s most trusted shopping destination for safe and compliant products. We do this by protecting customers from products that are unsafe, illegal, illegally marketed, controversial or otherwise in violation of Amazon's policies while enabling our Selling Partners (SPs) to offer their broadest selection of safe and compliant products. We are seeking an exceptional Applied Scientist to join a team of experts in the field of agentic AI, GenAI, Machine Learning, Software Engineers, and work together to tackle challenging problems across diverse compliance domains. We leverage and train state-of-the-art large-language-models (LLMs), multi-modal model, mixed with elegant harness engineering and SKILL building to 1) detect illegal and unsafe products across the Amazon catalog; 2) automation safety and compliance content authoring; 3) reasoning over enforcement action to provide actionable insights to Amazon sellers. We work on machine learning problems for content generation, multi-modal classification, global product taxonomy, intent detection, information retrieval, anomaly and fraud detection, agentic AI, generative AI and multi-agent system. This is an exciting and challenging position to deliver scientific innovations into production systems at Amazon-scale to make immediate, meaningful customer impacts while also pursuing ambitious, long-term research. You will work in a highly collaborative environment where you can analyze and process large amounts of image, text, unstructured and tabular data. You will work on challenging science problems that have not been solved before, conduct rapid prototyping to validate your hypothesis, and deploy your algorithmic ideas at scale. There will be something new to learn every day as we work in an environment with rapidly evolving regulations and adversarial actors looking to outwit your best ideas. Key job responsibilities • Design and evaluate state-of-the-art algorithms and approaches in content generation, multi-modal classification, global product taxonomy, intent detection, information retrieval, anomaly and fraud detection, agentic AI, generative AI and multi-agent system. • Translate product and CX requirements into measurable science problems and metrics. • Collaborate with product and tech partners and customers to validate hypothesis, drive adoption, and increase business impact • Key author in writing high quality scientific papers in internal and external peer-reviewed conferences. A day in the life • Understanding customer problems, project timelines, and team/project mechanisms • Proposing science formulations and brainstorming ideas with team to solve business problems • Writing code, and running experiments with re-usable science libraries • Reviewing labels and audit results with investigators and operations associates • Sharing science results with science, product and tech partners and customers • Writing science papers for submission to peer-review venues, and reviewing science papers from other scientists in the team. • Contributing to team retrospectives for continuous improvements • Driving science research collaborations and attending study groups with scientists across Amazon
US, NY, New York
About Sponsored Products and Brands: The Sponsored Products and Brands (SPB) organization at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About Our Team: The Brand Beacon team is responsible for inventing impressions offerings for brands to increase share of voice via premium experiences, helping brands get discovered, acquire new customers and sustainably grow customer lifetime value. We build end-to-end solutions that enable brands to drive discovery, visibility and share of voice. This includes building advertiser controls, shopper experiences, monetization strategies and optimization features. We succeed when (1) shoppers discover, engage and build affinity with brands and (2) brands can grow their business at scale with our advertising products. About This Role: As a Senior Scientist for the team, you will have the opportunity to apply your deep subject matter expertise in the area of ML, LLM and GenAI models. You will invent new product experiences that enable novel advertiser and shopper experiences. This role will liaise with internal Amazon partners and work on bringing state-of-the-art GenAI models to production, and stay abreast of the latest developments in the space of GenAI and identify opportunities to improve the efficiency and productivity of the team. Additionally, you will define a long-term science vision for our advertising business, driven by our customer’s needs, and translate it into actionable plans for our team of applied scientists and engineers. This role will play a critical role in elevating the team’s scientific and technical rigor, identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. You will communicate learnings to leadership and mentor and grow Applied AI talent across org. * Develop AI solutions for advertiser and shopper experiences. Build monetization and optimization systems that leverage generative models to value and improve campaign performance. * Define a long-term science vision and roadmap for our advertising business, driven from our customers' needs, translating that direction into specific plans for applied scientists and engineering teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. * Design and conduct A/B experiments to evaluate proposed solutions based on in-depth data analyses. * Effectively communicate technical and non-technical ideas with teammates and stakeholders. * Stay up-to-date with advancements and the latest modeling techniques in the field. * Think big about the arc of development of Gen AI over a multi-year horizon and identify new opportunities to apply these technologies to solve real-world problems. #GenAI
US, WA, Seattle
Amazon's Stores-Ads Science team operates at the intersection of Amazon's Stores and advertising businesses. We develop causal measurement systems, optimization algorithms, and machine learning models that inform how advertising affects shopper engagement, driving selling partner growth and marketplace economics. Our science shapes decisions both at the strategic level and in production systems. We are a team of interdisciplinary scientists who combine causal inference, economic modeling, and machine learning to drive measurable business impact. We are looking for an Applied Science Manager to lead our Ads Impact initiative. This team owns the science of understanding and optimizing how advertising creates value for shoppers and selling partners. What makes this role distinctive is its position at the frontier of AI and Economics: as Amazon's shopping experience evolves from traditional search toward LLM-powered, agentic commerce, the fundamental mechanisms through which advertising creates value are changing. This role will partner with leading scientists and academic researchers to measure these effects through large-scale causal experimentation, and develop novel methods to encode causal and economic reasoning into AI systems that optimize the shopping experience. Key job responsibilities In this role, you will lead a team of scientists, setting the technical vision and science roadmap for ads impact measurement and optimization. You will design experiments that identify the causal mechanisms through which advertising drives shopper engagement, advertiser value, and marketplace outcomes. You will develop optimization algorithms that integrate these causal signals into production and business decision-making, in close partnership with engineering and product teams across the organization. You will lead the research and communicate findings and recommendations to senior leadership through written narratives that connect technical science to business strategy. This role requires deep expertise in causal inference and experimental design, combined with strong applied ML skills and the engineering judgment to translate research into production systems. You will hire and develop future science leaders, think strategically, set ambitious roadmaps in highly ambiguous problem spaces, and foster a culture that values both intellectual depth and production impact. You will work cross-functionally, influencing across organizational boundaries to drive alignment on complex, multi-sided tradeoffs.
US, NY, New York
The Ads Measurement Science team in the Measurement, Ad Tech, and Data Science (MADS) team of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of advertiser's ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We also own the science solutions for AI tools that unlock new insights and automate high-effort customer workflows, such as custom query and report generation based on natural language user requests. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As a Senior Research Scientist on the team, you will be at the forefront of innovation, developing measurement solutions end-to-end from inception to production. You will set the technical vision and innovate on behalf of our customers. You will propose, design, analyze, and productionize models to provide novel measurement insights to our customers. You will partner with engineering to deploy these solutions into production. You will work with key stakeholders from various business teams to enable advertisers to act upon those metrics. Key job responsibilities * Lead the development of ad measurement models and solutions that address the full spectrum of an advertiser's investment, focusing on scalable and efficient methodologies. * Collaborate closely with cross-functional teams including engineering, product management, and business teams to define and implement measurement solutions. * Use state-of-the-art scientific technologies including Generative AI, Classical Machine Learning, Causal Inference, Natural Language Processing, and Computer Vision to develop state of the art models that measure the impact of ad spend across multiple platforms and timescales. * Drive experimentation and the continuous improvement of ML models through iterative development, testing, and optimization. * Translate complex scientific challenges into clear and impactful solutions for business stakeholders. * Mentor and guide junior scientists, fostering a collaborative and high-performing team culture. * Foster collaborations between scientists to move faster, with broader impact. * Regularly engage with the broader scientific community with presentations, publications, and patents. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate business insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the advertising organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. Team video https://advertising.amazon.com/help/G4LNN5YWHP6SM9TJ About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.