Computing on private data

Both secure multiparty computation and differential privacy protect the privacy of data used in computation, but each has advantages in different contexts.

Many of today’s most innovative computation-based products and solutions are fueled by data. Where those data are private, it is essential to protect them and to prevent the release of information about data subjects, owners, or users to the wrong parties. How can we perform useful computations on sensitive data while preserving privacy?

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

We will revisit two well-studied approaches to this challenge: secure multiparty computation (MPC) and differential privacy (DP). MPC and DP were invented to address different real-world problems and to achieve different technical goals. However, because they are both aimed at using private information without fully revealing it, they are often confused. To help draw a distinction between the two approaches, we will discuss the power and limitations of both and give typical scenarios in which each can be highly effective.

We are interested in scenarios in which multiple individuals (sometimes, society as a whole) can derive substantial utility from a computation on private data but, in order to preserve privacy, cannot simply share all of their data with each other or with an external party.

Secure multiparty computation

MPC methods allow a group of parties to collectively perform a computation that involves all of their private data while revealing only the result of the computation. More formally, an MPC protocol enables n parties, each of whom possesses a private dataset, to compute a function of the union of their datasets in such a way that the only information revealed by the computation is the output of the function. Common situations in which MPC can be used to protect private interests include

  • auctions: the winning bid amount should be made public, but no information about the losing bids should be revealed;
  • voting: the number of votes cast for each option should be made public but not the vote cast by any one individual;
  • machine learning inference: secure two-party computation enables a client to submit a query to a server that holds a proprietary model and receive a response, keeping the query private from the server and the model private from the client.
Related content
New approach to homomorphic encryption speeds up the training of encrypted machine learning models sixfold.

Note that the number n of participants can be quite small (e.g., two in the case of machine learning inference), moderate in size, or very large; the latter two size ranges both occur naturally in auctions and votes. Similarly, the participants may be known to each other (as they would be, for example, in a departmental faculty vote) or not (as, for example, in an online auction). MPC protocols mathematically guarantee the secrecy of input values but do not attempt to hide the identities of the participants; if anonymous participation is desired, it can be achieved by combining MPC with an anonymous-communication protocol.

Although MPC may seem like magic, it is implementable and even practical using cryptographic and distributed-computing techniques. For example, suppose that Alice, Bob, Carlos, and David are four engineers who want to compare their annual raises. Alice selects four random numbers that sum to her raise. She keeps one number to herself and gives each of the other three to one of the other engineers. Bob, Carlos, and David do the same with their own raises.

Secure multiparty computation
Four engineers wish to compute their average raise, without revealing any one engineer's raise to the others. Each selects four numbers that sum to his or her raise and sends three of them to the other engineers. Each engineer then sums his or her four numbers — one private number and three received from the others. The sum of all four engineers' sums equals the sum of all four raises.

After everyone has distributed the random numbers, each engineer adds up the numbers he or she is holding and sends the sum to the others. Each engineer adds up these four sums privately (i.e., on his or her local machine) and divides by four to get the average raise. Now they can all compare their raises to the team average.


Amount

Alice’s share

Bob’s share

Carlos’s share

David’s share

Sum of sums

Alice’s raise

3800

-1000

2500

900

1400


Bob’s raise

2514

700

400

650

764


Carlos’s raise

2982

750

-100

832

1500


David’s raise

3390

1500

900

-3000

3990


Sum

12686

1950

3700

-618

7654

12686

Average

3171.5





3171.5

Note that, because Alice (like Bob, Carlos, and David) kept part of her raise private (the bold numbers), no one else learned her actual raise. When she summed the numbers she was holding, the sum didn’t correspond to anyone’s raise. In fact, Bob’s sum was negative, because all that matters is that the four chosen numbers add up to the raise; the sign and magnitude of these four numbers are irrelevant.

Summing all of the engineers’ sums results in the same value as summing the raises directly, namely $12,686. If all of the engineers follow this protocol faithfully, dividing this value by four yields the team average raise of $3,171.50, which allows each person to compare his or her raise against the team average (locally and hence privately) without revealing any salary information.

A highly readable introduction to MPC that emphasizes practical protocols, some of which have been deployed in real-world scenarios, can be found in a monograph by Evans, Kolesnikov, and Rosulek. Examples of real-world applications that have been deployed include analysis of gender-based wage gaps in Boston-area companies, aggregate adoption of cybersecurity measures, and Covid exposure notification. Readers may also wish to read our previous blog post on this and related topics.

Differential privacy

Differential privacy (DP) is a body of statistical and algorithmic techniques for releasing an aggregate function of a dataset without revealing the mapping between data contributors and data items. As in MPC, we have n parties, each of whom possesses a data item. Either the parties themselves or, more often, an external agent wishes to compute an aggregate function of the parties’ input data.

Related content
Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.

If this computation is performed in a differentially private manner, then no information that could be inferred from the output about the ith input, xi, can be associated with the individual party Pi. Typically, the number n of participants is very large, the participants are not known to each other, and the goal is to compute a statistical property of the set {x1, …, xn} while protecting the privacy of individual data contributors {P1, …, Pn}.

In slightly more detail, we say that a randomized algorithm M preserves differential privacy with respect to an aggregation function f if it satisfies two properties. First, for every set of input values, the output of M closely approximates the value of f. Second, for every distinct pair (xi, xi') of possible values for the ith individual input, the distribution of M(x1, …, xi,…, xn) is approximately equivalent to the distribution of M(x1, …, xi′, …, xn). The maximum “distance” between the two distributions is characterized by a parameter, ϵ, called the privacy parameter, and M is called an ϵ-differentially private algorithm.

Note that the output of a differentially private algorithm is a random variable drawn from a distribution on the range of the function f. That is because DP computation requires randomization; in particular, it works by “adding noise.” All known DP techniques introduce a salient trade-off between the privacy parameter and the utility of the output of the computation. Smaller values of ϵ produce better privacy guarantees, but they require more noise and hence produce less-accurate outputs; larger values of ϵ yield worse privacy bounds, but they require less noise and hence deliver better accuracy.

For example, consider a poll, the goal of which is to predict who is going to win an election. The pollster and respondents are willing to sacrifice some accuracy in order to improve privacy. Suppose respondents P1, …, Pn have predictions x1, …, xn, respectively, where each xi is either 0 or 1. The poll is supposed to output a good estimate of p, which we use to denote the fraction of the parties who predict 1. The DP framework allows us to compute an accurate estimate and simultaneously to preserve each respondent’s “plausible deniability” about his or her true prediction by requiring each respondent to add noise before sending a response to the pollster.

Related content
Private aggregation of teacher ensembles (PATE) leads to word error rate reductions of more than 26% relative to standard differential-privacy techniques.

We now provide a few more details of the polling example. Consider the algorithm m that takes as input a bit xi and flips a fair coin. If the coin comes up tails, then m outputs xi; otherwise m flips another fair coin and outputs 1 if heads and 0 if tails. This m is known as the randomized response mechanism; when the pollster asks Pi for a prediction, Pi responds with m(xi). Simple statistical calculation shows that, in the set of answers that the pollster receives from the respondents, the expected fraction that are 1’s is

Pr[First coin is tails] ⋅ p + Pr[First coin is heads] ⋅ Pr[Second coin is heads] = p/2 + 1/4.

Thus, the expected number of 1’s received is n(p/2 + 1/4). Let N = m(x1) + ⋅⋅⋅ + m(xn) denote the actual number of 1’s received; we approximate p by M(x1, …, xn) = 2N/n − 1/2. In fact, this approximation algorithm, M, is differentially private. Accuracy follows from the statistical calculation, and privacy follows from the “plausible deniability” provided by the fact that M outputs 1 with probability at least 1/4 regardless of the value of xi.

Differential privacy has dominated the study of privacy-preserving statistical computation since it was introduced in 2006 and is widely regarded as a fundamental breakthrough in both theory and practice. An excellent overview of algorithmic techniques in DP can be found in a monograph by Dwork and Roth. DP has been applied in many real-world applications, most notably the 2020 US Census.

The power and limitations of MPC and DP

We now review some of the strengths and weaknesses of these two approaches and highlight some key differences between them.

Secure multiparty computation

MPC has been extensively studied for more than 40 years, and there are powerful, general results showing that it can be done for all functions f using a variety of cryptographic and coding-theoretic techniques, system models, and adversary models.

Despite the existence of fully general, secure protocols, MPC has seen limited real-world deployment. One obstacle is protocol complexity — particularly the communication complexity of the most powerful, general solutions. Much current work on MPC addresses this issue.

Related content
A privacy-preserving version of the popular XGBoost machine learning algorithm would let customers feel even more secure about uploading sensitive data to the cloud.

More-fundamental questions that must be answered before MPC can be applied in a given scenario include the nature of the function f being computed and the information environment in which the computation is taking place. In order to explain this point, we first note that the set of participants in the MPC computation is not necessarily the same as the set of parties that receive the result of the computation. The two sets may be identical, one may be a proper subset of the other, they may have some (but not all) elements in common, or they may be entirely disjoint.

Although a secure MPC protocol (provably!) reveals nothing to the recipients about the private inputs except what can be inferred from the result, even that may be too much. For example, if the result is the number of votes for and votes against a proposition in a referendum, and the referendum passes unanimously, then the recipients learn exactly how each participant voted. The referendum authority can avoid revealing private information by using a different f, e.g., one that is “YES” if the number of votes for the proposition is at least half the number of participants and “NO” if it is less than half.

This simple example demonstrates a pervasive trade-off in privacy-preserving computation: participants can compute a function that is more informative if they are willing to reveal private information to the recipients in edge cases; they can achieve more privacy in edge cases if they are willing to compute a less informative function.

In addition to specifying the function f carefully, users of MPC must evaluate the information environment in which MPC is to be deployed and, in particular, must avoid the catastrophic loss of privacy that can occur when the recipients combine the result of the computation with auxiliary information. For example, consider the scenario in which the participants are all of the companies in a given commercial sector and metropolitan area, and they wish to use MPC to compute the total dollar loss that they (collectively) experienced in a given year that was attributable to data breaches; in this example, the recipients of the result are the companies themselves.

Related content
Scientists describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.

Suppose further that, during that year, one of the companies suffered a severe breach that was covered in the local media, which identified the company by name and reported an approximate dollar figure for the loss that the company suffered as a result of the breach. If that approximate figure is very close to the total loss imposed by data breaches on all the companies that year, then the participants can conclude that all but one of them were barely affected by data breaches that year.

Note that this potentially sensitive information is not leaked by the MPC protocol, which reveals nothing but the aggregate amount lost (i.e., the value of the function f). Rather, it is inferred by combining the result of the computation with information that was already available to the participants before the computation was done. The same risk that input privacy will be destroyed when results are combined with auxiliary information is posed by any computational method that reveals the exact value of the function f.

Differential privacy

The DP framework provides some elegant, simple mechanisms that can be applied to any function f whose output is a vector of real numbers. Essentially, one can independently perturb or “noise up” each component of f(x) by an appropriately defined random value. The amount of noise that must be added in order to hide the contribution (or, indeed, the participation) of any single data subject is determined by the privacy parameter and the maximum amount by which a single input can change the output of f. We explain one such mechanism in slightly more mathematical detail in the following paragraph.

One can apply the Laplace mechanism with privacy parameter ϵ to a function f, whose outputs are k-tuples of real numbers, by returning the value f(x1, …, xn) + (Y1, …, Yk) on input (x1, …, xn), where the Yi are independent random variables drawn from the Laplace distribution with parameter Δ(f)/ϵ. Here Δ(f) denotes the 1sensitivity of the function f, which captures the magnitude by which a single individual’s data can change the output of f in the worst case. The technical definition of the Laplace distribution is beyond the scope of this article, but for our purposes, its important property is that the Yi can be sampled efficiently.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

Crucially, DP protects data contributors against privacy loss caused by post-processing computational results or by combining results with auxiliary information. The scenario in which privacy loss occurred when the output of an MPC protocol was combined with information from an existing news story could not occur in a DP application; moreover, no harm could be done by combining the result of a DP computation with auxiliary information in a future news story.

DP techniques also benefit from powerful composition theorems that allow separate differentially private algorithms to be combined in one application. In particular, the independent use of an ϵ1-differentially private algorithm and an ϵ2-differentially private algorithm, when taken together, is (ϵ1 + ϵ2)-differentially private.

One limitation on the applicability of DP is the need to add noise — something that may not be tolerable in some application scenarios. More fundamentally, the ℓ1 sensitivity of a function f, which yields an upper bound on the amount of noise that must be added to the output in order to achieve a given privacy parameter ϵ, also yields a lower bound. If the output of f is strongly influenced by the presence of a single outlier in the input, then it is impossible to achieve strong privacy and high accuracy simultaneously.

For example, consider the simple case in which f is the sum of all of the private inputs, and each input is an arbitrary positive integer. It is easy to see that the ℓ1 sensitivity is unbounded in this case; to hide the contribution or the participation of an individual whose data item strongly dominates those of all other individuals would require enough noise to render the output meaningless. If one can restrict all of the private inputs to a small interval [a,b], however, then the Laplace mechanism can provide meaningful privacy and accuracy.

DP was originally designed to compute statistical aggregates while preserving the privacy of individual data subjects; in particular, it was designed with real-valued functions in mind. Since then, researchers have developed DP techniques for non-numerical computations. For example, the exponential mechanism can be used to solve selection problems, in which both input and output are of arbitrary type.

Related content
Amazon is helping develop standards for post-quantum cryptography and deploying promising technologies for customers to experiment with.

In specifying a selection problem, one must define a scoring function that maps input-output pairs to real numbers. For each input x, a solution y is better than a solution y′ if the score of (x,y) is greater than that of (x,y′). The exponential mechanism generally works well (i.e., achieves good privacy and good accuracy simultaneously) for selection problems (e.g., approval voting) that can be defined by scoring functions of low sensitivity but not for those (e.g., set intersection) in which the scoring function must have high sensitivity. In fact, there is no differentially private algorithm that works well for set intersection; by contrast, MPC for set intersection is a mature and practical technology that has seen real-world deployment.

Conclusion

In conclusion, both secure multiparty computation and differential privacy can be used to perform computations on sensitive data while preserving the privacy of those data. Important differences between the bodies of technique include

  • The nature of the privacy guarantee: Use of MPC to compute a function y = f(x1, x2, ..., xn) guarantees that the recipients of the result learn the output y and nothing more. For example, if there are exactly two input vectors that are mapped to y by f, the recipients of the output y gain no information about which of two was the actual input to the MPC computation, regardless of the number of components in which these two input vectors differ or the magnitude of the differences. On the other hand, for any third input vector that does not map to y, the recipient learns with certainty that the real input to the MPC computation was not this third vector, even if it differs from one of the first two in only one component and only by a very small amount. By contrast, computing f with a DP algorithm guarantees that, for any two input vectors that differ in only one component, the (randomized!) results of the computation are approximately indistinguishable, regardless of whether the exact values of f on these two input vectors are equal, nearly equal, or extremely different. Straightforward use of composition yields a privacy guarantee for inputs that differ in c components at the expense of increasing the privacy parameter by a factor of c.
  • Typical use cases: DP techniques are most often used to compute aggregate properties of very large datasets, and typically, the identities of data contributors are not known. None of these conditions is typical of MPC use cases.
  • Exact vs. noisy answers: MPC can be used to compute exact answers for all functions f. DP requires the addition of noise. This is not a problem in many statistical computations, but even small amounts of noise may not be acceptable in some application scenarios. Moreover, if f is extremely sensitive to outliers in the input data, the amount of noise needed to achieve meaningful privacy may preclude meaningful accuracy.
  • Auxiliary information: Combining the result of a DP computation with auxiliary information cannot result in privacy loss. By contrast, any computational method (including MPC) that returns the exact value y of a function f runs the risk that a recipient of y might be able to infer something about the input data that is not implied by y alone, if y is combined with auxiliary information.

Finally, we would like to point out that, in some applications, it is possible to get the benefits of both MPC and DP. If the goal is to compute f, and g is a differentially private approximation of f that achieves good privacy and accuracy simultaneously, then one natural way to proceed is to use MPC to compute g. We expect to see both MPC and DP used to enhance data privacy in Amazon’s products and services.

Related content

US, VA, Arlington
Do you want a role with deep meaning and the ability to have a global impact? Hiring top talent is not only critical to Amazon’s success – it can literally change the world. It took a lot of great hires to deliver innovations like AWS, Prime, and Alexa, which make life better for millions of customers around the world. As part of the Intelligent Talent Acquisition (ITA) team, you'll have the opportunity to reinvent Amazon’s hiring process with unprecedented scale, sophistication, and accuracy. ITA is an industry-leading people science and technology organization made up of scientists, engineers, analysts, product professionals, and more. Our shared goal is to fairly and precisely connect the right people to the right jobs. Last year, we delivered over 6 million online candidate assessments, driving a merit-based hiring approach that gives candidates the opportunity to showcase their true skills. Each year we also help Amazon deliver billions of packages around the world by making it possible to hire hundreds of thousands of associates in the right quantity, at the right location, at exactly the right time. You’ll work on state-of-the-art research with advanced software tools, new AI systems, and machine learning algorithms to solve complex hiring challenges. Join ITA in using cutting-edge technologies to transform the hiring landscape and make a meaningful difference in people's lives. Together, we can solve the world's toughest hiring problems. Within ITA, the Global Hiring Science (GHS) team designs and implements innovative hiring solutions at scale. We work in a fast-paced, global environment where we use research to solve complex problems and build scalable hiring products that deliver measurable impact to our customers. We are seeking selection researchers with a strong foundation in hiring assessment development, legally-defensible validation approaches, research and experimental design, and data analysis. Preferred candidates will have experience across the full hiring assessment lifecycle, from solution design to content development and validation to impact analysis. We are looking for equal parts researcher and consultant, who is able to influence customers with insights derived from science and data. You will work closely with cross-functional teams to design new hiring solutions and experiment with measurement methods intended to precisely define exactly what job success looks like and how best to predict it. Key job responsibilities What you’ll do as a GHS Research Scientist: • Design large-scale personnel selection research that shapes Amazon’s global talent assessment practices across a variety of topics (e.g., assessment validation, measuring post-hire impact) • Partner with key stakeholders to create innovative solutions that blend scientific rigor with real-world business impact while navigating complex legal and professional standards • Apply advanced statistical techniques to analyze massive, diverse datasets to uncover insights that optimize our candidate evaluation processes and drive hiring excellence • Explore emerging technologies and innovative methodologies to enhance talent measurement while maintaining Amazon's commitment to scientific integrity • Translate complex research findings into compelling, actionable strategies that influence senior leader/business decisions and shape Amazon's talent acquisition roadmap • Write impactful documents that distill intricate scientific concepts into clear, persuasive communications for diverse audiences, from data scientists to business leaders • Ensure effective teamwork, communication, collaboration, and commitment across multiple teams with competing priorities A day in the life Imagine diving into challenges that impact millions of employees across Amazon's global operations. As a GHS Research Scientist, you'll tackle questions about hiring and organizational effectiveness on a global scale. Your day might begin with analyzing datasets to inform how we attract and select world-class talent. Throughout the day, you'll collaborate with peers in our research community, discussing different research methodologies and sharing innovative approaches to solving unique personnel challenges. This role offers a blend of focused analytical time and interacting with stakeholders across the globe.
US, WA, Seattle
We are looking for a researcher in state-of-the-art LLM technologies for applications across Alexa, AWS, and other Amazon businesses. In this role, you will innovate in the fastest-moving fields of current AI research, in particular in how to integrate a broad range of structured and unstructured information into AI systems (e.g. with RAG techniques), and get to immediately apply your results in highly visible Amazon products. If you are deeply familiar with LLMs, natural language processing, computer vision, and machine learning and thrive in a fast-paced environment, this may be the right opportunity for you. Our fast-paced environment requires a high degree of autonomy to deliver ambitious science innovations all the way to production. You will work with other science and engineering teams as well as business stakeholders to maximize velocity and impact of your deliverables. It's an exciting time to be a leader in AI research. In Amazon's AGI Information team, you can make your mark by improving information-driven experience of Amazon customers worldwide!
US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Enable unprecedented robustness and reliability, industry-ready - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities As an Applied Science Manager in the Foundations Model team, you will: - Build and lead a team of scientists and developers responsible for foundation model development - Define the right ‘FM recipe’ to reach industry ready solutions - Define the right strategy to ensure fast and efficient development, combining state of the art methods, research and engineering. - Lead Model Development and Training: Designing and implementing the model architectures, training and fine tuning the foundation models using various datasets, and optimize the model performance through iterative experiments - Lead Data Management: Process and prepare training data, including data governance, provenance tracking, data quality checks and creating reusable data pipelines. - Lead Experimentation and Validation: Design and execute experiments to test model capabilities on the simulator and on the embodiment, validate performance across different scenarios, create a baseline and iteratively improve model performance. - Lead Code Development: Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Research: Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Collaboration: Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies.
CA, QC, Montreal
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as perception, manipulation, scene understanding, sim2real transfer, multi-modal foundation models, and multi-task learning, designing novel algorithms that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Drive independent research initiatives in robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Lead technical projects from conceptualization through deployment, ensuring robust performance in production environments - Collaborate with platform teams to optimize and scale models for real-world applications - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement novel foundation model architectures, leveraging our extensive compute infrastructure to train and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through ground breaking foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, NY, New York
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through cutting-edge generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities Participate in the Science hiring process as well as mentor other scientists - improving their skills, their knowledge of your solutions, and their ability to get things done. Identify and devise new video related solutions following a customer-obsessed scientific approach to address customer or business problems when the problem is ill-defined, needs to be framed, and new methodologies or paradigms need to be invented at the product level. Articulate potential scientific challenges of ongoing or future customers’ needs or business problems, and present interventions to address them. Independently assess alternative video related technologies, driving evaluation and adoption of those that fit best A day in the life As an Applied Scientist on the Sponsored Products Video team, you will work with a team of talented and experienced engineers, scientists, and designers to help bring new products to market and ensure that our customers are delighted by what we create. The Sponsored Products Video team is responsible for the design, development, and implementation of Sponsored Products Video experiences worldwide. About the team The Sponsored Products Video team within Sponsored Products and Brands creates relevant and engaging video experiences, connecting advertisers and shoppers. We are on a mission to make Amazon the best in class destination for shoppers to discover, engage and build affinity with brands, making shopping delightful, & personal.
IN, TS, Hyderabad
We're seeking an Applied Scientist to lead and innovate in applying advanced AI technologies that will reshape how businesses sell on Amazon. Our team is passionate about leveraging Machine Learning, GenAI, and Agentic AI to help B2B sellers optimize their operations and drive growth. Join Amazon Business 3P (Third Party - Sellers) - a rapidly growing global organization where we innovate at the intersection of AI technology and B2B commerce. We're reimagining how sellers reach and serve business customers, creating intelligent solutions that help them grow their B2B business on Amazon. From AI-powered Seller Central tools to smart business certifications, dynamic pricing capabilities, and advanced analytics, we're transforming how B2B selling happens. As an Applied Scientist II on our AB 3P Tech team, you'll drive the development and implementation of state-of-the-art algorithms and models for supervised fine-tuning and reinforcement learning. You'll work with highly technical, entrepreneurial teams to: - Design and implement AI models that power the B2B selling experience - Lead the development of GenAI products that can handle Amazon-scale use cases - Drive research and implementation of advanced algorithms for human feedback and complex reasoning - Make strategic AI technology decisions and mentor technical talent - Own critical AI systems spanning from Seller Central to Amazon Business detail pages Join us in shaping the future of B2B selling - we're building applied AI solutions that businesses love and trust for their day-to-day success. If you are scrappy and bias for action is your favorite Leadership Principle, you'll fit right in as we innovate across the seller experience to create significant impact in this fast-growing business. Key job responsibilities Key job responsibilities: - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in Gen AI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of Gen AI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences About the team At Amazon Business Third Party (AB3P) Tech, we're revolutionizing B2B e-commerce by empowering sellers in the business marketplace. Our scope spans the complete B2B selling journey, from Seller Central to Amazon Business detail pages, cart, and checkout for merchant-fulfilled offers. Our entrepreneurial culture and global reach define us. We develop features across seller experience, delivery, certifications, fees, registration, and analytics, collaborating with worldwide teams and leveraging advanced AI technologies to continuously innovate. Working in true Day 1 spirit, we build next-generation solutions that shape the future of B2B commerce. Join us in building next-generation solutions that shape the future of B2B commerce.
GB, London
Come build the future of entertainment with us. Are you interested in shaping the future of movies and television? Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows including Amazon Originals and exclusive licensed content to exciting live sports events. Prime Video is a fast-paced, growth business - available in over 200 countries and territories worldwide. The Video Content Research team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. We are seeking a Data Scientist to develop scalable models that uncover key insights into how, why and when customers engage with Prime Video marketing. Key job responsibilities In this role you will work closely with business stakeholders and technical peers (data scientists, economists and engineers) to develop causal marketing measurement models, analyze experiments and investigate customer, marketing and content related factors that drive engagement with Prime Video. You will create mechanisms and infrastructure to deploy complex models and generate insights at scale. You will have the opportunity to work with large datasets, work with AWS to build and deploy machine learning models that impact Prime Video's marketing decisions. About the team The Video Content Research team uses machine learning, econometrics, and data science to optimize Amazon's marketing and content investments. We generate insights for Amazon's digital video strategy, partnering with finance, marketing, and content teams. We analyze customer behavior on Prime Video (marketing impressions, clicks on owned channels) to identify optimization opportunities.
US, MA, Boston
AI is the most transformational technology of our time, capable of tackling some of humanity’s most challenging problems. That is why Amazon is investing in generative AI (GenAI) and the responsible development and deployment of large language models (LLMs) across all of our businesses. Come build the future of human-technology interaction with us. We are looking for a Research Scientist with strong technical skills which includes coding and natural language processing experience in dataset construction, training and evaluating models, and automatic processing of large datasets. You will play a critical role in driving innovation and advancing the state-of-the-art in natural language processing and machine learning. You will work closely with cross-functional teams, including product managers, language engineers, and other scientists. Key job responsibilities Specifically, the Research Scientist will: • Ensure quality of speech/language/other data throughout all stages of acquisition and processing, including data sourcing/collection, ground truth generation, normalization, transformation, cross-lingual alignment/mapping, etc. • Clean, analyze and select speech/language/other data to achieve goals • Build and test models that elevate the customer experience • Collaborate with colleagues from science, engineering and business backgrounds • Present proposals and results in a clear manner backed by data and coupled with actionable conclusions • Work with engineers to develop efficient data querying infrastructure for both offline and online use cases
US, VA, Arlington
The People eXperience and Technology Central Science (PXTCS) team uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. PXTCS is an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. PXTCS is looking for an economist who can apply economic methods to address business problems. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure impact, and transform successful prototypes into improved policies and programs at scale. PXTCS is looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team. Ideal candidates will work in a team setting with individuals from diverse disciplines and backgrounds. They will work with teammates to develop scientific models and conduct the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. A day in the life The Economist will work with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions. About the team PXTCS is a multidisciplinary science team that develops innovative solutions to make Amazon Earth's Best Employer
IN, KA, Bengaluru
Alexa+ is Amazon’s next-generation, AI-powered virtual assistant. Building on the original Alexa, it uses generative AI to deliver a more conversational, personalised, and effective experience. Alexa Sensitive Content Intelligence (ASCI) team is developing responsible AI (RAI) solutions for Alexa+, empowering it to provide useful information responsibly. The team is currently looking for Senior Applied Scientists with a strong background in NLP and/or CV to design and develop ML solutions in the RAI space using generative AI across all languages and countries. A Senior Applied Scientist will be a tech lead for a team of exceptional scientists to develop novel algorithms and modeling techniques to advance the state of the art in NLP or CV related tasks. You will work in a hybrid, fast-paced organization where scientists, engineers, and product managers work together to build customer facing experiences. You will collaborate with and mentor other scientists to raise the bar of scientific research in Amazon. Your work will directly impact our customers in the form of products and services that make use of speech, language, and computer vision technologies. We are looking for a leader with strong technical experiences a passion for building scientific driven solutions in a fast-paced environment. You should have good understanding of Artificial Intelligence (AI), Natural Language Understanding (NLU), Machine Learning (ML), Dialog Management, Automatic Speech Recognition (ASR), and Audio Signal Processing where to apply them in different business cases. You leverage your exceptional technical expertise, a sound understanding of the fundamentals of Computer Science, and practical experience of building large-scale distributed systems to creating reliable, scalable, and high-performance products. In addition to technical depth, you must possess exceptional communication skills and understand how to influence key stakeholders. You will be joining a select group of people making history producing one of the most highly rated products in Amazon's history, so if you are looking for a challenging and innovative role where you can solve important problems while growing as a leader, this may be the place for you. Key job responsibilities You'll lead the science solution design, run experiments, research new algorithms, and find new ways of optimizing customer experience. You set examples for the team on good science practice and standards. Besides theoretical analysis and innovation, you will work closely with talented engineers and ML scientists to put your algorithms and models into practice. Your work will directly impact the trust customers place in Alexa, globally. You contribute directly to our growth by hiring smart and motivated Scientists to establish teams that can deliver swiftly and predictably, adjusting in an agile fashion to deliver what our customers need. A day in the life You will be working with a group of talented scientists on researching algorithm and running experiments to test scientific proposal/solutions to improve our sensitive contents detection and mitigation. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, and model development. You will mentor other scientists, review and guide their work, help develop roadmaps for the team. You work closely with partner teams across Alexa to deliver platform features that require cross-team leadership. About the hiring group About the team The mission of the Alexa Sensitive Content Intelligence (ASCI) team is to (1) minimize negative surprises to customers caused by sensitive content, (2) detect and prevent potential brand-damaging interactions, and (3) build customer trust through appropriate interactions on sensitive topics. The term “sensitive content” includes within its scope a wide range of categories of content such as offensive content (e.g., hate speech, racist speech), profanity, content that is suitable only for certain age groups, politically polarizing content, and religiously polarizing content. The term “content” refers to any material that is exposed to customers by Alexa (including both 1P and 3P experiences) and includes text, speech, audio, and video.