Amazon wins best-paper award for protecting privacy of training data

Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.

Differential privacy is a popular technique that provides a way to quantify the privacy risk of releasing aggregate statistics based on individual data. In the context of machine learning, differential privacy provides a way to protect privacy by adding noise to the data used to train a machine learning model. But the addition of noise can also reduce model performance.

In a pair of papers at the annual meeting of the Florida Artificial Intelligence Research Society (FLAIRS), the Privacy Engineering for Alexa team is presenting a new way to calibrate the noise added to the textual data used to train natural-language-processing (NLP) models. The idea is to distinguish cases where a little noise is enough to protect privacy from cases where more noise is necessary. This helps minimize the impact on model accuracy while maintaining privacy guarantees, which aligns with the team’s mission to measurably preserve customer privacy across Alexa.

One of the papers, “Density-aware differentially private textual perturbations using truncated Gumbel noise”, has won the conference’s best-paper award.

Calibrated noise addition.gif
A simplified example of the method proposed in the researchers' award-winning paper. Noise is added to the three nearest neighbors of a source word, A, and to A itself. After noise addition, the word closest to A's original position — B — is chosen as a substitute for A.
Credit: Glynis Condon

Differential privacy says that, given an aggregate statistic, the probability that the underlying dataset does or does not contain a particular item should be virtually the same. The addition of noise to the data helps enforce that standard, but it can also obscure relationships in the data that the model is trying to learn.

In NLP applications, a standard way to add noise involves embedding the words of the training texts. An embedding represents words as vectors, such that vectors that are close in the space have related meanings. 

Adding noise to an embedding vector produces a new vector, which would correspond to a similar but different word. Ideally, substituting the new words for the old should disguise the original data while preserving the attributes that the NLP model is trying to learn. 

However, words in an embedding space tend to form clusters, united by semantic similarity, with sparsely populated regions between clusters. Intuitively, within a cluster, much less noise should be required to ensure enough semantic distance to preserve privacy. However, if the noise added to each word is based on the average distance between embeddings — factoring in the sparser regions — it may be more than is necessary for words in dense regions.

Noise calibration.png
A simplified representation of words (red dots) in an embedding space. Adding noise to a source vector (A) produces a new vector, and the nearest (green circle) embedded word (B) is chosen as a substitute. In the graph at left, adding a lot of noise to the source word produces an output word that is far away and hence semantically dissimilar. In the middle graph, however, a lot of noise is needed to produce a semantically different output. In the graph at right, the amount of noise is calibrated to the density of the vectors around the source word.

This leads us to pose the following question in our FLAIRS papers: Can we recalibrate the noise added such that it varies for every word depending on the density of the surrounding space, rather than resorting to a single global sensitivity?

Calibration techniques

We study this question from two different perspectives. In the paper titled “Research challenges in designing differentially private text generation mechanisms”, my Alexa colleagues Oluwaseyi Feyisetan, Zekun Xu, Nathanael Teissier, and I discuss general techniques to enhance the privacy of text mechanisms by exploiting features such as local density in the embedding space.  

For example, one technique deduces a probability distribution (a prior) that assigns high probability to dense areas of the embedding and low probability to sparse areas. This prior can be produced using kernel density estimation, which is a popular technique for estimating distributions from limited data samples. 

However, these distributions are often highly nonlinear, which makes them difficult to sample from. In this case, we can either opt for an approximation to the distribution or adopt indirect sampling strategies such as the Metropolis–Hastings algorithm (which is based on well-known Monte Carlo Markov chain techniques). 

Another technique we discuss is to impose a limit on how far away a noisy embedding may be from its source. We explore two ways to do this: distance-based truncation and k-nearest-neighbor-based truncation. 

Distance-based truncation simply caps the distance between the noisy embedding and its source, according to some measure of distance in the space. This prevents the addition of a large amount of noise, which is useful in the dense regions of the embedding. But in the sparse regions, this can effectively mean zero perturbation, since there may not be another word within the distance limit. 

To avoid this drawback, we consider the alternate approach of k-nearest-neighbor-based truncation. In this approach, the  words closest to the source delineate the acceptable search area. We then execute a selection procedure to choose the new word from these candidates (plus the source word itself). This is the approach we adopt in our second paper.

Nearest-neighbor search.png
A schematic of distance-based (left and middle graphs) and nearest-neighbor-based (right graph) truncation techniques. In the first graph, the blue circle represents a limit on the distance from the source word, A. Randomly adding noise produces a vector within this limit, and the output word B is selected. In the middle graph, a large amount of noise has been randomly added, but it’s truncated at the boundary of the blue circle. The right graph shows k-nearest-neighbor truncation, where a random number of neighbors (in this case, three) are selected around the source word, A. Noise is added to each of these neighbors independently, and the nearest word after noise addition — B — is chosen (see animation, above).

In “Density-aware differentially private textual perturbations using truncated Gumbel noise”, Nan Xu, a summer intern with our group in 2020 and currently a PhD student in computer science at the University of Southern California, joins us to discuss a particular algorithm in detail. 

This algorithm calibrates noise by selecting a few neighbors of the source word and perturbing the distance to these neighbors using samples from the Gumbel distribution (the rightmost graph, above). We chose the Gumbel distribution because it is more computationally efficient than existing mechanisms for differentially private selection (e.g., the exponential mechanism). The number of neighbors is chosen randomly using Poisson samples.

Together, these two techniques, when calibrated appropriately, provide the required amount of differential privacy while enhancing utility. We call the resulting algorithm the truncated Gumbel mechanism, and it better preserves semantic meanings than multivariate Laplace mechanisms, a widely used method for adding noise to textual data. (The left and middle graphs of the top figure above depict the use of Laplace mechanisms). 

In tests, we found that this new algorithm provided improvements in accuracy of up to 9.9% for text classification tasks on two different datasets. Our paper also includes a formal proof of the privacy guarantees offered by this mechanism and analyzes relevant privacy statistics. 

Our ongoing research efforts continue to improve upon the techniques described above and enable Alexa to continue introducing new features and inventions that make customers’ lives easier while keeping their data private.

Related content

LU, Luxembourg
The Decision, Science and Technology (DST) team part of the global Reliability Maintenance Engineering (RME) is looking for a Senior Operations Research Scientist interested in solving challenging optimization problems in the maintenance space. Our mission is to leverage the use of data, science, and technology to improve the efficiency of RME maintenance activities, reduce costs, increase safety and promote sustainability while creating frictionless customer experiences. As a Senior OR Scientist in DST you will be focused on leading the design and development of innovative approaches and solutions by leading technical work supporting RME’s Predictive Maintenance (PdM) and Spare Parts (SP) programs. You will connect with world leaders in your field and you will be tackling customer's natural language challenges by carrying out a systematic review of existing solutions. The appropriate choice of methods and their deployment into effective tools will be the key for the success in this role. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail and outstanding ability in balancing technical leadership with strong business judgment to make the right decisions about model and method choices. Key job responsibilities • Provide technical expertise to support team strategies that will take EU RME towards World Class predictive maintenance practices and processes, driving better equipment up-time and lower repair costs with optimized spare parts inventory and placement • Implement an advanced maintenance framework utilizing Machine Learning technologies to drive equipment performance leading to reduced unplanned downtime • Provide technical expertise to support the development of long-term spares management strategies that will ensure spares availability at an optimal level for local sites and reduce the cost of spares A day in the life As a Senior OR Scientist in DST you will be focused on leading the design and development of innovative approaches and solutions by leading technical work supporting RME’s Predictive Maintenance (PdM) and Spare Parts (SP) programs. You will connect with world leaders in your field and you will be tackling customer's natural language challenges by carrying out a systematic review of existing solutions. The appropriate choice of methods and their deployment into effective tools will be the key for the success in this role. About the team Our mission is to leverage the use of data, science, and technology to improve the efficiency of RME maintenance activities, reduce costs, increase safety and promote sustainability while creating frictionless customer experiences. We are open to hiring candidates to work out of one of the following locations: Luxembourg, LUX
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, CA, Santa Clara
AWS AI Research and Engineering (AIRE) is looking for world class scientists and engineers to work on the development of autonomous AI agents. At AWS AI/ML you will invent, implement, and deploy state of the art machine learning algorithms and systems. You will build prototypes and innovate on new learning techniques. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. Large-scale foundation models have been the powerhouse in many of the recent advancements in computer vision, natural language processing, automatic speech recognition, recommendation systems, and time series modeling. Developing such models requires not only skillful modeling in individual modalities, but also understanding of how to seamlessly combine them, and how to scale the modeling methods to learn with huge models and on large datasets. We seek a strong technical leader with domain expertise in machine learning, large language models and multimodal models, reinforcement learning and setting up simulation environments to benchmark and evaluate. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS. Within AWS UC, Amazon Dedicated Cloud (ADC) roles engage with AWS customers who require specialized security solutions for their cloud services. About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. We are open to hiring candidates to work out of one of the following locations: Santa Clara, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA