A little public data makes privacy-preserving AI models more accurate

Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

Many useful computer vision models are trained on large corpora of public data, such as ImageNet. But some applications — models that analyze medical images for indications of disease, for instance — need to be trained on data whose owners might like to keep it private. In such cases, we want to be sure that no one can infer anything about specific training examples from the output of the trained model.

Differential privacy offers a way to quantify both the amount of private information that a machine learning model might leak and the effectiveness of countermeasures. The standard way to prevent data leakage is to add noise during the model training process. This can obscure the inferential pathway leading from model output to specific training examples, but it also tends to compromise model accuracy.

DP.CV.jpeg
A differential-privacy guarantee means that it is statistically impossible to tell whether a given sample was or was not part of the dataset used to train a machine learning model.

Natural-language-processing researchers have had success training models on a mixture of private and public training data, enforcing differential-privacy (DP) guarantees on the private data while compromising model accuracy very little. But attempts to generalize these methods to computer vision have fared badly. In fact, they fare so badly that training a model on public data and then doing zero-shot learning on the private-data task tends to work better than training mixed-data models.

In a paper we presented at this year’s Conference on Computer Vision and Pattern Recognition (CVPR), we address this problem, with an algorithm called AdaMix. We consider the case in which we have at least a little public data whose label set is the same as — or at least close to — that of the private data. In the medical-imaging example, we might have a small public dataset of images labeled to show evidence of the disease of interest, or something similar.

Related content
The surprising dynamics related to learning that are common to artificial and biological systems.

AdaMix works in two phases. It first trains on the public data to identify the “ball park” of the desired model weights. Then it trains jointly on the public and private data to refine the solution, while being incentivized to stay near the ball park. The public data also helps to make various adaptive decisions in every training iteration so that we can meet our DP criteria with minimal overall perturbation of the model.

AdaMix models outperform zero-shot models on private-data tasks, and relative to conventional mixed-data models, they reduce the error increase by 60% to 70%. That’s still a significant increase, but it’s mild enough that, in cases in which privacy protection is paramount, the resulting models may still be useful — which conventional mixed-data models often aren’t.

In addition, we obtained strong theoretical guarantees on the performance of AdaMix. Notably, we show that even a tiny public dataset will bring about substantial improvement in accuracy, with provable guarantees. This is in addition to the formal differential-privacy guarantee that the algorithm enjoys.

Information transfer and memorization

Computer vision models learn to identify image features relevant to particular tasks. A cat recognizer, for instance, might learn to identify image features that denote pointy ears when viewed from various perspectives. Since most of the images in the training data feature cats with pointy ears, the recognizer will probably model pointy ears in a very general way, which is not traceable to any particular training example.

Related content
Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.

If, however, the training data contains only a few images of Scottish Fold cats, with their distinctive floppy ears, the model might learn features particular to just those images, a process we call memorization. And memorization does open the possibility that a canny adversary could identify individual images used in the training data.

Information theory provides a way to quantify the amount of information that the model-training process transfers from any given training example to the model parameters, and the obvious way to prevent memorization would be to cap that information transfer.

But as one of us (Alessandro) explained in an essay for Amazon Science, “The importance of forgetting in artificial and animal intelligence”, during training, neural networks begin by memorizing a good deal of information about individual training examples before, over time, forgetting most of the memorized details. That is, they develop abstract models by gradually subtracting extraneous details from more particularized models. (This finding was unsurprising to biologists, as the development of the animal brain involves a constant shedding of useless information and a consolidation of useful information.)

DP provably prevents unintended memorization of individual training examples. But this also imposes a universal cap on the information transfer between training examples and model parameters, which could inhibit the learning process. The characteristics of specific training examples are often needed to map out the space of possibilities that the learning algorithm should explore as examples accumulate.

Related content
ADePT model transforms the texts used to train natural-language-understanding models while preserving semantic coherence.

This is the insight that our CVPR paper exploits. Essentially, we allow the model to memorize features of the small public dataset, mapping out the space of exploration. Then, when the model has been pretrained on public data, we cap the information transfer between the private data and the model parameters.

We tailor that cap, however, to the current values of the model parameters, and, more particularly, we update the cap after every iteration of the training procedure. This ensures that, for each sample in the private data set, we don’t add more noise than is necessary to protect privacy.

The particular improvement that our approach affords on test data suggests that it could enable more practical computer vision models that also meet privacy guarantees. But more important, we hope that the theoretical insight it incorporates — that DP schemes for computer vision have to be mindful of the importance of forgetting — will lead to still more effective methods of privacy protection.

Acknowledgments: Aaron Roth, Michael Kearns, Stefano Soatto

Related content

US, CA, Santa Clara
Job summaryAmazon is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology.Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Processing (NLP), Natural Language Understanding (NLU), Dialog management, conversational AI and Machine Learning (ML).As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services, as well as contributing to the wider research community. You will gain hands on experience with Amazon’s heterogeneous text and structured data sources, and large-scale computing resources to accelerate advances in language understanding.We are hiring primarily in Conversational AI / Dialog System Development areas: NLP, NLU, Dialog Management, NLG.This role can be based in NYC, Seattle or Palo Alto.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Seattle
Job summaryWorkforce Staffing (WFS) brings together the workforce powering Amazon’s ability to delight customers: the Amazon Associate. With over 1M hires, WFS supports sourcing, hiring, and developing the best talent to work in our fulfillment centers, sortation centers, delivery stations, shopping sites, Prime Air locations, and more.WFS' Funnel Science and Analytics team is looking for a Research Scientist. This individual will be responsible for conducting experiments and evaluating the impact of interventions when conducting experiments is not feasible. The perfect candidate will have the applied experience and the theoretical knowledge of policy evaluation and conducting field studies.Key job responsibilitiesAs a Research Scientist (RS), you will do causal inference, design studies and experiments, leverage data science workflows, build predictive models, conduct simulations, create visualizations, and influence science and analytics practice across the organization.Provide insights by analyzing historical data from databases (Redshift, SQL Server, Oracle DW, and Salesforce).Identify useful research avenues for increasing candidate conversion, test, and create well written documents to communicate to technical and non-technical audiences.About the teamFunnel Science and Analytics team finds ways to maximize the conversion and early retention of every candidate who wants to be an Amazon Associate. By focusing on our candidates, we improve candidate and business outcomes, and Amazon takes a step closer to being Earth’s Best Employer.
US, NY, New York
Job summaryAmazon is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology.Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Processing (NLP), Natural Language Understanding (NLU), Dialog management, conversational AI and Machine Learning (ML).As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services, as well as contributing to the wider research community. You will gain hands on experience with Amazon’s heterogeneous text and structured data sources, and large-scale computing resources to accelerate advances in language understanding.We are hiring primarily in Conversational AI / Dialog System Development areas: NLP, NLU, Dialog Management, NLG.This role can be based in NYC, Seattle or Palo Alto.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, CA, Santa Clara
Job summaryAWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on building automated ML solutions for planetary-scale sustainability and geospatial applications. Our team's mission is to develop ready-to-use and automated solutions that solve important sustainability and geospatial problems. We live in a time wherein geospatial data, such as climate, agricultural crop yield, weather, landcover, etc., has become ubiquitous. Cloud computing has made it easy to gather and process the data that describes the earth system and are generated by satellites, mobile devices, and IoT devices. Our vision is to bring the best ML/AI algorithms to solve practical environmental and sustainability-related R&D problems at scale. Building these solutions require a solid foundation in machine learning infrastructure and deep learning technologies. The team specializes in developing popular open source software libraries like AutoGluon, GluonCV, GluonNLP, DGL, Apache/MXNet (incubating). Our strategy is to bring the best of ML based automation to the geospatial and sustainability area.We are seeking an experienced Applied Scientist for the team. This is a role that combines science knowledge (around machine learning, computer vision, earth science), technical strength, and product focus. It will be your job to develop ML system and solutions and work closely with the engineering team to ship them to our customers. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. You are also expected to work closely with other applied scientists and demonstrate Amazon Leadership Principles (https://www.amazon.jobs/en/principles). Strong technical skills and experience with machine learning and computer vision are required. Experience working with earth science, mapping, and geospatial data is a plus. Our customers are extremely technical and the solutions we build for them are strongly coupled to technical feasibility.About the teamInclusive Team CultureAt AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded scientist and enable them to take on more complex tasks in the future.Interested in this role? Reach out to the recruiting team with questions or apply directly via amazon.jobs.
US, CA, Santa Clara
Job summaryAWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on building automated ML solutions for planetary-scale sustainability and geospatial applications. Our team's mission is to develop ready-to-use and automated solutions that solve important sustainability and geospatial problems. We live in a time wherein geospatial data, such as climate, agricultural crop yield, weather, landcover, etc., has become ubiquitous. Cloud computing has made it easy to gather and process the data that describes the earth system and are generated by satellites, mobile devices, and IoT devices. Our vision is to bring the best ML/AI algorithms to solve practical environmental and sustainability-related R&D problems at scale. Building these solutions require a solid foundation in machine learning infrastructure and deep learning technologies. The team specializes in developing popular open source software libraries like AutoGluon, GluonCV, GluonNLP, DGL, Apache/MXNet (incubating). Our strategy is to bring the best of ML based automation to the geospatial and sustainability area.We are seeking an experienced Applied Scientist for the team. This is a role that combines science knowledge (around machine learning, computer vision, earth science), technical strength, and product focus. It will be your job to develop ML system and solutions and work closely with the engineering team to ship them to our customers. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. You are also expected to work closely with other applied scientists and demonstrate Amazon Leadership Principles (https://www.amazon.jobs/en/principles). Strong technical skills and experience with machine learning and computer vision are required. Experience working with earth science, mapping, and geospatial data is a plus. Our customers are extremely technical and the solutions we build for them are strongly coupled to technical feasibility.About the teamInclusive Team CultureAt AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded scientist and enable them to take on more complex tasks in the future.Interested in this role? Reach out to the recruiting team with questions or apply directly via amazon.jobs.
US, WA, Seattle
Job summaryHow can we create a rich, data-driven shopping experience on Amazon? How do we build data models that helps us innovate different ways to enhance customer experience? How do we combine the world's greatest online shopping dataset with Amazon's computing power to create models that deeply understand our customers? Recommendations at Amazon is a way to help customers discover products. Our team's stated mission is to "grow each customer’s relationship with Amazon by leveraging our deep understanding of them to provide relevant and timely product, program, and content recommendations". We strive to better understand how customers shop on Amazon (and elsewhere) and build recommendations models to streamline customers' shopping experience by showing the right products at the right time. Understanding the complexities of customers' shopping needs and helping them explore the depth and breadth of Amazon's catalog is a challenge we take on every day. Using Amazon’s large-scale computing resources you will ask research questions about customer behavior, build models to generate recommendations, and run these models directly on the retail website. You will participate in the Amazon ML community and mentor Applied Scientists and software development engineers with a strong interest in and knowledge of ML. Your work will directly benefit customers and the retail business and you will measure the impact using scientific tools. We are looking for passionate, hard-working, and talented Applied scientist who have experience building mission critical, high volume applications that customers love. You will have an enormous opportunity to make a large impact on the design, architecture, and implementation of cutting edge products used every day, by people you know.Key job responsibilitiesScaling state of the art techniques to Amazon-scaleWorking independently and collaborating with SDEs to deploy models to productionDeveloping long-term roadmaps for the team's scientific agendaDesigning experiments to measure business impact of the team's effortsMentoring scientists in the departmentContributing back to the machine learning science community
US, NY, New York City
Job summaryAmazon Web Services is looking for world class scientists to join the Security Analytics and AI Research team within AWS Security Services. This group is entrusted with researching and developing core data mining and machine learning algorithms for various AWS security services like GuardDuty (https://aws.amazon.com/guardduty/) and Macie (https://aws.amazon.com/macie/). In this group, you will invent and implement innovative solutions for never-before-solved problems. If you have passion for security and experience with large scale machine learning problems, this will be an exciting opportunity.The AWS Security Services team builds technologies that help customers strengthen their security posture and better meet security requirements in the AWS Cloud. The team interacts with security researchers to codify our own learnings and best practices and make them available for customers. We are building massively scalable and globally distributed security systems to power next generation services.Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop and enable them to take on more complex tasks in the future.A day in the lifeAbout the hiring groupJob responsibilities* Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative and business judgment.* Collaborate with software engineering teams to integrate successful experiments into large scale, highly complex production services.* Report results in a scientifically rigorous way.* Interact with security engineers, product managers and related domain experts to dive deep into the types of challenges that we need innovative solutions for.
US, WA, Seattle
Job summaryAmazon Prime Video is changing the way millions of customers enjoy digital content. Prime Video delivers premium content to customers through purchase and rental of movies and TV shows, unlimited on-demand streaming through Amazon Prime subscriptions, add-on channels like Showtime and HBO, and live concerts and sporting events like NFL Thursday Night Football. In total, Prime Video offers nearly 200,000 titles and is available across a wide variety of platforms, including PCs and Macs, Android and iOS mobile devices, Fire Tablets and Fire TV, Smart TVs, game consoles, Blu-ray players, set-top-boxes, and video-enabled Alexa devices. Amazon believes so strongly in the future of video that we've launched our own Amazon Studios to produce original movies and TV shows, many of which have already earned critical acclaim and top awards, including Oscars, Emmys and Golden Globes.The Global Consumer Engagement team within Amazon Prime Video builds product and technology solutions that drive customer activation and engagement across all our supported devices and global footprint. We obsess over finding effective, programmatic and scalable ways to reach customers via a broad portfolio of both in-app and out-of-app experiences. We would love to have you join us to build models that can classify and detect content available on Prime Video. We need you to analyze the video, audio and textual signal streams and improve state-of-art solutions while being scalable to Amazon size data. We need to solve problems across many cultures and languages, working alongside an operations team generating labels across many languages to help us achieve these goals. Our team consistently strives to innovate, and holds several novel patents and inventions in the motion picture and television industry. We are highly motivated to extend the state of the art.As a member of our team, you will apply your deep knowledge of Computer Vision and Machine Learning to concrete problems that have broad cross-organizational, global, and technology impact. Your work will focus on addressing fundamental computer vision models like video understanding and video summarization in addition to building appropriate large scale datasets. You will work on large engineering efforts that solve significantly complex problems facing global customers. You will be trusted to operate with independence and are often assigned to focus on areas with significant impact on audience satisfaction. You must be equally comfortable with digging in to customer requirements as you are drilling into design with development teams and developing production ready learning models. You consistently bring strong, data-driven business and technical judgment to decisions.You will work with internal and external stakeholders, cross-functional partners, and end-users around the world at all levels. Our team makes a big impact because nothing is more important to us than pleasing our customers, continually earning their trust, and thinking long term. You are empowered to bring new technologies and deep learning approaches to your solutions.We embrace the challenges of a fast paced market and evolving technologies, paving the way to universal availability of content. You will be encouraged to see the big picture, be innovative, and positively impact millions of customers. This is a young and evolving business where creativity and drive will have a lasting impact on the way video is enjoyed worldwide.
US, CA, Palo Alto
Job summaryAmazon Web Services provides highly reliable, scalable and low-cost services in the cloud that powers over a million customers in 190 countries around the world. It is critical to AWS that we continue to build the most productive Field in the world that enables our customers to innovate and achieve their business goals at every stage of their cloud journey. At AWS World Wide Field Enablement (WWFE), we are seeking a UX researcher with experience in sales workflow, process and tool design & development. This position will work closely with our global field leaders, enablement stakeholders, product managers as well as WWFE leadership to ensure that we set the right vision, supported by a strategy for our teams to help our customers solve key business problems and issues. Key job responsibilitiesThis role will ensure our field roles have the best-in-class experience when it comes to executing the sales process, using our productivity/enablement tools and collaborating together as one team. About the teamAbout Us Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have twelve employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.
US, WA, Virtual Location - Washington
Job summaryThe Worldwide Workplace Health and Safety (WHS) team is hiring a Data Scientist to help us analyze, process, and model data to create actionable plans to measure and improve Amazon’s organizational culture as it relates to health and safety of our employees. The role will analyze the impact of behavioral and organizational changes, support data-driven decision making by business leaders, and facilitate the development of innovative products that improve the safety outcomes and overall employee experience across our global operations.The role will collaborate with internal technical, business WHS and operations teams, to support the development, implementation and ongoing measurement of safety programs and initiatives to meet our vision of being Earth's safest place to work.Key job responsibilities* Support the development of start-to-finish data product solutions from requirements gathering and ideation, through interface design and implementation.* Contribute to the design and implementation of data infrastructure and pipelines for machine learning and analytics products.* Obtain, merge, analyze, and report data using SQL, statistics software, and data visualization tools.* Apply various statistical and machine learning techniques to analyze large and complex data sets related to safety engagement, safety leadership and other behavioral factors. * Communicate applied machine learning and statistic concepts to project sponsors, business leaders, and development teams across Amazon.* Understand business customer needs, iterate on feedback, and drive adoption