Building machine learning models with encrypted data

New approach to homomorphic encryption speeds up the training of encrypted machine learning models sixfold.

The prevalence and success of machine learning have given rise to services that enable customers to train machine learning models in the cloud. In one scenario, a customer would upload training data to a cloud-based service and receive a trained model in return.

Homomorphic encryption (HE), a technology that allows computation on encrypted data, would give this procedure an extra layer of security. With HE, a customer would upload encrypted training data, and the service would use the encrypted data to directly produce an encrypted machine learning model, which only the customer could then decrypt.

At the 2020 Workshop on Encrypted Computing and Applied Homomorphic Cryptography, we presented a paper exploring the application of homomorphic encryption to logistic regression, a statistical model used for myriad machine learning applications, from genomics to tax compliance. Our paper shows how to train logistic-regression models on encrypted data six times as fast as prior work.

Homomorphic encryption

Homomorphic addition and multiplication

Homomorphic encryption provides an application programming interface (API) for evaluating functions on encrypted data. We refer to a message as m and its encryption as m with a box around it. Two of the operations in this API are the HE versions of addition and multiplication, which we present at right. The inputs are encrypted values, and the output is the encryption of the sum or product of the plaintext values. 

Homomorphic function evaluation

Diagram of a circuit with a multiplicative depth of three
A circuit with a multiplicative depth of three.

The eval operation takes a description of an arbitrary function ƒ as a circuit ƒ-hat (ƒ with a circumflex accent above it) expressed using only the HE versions of addition and multiplication, as in the example at left. Given ƒ-hat and an encrypted input, eval produces an encryption of the output of evaluating ƒ on the input m.

For example, to evaluate ƒ(x) = x4 + 2 on encrypted data, we could use the circuit ƒ1-hat at right. This would be to use ƒ1-hat and the encrypted version of x as the inputs to the eval operation and x4 + 2 as ƒ(m).

Multiplicative depth

Diagram of a circuit with a multiplicative depth of two
A circuit with a multiplicative depth of two.

The efficiency of the eval operation depends on a property called multiplicative depth, the maximum number of multiplications along any path through a circuit. In the example at right, ƒ1-hat has a multiplicative depth of three, since there is a path that contains three multiplications but no path that has more than three multiplications. However, this is not the most efficient circuit for computing ƒ(x) = x4 + 2 .

Consider, instead, the circuit at left. This circuit also computes x4 + 2 but has a multiplicative depth of only two. It is therefore more efficient to evaluate ƒ2-hat than to evaluate ƒ1-hat.

Model training with homomorphic encryption

We can now see how homomorphic encryption could be used to securely outsource the training of a logistic-regression model. Customers would encrypt training data with keys they generate and control and send the encrypted training data to a cloud service. The service would compute an encrypted model based on the encrypted data and send it back to the customer; the model could then be decrypted with the customer’s key.

The most challenging part of deploying this solution is expressing the logistic-regression-model training function as a low-depth circuit. Prior research on encrypted logistic-regression-model training has explored several variations on the logistic-regression training function. For example:

  • Training on all samples at once versus using minibatches;
  • Alternatives to classic gradient descent, such as Nesterov’s accelerated gradient;
  • Training with variations of the fixed-Hessian method.

Previously, the lowest-depth (and therefore most efficient) circuits for logistic-regression training had multiplicative depth 5k, where k is the number of minibatches of data that the model is trained on. 

We revisited one of these existing solutions and created a circuit with multiplicative depth 2.5k for k minibatches — half the multiplicative depth. This effectively doubles the number of minibatches that can be incorporated into the model in the same amount of time.

Techniques

The logistic-regression-training algorithm can be expressed as a sequence of linear-algebra computations. Prior work showed how to evaluate a limited number of linear-algebra expressions on encrypted data when certain conditions apply. Our paper generalizes those results, providing a complete “toolkit” of homomorphic linear-algebra operations, enabling addition and multiplication of scalars, encrypted vectors, and encrypted matrices. The toolkit is generic and can be used with a variety of linear-algebra applications.

We combine the algorithms in the toolkit with well-established compiler techniques to reduce the circuit depth for logistic-regression model training. First, we use loop unrolling, which replaces the body of a loop with two or more copies of itself and adjusts the loop indices accordingly. Loop unrolling enables further optimizations that may not be possible with just a single copy of the loop body.

Loop unrolling algorithm

We also employ pipelining, which allows us to start one iteration of a loop while still working on the previous iteration. Finally, we remove data dependencies by duplicating some computations. This has the effect of increasing the circuit width (the number of operations that can be performed in parallel), while reducing the circuit depth. 

We note that despite the increased circuit width, computing this lower-depth circuit is faster than computing previous circuits even on a single core. If the server has many cores, we can further improve training time, since our wide circuit provides ample opportunity for parallelism.

Results

We compared our circuit for logistic-regression training to an earlier baseline circuit, using the MNIST data set, an image-processing data set consisting of handwritten digits. Both circuits were configured to incorporate six minibatches into the resulting model. In practice, both circuits would have to be applied multiple times to accommodate a realistic number of minibatches. 

Our circuit requires more encrypted inputs than the baseline; with the circuit parameters we chose, that corresponded to about an 80% increase in bandwidth requirements. Even though our circuit involves four times as many multiplications as the baseline, we can evaluate it more than six times as rapidly (13 seconds, compared to 80 seconds for the baseline) using a parallel implementation. Our homomorphically trained model had the same accuracy as a model trained on the plaintext data for the MNIST data set.

Training other model types

Creating efficient homomorphic circuits is a manual, time-consuming process. To make it easier for Amazon Web Services (AWS) and others to create circuits for other functions — such as training functions for other machine learning models — we created the Homomorphic Implementor’s Toolkit (HIT), a C++ library that provides high-level APIs and evaluators for homomorphic circuits. HIT is available today on GitHub

Related content

LU, Luxembourg
Are you a MS student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for a customer obsessed Data Scientist Intern who can innovate in a business environment, building and deploying machine learning models to drive step-change innovation and scale it to the EU/worldwide. If this describes you, come and join our Data Science teams at Amazon for an exciting internship opportunity. If you are insatiably curious and always want to learn more, then you’ve come to the right place. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science Key job responsibilities As a Data Science Intern, you will have following key job responsibilities: • Work closely with scientists and engineers to architect and develop new algorithms to implement scientific solutions for Amazon problems. • Work on an interdisciplinary team on customer-obsessed research • Experience Amazon's customer-focused culture • Create and Deliver Machine Learning projects that can be quickly applied starting locally and scaled to EU/worldwide • Build and deploy Machine Learning models using large data-sets and cloud technology. • Create and share with audiences of varying levels technical papers and presentations • Define metrics and design algorithms to estimate customer satisfaction and engagement A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, France, Germany, Ireland, Israel, Italy, Luxembourg, Netherlands, Poland, Romania, Spain and the UK). Please note these are not remote internships.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist; to support the development and implementation of Generative AI (GenAI) algorithms and models for supervised fine-tuning, and advance the state of the art with Large Language Models (LLMs), As an Applied Scientist, you will play a critical role in supporting the development of GenAI technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Sr. Applied Scientists with Recommender System or Search Ranking or Ads Ranking experience to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Recommendation/Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Recommendation/Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
US, WA, Seattle
We are open to hiring candidates to work out of one of the following locations: San Francisco, CA, USA | Santa Clara, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA Amazon is seeking an innovative and high-judgement Senior Applied Scientist to join the Privacy Engineering team in the Amazon Privacy Services org. We own products and programs that deliver technical innovation for ensuring compliance with high-impact, urgent regulation across Amazon services worldwide. The Senior Applied Scientist will contribute to the strategic direction for Amazon’s privacy practices while building/owning the compliance approach for individual regulations such as General Data Protection Regulation (GDPR), DMA, Quebec 25 etc. This will require helping to frame, and participating in, high judgment debates and decision making across senior business, technology, legal, and public policy leaders. A great candidate will have a unique combination of experience with innovative data governance technology, high judgement in system architecture decisions and ability to set detailed technical design from ambiguous compliance requirements. You will drive foundational, cross-service decisions, set technical requirements, oversee technical design, and have end to end accountability for delivering technical changes across dozens of different systems. You will have high engagement with WW senior leadership via quarterly reviews, annual organizational planning, and s-team goal updates. Key job responsibilities * Develop information retrieval benchmarks related to code analysis and invent algorithms to optimize identification of privacy requirements and controls. * Develop semantic and syntactic code analysis tools to assess privacy implementations within application code, and automatic code replacement tools to enhance privacy implementations. * Leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in generative artificial intelligence for privacy compliance. * Collaborate with other science and engineering teams as well as business stakeholders to maximize the velocity and impact of your contributions. A day in the life Amazon Privacy Services own products and programs that deliver technical innovation for ensuring Privacy Amazon services worldwide. We are hiring an innovative and high-judgement Senior Applied Scientist to develop AI solutions for builders across Amazon’s consumer and digital businesses including but not limited to Amazon.com, Amazon Ads, Amazon Go, Prime Video, Devices and more. Our ideal candidate is creative, has excellent problem-solving skills, a solid understanding of computer science fundamentals, deep learning and a customer-focused mindset. The Senior Scientist will serve as the resident expert on the development of AI agents for privacy. They build on their experiences to develop LLMs to develop AI implementations across privacy workflows. They will have responsibilities to mentor junior scientists and engineers develop AI skills. About the team Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, WA, Seattle
Here at Amazon, we embrace our differences. We are committed to furthering our culture of diversity and inclusion of our teams within the organization. How do you get items to customers quickly, cost-effectively, and—most importantly—safely, in less than an hour? And how do you do it in a way that can scale? Our teams of hundreds of scientists, engineers, aerospace professionals, and futurists have been working hard to do just that! We are delivering to customers, and are excited for what’s to come. Check out more information about Prime Air on the About Amazon blog (https://www.aboutamazon.com/news/transportation/amazon-prime-air-delivery-drone-reveal-photos). If you are seeking an iterative environment where you can drive innovation, apply state-of-the-art technologies to solve real world delivery challenges, and provide benefits to customers, Prime Air is the place for you. Come work on the Amazon Prime Air Team! We are seeking a highly skilled Navigation Scientist to help develop advanced algorithms and software for our Prime Air delivery drone program. In this role, you will conduct comprehensive navigation analysis to support cross-functional decision-making, define system architecture and requirements, contribute to the development of flight algorithms, and actively identify innovative technological opportunities that will drive significant enhancements to meet our customers' evolving demands. Export Control License: This position may require a deemed export control license for compliance with applicable laws and regulations. Placement is contingent on Amazon’s ability to apply for and obtain an export control license on your behalf.