Privacy challenges in extreme gradient boosting

Scientists describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.

(Editor’s note: This is the fourth in a series of articles Amazon Science is publishing related to the science behind products and services from companies in which the Amazon Alexa Fund has invested. The Alexa Fund completed a strategic investment in Inpher, Inc., earlier this year; the New York and Swiss-based company develops privacy-preserving machine learning and analytics solutions that help organizations unlock the value of sensitive, siloed data to enable secure collaboration across organizations. This article is co-authored by Dimitar Jetchev, the cofounder and chief technology officer of Inpher, and Joan Feigenbaum, an Amazon Scholar and the Grace Murray Hopper professor of computer science at Yale University.)

Joan Feigenbaum and Dimitar Jetchev
Dimitar Jetchev (left), the cofounder and chief technology officer of Inpher, and Joan Feigenbaum, the Grace Murray Hopper professor of computer science at Yale University, and an Amazon Scholar, describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.
Credit: Glynis Condon

Machine learning (ML) is increasingly important in a wide range of applications, including market forecasting, service personalization, voice and facial recognition, autonomous driving, health diagnostics, education, and security analytics. Because ML touches so many aspects of our lives, it’s of vital concern that ML systems protect the privacy of the data used to train them, the confidential queries submitted to them, and the confidential predictions they return.

Privacy protection — and the protection of organizations’ intellectual property — motivates the study of privacy-preserving machine learning (PPML). In essence, the goal of PPML is to perform machine learning in a manner that does not reveal any unnecessary information about training-data sets, queries, and predictions.

Suppose, for example, that schools supplied encrypted student records to educational researchers who used them to train ML models. Suppose further that students, parents, teachers, and other researchers could feed encrypted queries to the models and receive encrypted predictions in return. By taking advantage of PPML techniques in this manner, all of the participants could mine the knowledge contained in educational-record databases without compromising the privacy of the data subjects or the data users.

PPML is a very active area, with an eponymous annual workshop and many strong papers in general-ML and security venues. Techniques have been developed for privacy-preserving training and prediction on a wide range of ML model types, e.g., neural nets, decision trees, and logistic-regression formulae.

In the sections below, we describe PPML methods for training and prediction in extreme gradient boosting.

Training

Gradient boosting is an ML method for regression and classification problems that yields a set of prediction trees, typically classification and regression trees (CARTs), which together constitute a model. A CART is a generalization of a binary decision tree; while a binary tree produces a binary output, classifying each input query as a “yes” or “no,” a CART assigns each input query a (real) numerical score.

Interpretation of scores is application dependent. If v is a query, then each CART in the model assigns a score to v, and the final prediction of the model on input v is the sum of these scores. In some applications, the softmax function may be used instead of sum to produce a probability distribution over the predicted output classes.

Extreme gradient boosting (XGBoost) is an optimized, distributed, gradient-boosting framework that is efficient, portable, and flexible. In this section, we consider confidentiality of training data in the creation of XGBoost models for disease prediction — specifically, for prediction of multiple sclerosis (MS).

Early diagnosis and treatment of MS is crucial to prevent degenerative progression of the disease and patient disabilities. A recent paper proposes an early-diagnosis method that applies XGBoost to electronic health records and uses three types of features: diagnostic, epidemiologic, and laboratory.

How cryptographic computing can accelerate the adoption of cloud computing

In a previous Amazon Science article, Joan Feigenbaum reviewed secure multiparty computation and privacy-preserving machine learning – two cryptographic techniques employed to address cloud-computing privacy concerns and accelerate enterprise cloud adoption.

The presence of another neurological disease (e.g., acute disseminated encephalomyelitis (ADEM)) is an example of a diagnostic feature. Epidemiologic features include age, gender, and total number of visits to a hospital. Two more features that are discovered by lab tests are used in the model and referred to as laboratory features: hyperlipidemia (abnormally elevated levels of any or all lipids) and hyperglycemia (elevated blood sugar). The proposed XGBoost model significantly outperforms other ML techniques (including naïve Bayes methods, k-nearest neighbor, and support vector machines) that have been proposed for early diagnosis of MS.

Collecting a sufficient number of high-quality data samples and features to train such a diagnostic model is quite challenging, because the data reside in different private locations. The training data can be split in different ways among these locations: horizontally split, vertically split, or both.

If the private data sources contain samples with the same feature set (as would be the case if, say, the same features are extracted from health records residing in different hospitals), the dataset is said to be horizontally split. The other extreme — vertically split data — occurs when a private data source contributes a new feature for all of the training samples. For example, a health-insurance company could supply reimbursement receipts for past medication (the new feature) to complement the features in clinical health records. In these scenarios, aggregating the training data on a central server violates GDPR regulations.

The figure below illustrates one possible CART in the trained model. The weights at the leaves might indicate probabilities of MS resulting from the various paths from root to leaf.

Classification and regression trees (CART)

Research on privacy-preserving training of XGBoost models for prediction of MS uses two distinct techniques: secure multiparty computation (SMPC) and privacy-preserving federated learning (PPFL). We briefly describe both of them here.

An SMPC protocol enables several parties, each of whom holds a private input, to jointly evaluate a publicly known function on these inputs without revealing anything about the inputs except what is implied by the output of the function. Private inputs are secret shared among the parties, e.g., via additive secret sharing, in which each owner of a private input v generates random “shares” that add up to v.

For instance, suppose that Alice’s private input is v = 5. She can secret share it among herself, Bob, and Charlie by generating two random integers SBob =125621 and SCharlie = 56872, sending Bob’s share to him and Charlie’s to him, and keeping SAlice = v - SBob - SCharlie = -182488. Unless an adversary controls all three parties, he cannot learn anything about Alice’s private input v.  
  
In an execution of an SMPC protocol, the inputs to each elementary operation (addition or multiplication) are secret shared, and the output of the operation is a set of secret shares of the result. We say that a secret-shared value y (which may be the final output of the computation) is revealed to party P if all the parties send their shares to P, thus enabling P to reconstruct y. Further discussion of SMPC and its relevance to cloud computing can be found here and in Inpher’s Secret Computing Explainer Series.

A recent paper by researchers at Inpher proposes an SMPC protocol, called XORBoost, for privacy-preserving training of XGBoost models. It improves the state of the art by several orders of magnitude and ensures that

  • The CARTs computed by the protocol are secret shared among the training-data owners and revealed only to a designated party, namely the data analyst.
  • The training algorithm not only protects the input data but also reveals no information about the paths in the CARTs taken by any of the training samples. 
  • XORBoost supports both numerical and categorical features, thus providing enough flexibility and generality to support the above model.    

XORBoost works well for training datasets of reasonable size — hundreds of thousands of samples and hundreds of features. However, many real-world applications require training on more than a million samples. To achieve that type of scale, one can use federated learning (FL), which is an ML technique used to train a model on data samples held locally by multiple, decentralized edge devices without requiring the devices to exchange the samples.

FL differs from XORBoost mainly in that FL does not perform the entire training exercise on secret-shared values. Rather, each device trains a local model on its local data samples and sends its local model to one or more servers for aggregation. The aggregation protocol typically uses simple operations such as sum, average, and oblivious comparisons but no complex optimization.

If the server receives the plaintext local-model updates from all of the devices, it could, in principle, recover the local training-data samples using model-inversion attacks. SMPC and other privacy-preserving computational techniques can be applied to aggregate local models without revealing them to the server. See the diagram below for the overall architecture. 

XORBoost architecture

Prediction

PPXGBoost is a privacy-preserving version of XGBoost prediction. More precisely, it is a system that supports encrypted queries to encrypted XGBoost models. PPXGBoost is designed for applications that start by training a plaintext model Ω on a suitable training-data set and then create, for each user U, a personalized, encrypted version ΩU of the model to which U will submit encrypted queries and from which she will receive encrypted results. 

PPXGBoost system architecture

The PPXGBoost system architecture is shown in the figure above. On the client side, there is an app with which a user encrypts queries and decrypts results. On the server side, there is a module called Proxy that runs in a trusted environment and is responsible for setup (i.e., creating, for each authorized user, a personalized, encrypted model and a set of cryptographic keys) and an ML module that executes the encrypted queries. PPXGBoost uses two specialized types of encryption schemes (symmetric-key, order-preserving encryption and public-key, additive, homomorphic encryption) to encrypt models and evaluate encrypted queries. Each user is issued keys for both schemes during the setup phase.

Note that PPXGBoost is a natural choice for researchers, clinicians, and patients who wish to make disease predictions repeatedly as the patients’ circumstances change. Potentially relevant changes include exposure to new environmental factors, experimental treatment for another condition, or simply aging. An individual patient can create a personalized, encrypted version of a disease-prediction model and store it on a server owned by the medical center at which he is receiving treatment. Patient and physician can then use it to monitor, in a privacy-preserving manner, changes in the patient’s likelihood of contracting the disease.

Conclusion

We have described the use of PPML to address privacy challenges in XGBoost training and prediction. In a future post, we will elaborate on how privacy-preserving federated learning enables researchers to train more-complex ML models on millions of samples stored on hundreds of thousands of devices.


Work with us

See more jobs
US, CA, Santa Clara
Job summaryMachine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations.The Amazon ML Solutions Lab team helps AWS customers accelerate the use of machine learning to solve business and operational challenges and promote innovation in their organization. We are looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help develop solutions by pushing the envelope in Time Series, Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Machine Learning (ML) and Computer Vision (CV).Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.As a ML Solutions Lab Applied Scientist, you are proficient in designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data and develop novel models to solve real-world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. You will apply classical ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others.The primary responsibilities of this role are to:· Design, develop, and evaluate innovative ML/DL models to solve diverse challenges and opportunities across industries· Interact with customer directly to understand their business problems, and help them with defining and implementing scalable ML/DL solutions to solve them· Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new algorithmsThis position requires travel of up to 25%.
DE, BE, Berlin
Job summaryWould you like to join the team that protects the global AWS platform from fraud? Do you enjoy thinking like a fraudster and using your technical skills to help detect & mitigate AWS accounts from being compromised? If so, AWS Fraud Prevention has an exciting opportunity for you.AWS has the most services and more features within those services, than any other cloud provider–from infrastructure technologies like compute, storage, and databases–to emerging technologies, such as machine learning and artificial intelligence, data lakes and analytics, and Internet of Things. AWS Platform is the glue that holds the AWS ecosystem together. Whether its Identity features such as access management and sign on, cryptography, console, builder & developer tools, and even projects like automating all of our contractual billing systems, AWS Platform is always innovating with the customer in mind. The AWS Platform team sustains over 750 million transactions per second.The AWS Fraud Prevention Compromise vertical is responsible for detecting & mitigating AWS account compromise. You’ll be part of a team of Data Scientists, Investigations Analysts, and Technical & non-Technical Program Managers. The team’s goal is to identify and neutralize fraudsters from compromising AWS customers’ accounts.As a Data Scientist, you will work directly with Business Analysts and Software Development Engineers to monitor the flavor/ trend of compromise on AWS worldwide and design appropriate solutions to respond in a collaborative environment. There are no walls, and success is determined by your ability to dive deep, and understand the subtle demands new and complex services will place upon systems and teams.As a Data Scientist your responsibilities will include: - Apply state-of-the-art Machine Learning methods to large amounts of data from different sources to build and productionalize fraud prevention, detection and mitigation solutions· Deep dive on the problems using SQL and scripting languages like Python/R to drive short term and long term solutions leveraging Statistical Analysis· Analyze data (past customer behavior, sales inputs, and other sources) to figure out trends, create compromise prevention and mitigation solutions and output reports with clear recommendations· Collaborate closely with the development team to recommend and build innovations based on Data Science· Manage your own process: identify and execute on high impact projects, triage external requests, and make sure you bring projects to conclusion in time for the results to be useful· Providing on-call product support approximately once every 3 monthsOn Call Responsibility. This position involves on-call responsibilities to support emergent customer impacting events. Our on-call fraud escalation support occurs approximately once every 12 weeks.Learn and Be Curious. We have a formal mentor search application that lets you find a mentor that works best for you based on location, job family, job level etc. Your manager can also help you find a mentor or two, because two is better than one. In addition to formal mentors, we work and train together so that we are always learning from one another, and we celebrate and support the career progression of our team members.Inclusion and Diversity. Our team is diverse! We drive towards an inclusive culture and work environment. We are intentional about attracting, developing, and retaining amazing talent from diverse backgrounds. Team members are active in Amazon’s 10+ affinity groups, sometimes known as employee resource groups, which bring employees together across businesses and locations around the world. These range from groups such as the Black Employee Network, Latinos at Amazon, Indigenous at Amazon, Families at Amazon, Amazon Women and Engineering, LGBTQ+, Warriors at Amazon (Military), Amazon People With Disabilities, and more.Learn more about Amazon on our Day 1 Blog: https://blog.aboutamazon.com
US, WA, Seattle
Job summaryWe are a passionate team working to build a best-in-class healthcare product designed to make high-quality healthcare easy to access.We are looking for a truly innovative and technically strong research scientist with a background in operations research, machine learning, and statistical modeling/analysis.About You:• Problem Solver: Ability to utilize exceptional modeling and problem-solving skills to work through different challenges in ambiguous situations.• Doer: You’ve successfully delivered end-to-end operations research projects, working through conflicting viewpoints, model intractability, and data limitations.• Detail Oriented: You have an enviable level of attention to details, and catch things that others miss.• Communicator: Ability to communicate analytical results to senior leaders, and peers.• Influencer: Innovative scientist with the ability to identify opportunities and develop novel modeling approaches in a fast-paced and ever-changing environment, and gain support with data and storytelling.Key job responsibilitiesAs a Senior Research Scientist, you will:• apply your operations research skills to solve complex scheduling, resource allocation, task assignment, and call routing problems.• develop code in support of our in-house operations management product designed to optimally match capacity with various forms of customer demand.• work closely with stakeholders and translate data-driven findings into actionable insights.
DE, BW, Tuebingen
Job summaryIf you get excited by the prospect of solving hard problems using Computer Vision and Machine Learning, enjoy working in a fast-paced environment with thought leaders in the CV space, and are passionate about launching algorithms for maximum customer impact, then we have the perfect role for you!We are looking for an expert in computer vision with a focus on data-driven image synthesis using deep learning techniques such as GANs and VAEs to help Amazon transform pixels into personalized fashion images. Our team is developing cutting-edge technology to personalize and transform photos, partnering with many different teams across Amazon to apply a mix of workflows, image generation, computer vision, and machine learning in the fashion space.As an Applied Scientist, you will work in a team with other scientists and engineers working on products and prototypes in the field of image synthesis and photo-realistic appearance of people and clothing to create scalable solutions to customer problems. You will play a critical role in ideation for the team and run live experiments, with opportunities to publish your work. We are building the next generation of fashion imagery, and we hope you'll join us!This role is located in Tuebingen. Germany, but we are open to hiring also in Berlin.
CA, BC, Vancouver
Job summaryHow can we create a rich, data-driven shopping experience on Amazon? How do we build data models that helps us innovate different ways to enhance customer experience? How do we combine the world's greatest online shopping dataset with Amazon's computing power to create models that deeply understand our customers?Recommendations at Amazon is a way to help customers discover products. Our team's stated mission is to "grow each customer’s relationship with Amazon by leveraging our deep understanding of them to provide relevant and timely product, program, and content recommendations". We strive to better understand how customers shop on Amazon (and elsewhere) and build recommendations models to streamline customers' shopping experience by showing the right products at the right time. Understanding the complexities of customers' shopping needs and helping them explore the depth and breadth of Amazon's catalog is a challenge we take on every day.Using Amazon’s large-scale computing resources you will ask research questions about customer behavior, build models to generate recommendations, and run these models directly on the retail website. You will participate in the Amazon ML community and mentor Applied Scientists and software development engineers with a strong interest in and knowledge of ML. Your work will directly benefit customers and the retail business and you will measure the impact using scientific tools. We are looking for passionate, hard-working, and talented Applied scientist who have experience building mission critical, high volume applications that customers love. You will have an enormous opportunity to make a large impact on the design, architecture, and implementation of cutting edge products used every day, by people you know.
US, CA, Sunnyvale
Job summaryAmazon is looking for a creative Applied Scientist to tackle some of the most interesting problems on the leading edge of Machine Learning (ML), Natural Language Processing (NLP), and Information Retrieval (IR) with our Alexa Artificial Intelligence (AI) team. Alexa AI is part of our ongoing efforts focused on reinventing information extraction and retrieval for a voice-forward, multi-modal future.A successful candidate will develop novel ML/NLP/IR/Deep Learning technologies to make Alexa smarter. They will have a true passion for working in a collaborative, cross-functional environment that encourages thinking about optimized solutions to unique problems that do not have yet a known science solution.If you are looking for an opportunity to solve deep technical problems and build innovative solutions in a fast-paced environment working within a smart and passionate team, this might be the role for you. You will develop and implement novel algorithms and modeling techniques to leverage and advance the state-of-the-art in technology areas that are found at the intersection of ML, NLP, IR, and Deep Learning. Your work will directly impact Amazon products and services that make use of speech and language technology. You will gain hands on experience with Alexa and large-scale computing resources.In this role you will:· Work collaboratively with scientists and developers to design and implement automated, scalable NLP/ML/IR models for accessing and presenting information· Drive scalable solutions from the business, to prototyping, production testing and through engineering directly to production· Drive best practices on the team, deal with ambiguity and competing objectives, and mentor and guide junior members to achieve their career growth potential.
US, WA, Seattle
Job summaryAre you passionate about conducting research to drive real behavioral change and inform front line managers and leaders in making more effective decisions? Would you love to see your research in practice, impacting Amazonians globally and improving the employee experience? If so, you should consider joining our Research team on Amazon Connections!Amazon Connections is an innovative program that gives Amazonians a confidential and effective way to give feedback on the workplace to help shape the future of the company and improve the employee experience. By asking employees quick questions every day, Connections leverages real time information to learn more about their experiences and introduce positive changes with internal business partners around the world. We maximize the value of the employee voice.In this role, you will own the research development strategy to evaluate, diagnose, understand, and surface drivers and moderators for key research streams. These include (but are not limited to) attrition, engagement, productivity, diversity, and Amazon culture. You will deep dive and analyze what research should be conducted and to what end, develop hypotheses that can be tested, and support a larger research program to deliver deeper insights that we can surface to leaders on our platform (short term and long).You will use both quantitative and qualitative data as well as conduct research studies to test your hypotheses. You will use a variety of statistical approaches to model and understand behavior. You will develop algorithms and thresholds to surface personalized results to managers/leaders, and partner with machine learning scientists to build these statistical models into production that scales. You will work with an interdisciplinary team of psychologists, economists, ML scientists, UX researchers, engineers, and product managers to inform and build product features to surface deeper people and business insights for our leaders.Key job responsibilitiesWhat you'll do:• Execute a scalable global content development and research strategy to drive more effective decisions and improve the employee experience across all of Amazon• Conduct psychometrics analyses to evaluate integrity and practical application of content• Identify research streams to evaluate how to mitigate or remove sources of measurement error• Partner closely and drive effective collaborations across multi-disciplinary research and product teams• Manage full life cycle of large-scale research programs (Develop strategy, gather requirements, manage and execute)
US, MA, North Reading
Job summaryAre you inspired by invention? Is problem solving through teamwork in your DNA? Do you like the idea of seeing how your work impacts the bigger picture? Answer yes to these questions and you'll fit right in here at Amazon Robotics. We are a smart team of doers who work passionately to apply cutting edge advances in robotics and software to solve real-world challenges that will transform our customers’ experiences in ways we can't even image yet. We invent new improvements every day.Amazon Robotics, a wholly owned subsidiary of Amazon.com, empowers a smarter, faster, more consistent customer experience through automation. Amazon Robotics automates fulfilment center operations using various methods of robotic technology including autonomous mobile robots, sophisticated control software, language perception, power management, computer vision, depth sensing, machine learning, object recognition, and semantic understanding of commands. Amazon Robotics has a dedicated focus on research and development to continuously explore new opportunities to extend its product lines into new areas.This role is a 3-month internship to join AR full-time (40 hours/week) from May 2021 to August 2021. Amazon Robotics internship opportunities will be based in the Greater Boston Area, in our two state-of-the-art facilities in Westborough and North Reading, MA. Both campuses provide a unique opportunity for interns to have direct access to robotics testing labs and manufacturing facilities.Job OverviewAmazon Robotics is seeking a talented and motivated Engineering student to join the Advanced Robotics team for a summer internship. The candidate will have the opportunity work with senior engineering staff to conduct research, develop, and test software and hardware for next-generation robotic manipulation solutions used in Amazon.com fulfillment operations. Ideal candidates are enrolled in an undergraduate or graduate program related to software engineering or robotics, and have strong mechanical and or electrical aptitude, embedded programming, enjoys problem solving and can potentially handle multiple parallel tasks.The Advanced Robotics Intern will be responsible for:· Work as part of an interdisciplinary team to design and analyze mechanisms, modules or systems· Identifying creative solutions for challenging problems in robotics and computer vision· Developing software solutions to test hypotheses and demonstrate new functionality· Building models, prototyping concepts, conducting tests, collecting data to quantify performance· Creating milestones and deliverables and tracking status with team· Developing design documentation and leading reviews with other engineers or interns· Writing code and unit tests and integrating code with other software and hardware components· Utilizing Amazon Robotics and Amazon engineering tools, processes and technologies
US, WA, Seattle
Job summaryAre you excited about influencing the payment experience of millions of customers worldwide ? The moment a customer makes a payment on Amazon is when trust is established – trust that the item is delivered on time, a refund is provided quickly if needed, a digital movie purchased will play immediately, a seller receives their disbursement, and hundreds of other experiences across Amazon when a customer completes a payment. The Payment Acceptance & Experience (PAE) team, within the Consumer Payments organization, has the mission to build the most trusted, intuitive, and accessible payment experience on Earth. Applied Science & Machine Learning Engineering (PAE ASMLE) is the core machine learning team within PAE. The team has a mission to enhance customer payments experience that requires advancing the state of the art in machine learning. We work backwards from the customer to create value for them by leveraging an underlying applied science methodology. We deploy our solutions through Native AWS services that operate at Amazon scale. We strive to publish our solutions and share our findings so that the broader Amazon scientific community can benefit.As an applied scientist on our team, your role is to leverage your strong background in Computer Science and Machine Learning to help build the next generation of our model development and assessment pipeline, harness and explain rich data at Amazon scale, and provide automated insights to improve machine learned solution that impacts Payments experience of millions of customers every day. This role requires a pragmatic technical leader comfortable with ambiguity, capable of summarizing complex data and models through clear visual and written explanations. The ideal candidate will have experience with machine learning models and applying science to various business contexts. We are particularly interested in experience applying predictive modeling, natural language processing, deep learning, and reinforcement learning at scale. Additionally, we are seeking candidates with strong rigor in applied sciences and engineering, creativity, curiosity, and great judgment.Your responsibilities include:. Analyze the data and metrics resulting from traffic into Amazon Consumer Payments experiences.. Design, build, and deploy effective and innovative ML solutions to improve various components of the Consumer Payments experience, using predictive modeling, recommendations, anomaly detection, ranking, and forecasting.. Evaluate the proposed solutions via offline benchmark tests as well as online A/B tests in production.. Publish and present your work at internal and external scientific venues in the fields of ML/NLP/IR/Forecasting.Your benefits include:. Working on a high-impact, high-visibility product, with your work improving the experience of millions of customers.. The opportunity to use (and innovate) state-of-the-art ML methods to solve real-world problems.. Excellent opportunities, and ample support, for career growth, development, and mentorship.. Competitive compensation, including relocation support.The PAE ML team operates primarily out of Amazon's Seattle office. We are a new and expanding team where you will have an opportunity to influence our goals and mission. We collaborate with Software Engineering, Data Engineering, Product Management and Marketing teams within Amazon Consumer Payments to solve and deploy machine learning solutions at scale.Please visit https://www.amazon.science for more information
US, WA, Seattle
Job summaryAre you excited about influencing the payment experience of millions of customers worldwide ? The moment a customer makes a payment on Amazon is when trust is established – trust that the item is delivered on time, a refund is provided quickly if needed, a digital movie purchased will play immediately, a seller receives their disbursement, and hundreds of other experiences across Amazon when a customer completes a payment. The Payment Acceptance & Experience (PAE) team, within the Consumer Payments organization, has the mission to build the most trusted, intuitive, and accessible payment experience on Earth. Applied Science & Machine Learning Engineering (PAE ASMLE) is the core machine learning team within PAE. The team has a mission to enhance customer payments experience that requires advancing the state of the art in machine learning. We work backwards from the customer to create value for them by leveraging an underlying applied science methodology. We deploy our solutions through Native AWS services that operate at Amazon scale. We strive to publish our solutions and share our findings so that the broader Amazon scientific community can benefit.As an applied scientist on our team, your role is to leverage your strong background in Computer Science and Machine Learning to help build the next generation of our model development and assessment pipeline, harness and explain rich data at Amazon scale, and provide automated insights to improve machine learned solution that impacts Payments experience of millions of customers every day. This role requires a pragmatic technical leader comfortable with ambiguity, capable of summarizing complex data and models through clear visual and written explanations. The ideal candidate will have experience with machine learning models and applying science to various business contexts. We are particularly interested in experience applying predictive modeling, natural language processing, deep learning, and reinforcement learning at scale. Additionally, we are seeking candidates with strong rigor in applied sciences and engineering, creativity, curiosity, and great judgment.Your responsibilities include:. Analyze the data and metrics resulting from traffic into Amazon Consumer Payments experiences.. Design, build, and deploy effective and innovative ML solutions to improve various components of the Consumer Payments experience, using predictive modeling, recommendations, anomaly detection, ranking, and forecasting.. Evaluate the proposed solutions via offline benchmark tests as well as online A/B tests in production.. Publish and present your work at internal and external scientific venues in the fields of ML/NLP/IR/Forecasting.Your benefits include:. Working on a high-impact, high-visibility product, with your work improving the experience of millions of customers.. The opportunity to use (and innovate) state-of-the-art ML methods to solve real-world problems.. Excellent opportunities, and ample support, for career growth, development, and mentorship.. Competitive compensation, including relocation support.The PAE ML team operates primarily out of Amazon's Seattle office. We are a new and expanding team where you will have an opportunity to influence our goals and mission. We collaborate with Software Engineering, Data Engineering, Product Management and Marketing teams within Amazon Consumer Payments to solve and deploy machine learning solutions at scale.Please visit https://www.amazon.science for more information
US, WA, Bellevue
Job summaryAre you an exceptional science leader who is interested in building innovative products that optimize a global supply chain? Within Amazon's Supply Chain Optimization Technology (SCOT) team, the OSS team owns the systems that target to maximize supply availability for Amazon and/or reduce total sourcing costs, by deciding from which vendor and at what cost Amazon should target to source a product; what is the ideal supply chain setup for a product; costs negotiation decisions, future inventory commitments, and supply risk and vendor’s lead times predictions.The science leader is expected to identify and scope the supply risks that are with significant business impacts and could be mitigated or reduced with reasonable cost. A science leader is needed to own the design of the vendor collaboration and incentives and modeling of the supply signals in the appropriate formats.This individual will need to influence multiple teams to build the end-to-end system. The problem is multi-disciplinary which requires supply chain, mechanism design, and machine learning knowledge. The science leader is expected to scope and identify the appropriate technical talents to solve it.The ideal candidate will be a proven sciences leader who is a self-starter comfortable with ambiguity, demonstrates strong attention to detail, and thrives in a fast-paced environment. You will have excellent business, technical, analytical and strategic thinking skills. You are effectively able to work with product, business and technology leaders to define and prioritize key customer problems, build data acquisition and integration pipelines to create data sets, develop statistical and machine learning models and deliver analyses and insights that answer these problems. You will have strong quantitative modeling skills and expertise using data mining and statistical analyses at web-scale to coach and guide the team to produce actionable insights and recommendations. You will lead by example and are comfortable taking on projects and delivering results as an individual contributor.We are looking for a Sr RS that enjoys solving complex supply chain problems and demonstrates strategic thinking, leading the team to success. As a leader in SCOT, you will own and drive improvements in the Amazon's supply chain, continually raising the bar by delivering Supply Chain efficiencies. There are no textbook solutions to the problems we are solving and very few attempts have been made to solve at Amazon's scale, which necessitates an analytical thinking to solve problems.
US
Job summaryThe Market Intelligence team for Workforce Staffing applies science, data and insights to optimize hiring for Amazon’s largest candidate population – Tier 1 Associates. Amazon's hourly workforce brings the magic of Amazon’s industry-leading customer fulfillment to life. The pace at which job creation, hiring, and growth must happen to support the scale and complexity of Amazon businesses is a problem Amazon is uniquely qualified to solve and innovate on. Workforce Staffing literally hires by the hundreds of thousands across multiple business lines, job types, and shift configurations. The Market Intelligence team in particular focuses on applying labor market, competitor, and candidate preference intelligence to enhance job offerings, mitigate operational risk, and sustain Amazon position in the market. Come join a team that is continually shaping and writing the future of the hourly worker landscape.Amazon is seeking an industry-leading Economist as a senior advisor on its most pressing labor and staffing challenges to grow and innovate its singular customer fulfillment experience 10x into the future. In this role, you would build models, frameworks, and serve as a senior science advisor and consultant across a broad science portfolio comprising labor market intelligence, candidate research, employer branding, and marketing analytics. This individual will brief and influence C-suite and VP-level decisions on billion-dollar operational and strategic decisions. This role resides within a unique cross-functional science, engineering and product organization that is vertically integrated to deliver innovative intelligence and scenario analysis solutions to Amazon operations. Influence the roadmap and raise the bar on the science and innovations on Amazon's path to being earth's best and safest employer.
US, VA, Arlington
Job summaryOur group is developing advanced technologies that enhance the experience of shoppers in physical stores. Designed and custom-built by Amazon, existing products such as the Amazon Dash Cart and Amazon Go integrate a variety of advanced technologies including computer vision, sensor fusion, and advanced machine learning.Key job responsibilitiesAs an Applied Scientist, you will research, implement and deploy scientific techniques that span the domain of Computer Vision, Machine Learning and Sensor Fusion. You will tackle challenging situations every day and have the opportunity to work with multiple technical teams at Amazon. You should be comfortable with a degree of ambiguity that’s higher than most projects and relish the idea of solving problems.A day in the lifeOn a typical day, you will research on possible approaches in the literature for a given problem or implement Computer Vision/Machine Learning algorithms that demonstrates the feasibility of an approach, or implement the same in production. In addition, scientist also periodically apply for patents, give presentations on their research to the wider scientific community and expand their influence.About the teamThis is a fast growing computer vision, machine learning and research engineering team that continuously questions the status-quo of Customer and Merchant experience in Physical Stores, striving for disruptive innovation and defining the next generation of Physical Stores technology.
US, WA, Seattle
The Team: Amazon Go is a new kind of store with no lines and no checkout—you just grab and go! Customers simply use the Amazon Go app to enter the store, take what they want from our selection of fresh, delicious meals and grocery essentials, and go!Our checkout-free shopping experience is made possible by our Just Walk Out Technology, which automatically detects when products are taken from or returned to the shelves and keeps track of them in a virtual cart. When you’re done shopping, you can just leave the store. Shortly after, we’ll charge your Amazon account and send you a receipt. Check it out at amazon.com/go. Designed and custom-built by Amazonians, our Just Walk Out Technology uses a variety of technologies including computer vision, sensor fusion, and advanced machine learning. Innovation is part of our DNA! Our goal is to be Earths’ most customer centric company and we are just getting started. We need people who want to join an ambitious program that continues to push the state of the art in computer vision, machine learning, distributed systems and hardware design.The Role: Everyone on the team needs to be entrepreneurial, wear many hats and work in a highly collaborative environment that’s more startup than big company. We’ll need to tackle problems that span a variety of domains: computer vision, image recognition, machine learning, real-time and distributed systems.As a Computer Vision Research Scientist, you will help solve a variety of technical challenges and mentor other engineers. You will tackle challenging, novel situations every day and given the size of this initiative, you’ll have the opportunity to work with multiple technical teams at Amazon in different locations. You should be comfortable with a degree of ambiguity that’s higher than most projects and relish the idea of solving problems that, frankly, haven’t been solved at scale before - anywhere. Along the way, we guarantee that you’ll learn a ton, have fun and make a positive impact on millions of people.
US, WA, Seattle
Job summaryAre you excited to help customers discover the hottest and best reviewed products?Through the enablement of intelligent campaigns that leverage machine-learning models, you will help to deliver the best possible shopping experience for Amazon’s customers all over the globe.We are looking for experienced scientist who will work with business leaders, scientists, and engineers to translate business and functional requirements into concrete deliverables. Your domain spans the design, development, testing, and deployment of data driven and highly scalable solutions using data processing and machine learning in product recommendation. You will partner with scientists, product managers, and engineers to help invent and implement scalable Data processing and ML models while inventing tools on our customers behalf.A day in the lifeThis is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large-scale problems, and work closely with scientists and engineers. We are particularly interested in candidates with experience building large scale machine learning solutions and working with distributed systems to 1) help us build robust ensemble of ML systems that can drive classification and recommendation of products with a high precision and recall utilizing various signals and scale to new marketplaces and languages and 2) design optimal or near optimal supervised and unsupervised machine learning models and solutions for moderately complex projects in business, science, or engineering.About the hiring groupThe Discovery Tech team helps customers discover and engage with new, popular and relevant products across Amazon worldwide. We do this by combining technology, science, and innovation to build new customer-facing features and experiences alongside cutting edge tools for marketers. You will be responsible for creating and building critical services that automatically generate, target, and optimize Amazon’s cross-category marketing and merchandising.Job responsibilitiesAs a Senior Applied Scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions.
US, CA, Santa Clara
Job summaryAWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on building automated ML solutions for planetary-scale sustainability and geospatial applications. Our team's mission is to develop ready-to-use and automated solutions that solve important sustainability and geospatial problems. We live in a time wherein geospatial data, such as climate, agricultural crop yield, weather, landcover, etc., has become ubiquitous. Cloud computing has made it easy to gather and process the data that describes the earth system and are generated by satellites, mobile devices, and IoT devices. Our vision is to bring the best ML/AI algorithms to solve practical environmental and sustainability-related R&D problems at scale. Building these solutions require a solid foundation in machine learning infrastructure and deep learning technologies. The team specializes in developing popular open source software libraries like AutoGluon, GluonCV, GluonNLP, DGL, Apache/MXNet (incubating). Our strategy is to bring the best of ML based automation to the geospatial and sustainability area.We are seeking an experienced Applied Scientist for the team. This is a role that combines science knowledge (around machine learning, computer vision, earth science), technical strength, and product focus. It will be your job to develop ML system and solutions and work closely with the engineering team to ship them to our customers. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. You are also expected to work closely with other applied scientists and demonstrate Amazon Leadership Principles (https://www.amazon.jobs/en/principles).Strong technical skills and experience with machine learning and computer vision are required. Experience working with earth science, mapping, and geospatial data is a plus. Our customers are extremely technical and the solutions we build for them are strongly coupled to technical feasibility.About the teamInclusive Team CultureAt AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded scientist and enable them to take on more complex tasks in the future.Interested in this role? Reach out to the recruiting team with questions or apply directly via amazon.jobs.
US, CA, San Francisco
About Us:Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate and grow their personal interests and passions. We're always live at Twitch.About the Role:Data scientists play a central role in Twitch's data-driven decision-making process. As a DS at Twitch, you will shape the way product performance is measured, define critical questions that guide product strategies, and scale analytics methods and tools to support our growing business.More specifically, Twitch's Community Foundations Analytics team is looking for an experienced DS to help develop new programs aimed at improving viewer experience and unlocking new opportunities for creators. As part of the team, you will help product managers, engineering leads, designers, and executives, make better product decisions faster. You'll recommend and implement metrics to measure user behavior, run experiments to test product hypotheses, and produce insights for your partners. You will report to the CF Analytics Lead.You Will:· Become a domain expert at the product surfaces you support, building trust with your product partners.· Ensure your product, engineering, and design partners understand and use the insights you produce.· Translate product and strategy questions into metrics, and collaborate with data engineers to dashboard these metrics.· Distill ambiguous product or business questions, find clever ways to answer them, and to quantify the uncertainty.· Mentor junior team members and drive analytics and experimentation best practices throughout the company.
US, CA, Manhattan Beach
Job summaryAmazon is looking for a creative Applied Scientist to tackle some of the most interesting problems on the leading edge of Machine Learning (ML), Natural Language Processing (NLP), and Information Retrieval (IR) with our Alexa Artificial Intelligence (AI) team. Alexa AI is part of our ongoing efforts focused on reinventing information extraction and retrieval for a voice-forward, multi-modal future.A successful candidate will develop novel ML/NLP/IR/Deep Learning technologies to make Alexa smarter. They will have a true passion for working in a collaborative, cross-functional environment that encourages thinking about optimized solutions to unique problems that do not have yet a known science solution.If you are looking for an opportunity to solve deep technical problems and build innovative solutions in a fast-paced environment working within a smart and passionate team, this might be the role for you. You will develop and implement novel algorithms and modeling techniques to leverage and advance the state-of-the-art in technology areas that are found at the intersection of ML, NLP, IR, and Deep Learning. Your work will directly impact Amazon products and services that make use of speech and language technology. You will gain hands on experience with Alexa and large-scale computing resources.In this role you will:· Work collaboratively with scientists and developers to design and implement automated, scalable NLP/ML/IR models for accessing and presenting information· Drive scalable solutions from the business, to prototyping, production testing and through engineering directly to production· Drive best practices on the team, deal with ambiguity and competing objectives, and mentor and guide junior members to achieve their career growth potential.About the teamOur team tries to have a healthy balance between work and play. We celebrate our successes and milestones and we are not afraid to take risks, even if it causes unintentional mistakes along the way. We believe in learning from our mistakes and moving forward.
US, WA, Seattle
Job summaryAt Alexa Shopping, we strive to enable shopping in everyday life. We allow customers to instantly order whatever they need, by simply interacting with their Smart Devices such as Amazon Show, Spot, Echo, Dot or Tap. Our Services allow you to shop, no matter where you are or what you are doing, you can go from 'I want that' to 'that's on the way' in a matter of seconds. We are seeking the industry's best to help us create new ways to interact, search and shop. Join us, and you'll be taking part in changing the future of everyday lifeWe are seeking a Data Scientist to be part of the ASR science team for Alexa Shopping. This is a strategic role to shape and deliver our technical strategy in developing and deploying ASR, Machine Learning solutions to our hardest customer facing problems. Our goal is to delight customers by providing a conversational interaction. These initiatives are at the heart of the organization and recognized as the innovations that will allow us to build a differentiated product that exceeds customer expectations. If this role seems like a good fit, please reach out, we'd love to talk to you.This role requires working closely with business, engineering and other scientists within Alexa Shopping and across Amazon to deliver ground breaking features. You will lead high visibility and high impact programs collaborating with various teams across Amazon. You will work with a team of Scientists and SDEs to launch new customer facing features and improve the current features.
US, WA, Seattle
Job summaryAre you interested in big data, machine learning, and product recommendations? If so, the Product Semantics team in Amazon Product Graph might be the right place for you. We are a team in a fast-paced organization with a huge impact on hundreds of millions of customers. We innovate at the intersection of customer experience, deep learning, and high-scale machine-learning systems.As the world’s most customer-centric company, Amazon heavily invests in inventing and applying state-of-art technologies to build world-class product recommendation systems to improve shopper experience. We break fresh ground to create world-class customer-facing features to help customers discover high quality products that meet their needs, and provide most relevant product information to help customers make confident shopping decisions. We are a highly motivated, collaborative, and fun-loving team with a strong entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we have a very wide range of new opportunities to explore.The Product Semantics team in Amazon Personalization, based in Seattle and New York City, is looking for scientists who love big data, are passionate about understanding products and product relationships from product profiles, reviews, and search log, and who are capable of inventing and applying Machine Learning, NLP, and Computer Vision techniques that will leave no valuable data behind. Our applied scientists work closely with software engineers to put algorithms into practice. They also work in partnership with teams across Amazon to create enormous benefits for our customers.If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you.Key job responsibilities· Use machine learning and analytical techniques to create scalable solutions for business problems· Analyze and extract relevant information from large amounts of Amazon's historical business data to help automate and optimize key processes· Design, development and evaluation of highly innovative models for predictive learning· Work closely with software engineering teams to drive model implementations and new feature creations· Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation· Research and implement novel machine learning and statistical approachesAbout the teamOur mission is to delight every Amazon customer with a personalized shopping experience. We achieve our mission through investments in UX, Science, and Systems with the purpose of delivering the future of shopping on Amazon. We are seeking an Applied Scientist to work on step function science improvements across the recommendations space.