Privacy challenges in extreme gradient boosting

Scientists describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.

(Editor’s note: This is the fourth in a series of articles Amazon Science is publishing related to the science behind products and services from companies in which the Amazon Alexa Fund has invested. The Alexa Fund completed a strategic investment in Inpher, Inc., earlier this year; the New York and Swiss-based company develops privacy-preserving machine learning and analytics solutions that help organizations unlock the value of sensitive, siloed data to enable secure collaboration across organizations. This article is co-authored by Dimitar Jetchev, the cofounder and chief technology officer of Inpher, and Joan Feigenbaum, an Amazon Scholar and the Grace Murray Hopper professor of computer science at Yale University.)

Joan Feigenbaum and Dimitar Jetchev
Dimitar Jetchev (left), the cofounder and chief technology officer of Inpher, and Joan Feigenbaum, the Grace Murray Hopper professor of computer science at Yale University, and an Amazon Scholar, describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.
Credit: Glynis Condon

Machine learning (ML) is increasingly important in a wide range of applications, including market forecasting, service personalization, voice and facial recognition, autonomous driving, health diagnostics, education, and security analytics. Because ML touches so many aspects of our lives, it’s of vital concern that ML systems protect the privacy of the data used to train them, the confidential queries submitted to them, and the confidential predictions they return.

Privacy protection — and the protection of organizations’ intellectual property — motivates the study of privacy-preserving machine learning (PPML). In essence, the goal of PPML is to perform machine learning in a manner that does not reveal any unnecessary information about training-data sets, queries, and predictions.

Suppose, for example, that schools supplied encrypted student records to educational researchers who used them to train ML models. Suppose further that students, parents, teachers, and other researchers could feed encrypted queries to the models and receive encrypted predictions in return. By taking advantage of PPML techniques in this manner, all of the participants could mine the knowledge contained in educational-record databases without compromising the privacy of the data subjects or the data users.

PPML is a very active area, with an eponymous annual workshop and many strong papers in general-ML and security venues. Techniques have been developed for privacy-preserving training and prediction on a wide range of ML model types, e.g., neural nets, decision trees, and logistic-regression formulae.

In the sections below, we describe PPML methods for training and prediction in extreme gradient boosting.

Training

Gradient boosting is an ML method for regression and classification problems that yields a set of prediction trees, typically classification and regression trees (CARTs), which together constitute a model. A CART is a generalization of a binary decision tree; while a binary tree produces a binary output, classifying each input query as a “yes” or “no,” a CART assigns each input query a (real) numerical score.

Interpretation of scores is application dependent. If v is a query, then each CART in the model assigns a score to v, and the final prediction of the model on input v is the sum of these scores. In some applications, the softmax function may be used instead of sum to produce a probability distribution over the predicted output classes.

Extreme gradient boosting (XGBoost) is an optimized, distributed, gradient-boosting framework that is efficient, portable, and flexible. In this section, we consider confidentiality of training data in the creation of XGBoost models for disease prediction — specifically, for prediction of multiple sclerosis (MS).

Early diagnosis and treatment of MS is crucial to prevent degenerative progression of the disease and patient disabilities. A recent paper proposes an early-diagnosis method that applies XGBoost to electronic health records and uses three types of features: diagnostic, epidemiologic, and laboratory.

How cryptographic computing can accelerate the adoption of cloud computing

In a previous Amazon Science article, Joan Feigenbaum reviewed secure multiparty computation and privacy-preserving machine learning – two cryptographic techniques employed to address cloud-computing privacy concerns and accelerate enterprise cloud adoption.

The presence of another neurological disease (e.g., acute disseminated encephalomyelitis (ADEM)) is an example of a diagnostic feature. Epidemiologic features include age, gender, and total number of visits to a hospital. Two more features that are discovered by lab tests are used in the model and referred to as laboratory features: hyperlipidemia (abnormally elevated levels of any or all lipids) and hyperglycemia (elevated blood sugar). The proposed XGBoost model significantly outperforms other ML techniques (including naïve Bayes methods, k-nearest neighbor, and support vector machines) that have been proposed for early diagnosis of MS.

Collecting a sufficient number of high-quality data samples and features to train such a diagnostic model is quite challenging, because the data reside in different private locations. The training data can be split in different ways among these locations: horizontally split, vertically split, or both.

If the private data sources contain samples with the same feature set (as would be the case if, say, the same features are extracted from health records residing in different hospitals), the dataset is said to be horizontally split. The other extreme — vertically split data — occurs when a private data source contributes a new feature for all of the training samples. For example, a health-insurance company could supply reimbursement receipts for past medication (the new feature) to complement the features in clinical health records. In these scenarios, aggregating the training data on a central server violates GDPR regulations.

The figure below illustrates one possible CART in the trained model. The weights at the leaves might indicate probabilities of MS resulting from the various paths from root to leaf.

Classification and regression trees (CART)

Research on privacy-preserving training of XGBoost models for prediction of MS uses two distinct techniques: secure multiparty computation (SMPC) and privacy-preserving federated learning (PPFL). We briefly describe both of them here.

An SMPC protocol enables several parties, each of whom holds a private input, to jointly evaluate a publicly known function on these inputs without revealing anything about the inputs except what is implied by the output of the function. Private inputs are secret shared among the parties, e.g., via additive secret sharing, in which each owner of a private input v generates random “shares” that add up to v.

For instance, suppose that Alice’s private input is v = 5. She can secret share it among herself, Bob, and Charlie by generating two random integers SBob =125621 and SCharlie = 56872, sending Bob’s share to him and Charlie’s to him, and keeping SAlice = v - SBob - SCharlie = -182488. Unless an adversary controls all three parties, he cannot learn anything about Alice’s private input v.  
  
In an execution of an SMPC protocol, the inputs to each elementary operation (addition or multiplication) are secret shared, and the output of the operation is a set of secret shares of the result. We say that a secret-shared value y (which may be the final output of the computation) is revealed to party P if all the parties send their shares to P, thus enabling P to reconstruct y. Further discussion of SMPC and its relevance to cloud computing can be found here and in Inpher’s Secret Computing Explainer Series.

A recent paper by researchers at Inpher proposes an SMPC protocol, called XORBoost, for privacy-preserving training of XGBoost models. It improves the state of the art by several orders of magnitude and ensures that

  • The CARTs computed by the protocol are secret shared among the training-data owners and revealed only to a designated party, namely the data analyst.
  • The training algorithm not only protects the input data but also reveals no information about the paths in the CARTs taken by any of the training samples. 
  • XORBoost supports both numerical and categorical features, thus providing enough flexibility and generality to support the above model.    

XORBoost works well for training datasets of reasonable size — hundreds of thousands of samples and hundreds of features. However, many real-world applications require training on more than a million samples. To achieve that type of scale, one can use federated learning (FL), which is an ML technique used to train a model on data samples held locally by multiple, decentralized edge devices without requiring the devices to exchange the samples.

FL differs from XORBoost mainly in that FL does not perform the entire training exercise on secret-shared values. Rather, each device trains a local model on its local data samples and sends its local model to one or more servers for aggregation. The aggregation protocol typically uses simple operations such as sum, average, and oblivious comparisons but no complex optimization.

If the server receives the plaintext local-model updates from all of the devices, it could, in principle, recover the local training-data samples using model-inversion attacks. SMPC and other privacy-preserving computational techniques can be applied to aggregate local models without revealing them to the server. See the diagram below for the overall architecture. 

XORBoost architecture

Prediction

PPXGBoost is a privacy-preserving version of XGBoost prediction. More precisely, it is a system that supports encrypted queries to encrypted XGBoost models. PPXGBoost is designed for applications that start by training a plaintext model Ω on a suitable training-data set and then create, for each user U, a personalized, encrypted version ΩU of the model to which U will submit encrypted queries and from which she will receive encrypted results. 

PPXGBoost system architecture

The PPXGBoost system architecture is shown in the figure above. On the client side, there is an app with which a user encrypts queries and decrypts results. On the server side, there is a module called Proxy that runs in a trusted environment and is responsible for setup (i.e., creating, for each authorized user, a personalized, encrypted model and a set of cryptographic keys) and an ML module that executes the encrypted queries. PPXGBoost uses two specialized types of encryption schemes (symmetric-key, order-preserving encryption and public-key, additive, homomorphic encryption) to encrypt models and evaluate encrypted queries. Each user is issued keys for both schemes during the setup phase.

Note that PPXGBoost is a natural choice for researchers, clinicians, and patients who wish to make disease predictions repeatedly as the patients’ circumstances change. Potentially relevant changes include exposure to new environmental factors, experimental treatment for another condition, or simply aging. An individual patient can create a personalized, encrypted version of a disease-prediction model and store it on a server owned by the medical center at which he is receiving treatment. Patient and physician can then use it to monitor, in a privacy-preserving manner, changes in the patient’s likelihood of contracting the disease.

Conclusion

We have described the use of PPML to address privacy challenges in XGBoost training and prediction. In a future post, we will elaborate on how privacy-preserving federated learning enables researchers to train more-complex ML models on millions of samples stored on hundreds of thousands of devices.

Related content

GB, London
Amazon Advertising is looking for a Data Scientist to join its brand new initiative that powers Amazon’s contextual advertising products. Advertising at Amazon is a fast-growing multi-billion dollar business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies. The Supply Quality organization has the charter to solve optimization problems for ad-programs in Amazon and ensure high-quality ad-impressions. We develop advanced algorithms and infrastructure systems to optimize performance for our advertisers and publishers. We are focused on solving a wide variety of problems in computational advertising like traffic quality prediction (robot and fraud detection), Security forensics and research, Viewability prediction, Brand Safety, Contextual data processing and classification. Our team includes experts in the areas of distributed computing, machine learning, statistics, optimization, text mining, information theory and big data systems. We are looking for a dynamic, innovative and accomplished Data Scientist to work on data science initiatives for contextual data processing and classification that power our contextual advertising solutions. Are you an experienced user of sophisticated analytical techniques that can be applied to answer business questions and chart a sustainable vision? Are you exited by the prospect of communicating insights and recommendations to audiences of varying levels of technical sophistication? Above all, are you an innovator at heart and have a track record of resolving ambiguity to deliver result? As a data scientist, you help our data science team build cutting edge models and measurement solutions to power our contextual classification technology. As this is a new initiative, you will get an opportunity to act as a thought leader, work backwards from the customer needs, dive deep into data to understand the issues, define metrics, conceptualize and build algorithms and collaborate with multiple cross-functional teams. Key job responsibilities * Define a long-term science vision for contextual-classification tech, driven fundamentally from the needs of our advertisers and publishers, translating that direction into specific plans for the science team. Interpret complex and interrelated data points and anecdotes to build and communicate this vision. * Collaborate with software engineering teams to Identify and implement elegant statistical and machine learning solutions * Oversee the design, development, and implementation of production level code that handles billions of ad requests. Own the full development cycle: idea, design, prototype, impact assessment, A/B testing (including interpretation of results) and production deployment. * Promote the culture of experimentation and applied science at Amazon. * Demonstrated ability to meet deadlines while managing multiple projects. * Excellent communication and presentation skills working with multiple peer groups and different levels of management * Influence and continuously improve a sustainable team culture that exemplifies Amazon’s leadership principles. We are open to hiring candidates to work out of one of the following locations: London, GBR
JP, 13, Tokyo
We are seeking a Principal Economist to be the science leader in Amazon's customer growth and engagement. The wide remit covers Prime, delivery experiences, loyalty program (Amazon Points), and marketing. We look forward to partnering with you to advance our innovation on customers’ behalf. Amazon has a trailblazing track record of working with Ph.D. economists in the tech industry and offers a unique environment for economists to thrive. As an economist at Amazon, you will apply the frontier of econometric and economic methods to Amazon’s terabytes of data and intriguing customer problems. Your expertise in building reduced-form or structural causal inference models is exemplary in Amazon. Your strategic thinking in designing mechanisms and products influences how Amazon evolves. In this role, you will build ground-breaking, state-of-the-art econometric models to guide multi-billion-dollar investment decisions around the global Amazon marketplaces. You will own, execute, and expand a research roadmap that connects science, business, and engineering and contributes to Amazon's long term success. As one of the first economists outside North America/EU, you will make an outsized impact to our international marketplaces and pioneer in expanding Amazon’s economist community in Asia. The ideal candidate will be an experienced economist in empirical industrial organization, labour economics, or related structural/reduced-form causal inference fields. You are a self-starter who enjoys ambiguity in a fast-paced and ever-changing environment. You think big on the next game-changing opportunity but also dive deep into every detail that matters. You insist on the highest standards and are consistent in delivering results. Key job responsibilities - Work with Product, Finance, Data Science, and Data Engineering teams across the globe to deliver data-driven insights and products for regional and world-wide launches. - Innovate on how Amazon can leverage data analytics to better serve our customers through selection and pricing. - Contribute to building a strong data science community in Amazon Asia. We are open to hiring candidates to work out of one of the following locations: Tokyo, 13, JPN
DE, BE, Berlin
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Berlin, BE, DEU | Berlin, DEU
DE, BY, Munich
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Munich, BE, DEU | Munich, BY, DEU | Munich, DEU
IT, MI, Milan
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Milan, MI, ITA
ES, M, Madrid
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Madrid, ESP | Madrid, M, ESP
US, TX, Austin
The role is available Arlington, Virginia (may consider New York, NY, Los Angeles, CA, or Toronto, Canada). Calling all inventors to work on exciting new opportunities in Sponsored Products. Amazon is building a world class advertising business and defining and delivering a collection of self-service performance advertising products that drive discovery and sales of merchandise. Our products are strategically important to our Retail and Marketplace businesses, driving long-term growth. Sponsored Products (SP) helps merchants, retail vendors, and brand owners grows incremental sales of their products sold on Amazon through native advertising. SP achieves this by using a combination of machine learning, big data analytics, ultra-low latency high-volume engineering systems, and quantitative product focus. We are a highly motivated, collaborative and fun-loving group with an entrepreneurial spirit and bias for action. You will join a newly-founded team with a broad mandate to experiment and innovate, which gives us the flexibility to explore and apply scientific techniques to novel product problems. You will have the satisfaction of seeing your work improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact. More importantly, you will have the opportunity to broaden your technical skills, work with Generative AI, and be a science leader in an environment that thrives on creativity, experimentation, and product innovation. We are open to hiring candidates to work out of one of the following locations: Austin, TX, USA
US, CA, San Diego
The Private Brands team is looking for an Applied Scientist to join the team in building science solutions at scale. Our team applies Optimization, Machine Learning, Statistics, Causal Inference, and Econometrics/Economics to derive actionable insights. We are an interdisciplinary team of Scientists, Engineers, and Economists and primary focus on building optimization and machine learning solutions in supply chain domain with specific focus on Amazon private brand products. Key job responsibilities You will work with business leaders, scientists, and economists to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of highly scalable optimization solutions and ML models. This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large-scale problems, enable measurable actions on the consumer economy, and work closely with scientists and economists. As a scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. We are particularly interested in candidates with experience in predictive and machine learning models and working with distributed systems. Academic and/or practical background in Machine Learning are particularly relevant for this position. Familiarity and experience in applying Operations Research techniques to supply chain problems is a plus. To know more about Amazon science, Please visit https://www.amazon.science We are open to hiring candidates to work out of one of the following locations: San Diego, CA, USA | Seattle, WA, USA
LU, Luxembourg
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Luxembourg, LUX
GB, London
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis Basic Qualifications -Masters in Computer Science, Machine Learning, Robotics or equivalent with a focus on Computer Vision. -2+ years of experience of building machine learning models for business application -Broad knowledge of fundamentals and state of the art in computer vision and machine learning -Strong coding skills in two or more programming languages such as Python or C/C++ -Knowledge of fundamentals in optimization, supervised and reinforcement learning -Excellent problem-solving ability Preferred Qualifications -PhD and 4+ years of industry or academic applied research experience applying Computer Vision techniques and developing Computer vision algorithms -Depth and breadth in state-of-the-art computer vision and machine learning technologies and experience designing and building computer vision solutions -Industry experience in sensor systems and the development of production computer vision and machine learning applications built to use them -Experience developing software interfacing to AWS services -Excellent written and verbal communication skills with the ability to present complex technical information in a clear and concise manner to a variety of audiences -Ability to work on a diverse team or with a diverse range of coworkers -Experience in publishing at major Computer Vision, ML or Robotics conferences or Journals (CVPR, ICCV, ECCV, NeurIPS, ICML, IJCV, ICRA, IROS, RSS,...) We are open to hiring candidates to work out of one of the following locations: London, GBR