How to reduce annotation when evaluating AI systems

By exploiting consistencies across components of ensemble classifiers, a new approach reduces data requirements by up to 89%.

Commercial machine learning systems are trained on examples meant to represent the real world. But the world is constantly changing, and deployed machine learning systems need to be regularly reevaluated, to ensure that their performance hasn’t declined.

Evaluating a deployed AI system means manually annotating data the system has classified, to determine whether those classifications are accurate. But annotation is labor intensive, so it is desirable to minimize the number of samples required to assess the system’s performance.

Many commercial machine learning systems are in fact ensembles of binary classifiers; each classifier “votes” on whether an input belongs to a particular class, and the votes are pooled to produce a final decision.

In a paper we’re presenting at the European Conference on Machine Learning, we show how to reduce the number of random samples required to evaluate ensembles of binary classifiers by exploiting overlaps between the sample sets used to evaluate the individual components.

For example, imagine that an ensemble that has three classifiers, and we need 10 samples each to evaluate the performance of the three classifiers. Evaluating the ensemble requires 40 samples — 10 each for the individual classifiers and 10 for the full ensemble. If 10 of the 40 samples were duplicates, we could make do with 30 annotations. Our paper builds on this intuition.

In an experiment using real data, our approach reduced the number of samples required to evaluate an ensemble by more than 89%, while preserving the accuracy of the evaluation.

We also ran experiments using simulated data that varied the degree of overlap between the sample sets for the individual classifiers. In those experiments, the savings averaged 33%.

Finally, in the paper, we show that our sampling procedure doesn’t introduce any biases into the resulting sample sets, relative to random sampling.

Common ground

Intuitively, randomly chosen samples for the separate components of an ensemble would inevitably include some duplicates. Most of the samples useful for evaluating one model should thus be useful for evaluating the others. The goal is to add in just enough additional samples to be able to evaluate all the models.

We begin by choosing a sample set for the entire ensemble, which we dub the “parent”; the individual models of the ensemble are, by reference, “children”. After finding a set of samples sufficient for evaluating the parent, we expand it to include the first child, then repeat the procedure until the set of samples covers all the children.

Our general approach works with any criterion for evaluating an ensemble’s performance, but in the paper, we use precision — or the percentage of true positives that the classifier correctly identifies — as a running example.

Set intersections.png
In this figure, the set of inputs classified as positive by the parent (right circle, AP) intersects the set of inputs classified as positive by the child (left circle, AC). The intersection (orange-shaded region) between a random sample of AP (orange curve, SP) and AC represents S+, the samples from the parent’s positive set that were also classified as positive by the child. The green-shaded region represents S-, samples from the set of inputs that were classified as positive by the child but not the parent. The sprinkled x’s represent Sremain, additional samples of the inputs classified as positive by the child, required to provide enough samples to get a highly accurate estimate of precision.

We begin with the total set of inputs that the parent has judged to belong to the target class and the total set of inputs that the child has. There’s usually considerable overlap between the two sets; for example, in a majority-vote ensemble composed of three classifiers, the ensemble (parent) classifies an input as positive as long as two of the components (children) do.

From the parent set, we select enough random samples to evaluate the parent. Then we find the intersection between that sample set and the child’s total set of positive classifications (S+ in the figure above). This becomes our baseline sample set for the child.

Next, we draw a random sampling of inputs that the child classified as positive but the parent did not (S-, above). The ratio between the size of this sample and the size of the baseline sample set should be the same as the ratio between the number of inputs that the child — but not the parent — labeled positive and the number of inputs that both labeled positive.

When we add these samples to the baseline sample set, we get a combined sample set that may not be large enough to accurately estimate precision. If needed, we select more samples from the inputs classified as positive by the child. These samples may also have been classified as positive by the parent (Sremain in the figure above).

Recall that we first selected samples from the set where the child and parent agreed, then from the set where the child and parent disagreed. That means that the sample set we have constructed is not truly random, so the next step is to mix together the samples in the combined set.

Reshuffle or resample?

We experimented with two different ways of performing this mixing. In one, we simply reshuffle all the samples in the combined set. In the other, we randomly draw samples from the combined set and add them to a new mixed set, until the mixed set is the same size as the combined set. In both approaches, the end result is that when we pick any element from the sample, we won’t know whether it came from the set where the parent and child agreed or the one where they disagreed.

Savings:overlap.png
A visualization of the average savings in samples provided by our approach as we varied the amount of overlap between the parent’s and child’s judgments.

In our experiments, we identified a slight trade-off between the results of our algorithm when we used reshuffling to produce the mixed sample set and when we used resampling. Because resampling introduces some redundancies into the mixed set, it requires fewer samples than reshuffling, which increases the savings in sample size versus random sampling.

At the same time, however, it slightly lowers the accuracy of the precision estimate. With reshuffling, our algorithm, on average, slightly outperformed random sampling on our three test data sets, while with resampling, it was slightly less accurate than random sampling.

Overall, the sampling procedure we have developed reduces the sample size. Of course, the amount of savings depends on the overlap between the parent’s and child’s judgments. The greater the overlap, the greater the savings in samples.

Research areas
About the Author
Srinivasan Jagannathan is a senior manager of software development at Amazon.

Related content

Amazon Science Newsletter Project Kuiper.jpg
Get more from Amazon Science
Sign up for our monthly newsletter

Work with us

See More Jobs
MX, DIF, Mexico City
At Amazon Web Services (AWS), we’re hiring highly technical Data and Machine Learning engineers to collaborate with our customers and partners on key engagements. Our consultants will develop and deliver proof-of-concept projects, technical workshops, and support implementation projects. These professional services engagements will focus on customer solutions such as Machine Learning, Data and Analytics, HPC and more.In this role, you will work with our partners, customers and focus on our AWS offerings such Amazon Kinesis, AWS Glue, Amazon Redshift, Amazon EMR, Amazon Athena, Amazon SageMaker and more. You will help our customers and partners to remove the constraints that prevent them from leveraging their data to develop business insights.AWS Professional Services engage in a wide variety of projects for customers and partners, providing collective experience from across the AWS customer base and are obsessed about customer success. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based upon customer needs.You will also have the opportunity to create white papers, writing blogs, build demos and other reusable collateral that can be used by our customers. Most importantly, you will work closely with our Solution Architects, Data Scientists and Service Engineering teams.The ideal candidate will have extensive experience with design, development and operations that leverages deep knowledge in the use of services like Amazon Kinesis, Apache Kafka, Apache Spark, Amazon SageMaker, Amazon EMR, NoSQL technologies and other 3rd parties.This is a customer facing role. You will be required to travel to client locations and deliver professional services when needed.
MX, DIF, Mexico City
At Amazon Web Services (AWS), we’re hiring highly technical Data and Machine Learning engineers to collaborate with our customers and partners on key engagements. Our consultants will develop and deliver proof-of-concept projects, technical workshops, and support implementation projects. These professional services engagements will focus on customer solutions such as Machine Learning, Data and Analytics, HPC and more.In this role, you will work with our partners, customers and focus on our AWS offerings such Amazon Kinesis, AWS Glue, Amazon Redshift, Amazon EMR, Amazon Athena, Amazon SageMaker and more. You will help our customers and partners to remove the constraints that prevent them from leveraging their data to develop business insights.AWS Professional Services engage in a wide variety of projects for customers and partners, providing collective experience from across the AWS customer base and are obsessed about customer success. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based upon customer needs.You will also have the opportunity to create white papers, writing blogs, build demos and other reusable collateral that can be used by our customers. Most importantly, you will work closely with our Solution Architects, Data Scientists and Service Engineering teams.The ideal candidate will have extensive experience with design, development and operations that leverages deep knowledge in the use of services like Amazon Kinesis, Apache Kafka, Apache Spark, Amazon SageMaker, Amazon EMR, NoSQL technologies and other 3rd parties.This is a customer facing role. You will be required to travel to client locations and deliver professional services when needed.
US, WA, Seattle
The Amazon Air Science and Technology team is seeking a Data Scientist to be part of a team solving complex aviation operations problems to reduce cost and improve performance. This is a blue-sky role that gives you a chance to bring optimization modeling, statistical modeling, machine learning advancements to data analytics for customer-facing solutions in complex industrial settings.You will work closely with product, research science and technical leaders throughout Amazon Air, Amazon Delivery Technology and Supply Chain Optimization and will be responsible for influencing funding decisions in areas of investment that you identify as critical future product offerings. You will partner with software developers and data scientists to build end-to-end data pipelines and production code, and you will have exposure to senior leadership as we communicate results and provide scientific guidance to the business. You will analyze large amounts of business data, build the machine learning or optimization models that will enable us to continually delight our customers worldwide.The ideal candidate will have extensive experience in Science work, business analytics and have the aptitude to incorporate new approaches and methodologies while dealing with ambiguities. Excellent business and communication skills are a must to develop and define key business questions and to build data sets that answer those questions. You should have a demonstrated ability to think strategically and analytically about business, product, and technical challenges. Further, you must have the ability to build and communicate compelling value propositions, and work across the organization to achieve consensus. This role requires a strong passion for customers, a high level of comfort navigating ambiguity, and a keen sense of ownership and drive to deliver results.RESPONSIBILITIES:· Utilize code (Python, R, Scala, etc.) for analyzing data and building statistical models to solve specific business problems.· Improve upon existing methodologies by developing new data sources, testing model enhancements, and fine-tuning model parameters.· Collaborate with researchers, software developers, and business leaders to define product requirements and provide analytical support· Directly contribute to the design and development of automated selection systems· Build customer-facing reporting tools to provide insights and metrics which track system performance· Communicate verbally and in writing to business customers and leadership team with various levels of technical knowledge, educating them about our systems, as well as sharing insights and recommendations
US, WA, Seattle
The Amazon Air Science and Technology team is seeking an Applied Scientist to be part of a team solving complex aviation operations problems to reduce cost and improve performance. This is a blue-sky role that gives you a chance to bring optimization modeling, statistical modeling, machine learning advancements to data analytics for customer-facing solutions in complex industrial settings.You will work closely with product, research science and technical leaders throughout Amazon Air, Amazon Delivery Technology and Supply Chain Optimization and will be responsible for influencing funding decisions in areas of investment that you identify as critical future product offerings. You will partner with software developers and data scientists to build end-to-end data pipelines and production code, and you will have exposure to senior leadership as we communicate results and provide scientific guidance to the business. You will analyze large amounts of business data, build the machine learning or optimization models that will enable us to continually delight our customers worldwide.The ideal candidate will have extensive experience in Science work, business analytics and have the aptitude to incorporate new approaches and methodologies while dealing with ambiguities. Excellent business and communication skills are a must to develop and define key business questions and build models that answer those questions. You should have a demonstrated ability to think strategically and analytically about business, product, and technical challenges. Further, you must have the ability to build and communicate compelling value propositions, and work across the organization to achieve consensus. This role requires a strong passion for customers, a high level of comfort navigating ambiguity, and a keen sense of ownership and drive to deliver results.Tasks/ Responsibilities:· Partnership with the engineering and operations to drive modeling and design for complex business problems.· Design and prototype decision support tools (product) to automate standardized processes and optimize trade-offs across the full decision space.· Contribute to the mid- and long-term strategic planning studies and analysis.· Lead complex transportation modeling analyses to aid management in making key business decisions and set new policies.
US, WA, Seattle
Want to watch a movie at the end of a long week, but not sure what to choose? Looking for a new show while you wait for the next season of Game of Thrones to start? So are millions of our Prime Video customers. The Prime Video Relevance team helps customers find relevant videos, channels and topics so they can find content they didn’t even known they were looking for, continuing to surprise them with the depth of our catalog.We tailor our recommendations through a variety of machine learning algorithms including deep learning neural networks, that you will help define and extend. We are looking for creative, customer and details obsessed machine learning scientists who can apply the latest research, state of the art algorithms and machine learning to build highly scalable recommendation and personalization systems. You'll have a chance to collaborate with talented teams of engineers and scientists to run these predictions on distributed systems at incredible scale and speed.As a member of the Prime Video Personalization organization, you will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At then of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide.Some examples of the things we work on:· Using Neural Networks and Deep Learning techniques to find titles that customers will enjoy· Build and operate services that deliver millions of recommendations per second· Extend models and algorithms to support our ever growing ways of consuming content (subscriptions, live, rentals etc), dealing with unique challenges such as observational bias and rapidly scaling dimensions· Constantly experimenting with changes to the underlying algorithms and models to deliver relevant content to a wide variety of customer experiencesIf you are ready to truly make an impact on a product that is used by millions of people around the world, including your own friends and family, then we would love to talk to you.Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation
ES, B, Barcelona
If you get excited by the prospect of solving hard problems using Computer Vision and Machine Learning, enjoy working in a fast-paced environment with thought leaders in the CV space, and are passionate about launching algorithms for maximum customer impact, then we have the perfect role for you!We are looking for Principal Scientist with a deep expertise in computer vision to focus on data-driven image synthesis using deep learning techniques such as GANs and VAEs to help Amazon transform pixels into personalized fashion images. Our team is developing cutting-edge technology to personalize and transform photos, partnering with many different teams across Amazon to apply a mix of workflows, image generation, computer vision, and machine learning to solve problems like virtual try-on at Amazon scale. A strong candidate will have experience leading a team of scientists, working independently on problems, and setting their own research direction.As a Principal Applied Scientist, you will work in a team with other scientists and engineers working on products and prototypes in the field of image synthesis and photo-realistic appearance of people and clothing to create scalable solutions for business problems. You will play a critical role in ideation for the team and run live experiments, with opportunities to publish your work. We are building the next generation of fashion imagery, and we hope you'll join us!This role is located in Barcelona, Spain, but we are open to hiring remote positions across the US and EU.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, WA, Seattle
The Human Resources Central Science Team (HRCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.We are looking for economists who are able to provide structure around complex business problems, hone those complex problems into specific, scientific questions, and test those questions to generate insights. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team.Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science and a scientific resource for business teams, so that scientific processes permeate throughout the HR organization to the benefit of Amazonians and Amazon. Ideal candidates will own the development of scientific models and manage the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions.
US, WA, Seattle
The Human Resources Central Science Team (HRCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.We are looking for economists who are able to provide structure around complex business problems, hone those complex problems into specific, scientific questions, and test those questions to generate insights. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team.Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science and a scientific resource for business teams, so that scientific processes permeate throughout the HR organization to the benefit of Amazonians and Amazon. Ideal candidates will own the development of scientific models and manage the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions.
US, WA, Seattle
The Amazon Product Catalog group is hiring a Senior Applied Scientist to help us build the most authoritative product knowledge in existence. Our charter is to capture every relevant fact and relationship for any product on the planet. You will have an enormous opportunity to make a large impact on the design, architecture, and implementation of cutting edge products used every day, by people you know.As a Senior Applied Scientist, you will be responsible for designing, developing, and deploying large-scale data mining solutions and distributed machine learning systems that ultimately make shopping on Amazon delightful and functional. Because the product catalog is the heart of our retail business, your work will directly change how Amazon customers search, find, compare, and buy everything from televisions to groceries. Our catalog contains hundreds of billions of facts, a scale which demands automated, state-of-the-art techniques to identify defects, abuse, incorrect data, and find new relationships between products, facts, and entities.Our domain blurs the boundaries between knowledge graphs, ontologies, entity recognition, image classification, machine translation, entity recognition, and semantic fact extraction. This requires a fast, collaborative environment We not only leverage best-in-class models, but also improve them. You will have the support of a well-established team with a long-term charter, science-smart engineers, language experts, and the latest AWS products and compute resources. You will collaborate closely with teams of software engineers, applied machine learning scientists, product managers, user interface designers, and others in order to influence our business and technical strategy, and play a key role in defining the team’s roadmap.A successful candidate will have an established background in machine learning science, large scale software systems, a strong technical ability, great communication skills, and a motivation to solve new, ambiguous problems.Unique Opportunities· Global impact on hundreds of millions of Amazon customers· Opportunities to publish both internally and externally· Close-knit partnership with science-smart developers – your improvements can roll out in days, not months, because engineers understand your work· Access to Amazon-scale datasets, spanning hundreds of billions of facts· Support from over 100 dedicated language experts and SMEs to label, validate, and test your hypothesis· Challenging, cutting-edge science problems that cross multiple domains (such as textual, semantic and image-aware Transformer models )Key Responsibilities· Develop production-ready machine learning solutions to improve Amazon Product Catalog· Collaborate with and influence scientists and engineers in multiple teams· Refine existing techniques, optimizing accuracy, throughput, and global impact· Publish results internally and externally
US, WA, Seattle
Interested in modeling and understanding customer behavior through machine learning, artificial intelligence, and data mining over TB scale data with huge business impact on millions of customers? Join our team of Scientists and Engineers developing models to predict customer behavior and optimize the customer experience with Amazon Prime. This includes identifying who our customers are and providing them with personalized relevant content. As an ML expert, you will partner directly with product owners to intake, build, and directly apply your modeling solutions.There are numerous scientific and technical challenges you will get to tackle in this role, such as global scalability of models, combinatorial optimization, cold start problem, accelerated experimentation, short/long term goals modeling, cohort identification in a semi-supervised setting, and multi-step optimization leading to reinforcement learning of the customer journey. We employ techniques from supervised learning, bandits, optimization, and RL.As the central science team within Prime, our expertise gets routinely called upon to weigh in on a variety of topics. We also emphasize the need and value of scientific research and have developed a strong publication and patent record (internally/externally) which you will be a part of.You will also utilize and be exposed to the latest in ML technologies and infrastructure: AWS technologies (EMR/Spark, Redshift, Sagemaker, DynamoDB, S3, ...), various ML algorithms and techniques (XGBoost, Random Forests, Neural Networks, supervised/unsupervised/semi-supervised/reinforcement learning), and statistical modeling techniques.Major responsibilities· · Build and develop machine learning models and supporting infrastructure at TB scale, in coordination with software engineering teams.· · Leverage Bandits and Reinforcement Learning for Recommendation Systems.· · Develop offline policy estimation tools and integrate with reporting systems.· · Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.· · Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes.· · Work closely with the business to understand their problem space, identify the opportunities and formulate the problems.· · Use machine learning, data mining, statistical techniques and others to create actionable, meaningful, and scalable solutions for the business problems.· · Design, develop and evaluate highly innovative models and statistical approaches to understand and predict customer behavior and to solve business problems.
US, WA, Seattle
Interested in modeling and understanding customer behavior through machine learning, artificial intelligence, and data mining over TB scale data with huge business impact on millions of customers? Join our team of Scientists and Engineers developing models to predict customer behavior and optimize the customer experience with Amazon Prime. This includes identifying who our customers are and providing them with personalized relevant content. As an ML expert, you will partner directly with product owners to intake, build, and directly apply your modeling solutions.There are numerous scientific and technical challenges you will get to tackle in this role, such as global scalability of models, combinatorial optimization, cold start problem, accelerated experimentation, short/long term goals modeling, cohort identification in a semi-supervised setting, and multi-step optimization leading to reinforcement learning of the customer journey. We employ techniques from supervised learning, bandits, optimization, and RL.As the central science team within Prime, our expertise gets routinely called upon to weigh in on a variety of topics. We also emphasize the need and value of scientific research and have developed a strong publication and patent record (internally/externally) which you will be a part of.You will also utilize and be exposed to the latest in ML technologies and infrastructure: AWS technologies (EMR/Spark, Redshift, Sagemaker, DynamoDB, S3, ...), various ML algorithms and techniques (XGBoost, Random Forests, Neural Networks, supervised/unsupervised/semi-supervised/reinforcement learning), and statistical modeling techniques.Major responsibilities· · Build and develop machine learning models and supporting infrastructure at TB scale, in coordination with software engineering teams.· · Leverage Bandits and Reinforcement Learning for Recommendation Systems.· · Develop offline policy estimation tools and integrate with reporting systems.· · Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.· · Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes.· · Work closely with the business to understand their problem space, identify the opportunities and formulate the problems.· · Use machine learning, data mining, statistical techniques and others to create actionable, meaningful, and scalable solutions for the business problems.· · Design, develop and evaluate highly innovative models and statistical approaches to understand and predict customer behavior and to solve business problems.
US, WA, Seattle
Are you excited about powering Amazon’s physical stores’ expansion through the application of Machine Learning and Big Data technologies? Do you thrive in a fast-moving, innovative environment that values data-driven decision making, scalable solutions, and sound scientific practices? We are looking for experienced scientists to build the next level of intelligence that will help Amazon physical stores grow and succeed.Our team is responsible for building the core intelligence, insights, and algorithms that support the real estate acquisition strategies for Amazon physical stores. We are tackling cutting-edge, complex problems — such as predicting the optimal location for new Amazon stores — by bringing together numerous data assets from disparate sources inside and outside of Amazon, and using best-in-class modeling solutions to extract the most information out of them.You will have a proven track-record of delivering solutions using advanced science approaches. You will be comfortable using a variety of tools and data sources to answer high-impact business questions. You will transform one-off models into automated systems. You will be able to break down complex information and insights into clear and concise language and be comfortable presenting your findings to audiences with a broad range of backgrounds.Responsibilities:· Develop production software systems utilizing advanced algorithms to solve business problems.· Analyze and validate data to ensure high data quality and reliable insights.· Partner with data engineering teams across multiple business lines to improve data assets, quality, metrics and insights.· Proactively identify interesting areas for deep dive investigations and future product development.· Design and execute experiments, and analyze experimental results in collaboration with Product Managers, Business Analysts, Economists, and other specialists.· Leverage industry best practices to establish repeatable applied science practices, principles & processes.
US, WA, Seattle
Amazon’s High Value Messaging (HVM) Analytics team (part of Customer Behavior Analytics) is looking for a Senior Applied Scientist to spearhead the rapid growth of our Marketing Measurement solutions. The team focuses on building scalable scientific models to estimate the effectiveness of Amazon marketing efforts and provide actionable insights to the various marketing teams within Amazon. We are looking for a thought leader that has an aptitude for delivering customer-focused solutions and who enjoys working on the intersection of Big-Data analytics, Machine/Deep Learning, and Causal Inference.A successful candidate will be a self-starter, comfortable with ambiguity, able to think big and be creative, while still paying careful attention to detail. You should be able to translate how data represents the customer journey, be comfortable dealing with large and complex data sets, and have experience using machine learning and econometric modeling to solve business problems. You should have strong analytical and communication skills, be able to work with product managers and software teams to define key business questions and work with the analytics team to solve them. You will join a highly collaborative and diverse working environment that will empower you to shape the future of Amazon marketing, as well as allow you to be part of the large science community within the Customer Behavior Analytics (CBA) organization.The Customer Behavior Analytics (CBA) organization owns Amazon’s insights pipeline, from data collection to deep analytics. We aspire to be the place where Amazon teams come for answers, a trusted source for data and insights that empower our systems and business leaders to make better decisions. Our outputs shape Amazon product and marketing teams’ decisions and thus how Amazon customers see, use, and value their experience.The main responsibilities for this position include:· Apply expertise in ML and causal modeling to develop systems that describe how Amazon’s marketing campaigns impact customers’ actions· Own the end-to-end development of novel scientific models that address the most pressing needs of our business stakeholders and help guide their future actions· Improve upon and simplify our existing solutions and frameworks· Review and audit modeling processes and results for other scientists, both junior and senior· Work with marketing leadership to align our measurement plan with business strategy· Formalize assumptions about how models are expected to behave, creating definitions of outliers, developing methods to systematically identify these outliers, and explaining why they are reasonable or identifying fixes for them· Identify new opportunities that are suggested by the data insights· Bring a department-wide perspective into decision making· Develop and document scientific research to be shared with the greater science community at Amazon
US, WA, Seattle
Global Talent Management (GTM) at Amazon owns a suite of products which helps drive career development for hundreds of thousands of Amazonians across the world. GTM - Science utilizes a wide array of data sources to conduct analytics and create predictive models that fuel recommendations, actions, and insights in nearly a dozen software systems. The team itself is composed of a variety of scientists and engineers with varied backgrounds, coming together to create diverse and innovative solutions to the problems faced by the one of the world’s largest and fastest growing workforces.This role will support the advancement of key workforce planning products owned by the team. The role will be a scientific lead for forecasting in the organization and a thought leader for forecasting applications throughout HR. If you’re interested in building models used regularly by thousands of Amazonians, to inform talent management decisions, this role is for you. You will support interesting, analytical problems, in an environment where you get to learn from other experienced economists and apply econometrics at massive scale.You will build econometric models, using our world class data systems, and apply economic theory to solve business problems in a fast moving environment. Economists at Amazon will be expected to develop new techniques to process large data sets, address quantitative problems, and contribute to design of automated systems around the company.· Build and operationalize econometric and statistical models· Perform model refreshes or updates to analyses as needed· Work collaboratively with economists and research scientists to assist in the design and implementation of analysis to answer challenging HR questions· Interpret and communicate results to outside customers· Aggregate and analyze data pulled from disparate sources (HR, Finance or other business systems) and related industry and external benchmarks; provide insights and a point of view on analysis and recommendations· Assist in the design and delivery of automated, scalable analytical models to stakeholders· Report results in a manner which is both statistically rigorous and compellingly relevant
US, VA, Arlington
Global Talent Management (GTM) at Amazon owns a suite of products which helps drive career development for hundreds of thousands of Amazonians across the world. GTM - Science utilizes a wide array of data sources to conduct analytics and create predictive models that fuel recommendations, actions, and insights in nearly a dozen software systems. The team itself is composed of a variety of scientists and engineers with varied backgrounds, coming together to create diverse and innovative solutions to the problems faced by the one of the world’s largest and fastest growing workforces.This role will support the advancement of key workforce planning products owned by the team. The role will be a scientific lead for forecasting in the organization and a thought leader for forecasting applications throughout HR. If you’re interested in building models used regularly by thousands of Amazonians, to inform talent management decisions, this role is for you. These are exciting fast-paced businesses in which work on extremely interesting analytical problems, in an environment where you get to learn from other experienced economists and apply econometrics at massive scale.You will build econometric models, using our world class data systems, and apply economic theory to solve business problems in a fast moving environment. Economists at Amazon will be expected to develop new techniques to process large data sets, address quantitative problems, and contribute to design of automated systems around the company.· Build and operationalize econometric and statistical models· Perform model refreshes or updates to analyses as needed· Work collaboratively with economists and research scientists to assist in the design and implementation of analysis to answer challenging HR questions· Interpret and communicate results to outside customers· Aggregate and analyze data pulled from disparate sources (HR, Finance or other business systems) and related industry and external benchmarks; provide insights and a point of view on analysis and recommendations· Assist in the design and delivery of automated, scalable analytical models to stakeholders· Report results in a manner which is both statistically rigorous and compellingly relevant
US, NY, New York
Machine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations.The Amazon ML Solutions Lab team helps AWS customers accelerate the use of machine learning to solve business and operational challenges and promote innovation in their organization. In this role, you will be designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data to solve real-world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience.We’re looking for talented data scientists capable of applying classical ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others.The primary responsibilities of this role are to:· Design, develop, and evaluate innovative ML/DL models to solve diverse challenges and opportunities across industries· Interact with customer directly to understand their business problems, and help them with defining and implementing scalable ML/DL solutions to solve them· Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new algorithmsThis position requires travel of up to 25%.Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and we host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by protecting Amazon customers from hackers and bad actors? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer every day? Are you excited by the prospect of analyzing and modeling terabytes of data and create state-of-art algorithms to solve real world problems? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Amazon Account Integrity team.The Amazon Account Integrity team works to ensure that customers are protected from bad actors trying to access their accounts. Our greatest challenge is protecting customer trust without unjustly harming good customers. To strike the right balance, we invest in mechanisms which allow us to accurately identify and mitigate risk, and to quickly correct and learn from our mistakes. This strategy includes continuously evolving enforcement policies, iterating our Machine Learning risk models, and exercising high‐judgement decision‐making where we cannot apply automation.
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by protecting Amazon customers from hackers and bad actors? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer every day? Are you excited by the prospect of analyzing and modeling terabytes of data and create state-of-art algorithms to solve real world problems? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Amazon Account Integrity team.The Amazon Account Integrity team works to ensure that customers are protected from bad actors trying to access their accounts. Our greatest challenge is protecting customer trust without unjustly harming good customers. To strike the right balance, we invest in mechanisms which allow us to accurately identify and mitigate risk, and to quickly correct and learn from our mistakes. This strategy includes continuously evolving enforcement policies, iterating our Machine Learning risk models, and exercising high‐judgement decision‐making where we cannot apply automation.
US, CA, San Francisco
LOCATION: San Francisco, CAMULTIPLE POSITIONS AVAILABLE1. Analyze real user data (search query logs) using SQL or equivalent data query language.2. Train machine learning / deep learning based models using ML platforms and libraries such as Tensorflow, Pytorch, Pyspark etc.3. Apply natural language processing techniques to improve ranking of search results and develop new ranking features and techniques building upon the latest results from the academic research community4. Boost search conversion by classifying user search queries and recommending relevant content5. Contribute to operational excellence in search team's scientific features, constructively identifying inefficient processes and proposing solutions6. Experiment with different models, analyze results using statistical methods and iterate on improving the results7. Propose and validate hypotheses to direct our business and product road map. Work with engineers to make low latency model predictions and scale the throughput of the system.8. Design, develop, and implement production level code that serves millions of search requests. Own the full development cycle: design, development, impact assessment, A/B testing (including interpretation of results) and production deployment.9. Telecommuting benefits available#0000
US, CA, Pasadena
LOCATION: Pasadena, CAMULTIPLE POSITIONS AVAILABLE1. Assist large enterprises with researching and learning about new technologies in cloud computing. Understand their business needs in different industries and guide them to a solution using AWS Services.2. Develop approaches to industry problems in optimization, simulation and machine learning and execute customer projects and cases studies end-to-end.3. Develop a deep understanding of emerging technologies and innovate in co-designing novel algorithms on these platforms.4. Collaborate with AWS Services and research teams to continually improve the customer experience.5. Collaborate across the entire AWS organization to bring access to product and service teams, get the right solutions delivered and drive feature innovation based upon customer needs.6. Influence a team of scientists who are working on procedures to build quantum computers more reliably and develop methods to benchmark the performance of quantum hardware.7. Lead the exploratory research and prototyping of new schemes and simulation software for error correction resource estimates and benchmarking.8. Publish in scientific journals, create white papers, write blogs, and build demos and other reusable collateral that can be used by customers.9. Lead research and publication efforts focused on quantum error correction and quantum bench marking.10. Domestic and some international travel may be required up to 25% of the time.11. Telecommuting benefits available.#0000