How to reduce annotation when evaluating AI systems

By exploiting consistencies across components of ensemble classifiers, a new approach reduces data requirements by up to 89%.

Commercial machine learning systems are trained on examples meant to represent the real world. But the world is constantly changing, and deployed machine learning systems need to be regularly reevaluated, to ensure that their performance hasn’t declined.

Evaluating a deployed AI system means manually annotating data the system has classified, to determine whether those classifications are accurate. But annotation is labor intensive, so it is desirable to minimize the number of samples required to assess the system’s performance.

Many commercial machine learning systems are in fact ensembles of binary classifiers; each classifier “votes” on whether an input belongs to a particular class, and the votes are pooled to produce a final decision.

In a paper we’re presenting at the European Conference on Machine Learning, we show how to reduce the number of random samples required to evaluate ensembles of binary classifiers by exploiting overlaps between the sample sets used to evaluate the individual components.

For example, imagine that an ensemble that has three classifiers, and we need 10 samples each to evaluate the performance of the three classifiers. Evaluating the ensemble requires 40 samples — 10 each for the individual classifiers and 10 for the full ensemble. If 10 of the 40 samples were duplicates, we could make do with 30 annotations. Our paper builds on this intuition.

In an experiment using real data, our approach reduced the number of samples required to evaluate an ensemble by more than 89%, while preserving the accuracy of the evaluation.

We also ran experiments using simulated data that varied the degree of overlap between the sample sets for the individual classifiers. In those experiments, the savings averaged 33%.

Finally, in the paper, we show that our sampling procedure doesn’t introduce any biases into the resulting sample sets, relative to random sampling.

Common ground

Intuitively, randomly chosen samples for the separate components of an ensemble would inevitably include some duplicates. Most of the samples useful for evaluating one model should thus be useful for evaluating the others. The goal is to add in just enough additional samples to be able to evaluate all the models.

We begin by choosing a sample set for the entire ensemble, which we dub the “parent”; the individual models of the ensemble are, by reference, “children”. After finding a set of samples sufficient for evaluating the parent, we expand it to include the first child, then repeat the procedure until the set of samples covers all the children.

Our general approach works with any criterion for evaluating an ensemble’s performance, but in the paper, we use precision — or the percentage of true positives that the classifier correctly identifies — as a running example.

Illustration of set intersections.
In this figure, the set of inputs classified as positive by the parent (right circle, AP) intersects the set of inputs classified as positive by the child (left circle, AC). The intersection (orange-shaded region) between a random sample of AP (orange curve, SP) and AC represents S+, the samples from the parent’s positive set that were also classified as positive by the child. The green-shaded region represents S-, samples from the set of inputs that were classified as positive by the child but not the parent. The sprinkled x’s represent Sremain, additional samples of the inputs classified as positive by the child, required to provide enough samples to get a highly accurate estimate of precision.

We begin with the total set of inputs that the parent has judged to belong to the target class and the total set of inputs that the child has. There’s usually considerable overlap between the two sets; for example, in a majority-vote ensemble composed of three classifiers, the ensemble (parent) classifies an input as positive as long as two of the components (children) do.

From the parent set, we select enough random samples to evaluate the parent. Then we find the intersection between that sample set and the child’s total set of positive classifications (S+ in the figure above). This becomes our baseline sample set for the child.

Next, we draw a random sampling of inputs that the child classified as positive but the parent did not (S-, above). The ratio between the size of this sample and the size of the baseline sample set should be the same as the ratio between the number of inputs that the child — but not the parent — labeled positive and the number of inputs that both labeled positive.

When we add these samples to the baseline sample set, we get a combined sample set that may not be large enough to accurately estimate precision. If needed, we select more samples from the inputs classified as positive by the child. These samples may also have been classified as positive by the parent (Sremain in the figure above).

Recall that we first selected samples from the set where the child and parent agreed, then from the set where the child and parent disagreed. That means that the sample set we have constructed is not truly random, so the next step is to mix together the samples in the combined set.

Reshuffle or resample?

We experimented with two different ways of performing this mixing. In one, we simply reshuffle all the samples in the combined set. In the other, we randomly draw samples from the combined set and add them to a new mixed set, until the mixed set is the same size as the combined set. In both approaches, the end result is that when we pick any element from the sample, we won’t know whether it came from the set where the parent and child agreed or the one where they disagreed.

Visualization of the average savings in samples provided by our approach.
A visualization of the average savings in samples provided by our approach as we varied the amount of overlap between the parent’s and child’s judgments.

In our experiments, we identified a slight trade-off between the results of our algorithm when we used reshuffling to produce the mixed sample set and when we used resampling. Because resampling introduces some redundancies into the mixed set, it requires fewer samples than reshuffling, which increases the savings in sample size versus random sampling.

At the same time, however, it slightly lowers the accuracy of the precision estimate. With reshuffling, our algorithm, on average, slightly outperformed random sampling on our three test data sets, while with resampling, it was slightly less accurate than random sampling.

Overall, the sampling procedure we have developed reduces the sample size. Of course, the amount of savings depends on the overlap between the parent’s and child’s judgments. The greater the overlap, the greater the savings in samples.

Research areas

Related content

US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning to help Amazon provide the best experience to our Selling Partners by automatically understanding and addressing their challenges, needs and opportunities? Do you want to build advanced algorithmic systems that are powered by state-of-art ML, such as Natural Language Processing, Large Language Models, Deep Learning, Computer Vision and Causal Modeling, to seamlessly engage with Sellers? Are you excited by the prospect of analyzing and modeling terabytes of data and creating cutting edge algorithms to solve real world problems? Do you like to build end-to-end business solutions and directly impact the profitability of the company and experience of our customers? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Selling Partner Experience Science team. Key job responsibilities - Use statistical and machine learning techniques to create the next generation of the tools that empower Amazon's Selling Partners to succeed. - Design, develop and deploy highly innovative models to interact with Sellers and delight them with solutions. - Work closely with teams of scientists and software engineers to drive real-time model implementations and deliver novel and highly impactful features. - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. - Research and implement novel machine learning and statistical approaches. - Participate in strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment. About the team Selling Partner Experience Science is a growing team of scientists, engineers and product leaders engaged in the research and development of the next generation of ML-driven technology to empower Amazon's Selling Partners to succeed. We draw from many science domains, from Natural Language Processing to Computer Vision to Optimization to Economics, to create solutions that seamlessly and automatically engage with Sellers, solve their problems, and help them grow. Focused on collaboration, innovation and strategic impact, we work closely with other science and technology teams, product and operations organizations, and with senior leadership, to transform the Selling Partner experience. We are open to hiring candidates to work out of one of the following locations: Denver, CO, USA | Seattle, WA, USA
US, WA, Seattle
Amazon is investing heavily in building a world class advertising business and developing a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses for driving long-term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. Key job responsibilities Search Supply and Experiences, within Sponsored Products, is seeking a Senior Data Scientist to join a fast growing team with the mandate of creating new ads experience that elevates the shopping experience for our hundreds of millions customers worldwide. We are looking for a top analytical mind capable of understanding our complex ecosystem of advertisers participating in a pay-per-click model– and leveraging this knowledge to help turn the flywheel of the business. As a Senior Data Scientist on this team you will: - Lead Data Science solutions from beginning to end. - Deliver with independence on challenging large-scale problems with ambiguity. - Manage and drive the technical and analytical aspects of Advertiser segmentation; continually advance approach and methods. - Write code (Python, R, Scala, etc.) to analyze data and build statistical models to solve specific business problems - Retrieve, synthesize, and present critical data in a format that is immediately useful to answering specific questions or improving system performance. - Analyze historical data to identify trends and support decision making. - Improve upon existing methodologies by developing new data sources, testing model enhancements, and fine-tuning model parameters. - Provide requirements to develop analytic capabilities, platforms, and pipelines. - Apply statistical and machine learning knowledge to specific business problems and data. - Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers. Work out why such examples are outliers and define if any actions needed. - Given anecdotes about anomalies or generate automatic scripts to define anomalies, deep dive to explain why they happen, and identify fixes. - Build decision-making models and propose solution for the business problem you defined - Conduct written and verbal presentation to share insights and recommendations to audiences of varying levels of technical sophistication. - Write code (python or another object-oriented language) for data analyzing and modeling algorithms. A day in the life The Senior Data Scientist will have the opportunity to use one of the world's largest eCommerce and advertising data sets to influence the evolution of our products. This role requires an individual with excellent business, communication, and technical skills, enabling collaboration with various functions, including product managers, software engineers, economists and data scientists, as well as senior leadership. This role will create and enhance performance monitoring reports to find insights that product and business team should focus on. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail, and with an ability to work in a fast-paced, high-energy and ever-changing environment. The drive and capability to shape the direction is a must. This role will influence the direction of the business by leveraging our data to deliver insights that drive decisions and actions. The role will involve translating broad business problems into specific analytics projects, conducting deep quantitative analyses, and communicating results effectively. The role will help the organization identify, evaluate, and evangelize new techniques and tools to continue to improve our ability to deliver value to Amazon’s customers. About the team We are a customer-obsessed team of engineers, technologists, product leaders, and scientists. We are focused on continuous exploration of contexts and creatives where advertising delivers value to customers and advertisers. We specifically work on new ads experiences globally with the goal of helping shoppers make the most informed purchase decision. We obsess about our customers and we are continuously innovating on their behalf to enrich their shopping experience on Amazon We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. We are looking to hire an Applied Scientist to work on the embedded software for our control system. The position is on-site at our lab, located on the Caltech campus in Pasadena, CA. The ideal candidate will be able to translate high-level requirements (e.g. latency, bandwidth, architecture) into software/firmware implementations (e.g. low-level device drivers, kernel modules, Python APIs) compatible with our FPGA-based control systems. This requires someone who (1) has a strong desire to work within a team of scientists and engineers, and (2) demonstrates ownership in initiating and driving projects to completion. Key job responsibilities - Develop embedded software in C, C++ or Rust for high-performance real-time tasks. - Develop Linux and/or real-time operating system (RTOS) features required to operate control system. - Develop FPGA gateware that drives domain-specific functions of our control hardware. - Develop user-space API that exposes low-level features, preferably in Python. - Develop, test, and optimize control system features on bench-top and in real-world conditions. - Own the stability of control system software and firmware. We are looking for candidates with strong engineering principles, resourcefulness and a bias for action, superior problem-solving and excellent communication skills. Working effectively within a team environment is essential. You will have the opportunity to work on new ideas and stay abreast of the field of experimental quantum computation. A day in the life The lifetime of your projects will likely begin with a lot of discussion and negotiation with our scientists and engineers to translate their software and hardware feature requests into design proposals that demonstrate sensible trade-offs between complexity and delivery. Once a design proposal has been accepted, you will implement it in a logical and maintainable manner. You will also be encouraged to take ownership over the stability and quality of the software and hardware stack by identifying, proposing, and implementing features that will accelerate our realization of quantum computing technologies. You will be joining the Control & Calibration Software team within the AWS Center of Quantum Computing. Our team is comprised of scientists and engineers who are building scalable software that enables quantum computing technologies. About the team AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. We are open to hiring candidates to work out of one of the following locations: Pasadena, CA, USA
ES, M, Madrid
Amazon's International Technology org in EU (EU INTech) is creating new ways for Amazon customers discovering Amazon catalog through new and innovative Customer experiences. Our vision is to provide the most relevant content and CX for their shopping mission. We are responsible for building the software and machine learning models to surface high quality and relevant content to the Amazon customers worldwide across the site. The team, mainly located in Madrid Technical Hub, London and Luxembourg, comprises Software Developer and ML Engineers, Applied Scientists, Product Managers, Technical Product Managers and UX Designers who are experts on several areas of ranking, computer vision, recommendations systems, Search as well as CX. Are you interested on how the experiences that fuel Catalog and Search are built to scale to customers WW? Are interesting on how we use state of the art AI to generate and provide the most relevant content? Key job responsibilities We are looking for Applied Scientists who are passionate to solve highly ambiguous and challenging problems at global scale. You will be responsible for major science challenges for our team, including working with text to image and image to text state of the art models to scale to enable new Customer Experiences WW. You will design, develop, deliver and support a variety of models in collaboration with a variety of roles and partner teams around the world. You will influence scientific direction and best practices and maintain quality on team deliverables. We are open to hiring candidates to work out of one of the following locations: Madrid, M, ESP
US, WA, Seattle
Alexa is the Amazon cloud service that powers Echo, the groundbreaking Amazon device designed around your voice. We believe voice is the most natural user interface for interacting with technology across many domains; we are inventing the future. Alexa Audio is responsible for fulfilling customers requests for all types of audio content (Music, Radio, Podcasts, Books, custom sounds) across all Alexa enabled devices. This covers a broad set of experiences including search, browse, recommendations, playback, and devices grouping and controls. We are seeking a talented, self-directed Applied Scientists who would come up with state of the art semantic search and recommendation techniques that work with both voice and visual interfaces. This is a unique opportunity where you will be working on latest technologies including LLMs, and also see it impact customer's lives in meaningful ways. Responsibilities - Apply advance state-of-the-art artificial intelligence techniques and develop algorithms in areas of personalization, voice based dialogue systems and natural language information retrieval. - Design scientifically sound online experiments and offline simulations to study and improve products. - Work closely with talented engineers to create scalable models and put them to production. - Perform statistical analyses on large data sets, identify problems, and propose solutions. - Work with partner science teams to identify collaboration opportunities. Work hard. Have fun. Make history. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
GB, London
Amazon Advertising is looking for an Applied Scientist to join its initiative that powers Amazon’s contextual advertising products. Advertising at Amazon is a fast-growing multi-billion dollar business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies.The Supply Quality organization has the charter to solve optimization problems for ad-programs in Amazon and ensure high-quality ad-impressions. We develop advanced algorithms and infrastructure systems to optimize performance for our advertisers and publishers. We are focused on solving a wide variety of problems in computational advertising like Contextual data processing and classification, traffic quality prediction (robot and fraud detection), Security forensics and research, Viewability prediction, Brand Safety and experimentation. Our team includes experts in the areas of distributed computing, machine learning, statistics, optimization, text mining, information theory and big data systems. We are looking for a dynamic, innovative and accomplished Applied Scientist to work on machine learning and data science initiatives for contextual data processing and classification that power our contextual advertising solutions. Are you excited by the prospect of analyzing terabytes of data and leveraging state-of-the-art data science and machine learning techniques to solve real world problems? Do you like to own business problems/metrics of high ambiguity where yo get to define the path forward for success of a new initiative? As an applied scientist, you will invent ML based solutions to power our contextual classification technology. As this is a new initiative, you will get an opportunity to act as a thought leader, work backwards from the customer needs, dive deep into data to understand the issues, conceptualize and build algorithms and collaborate with multiple cross-functional teams. Key job responsibilities * Design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both analysis and business judgment. * Collaborate with software engineering teams to integrate successful experiments into large-scale, highly complex Amazon production systems. * Promote the culture of experimentation and applied science at Amazon. * Demonstrated ability to meet deadlines while managing multiple projects. * Excellent communication and presentation skills working with multiple peer groups and different levels of management * Influence and continuously improve a sustainable team culture that exemplifies Amazon’s leadership principles. We are open to hiring candidates to work out of one of the following locations: London, GBR
US, NY, New York
AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS. Within AWS UC, Amazon Dedicated Cloud (ADC) roles engage with AWS customers who require specialized security solutions for their cloud services. Amazon AI is looking for world class scientists and engineers to join its AWS AI Labs to develop groundbreaking generative AI technologies in Amazon Q. Q is an interactive, AI-powered assistant that touches all aspects of builder and developer experience. You will be part of the Q Code Analysis team that works at the intersection of code analysis, logical reasoning and machine learning to build and enhance capabilities, safety and security of AI-powered developer tools in Amazon Q. You will invent, implement, and deploy state-of-the-art algorithms and systems, and be at the heart of a growing and exciting focus area for AWS. Your work will directly impact millions of our customers in the form of products and services that are based on large language models, retrieval-augmented generation, code analysis, responsible AI, and a lot more. You will make breakthroughs that challenge the limits of code analysis, machine learning and AI while collaborating with academics and interacting directly with customers to bring new research rapidly to production. A day in the life Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. EEO/Accommodations AWS is committed to a diverse and inclusive workplace to deliver the best results for our customers. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status; we celebrate the diverse ways we work. For individuals with disabilities who would like to request an accommodation, please let us know and we will connect you to our accommodation team. You may also reach them directly by visiting please https://www.amazon.jobs/en/disability/us. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our [insert req country location here] Amazon offices. About the team The Amazon Web Services (AWS) Next Gen DevX (NGDE) team uses generative AI and foundation models to reimagine the experience of all builders on AWS. From the IDE to web-based tools and services, AI will help engineers work on large and small applications. We explore new technologies and find creative solutions. Curiosity and an explorative mindset can find a place here to impact the life of engineers around the world. If you are excited about this space and want to enlighten your peers with new capabilities, this is the team for you. We are open to hiring candidates to work out of one of the following locations: New York, NY, USA
US, WA, Bellevue
The Planning and Execution team (PLEX) is seeking a Research Scientist to build & improve mathematical optimization techniques and algorithms to support planning and execution activities throughout North America. PLEX is comprised of high-powered dynamic teams, which are shaping network execution through the development and application of innovative labor & flow planning mechanisms. Our goal is to improve and enhance the Amazon Fulfillment network to ultimately drive the best customer experience in a reliable and cost-efficient manner that is truly world-class. As part of the PLEX organization, you’ll partner closely with other scientists, engineers, and product teams in a collegial environment to build optimization strategies that will influence the performance of all North America Amazon Fulfillment networks. You will develop scientific models and perform complex mathematical research to accurately solve labor and flow planning problems, enhance automation, and provide value-added research to the business. You will continually iterate and identify new modeling and research opportunities to implement science into customer fulfillment planning processes. We are looking for a passionate scientist with a commitment to innovation & teamwork. Successful candidates will have a deep knowledge of optimization techniques and ML methods to tackle complex science problems. You will have the communication skills necessary to impact and influence leadership & partner teams through technical writings, presentations and discussions. You will learn a lot, grow, and have fun in the process! Innovation Opportunities & Career Growth Our business grows fast and we want our employees growing with it too. We provide constant opportunities for growth in our team through regular training, talent development, mentoring, and mechanisms conducive to incubating ideas from the bottom up to showcase your innovations. Inclusive Team Culture Here at Amazon, we promote an inclusive and engaging environment. We understand the strength that unique experiences bring to the team and value it. In our team, we uphold that all individuals should feel included, respected, and developed. Flexibility It's not the hours that you put into work matters, rather it's the quality of work that you put in. We provide flexibility and support to help you find a balance between your work and personal lives. This position will be based in Austin, TX We are open to hiring candidates to work out of one of the following locations: - Austin, TX - Bellevue, WA - Nashville, TN Key job responsibilities - Create & improve mathematical optimization techniques & ML models for labor & flow planning - Lead & partner with research, applied, and data science teams to improve accuracy of existing technology solutions and provide data driven recommendations for strategic model implementations - Identify and thoroughly research external and previously non-considered factors to implement with advanced mathematics - Simplify the scientific decisions by navigating through the technology complexities, explaining them in plain customer and business context to our partners & customers. We are open to hiring candidates to work out of one of the following locations: Austin, TX, USA | Bellevue, WA, USA | Nashville, TN, USA
US, WA, Seattle
We are building GenAI based shopping assistant for Amazon. We reimage Amazon Search with an interactive conversational experience that helps you find answers to product questions, perform product comparisons, receive personalized product suggestions, and so much more, to easily find the perfect product for your needs. We’re looking for the best and brightest across Amazon to help us realize and deliver this vision to our customers right away. This will be a once in a generation transformation for Search, just like the Mosaic browser made the Internet easier to engage with three decades ago. If you missed the 90s—WWW, Mosaic, and the founding of Amazon and Google—you don’t want to miss this opportunity. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
DE, Berlin
The Amazon Artificial General Intelligence (AGI) team is looking for a passionate, highly skilled and inventive Senior Applied Scientist with strong machine learning background to lead the development and implementation of state-of-the-art ML systems for building large-scale, high-quality conversational assistant systems. Key job responsibilities - Use deep learning, ML and NLP techniques to create scalable solutions for creation and development of language model centric solutions for building personalized assistant systems based on a rich set of structured and unstructured contextual signals - Innovate new methods for contextual knowledge extraction and information representation, using language models in combination with other learning techniques, that allows effective grounding in context providers when considering memory, cpu, latency and quality - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in personal knowledge aggregation, processing and verification - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think Big about the arc of development of conversational assistant system personalization over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team A day in the life As a Senior Applied Scientist, you will play a critical role in driving the development of personalization techniques enabling conversational systems, in particular those based on large language models, to be tailored to customer needs. You will handle Amazon-scale use cases with significant impact on our customers' experiences. We are open to hiring candidates to work out of one of the following locations: Berlin, DEU