Automated evaluation of RAG pipelines with exam generation

The fight against hallucination in retrieval-augmented-generation models starts with a method for accurately assessing it.

In the swiftly evolving domain of large language models (LLMs), the accurate evaluation of retrieval-augmented-generation (RAG) models is paramount. In this blog, we introduce a pioneering methodology that employs an automated exam generation process, enhanced by item response theory (IRT), to evaluate the factual accuracy of RAG models on specific tasks. Our approach is not only robust and interpretable but also cost efficient, strategically identifying model strengths and refining exams to optimize their evaluative utility. We describe our methodology in a paper we will present in July at the 2024 International Conference on Machine Learning (ICML).

Exam generation process

RAG is a method for handling natural-language queries by retrieving relevant documents and using text from them to seed the response generated by an LLM. The expectation is that factual assertions from reliable documents will curb the LLM’s tendency to “hallucinate”, or generate reasonable-sounding but false sentences.

To evaluate a RAG model on a particular task, we use an LLM to generate multiple-choice questions from a task-specific knowledge corpus. Our method is agnostic to the retriever and generative model used in both the RAG system and the exam generation task.

RAG diagram.png
Summary of the proposed exam generation, evaluation, and iterative-improvement processes.

Our approach has two steps. For each document in the knowledge corpus, we use an LLM and several prompt-engineering strategies to create candidate questions. Then we use several natural-language-processing filters to remove low-quality questions along various axes, such as length, incorrectness, and self-containment.

We note an interesting asymmetry: given a document corpus, it is relatively easy for an LLM to generate a question and the correct answer, as the content of both is contained in the prompt. However, it is considerably more difficult to create high-quality incorrect answers, commonly referred to as discriminators.

To filter out degenerate questions, we use the Jaccard similarity coefficient and embedding-based similarity metrics.

Here is the prompt that we used for exam generation:

Human: Here is some documentation from {task_domain}: {documentation}.\n
From this generate a difficult multi-form question for an exam.
It should have 4 candidates, 1 correct answer, and explanations.

Syntax should be Question: {question}\n
A){candidate A}\n
B){candidate B}\n
C){candidate C}\n
D){candidate D}

Correct Answer: {correct answer}\n
### Assistant:"

In our research, we analyzed several RAG pipeline variants, including closed-book (no knowledge from the document corpus is provided to the LLM), oracle (the exam taker has access to the specific document used to generate the question-and-answer pair, in addition to the question itself and all possible candidate answers), and classical retrieval models such as MultiQA embeddings, Siamese network embeddings, and BM25. Our evaluations also extended to different scales of language models, from 7 billion parameters to 70 billion, to understand the impact of model scale on performance.

To demonstrate the practical utility of this methodology, we deployed it across a wide range of domains. These include Amazon Web Services (AWS) DevOps, where troubleshooting guides for cloud-based services tests the models' operational effectiveness; arXiv abstracts, which challenge the models' ability to parse and generate insights from dense scientific texts; StackExchange questions, which probe the models' responsiveness and accuracy; and SEC filings, where the complexity of financial reporting tests the models’ capacity to extract nuanced information from structured corporate documents. This multi-domain approach not only enhances the robustness of our evaluations but also ensures that our models are versatile and reliable across various real-world applications.

Evaluating the exam generation model

The following figure shows granular results of our evaluation method for the task of AWS DevOps troubleshooting. We report accuracy for different retrieval approaches and retriever sizes, on a percentage scale. Labels on the diameter show the AWS resources we’re using. Colors correspond to different retrieval approaches (Oracle, DPRV2, MultiQA, ClosedBook), and solid and broken lines correspond to different base LLM sizes (7B, 13B, and 70B). For instance, we observe that a small model such as Mistral-7B with MultiQA embeddings has an accuracy of around 80% for the AWS resource Relational Database Service (RDS).

Granular results of our exam evaluation for the task of AWS DevOps troubleshooting.png
A comparison of several different models, at a range of sizes, on the task of DevOps troubleshooting for eight different AWS resources.

Our experiments yielded four key findings. First, there’s no one-size-fits-all solution; the optimal choice of retrieval method, and to a lesser extent LLM, is typically task dependent. For example, in tasks such as SEC filings and arXiv abstracts, BM25 outperforms MultiQA and Siamese network embeddings, indicating that sparse retrieval is generally more effective than dense retrieval. This could be because such tasks often contain easily identifiable terms (e.g., AWS service names in AWS DevOps) that can be retrieved with keyword search, while other tasks, such as StackExchange, mostly contain common words.

Second, the right choice of retrieval method can lead to greater performance improvements than simply using larger LLMs. For instance, in SEC filings, we observed a greater performance gain from switching from Siamese network embeddings to DPRV2 than from switching to larger LLMs.

Third, for tasks involving closed-source knowledge, the accuracy bottleneck is typically the LLM rather than the retrieval method. Finally, a poorly aligned retriever component can result in worse accuracy than having no retrieval at all.

Exam enhancements through item response theory

Integrating item response theory (IRT) into our process has significantly improved the quality of the exams. IRT models the likelihood of a correct response based on characteristics of a question and the capabilities of a model. It uses three factors — difficulty, discrimination, and guessing chance — to create exams that more accurately reflect and predict model performance.

IRT posits that a model’s probability of correctly answering a question is correlated with a latent variable known as ability, and it provides a method for estimating the value of that variable. As such, it offers a way to quantify a model’s ability level.

Our process begins with an initial exam assessment, identifying and removing questions that contribute minimally to discriminative insights. The exam is then refined iteratively, based on updated IRT parameters, which helps it accurately gauge nuanced model behaviors.

By continuously analyzing and adjusting exams based on IRT parameters, we have seen substantial improvements in the exams’ ability to discriminate among models. For instance, we use Fisher information to quantify the informativeness of exam questions. Fisher information measures the amount of information that an observable random variable provides about an unknown parameter, offering a way to gauge the precision of statistical estimators in parameter estimation theory.

During iterative improvements for the arXiv task, the Fisher information function consistently showed progress, marking a considerable enhancement of the exams' capacity to differentiate model capabilities. This iterative process ensures that each new version of the exam is more informative than the last and effectively evaluates the RAG model’s abilities.

Evaluating the generated exams

To further enhance the assessment of RAG models, we categorize exam questions using both semantic analysis and Bloom’s revised taxonomy, devised by the University of Chicago psychologist Benjamin Bloom. Bloom’s taxonomy helps classify questions by cognitive complexity — from basic recall to analytical tasks — enabling structured evaluation of model capabilities.

Different levels in Bloom's taxonomy differentiate between the knowledge dimension (factual, conceptual, procedural, and meta-cognitive) and the cognitive-process dimension (remember, understand, apply, analyze, evaluate, and create). Additionally, we classify questions semantically by identifying keywords like “what” and “which.” These additional classifications allow us to assess how well models perform at different ability levels.

Bloom's Taxonomy.png
Average Fisher information for each category in Bloom’s taxonomy category (left) and semantic category (right) for the StackExchange task.

The above two figures present the average Fisher information value for each Bloom category (left) and semantic category (right) for the StackExchange task. For this specific task, we observe that “evaluating” and “understanding” are the most discriminate dimensions in Bloom’s taxonomy across different ability levels, while “remembering” is the least discriminatory.

On the semantic categories, we observe that “what” and “which” were the most discriminatory terms for lower ability levels, and “when” discriminated more at higher ability levels. One interpretation is that “what” and “how” questions tend to be more factual and syntax-based in the StackExchange domain, so at lower ability levels, RAG struggles more with these genres of questions.

The following figure illustrates the maximization process for the arXiv task as the exam and IRT estimation evolve. We show the results for three incremental steps. We observe a 0.05 increase in Fisher information even with a single iteration. This progress reaches a 0.1 increase in the subsequent steps.

Exam Information Curve.png
The maximization process, as the exam and IRT estimation evolve, for the task of generating abstracts for arXiv papers.

To expand our approach beyond Q&A applications, our future research will focus on domains such as summarization, translation, and sentiment analysis. We are also addressing the complex task of meta-evaluation, comparing and refining our evaluation methods to account for the multidimensional nature of LLM performance. Additionally, we will continuously update our methodologies to accommodate the rapid evolution of LLM technology, ensuring robust and comprehensive assessment of emerging models.

Acknowledgments: Laurent Callot

Research areas

Related content

CA, BC, Vancouver
We are looking for a senior audio applied scientist with experience and expertise in speech and audio signal processing, machine learning, automatic speech recognition, and/or natural language processing to work on state-of-the-art solutions for applications including speech enhancement, voice analytics, and real-time transcription of conversational audio. Amazon Connect is a highly disruptive cloud-based contact center that enables businesses to deliver engaging, dynamic, and personal customer service experiences. Amazon Connect is the result of the ten years of development that went into building the tools Amazon uses to provide its award winning customer service at massive and launching it as a publicly available service. With Amazon Connect, you can create your own cloud-based contact center and be taking calls in minutes. Our team’s charter as part of the Amazon Connect organization is to think big, re-imagine, innovate, and deliver novel, state-of-the-art solutions to audio and video problems. We are interested in all aspects of audio, video, and media technology, and we leverage the latest machine learning and signal processing techniques to surprise and delight our customers. Our applications include real-time audio/video communications, audio/video scene analysis, anomaly detection, audio/speech/music/image/video processing, enhancement, analysis, synthesis and coding. We have the nimbleness of a small startup but, at the same time, the immense resources of AWS - the world leader in cloud computing - behind us as well. If you want to innovate on the cutting edge while having a profound and direct impact on the end customer experience, this is the team to be on! About the team AWS Applications and Higher Level Abstractions (Apps) provides horizontal and industry vertical applications for business users with the same on-demand scalability, reliability, pay-as-you-go pricing, and machine learning expertise that drive AWS services. The AWS Applications group includes services such as Amazon Connect (a cost-effective cloud contact center), our End User Computing (including Amazon Workspaces, AppStream, etc.), Marketing Tech (Amazon Pinpoint), and Autonomous Checkout and Biometric Identity Services (Just Walk Out, Amazon One) for retail, sports, travel, and other verticals. Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
US, WA, Seattle
We are seeking an entrepreneurial, innovative and self-driven Senior Data Scientist to join our team. Your mission will be to leverage science, technology, and data analysis to help advertisers and hundreds of thousands of independent sellers grow their business on WW Amazon marketplaces by understanding how brand ads are working for them and coming up with scaled recommendations. You can change the life of local business owners while taking ownership to solve scientific challenges from analyzing millions of global advertising campaigns and generating brand insights and recommendations for all our advertisers. The Sponsored Brands Advertiser Control team is a versatile environment, with a wide variety of challenges. We guide advertisers to make informed decisions by recommendations, sharing insights, and forecasts. We help advertisers deliver effective campaigns automatically by optimizing campaign settings on behalf of them. We enable advertisers to achieve brand advertising goals with maximum efficiency. We have the opportunity to deliver social impact, own technical problems, thought diversity, and business impact. Why you will love this opportunity: Amazon is investing heavily in building a world-class advertising business. This team defines and delivers a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are a highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Impact and Career Growth: You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven from our customers' needs, translating that direction into specific plans for scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. Amazon Advertising is one of Amazon's fastest growing and most profitable businesses, responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! As a Senior Data Scientist on this team you will: - Lead Data Science solutions from beginning to end. - Deliver with independence on challenging large-scale problems with ambiguity and complexity. - Influence multiple teams and able to work closely with business teams, build consensus, and advise business leaders. - Write code (Python, R, Scala, SQL, etc.) to obtain, manipulate, analyze data, and build dashboards. - Build Statistical and Machine Learning models to solve specific business problems. - Retrieve, synthesize, and present critical data in a format that is immediately useful to answering specific questions or improving system performance. - Analyze historical data to identify trends and support optimal decision making. - Apply statistical and machine learning knowledge to specific business problems and data. - Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers. Work out why such examples are outliers and define if any actions needed. - Given anecdotes about anomalies or generate automatic scripts to define anomalies, deep dive to explain why they happen, and identify fixes. - Build decision-making models and propose effective solutions for the business problems you define. - Conduct written and verbal presentations to share insights to audiences of varying levels of technical sophistication. Team video https://youtu.be/zD_6Lzw8raE
US, CA, Palo Alto
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. As a core product offering within our advertising portfolio, Sponsored Products (SP) helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The SP team's primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! Why you love this opportunity Amazon is investing heavily in building a world-class advertising business. This team is responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Impact and Career Growth You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven fundamentally from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. Key job responsibilities Key job responsibilities As an Applied Scientist II on this team you will: * Lead complex and ambiguous projects to deliver bidding recommendation products to advertisers. * Build machine learning models and utilize data analysis to deliver scalable solutions to business problems. * Perform hands-on analysis and modeling with very large data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience. * Work closely with software engineers on detailed requirements, technical designs and implementation of end-to-end solutions in production. * Design and run A/B experiments that affect hundreds of millions of customers, evaluate the impact of your optimizations and communicate your results to various business stakeholders. * Work with scientists and economists to model the interaction between organic sales and sponsored content and to further evolve Amazon's marketplace. * Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. * Research new predictive learning approaches for the sponsored products business. * Write production code to bring models into production.
US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team's mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP) and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in developing LLM solution, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including developing best-in-class modeling, prompt optimization algorithms to enable Conversation AI use cases. Your work will directly impact our customers in the form of novel products and services .
US, GA, Atlanta
Are you looking to work at the forefront of Machine Learning and AI? Would you be excited to apply cutting edge Generative AI algorithms to solve real world problems with significant impact? The Generative AI Innovation Center at AWS is a new strategic team that helps AWS customers implement Generative AI solutions and realize transformational business opportunities. This is a team of strategists, data scientists, engineers, and solution architects working step-by-step with customers to build bespoke solutions that harness the power of generative AI. The team helps customers imagine and scope the use cases that will create the greatest value for their businesses, select and train and fine tune the right models, define paths to navigate technical or business challenges, develop proof-of-concepts, and make plans for launching solutions at scale. The GenAI Innovation Center team provides guidance on best practices for applying generative AI responsibly and cost efficiently. You will work directly with customers and innovate in a fast-paced organization that contributes to game-changing projects and technologies. You will design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for Data Scientists capable of using GenAI and other techniques to design, evangelize, and implement state-of-the-art solutions for never-before-solved problems. Key job responsibilities As an Data Scientist, you will - Collaborate with AI/ML scientists and architects to Research, design, develop, and evaluate cutting-edge generative AI algorithms to address real-world challenges - Interact with customers directly to understand the business problem, help and aid them in implementation of generative AI solutions, deliver briefing and deep dive sessions to customers and guide customer on adoption patterns and paths to production - Create and deliver best practice recommendations, tutorials, blog posts, sample code, and presentations adapted to technical, business, and executive stakeholder - Provide customer and market feedback to Product and Engineering teams to help define product direction About the team ABOUT AWS: Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
GB, London
Are you a MS or PhD student interested in a 2025 Internship in the field of machine learning, deep learning, speech, robotics, computer vision, optimization, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact, visionary person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Luxembourg, Netherlands, Poland, Romania, Spain, UAE, and UK). Please note these are not remote internships.
US, WA, Seattle
Revolutionize the Future of AI at the Frontier of Applied Science Are you a brilliant mind seeking to push the boundaries of what's possible with artificial intelligence? Join our elite team of researchers and engineers at the forefront of applied science, where we're harnessing the latest advancements in natural language processing, deep learning, and generative AI to reshape industries and unlock new realms of innovation. As an Applied Science Intern, you'll have the unique opportunity to work alongside world-renowned experts, gaining invaluable hands-on experience with cutting-edge technologies such as large language models, transformers, and neural networks. You'll dive deep into complex challenges, fine-tuning state-of-the-art models, developing novel algorithms for named entity recognition, and exploring the vast potential of generative AI. This internship is not just about executing tasks – it's about being a driving force behind groundbreaking discoveries. You'll collaborate with cross-functional teams, leveraging your expertise in statistics, recommender systems, and question answering to tackle real-world problems and deliver impactful solutions. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated.. Join us at the forefront of applied science, where your contributions will shape the future of AI and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Amazon has positions available for LLM & GenAI Applied Science Internships in, but not limited to, Bellevue, WA; Boston, MA; Cambridge, MA; New York, NY; Santa Clara, CA; Seattle, WA; Sunnyvale, CA. Key job responsibilities We are particularly interested in candidates with expertise in: LLMs, NLP/NLU, Gen AI, Transformers, Fine-Tuning, Recommendation Systems, Deep Learning, NER, Statistics, Neural Networks, Question Answering. In this role, you will work alongside global experts to develop and implement novel, scalable algorithms and modeling techniques that advance the state-of-the-art in areas at the intersection of LLMs and GenAI. You will tackle challenging, groundbreaking research problems on production-scale data, with a focus on recommendation systems, question answering, deep learning and generative AI. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. A day in the life - Collaborate with cross-functional teams to tackle complex challenges in natural language processing, computer vision, and generative AI. - Fine-tune state-of-the-art models and develop novel algorithms to push the boundaries of what's possible. - Explore the vast potential of generative AI and its applications across industries. - Attend cutting-edge research seminars and engage in thought-provoking discussions with industry luminaries. - Leverage state-of-the-art computing infrastructure and access to the latest research papers to fuel your innovation. - Present your groundbreaking work and insights to the team, fostering a culture of knowledge-sharing and continuous learning
US, WA, Seattle
Shape the Future of Visual Intelligence Are you passionate about pushing the boundaries of computer vision and shaping the future of visual intelligence? Join Amazon and embark on an exciting journey where you'll develop cutting-edge algorithms and models that power our groundbreaking computer vision services, including Amazon Rekognition, Amazon Go, Visual Search, and more! At Amazon, we're combining computer vision, mobile robots, advanced end-of-arm tooling, and high-degree of freedom movement to solve real-world problems at an unprecedented scale. As an intern, you'll have the opportunity to build innovative solutions where visual input helps customers shop, anticipate technological advances, work with leading-edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers worldwide. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated.. Join us at the forefront of applied science, where your contributions will shape the future of AI and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Amazon has positions available for Computer Vision Applied Science Internships in, but not limited to, Arlington, VA; Boston, MA; Cupertino, CA; Minneapolis, MN; New York, NY; Portland, OR; Santa Clara, CA; Seattle, WA; Bellevue, WA; Santa Clara, CA; Sunnyvale, CA. Key job responsibilities We are particularly interested in candidates with expertise in: Vision - Language Models, Object Recognition/Detection, Computer Vision, Large Language Models (LLMs), Programming/Scripting Languages, Facial Recognition, Image Retrieval, Deep Learning, Ranking, Video Understanding, Robotics In this role, you will work alongside global experts to develop and implement novel, scalable algorithms and modeling techniques that advance the state-of-the-art in areas of visual intelligence. You will tackle challenging, groundbreaking research problems to help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. A day in the life - Collaborate with Amazon scientists and cross-functional teams to develop and deploy cutting-edge computer vision solutions into production. - Dive into complex challenges, leveraging your expertise in areas such as Vision-Language Models, Object Recognition/Detection, Large Language Models (LLMs), Facial Recognition, Image Retrieval, Deep Learning, Ranking, Video Understanding, and Robotics. - Contribute to technical white papers, create technical roadmaps, and drive production-level projects that will support Amazon Science. - Embrace ambiguity, strong attention to detail, and a fast-paced, ever-changing environment as you own the design and development of end-to-end systems. - Engage in knowledge-sharing, mentorship, and career-advancing resources to grow as a well-rounded professional.
US, WA, Seattle
Shape the Future of Cloud Computing Are you a graduate student passionate about Automated Reasoning and its real-world applications? Join our team of innovators and embark on a journey to revolutionize cloud computing through cutting-edge automated reasoning techniques.Our tools are called billions of times daily, powering the backbone of Amazon's products and services. We are changing the way computer systems are developed and operated, raising the bar for security, durability, availability, and quality. As an Applied Science Intern, you'll have the opportunity to work alongside our brilliant scientists and contribute to groundbreaking projects. From distributed proof search and SAT/SMT solvers to program analysis, synthesis, and verification, you'll tackle complex challenges at the intersection of theory and practice, driving innovation and delivering tangible value to our customers. This internship is not just about executing tasks – you'll explore novel approaches to solving intricate automated reasoning problems. You'll dive deep into cutting-edge research, leveraging your expertise to develop innovative solutions. You'll work on deploying your solutions into production, witnessing the real-world impact of your contributions. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment. Join us and be part of a team that is shaping the future of cloud computing through the power of Automated Reasoning. Apply now and unlock your potential! Amazon has positions available for Automated Reasoning Applied Science Internships in, but not limited to, Arlington, VA; Boston, MA; Cupertino, CA; Minneapolis, MN; New York, NY; Portland, OR; Santa Clara, CA; Seattle, WA; Bellevue, WA; Santa Clara, CA; Sunnyvale, CA. Key job responsibilities We are particularly interested in candidates with expertise in: Theorem Proving, Boolean Satisfiability Solvers, Bounded Model Checking, Deductive Verification, Programming/Scripting Languages, Abstract Interpretation, Automated Reasoning, Static/Program Analysis, Program Synthesis In this role, you will work alongside global experts to develop and implement novel, scalable algorithms and modeling techniques that advance the state-of-the-art in areas at the intersection of Natural Language Processing and Speech Technologies. You will tackle challenging, groundbreaking research problems on production-scale data, with a focus on natural language processing, speech recognition, text-to-speech (TTS), text recognition, question answering, NLP models (e.g., LSTM, transformer-based models), signal processing, information extraction, conversational modeling, audio processing, speaker detection, large language models, multilingual modeling, and more. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. Key job responsibilities We are particularly interested in candidates with expertise in: Theorem Proving, Boolean Satisfiability Solvers, Bounded Model Checking, Deductive Verification, Programming/Scripting Languages, Abstract Interpretation, Automated Reasoning, Static/Program Analysis, Program Synthesis In this role, you will work alongside global experts to develop and implement novel, scalable algorithms and modeling techniques that advance the state-of-the-art in areas at the intersection of Natural Language Processing and Speech Technologies. You will tackle challenging, groundbreaking research problems on production-scale data, with a focus on natural language processing, speech recognition, text-to-speech (TTS), text recognition, question answering, NLP models (e.g., LSTM, transformer-based models), signal processing, information extraction, conversational modeling, audio processing, speaker detection, large language models, multilingual modeling, and more. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment.
US, WA, Seattle
Unleash Your Potential as an AI Trailblazer At Amazon, we're on a mission to revolutionize the way people discover and access information. Our Applied Science team is at the forefront of this endeavor, pushing the boundaries of recommender systems and information retrieval. We're seeking brilliant minds to join us as interns and contribute to the development of cutting-edge AI solutions that will shape the future of personalized experiences. As an Applied Science Intern focused on Recommender Systems and Information Retrieval in Machine Learning, you'll have the opportunity to work alongside renowned scientists and engineers, tackling complex challenges in areas such as deep learning, natural language processing, and large-scale distributed systems. Your contributions will directly impact the products and services used by millions of Amazon customers worldwide. Imagine a role where you immerse yourself in groundbreaking research, exploring novel machine learning models for product recommendations, personalized search, and information retrieval tasks. You'll leverage natural language processing and information retrieval techniques to unlock insights from vast repositories of unstructured data, fueling the next generation of AI applications. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated. Join us at the forefront of applied science, where your contributions will shape the future of AI and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Amazon has positions available for Machine Learning Applied Science Internships in, but not limited to Arlington, VA; Bellevue, WA; Boston, MA; New York, NY; Palo Alto, CA; San Diego, CA; Santa Clara, CA; Seattle, WA. Key job responsibilities We are particularly interested in candidates with expertise in: Knowledge Graphs and Extraction, Programming/Scripting Languages, Time Series, Machine Learning, Natural Language Processing, Deep Learning,Neural Networks/GNNs, Large Language Models, Data Structures and Algorithms, Graph Modeling, Collaborative Filtering, Learning to Rank, Recommender Systems In this role, you'll collaborate with brilliant minds to develop innovative frameworks and tools that streamline the lifecycle of machine learning assets, from data to deployed models in areas at the intersection of Knowledge Management within Machine Learning. You will conduct groundbreaking research into emerging best practices and innovations in the field of ML operations, knowledge engineering, and information management, proposing novel approaches that could further enhance Amazon's machine learning capabilities. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. A day in the life - Design, implement, and experimentally evaluate new recommendation and search algorithms using large-scale datasets - Develop scalable data processing pipelines to ingest, clean, and featurize diverse data sources for model training - Conduct research into the latest advancements in recommender systems, information retrieval, and related machine learning domains - Collaborate with cross-functional teams to integrate your innovative solutions into production systems, impacting millions of Amazon customers worldwide - Communicate your findings through captivating presentations, technical documentation, and potential publications, sharing your knowledge with the global AI community