Teaching language models to reason consistently

At this year’s ACL, Amazon researchers won an outstanding-paper award for showing that knowledge distillation using contrastive decoding in the teacher model and counterfactual reasoning in the student model improves the consistency of “chain of thought” reasoning.

Teaching large language models (LLMs) to reason is an active topic of research in natural-language processing, and a popular approach to that problem is the so-called chain-of-thought paradigm, in which a model is prompted not just to answer questions but to provide rationales for its answers.

In-context learning.png
The structure of the type of prompt used to induce chain-of-thought reasoning in a large language model.

However, given LLMs’ tendency to hallucinate (that is, make spurious factual assertions), the generated rationales may be inconsistent with the predicted answers, making them untrustworthy.

In a paper we presented at this year’s meeting of the Association for Computational Linguistics (ACL), we show how to improve the consistency of chain-of-thought reasoning through knowledge distillation: given pairs of questions and answers from a training set, an LLM — the “teacher” — generates rationales for a smaller “student” model, which learns to both answer questions and provide rationales for its answers. Our paper received one of the conference’s outstanding-paper awards, reserved for 39 of the 1,074 papers accepted to the main conference.

Sample rationale.png
Example of a student model outputting a rationale together with the answer to a question.

With knowledge distillation (KD), we still have to contend with the possibility that the rationales generated by the teacher are spurious or vacuous. On the student side, the risk is that while the model may learn to produce rationales, and it may learn to deliver answers, it won’t learn the crucial logical relationships between the two; it might, for instance, learn inferential short cuts between questions and answers that bypass the whole reasoning process.

False rationales.png
In a study involving a leading LLM, we found that 42% of generated rationales were vacuous (top), and 37% were irrelevant (bottom).

To curb hallucination, on the teacher side, we use contrastive decoding, which ensures that the rationales generated for true assertions differ as much as possible from the rationales generated for false assertions.

To train the student model, we use counterfactual reasoning, in which the model is trained on both true and false rationales and must learn to provide the answer that corresponds to the rationale, even if it’s wrong. To ensure that this doesn’t compromise model performance, during training, we label true rationales “factual” and false rationales “counterfactual”.

Counterfactual training.png
Counterfactual training eliminates reasoning short cuts, in which the student model uses incidental features of the input question to leap to an answer, without performing the intervening inferential steps.

To evaluate our model, we compared it to a chain-of-thought model built using ordinary knowledge distillation, on datasets for four different reasoning tasks. We asked human reviewers to evaluate the rationales generated by the teacher models. To evaluate the student models, we used the leakage-adjusted simulatability (LAS) metric, which measures the ability of a simulator (an external model) to predict the student’s outputs from the generated rationales. Across the board, our models outperformed the baselines, while preserving accuracy on the reasoning tasks.

Contrastive decoding

As our teacher model, we use a trained LLM whose parameters are frozen. To generate training examples for the student model, we use in-context learning, in which we provide the teacher with a handful of examples of questions, answers, and human-annotated rationales, then supply a final question-answer pair. The model generates the rationale for the final pair.

Related content
Methods for controlling the outputs of large generative models and integrating symbolic reasoning with machine learning are among the conference’s hot topics.

During training, LLMs learn the probabilities of sequences of words. At generation time, they either select the single most probable word to continue a sequence or sample from the top-ranked words. This is the standard decoding step, which doesn’t guarantee that the generated rationales justify the model’s answers.

We can control the decoding process without making any adjustments to the LLM parameters. With contrastive decoding, we perform the same in-context rationale generation twice, once with the true answer in the final question-answer pair and once with a perturbed answer.

Then, when we’re decoding the true question-answer pair, we select words that are not only probable given the true pair but relatively improbable given the false pair. In other words, we force the rationale for the true pair to diverge from the rationale for the false pair. In this way, we ensure that the output skews toward rationales particularized to the answers in the question-answer pairs.

In our experiments, we considered two types of perturbation to the true answers: null answers, where no answer at all was supplied, and false answers. We found that contrastive decoding using false answers consistently yielded better rationales than contrastive decoding using null answers.

Counterfactual reasoning

Past research has shown that question-answering models will often exploit short cuts in their training data to improve performance. For instance, answering “who?” questions with the first proper name encountered in a source document will yield the right answer with surprising frequency.

Similarly, a chain-of-thought model might learn to use shortcuts in answering questions and generate rationales as a parallel task, without learning the crucial connection between the two. The goal of training our model on a counterfactual-reasoning objective is to break that short cut.

Related content
Amazon’s Dan Roth on a hot new research topic — that he’s been studying for more than 25 years.

To generate counterfactual training data, we randomly vary the answers in question-answer pairs and generate the corresponding rationales, just as we did for contrastive decoding. Then we train the student model using the questions and rationales as input, and it must generate the corresponding answers.

This means that the student model may very well see the same question multiple times during training, but with different answers (and rationales). The “factual” and “counterfactual” tags prevent it from getting confused about its task.

In our experiments, we compared our approach to one that also uses in-context learning but uses greedy decoding to produce rationales — that is, a decoding method that always selects the highest-probability word. We also used two other baselines: an LLM that directly generates rationales from in-context learning and a model trained on human-annotated rationales.

Our study with human evaluators showed that in-context learning with contrastive decoding generated more persuasive rationales than in-context learning with greedy decoding:

Teacher ModelGrammaticalityNew InfoSupports Answer
Greedy0.990.650.48
Contrast.-Empty0.970.770.58
Contrast.-Wrong0.970.820.63

Table: Human evaluation of data generated with greedy decoding, contrastive decoding using empty answers, and contrastive decoding using incorrect answers.

In the experiments using the LAS metric, knowledge distillation using contrastive decoding alone consistently outperformed all three baselines, and knowledge distillation with counterfactual reasoning and contrastive decoding consistently outperformed knowledge distillation with contrastive decoding alone. The model trained on the human-annotated dataset yielded the most-accurate results on downstream tasks, but its rationales fared badly. On average, our model was slightly more accurate than the one trained using greedy decoding.

LAS results.png
Experimental results, measured according to leakage-adjusted simulatability (LAS) and question-answering accuracy.

Research areas

Related content

US, WA, Bellevue
We are a part of Amazon Alexa organization where our mission is “delight customers through contextual and personalized proactive experiences that keep customers informed, engaged, and productive without cognitive burden”. We are developing advanced systems to deliver engaging, intuitive, and adaptive content recommendations across all Amazon surfaces. We aim to facilitate seamless reasoning and customer experiences, surpassing the capabilities of previous machine learning models. We are looking for a passionate, talented, and resourceful Senior Applied Scientist in the field of Natural Language Processing (NLP), Large Language Model (LLM), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware personal assistant. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, shipping solutions via rapid experimentation and then iterating on user feedback and interactions. Key job responsibilities As a Senior Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems, setting the direction and collaborating with applied scientists and engineers to develop novel algorithms and modeling techniques to enable timely, relevant and delightful recommendations and conversations. Your work will directly impact our customers in the form of products and services that make use of various machine learing, deep learning and language model technologies. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in the state of art.
US, WA, Seattle
Amazon continues to invest heavily in building our world class advertising business. Our products are strategically important to our Retail and Marketplace businesses, driving long term growth. We deliver billions of ad impressions and millions of clicks daily, breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and strong bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. The Sponsored Products Monetization team is broadly responsible for pricing of ads on Amazon search pages, balancing short-term and long-term ad revenue growth to drive sustainable marketplace health. As a Senior Applied Scientist on our team, you will be responsible for defining the science and technical strategy for one of our most impactful marketplace controls, creating lasting value for Amazon and our advertising customers. You will help to identify unique opportunities to create customized and delightful shopping experience for our growing marketplaces worldwide. Your job will be identify big opportunities for the team that can help to grow Sponsored Products business working with retail partner teams, Product managers, Software engineers and PMs. You will have opportunity to design, run and analyze A/B experiments to improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact. More importantly, you will have the opportunity to broaden your technical skills in an environment that thrives on creativity, experimentation, and product innovation. Key job responsibilities - Lead science, tech and business strategy and roadmap for Sponsored Products Monetization - Drive alignment across multiple organizations for science, engineering and product strategy to achieve business goals - Lead and mentor scientists and engineers across teams to develop, test, launch and improve of science models designed to optimize the shopper experience and deliver long term value for Amazon and advertisers - Develop state of the art experimental approaches and ML models - Drive end-to-end Machine Learning projects that have a high degree of ambiguity, scale, complexity - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving - Research new and innovative machine learning approaches - Recruit Scientists to the team and provide mentorship
US, CA, East Palo Alto
The Applied Scientist will play a critical role in the research, develop, and implementation of solutions to key challenges in developing conversational AI systems that can understand and communicate with customers in a natural and contextually appropriate manner. This involves tackling complex problems in areas such as multi-turn dialogue management, knowledge grounding, and open-ended generation. Key job responsibilities 1. Research and development of LLM-based chatbots and conversational AI systems for customer service applications. 2. Design and implement state-of-the-art NLP and ML models for tasks such as language understanding, dialogue management, and response generation. 3. Collaborate with cross-functional teams, including data scientists, software engineers, and product managers, to integrate LLM-based solutions into Amazon's customer service platforms. 4. Develop and implement strategies for data collection, annotation, and model training to ensure high-quality and robust performance of the chatbots. 5. Conduct experiments and evaluations to measure the performance of the developed models and systems, and identify areas for improvement. 6. Stay up-to-date with the latest advancements in NLP, LLMs, and conversational AI, and explore opportunities to incorporate new techniques and technologies into Amazon's customer service solutions. 7. Collaborate with internal and external research communities, participate in conferences and publications, and contribute to the advancement of the field. A day in the life We thrive on solving challenging problems to innovate for our customers. By pushing the boundaries of technology, we create unparalleled experiences that enable us to rapidly adapt in a dynamic environment. Our decisions are guided by data, and we collaborate with engineering, science, and product teams to foster an innovative learning environment. If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! Benefits Summary: Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan About the team Join our team of scientists and engineers who develop and deploy LLM-based Conversational AI systems to enhance Amazon's customer service experience and effectiveness. We work on innovative solutions that help customers solve their issues and get their questions answered efficiently, and associate-facing products that support our customer service associate workforce.
US, WA, Seattle
The Alexa Smart Home team is focused on making Alexa the user interface for the home. From the simplest voice commands (turn on the lights, turn down the heat) to use cases spanning home security, home entertainment, and the home environment; we are evolving Alexa into an intelligent, indispensable companion that automates daily routines, simplifies interaction with appliances and electronics, and alerts when something unusual is detected. You can be part of a team delivering features that are highly anticipated by media and well received by our customers. As an Applied Scientist, you will work with other scientists and software developers to design and build the next generation of Smart Home voice control using the latest Large Language Models (LLMs). And, you will have the satisfaction of working on a product your friends and family can relate to, and want to use every day. Key job responsibilities - Develop new inference and training techniques to improve the performance of LLMs for Smart Home control and Automation - Develop robust techniques for synthetic data generation for training large models and maintaining model generalization - Mentoring junior scientists to improve their skills, knowledge, and their ability to get things done About the team We are a team of Scientists, Machine Learning Engineers, and Software Developers that work together to make Alexa more insightful and proactive through ambient intelligence, with features like Alexa Hunches that automatically control Smart Home devices. We are interdisciplinary and we act like it. We ask each other questions and value our different perspectives.
US, MA, Boston
As part of Alexa CAS team, our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. Your work will directly impact our customers in the form of novel products and services . Key job responsibilities . You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. . You will work on core LLM technologies, including developing best-in-class modeling, prompt optimization algorithms to enable Conversation AI use cases · Build and measure novel online & offline metrics for personal digital assistants and customer scenarios, on diverse devices and endpoints · Create, innovate and deliver deep learning, policy-based learning, and/or machine learning based algorithms to deliver customer-impacting results · Perform model/data analysis and monitor metrics through online A/B testing · Research and implement novel machine learning and deep learning algorithms and models.
US, CA, Santa Clara
The Geospatial science team solves problems at the interface of ML/AI and GIS for Amazon's last mile delivery programs. We have access to Earth-scale datasets and use them to solve challenging problems that affect hundreds of thousands of transporters. We are looking for strong candidates to join the transportation science team which owns time estimation, GPS trajectory learning, and sensor fusion from phone data. You will join a team of GIS and ML domain experts and be expected to develop ML models, present research results to stakeholders, and collaborate with SDEs to implement the models in production. Key job responsibilities - Understand business problems and translate them into science problems - Develop ML models - Present research results - Write and publish papers - Collaborate with other scientists
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Us: Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come together for whatever, every day. We’re about community, inside and out. You’ll find coworkers who are eager to team up, collaborate, and crush (or elegantly solve) problems together. We’re on a quest to empower live communities, so if this sounds good to you, see what we’re up to on LinkedIn and X, and discover the projects we’re solving on our Blog. Be sure to explore our Interviewing Guide to learn how to ace our interview process. About the Role: We are looking for an experienced Data Scientist to support our central analytics and finance disciplines at Twitch. Bringing to bear a mixture of data analysis, dashboarding, and SQL query skills, you will use data-driven methods to answer business questions, and deliver insights that deepen understanding of our viewer behavior and monetization performance. Reporting to the Head of Finance, Analytics, and Business Operations, your team will be located in San Francisco. While there is a preference for the San Francisco Bay Area. You Will: - Create actionable insights from data related to Twitch viewers, creators, advertising revenue, commerce revenue, and content deals. - Develop dashboards and visualizations to communicate points of view that inform business decision-making. - Create and maintain complex queries and data pipelines for ad-hoc analyses. - Author narratives and documentation that support conclusions. - Collaborate effectively with business partners, product managers, and data team members to align data science efforts with strategic goals.
US, WA, Seattle
The Private Brands Discovery team designs innovative machine learning solutions to enhance customer awareness of Amazon’s own brands and help customers find products they love. This interdisciplinary team of scientists and engineers incubates and develops disruptive solutions using cutting-edge technology to tackle some of the most challenging scientific problems at Amazon. To achieve this, the team utilizes methods from Natural Language Processing, deep learning, large language models (LLMs), multi-armed bandits, reinforcement learning, Bayesian optimization, causal and statistical inference, and econometrics to drive discovery throughout the customer journey. Our solutions are crucial to the success of Amazon’s private brands and serve as a model for discovery solutions across the company. This role presents a high-visibility opportunity for someone eager to make a business impact, delve into large-scale problems, drive measurable actions, and collaborate closely with scientists and engineers. As a team lead, you will be responsible for developing and coaching talent, guiding the team in designing and developing cutting-edge models, and working with business, marketing, and software teams to address key challenges. These challenges include building and improving models for sourcing, relevance, and CTR/CVR estimation, deploying reinforcement learning methods in production etc. In this role, you will be a technical leader in applied science research with substantial scope, impact, and visibility. A successful team lead will be an analytical problem solver who enjoys exploring data, leading problem-solving efforts, guiding the development of new frameworks, and engaging in investigations and algorithm development. You should be capable of effectively interfacing between technical teams and business stakeholders, pushing the boundaries of what is scientifically possible, and maintaining a sharp focus on measurable customer and business impact. Additionally, you will mentor and guide scientists to enhance the team's talent and expand the impact of your work.
IN, KA, Bangalore
AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global Support team interacts with leading companies and believes that world-class support is critical to customer success. AWS Support also partners with a global list of customers that are building mission-critical applications on top of AWS services. Do you have proven analytical capabilities to identify business opportunities, develop predictive models and optimization algorithms to help us build state of the art Support organization? At Amazon, we are working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, and driven people. We set big goals and are looking for people who can help us reach and exceed them. Amazon Web Services (AWS) is one of the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Amazon Web Services, Inc. provides services for broad range of applications including compute, storage, databases, networking, analytics, machine learning and artificial intelligence (AI), Internet of Things (IoT), security, and application development, deployment, and management. Global AWS Support BizOPs team is looking for a passionate Data Scientist to model contact forecasting, discovering insights and identifying opportunities through the use of statistics, machine learning, and deep learning to drive business and operational improvements. A successful candidate must be passionate about building solutions that will help drive a more efficient operations network and optimize cost. In this role, you will partner with data engineering, Tooling team, operations, Training, Customer Service, Capacity planning and finance teams, driving optimization and prediction solutions across the network. Key job responsibilities We are looking for an experienced and motivated Sr.Data Scientist with proven abilities to build and manage modeling projects, identify data requirements, build methodology and tools that are statistically grounded The candidate will be an expert in the areas of data science, optimization, machine learning and statistics, and is comfortable facilitating ideation and working from concept through execution. The candidate is customer obsessed, innovative, independent, results-oriented and enjoys working in a fast-paced growing organization. An interest in operations, manufacturing or process improvement is helpful. The ability to embrace this ambiguity and work with a highly distributed team of experts is critical. As we scale up, there is opportunity to own globally impactful work and grow your career in technical, programmatic or people leadership. You will likely work with Python or R, though specific particular modelling language. Your problem-solving ability, knowledge of data models and ability to drive results through ambiguity are more important to us. A day in the life Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, MA, Boston
* Note: This job is located in Hudson, MA Amazon Dash Cart allows shoppers to checkout without lines — you just place the items in the cart and the cart will take care of the rest. Designed and custom-built by Amazonians, our Dash Cart uses a variety of technologies including computer vision, sensor fusion, and advanced machine learning. Check it out at https://www.amazon.com/b?ie=UTF8&node=21289116011. We are looking for an Applied Scientist to develop hardware solutions that require significant innovation for our Amazon Dash Cart team, located in Hudson, MA. As an Applied Scientist within the hardware development team, you will engage with a skilled and accomplished cross-disciplinary staff to conceive and evaluate innovative technologies. You will collaborate with internal and external stakeholders to drive key aspects of technology solution definition, execution and validation. Key job responsibilities - Evaluate or conceive of new cameras, sensors, and computer vision systems which push the limit of existing technologies and delight Dash Cart customers. - Design embedded compute architectures optimized for cost and power efficiency. - Propose hardware solutions and create working prototypes while working with hardware development engineers to bring those prototypes to production. - Develop computer vision algorithms including ISP optimization and video pipelines architectures. - Develop firmware device drivers for interfacing to a range of hardware components and sensors. - Work closely with an inter-disciplinary product development team including outside partners to bring prototypes into production. - Use machine learning, data mining, statistical techniques and others to create actionable, meaningful, and scalable solutions for the business' problems.