Training code generation models to debug their own outputs

Using large language models to generate training data and updating models through both fine tuning and reinforcement learning improves the success rate of code generation by 39%.

Code generation — automatically translating natural-language specifications into computer code — is one of the most promising applications of large language models (LLMs). But the more complex the programming task, the more likely the LLM is to make errors.

Of course, the more complex the task, the more likely human coders are to make errors, too. That’s why debugging is an essential component of the software development pipeline. In a paper we presented at the 2024 Conference on Neural Information Processing Systems (NeurIPS), we describe a new way to train LLMs to be better debuggers while simultaneously improving code generation ability.

Previous attempts to debug code with LLMs have primarily used few-shot learning, where a few examples of successful debugs are provided, and the LLM infers the rest. In our work, by contrast, we use both supervised fine tuning (SFT) and reinforcement learning (RL) to specialize an LLM for debugging. Since debugging training data is scarce, we leveraged LLMs to create high-quality synthetic training data.

LeDex overview.png
In our work, we use both supervised fine tuning (SFT) and reinforcement learning (RL) to specialize an LLM for debugging.

We conducted a series of experiments in which LLMs were given one attempt to generate code in response to a natural-language prompt and one further attempt to debug that code. Because our models had been fine-tuned on debugging data, their initial generations were more successful than those of an LLM relying solely on prompt engineering. But with both our models and the prompt-engineering baseline, debugging always resulted in better code performance.

Self-debuggin pipelin.png
In our experiments, we gave an LLM one attempt to generate code in response to a natural-language prompt and one further attempt to debug that code.

To evaluate model performance, we used the pass@k metric, in which a model generates k implementations of a natural-language specification, and it’s accounted successful if at least one of those implementations passes a set of prespecified tests. In experiments with different code LLMs — including StarCoder-15B, CodeLlama-7B, and CodeLlama-13B — our approach improved pass@k scores by up to 39% on standard benchmark datasets such as MBPP.

Data synthesis

There are several widely used public datasets for training code generation models, which include natural-language prompts; canonical implementations of the prompts in code; and unit tests, specific sequences of inputs that can be used to test the full functional range of the generated code. But training data for debugging models is comparatively sparse.

To create our debugging dataset, we begin with several of the existing code generation datasets. We repeatedly feed each natural-language prompt in those datasets to a code generation model, resulting in a number of different generations — say, 20 — for the same prompt. Then we run the relevant unit tests on those generations, keeping only the ones that fail the tests — that is, the buggy code.

Next, we feed the buggy code to an LLM, together with the error messages it generated on the unit tests, and we prompt the LLM to explain where and why errors occurred. Finally, we take the LLM’s diagnosis and feed it, the buggy code, and the error messages back to the LLM, together with instructions to repair the bug. This is a version of chain-of-thought reasoning: prior work have shown that asking an LLM to explain the action it intends to take before it takes that action often improves performance.

LLM debugger - Prompt used to generate training data for code explanation and refinement.
Prompt used to generate training data for code explanation and refinement.

We next execute the unit tests on the revised code, this time keeping only those revisions that pass all the tests. We now have a new dataset consisting of natural-language prompts; buggy implementations of those prompts; diagnoses of the bugs; debugged code; and unit tests.

Model updates

Armed with this dataset, we’re ready to update our debugging model, using both SFT and RL. With both update methods, we experimented with training regimens in which we asked for chain-of-thought explanations before asking for code revisions and those in which we simply asked for revisions.

With SFT, we prompted the model with the natural-language instructions, the buggy code, and the error messages from the unit tests. Model outputs were evaluated according to their performance on the unit tests.

With RL, the model interacts iteratively with the training data, attempting to learn a policy that will maximize a reward function. The classic RL learning algorithms require a continuous reward function, to enable exploration of the optimization landscape.

The unit test feedback is binary, hence discrete. To overcome this limitation, in addition to success rate on the unit tests, our RL reward function also includes the revised code’s CodeBLEU score, which measures its distance from the code of the canonical examples, providing a continuous reward signal.

Unit tests are time and resource intensive to apply, so training on CodeBLEU scores also opens the possibility of training directly on the canonical examples, a much more computationally efficient process. Our experiments indicate that this approach does improve debugging performance — though not as much as training on unit test results as well.

Evaluation

In our experiments, we used three types of models: one was a vanilla LLM that relied entirely on prompt engineering; one was an LLM updated on our dataset using SFT only; and one was an LLM updated on our dataset using both SFT and RL.

We implemented each type of model using three different LLM architectures, and for each class of model, we measured three sets of outputs: an initial generation; a direct revision of the initial generation; and a revision involving chain-of-thought reasoning. Finally, we also investigated two different generation paradigms: in one, a model was given one chance to generate correct code; in the other, it was given 10 chances. This gave us a total of 24 different comparisons.

Across the board, our updated models outperformed the prompt-engineering baselines. In all but one case, the version of our model updated through both SFT and RL outperformed the version updated through SFT only. Overall, we demonstrate a scalable way to use execution feedback and canonical examples to better debug code models and improve their generation performance.

Research areas

Related content

IN, KA, Bengaluru
Amazon Devices is an inventive research and development company that designs and engineer high-profile devices like the Kindle family of products, Fire Tablets, Fire TV, Health Wellness, Amazon Echo & Astro products. This is an exciting opportunity to join Amazon in developing state-of-the-art techniques that bring Gen AI on edge for our consumer products. We are looking for exceptional scientists to join our Applied Science team and help develop the next generation of edge models, and optimize them while doing co-designed with custom ML HW based on a revolutionary architecture. Work hard. Have Fun. Make History. Key job responsibilities What will you do? - Quantize, prune, distill, finetune Gen AI models to optimize for edge platforms - Fundamentally understand Amazon’s underlying Neural Edge Engine to invent optimization techniques - Analyze deep learning workloads and provide guidance to map them to Amazon’s Neural Edge Engine - Use first principles of Information Theory, Scientific Computing, Deep Learning Theory, Non Equilibrium Thermodynamics - Train custom Gen AI models that beat SOTA and paves path for developing production models - Collaborate closely with compiler engineers, fellow Applied Scientists, Hardware Architects and product teams to build the best ML-centric solutions for our devices - Publish in open source and present on Amazon's behalf at key ML conferences - NeurIPS, ICLR, MLSys.
US, WA, Seattle
About Sponsored Products and Brands The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through state-of-art generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities This role will be pivotal in redesigning how ads contribute to a personalized, relevant, and inspirational shopping experience, with the customer value proposition at the forefront. Key responsibilities include, but are not limited to: * Contribute to the design and development of GenAI, deep learning, multi-objective optimization and/or reinforcement learning empowered solutions to transform ad retrieval, auctions, whole-page relevance, and/or bespoke shopping experiences. * Collaborate cross-functionally with other scientists, engineers, and product managers to bring scalable, production-ready science solutions to life. * Stay abreast of industry trends in GenAI, LLMs, and related disciplines, bringing fresh and innovative concepts, ideas, and prototypes to the organization. * Contribute to the enhancement of team’s scientific and technical rigor by identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. * Mentor and grow junior scientists and engineers, cultivating a high-performing, collaborative, and intellectually curious team. A day in the life As an Applied Scientist on the Sponsored Products and Brands Off-Search team, you will contribute to the development in Generative AI (GenAI) and Large Language Models (LLMs) to revolutionize our advertising flow, backend optimization, and frontend shopping experiences. This is a rare opportunity to redefine how ads are retrieved, allocated, and/or experienced—elevating them into personalized, contextually aware, and inspiring components of the customer journey. You will have the opportunity to fundamentally transform areas such as ad retrieval, ad allocation, whole-page relevance, and differentiated recommendations through the lens of GenAI. By building novel generative models grounded in both Amazon’s rich data and the world’s collective knowledge, your work will shape how customers engage with ads, discover products, and make purchasing decisions. If you are passionate about applying frontier AI to real-world problems with massive scale and impact, this is your opportunity to define the next chapter of advertising science. About the team The Off-Search team within Sponsored Products and Brands (SPB) is focused on building delightful ad experiences across various surfaces beyond Search on Amazon—such as product detail pages, the homepage, and store-in-store pages—to drive monetization. Our vision is to deliver highly personalized, context-aware advertising that adapts to individual shopper preferences, scales across diverse page types, remains relevant to seasonal and event-driven moments, and integrates seamlessly with organic recommendations such as new arrivals, basket-building content, and fast-delivery options. To execute this vision, we work in close partnership with Amazon Stores stakeholders to lead the expansion and growth of advertising across Amazon-owned and -operated pages beyond Search. We operate full stack—from backend ads-retail edge services, ads retrieval, and ad auctions to shopper-facing experiences—all designed to deliver meaningful value.
GB, London
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.
US, WA, Seattle
Passionate about books? The Amazon Books personalization team is looking for a talented Applied Scientist II to help develop and implement innovative science solutions to make it easier for millions of customers to find the next book they will love. In this role you will: - Collaborate within a dynamic team of scientists, economists, engineers, analysts, and business partners. - Utilize Amazon's large-scale computing and data resources to analyze customer behavior and product relationships. - Contribute to building and maintaining recommendation models, and assist in running A/B tests on the retail website. - Help develop and implement solutions to improve Amazon's recommendation systems. Key job responsibilities The role involves working with recommender systems that combine Natural Language Processing (NLP), Reinforcement Learning (RL), graph networks, and deep learning to help customers discover their next great read. You will assist in developing recommendation model pipelines, analyze deep learning-based recommendation models, and collaborate with engineering and product teams to improve customer-facing recommendations. As part of the team, you will learn and contribute across these technical areas while developing your skills in the recommendation systems space. A day in the life In your day-to-day role, you will contribute to the development and maintenance of recommendation models, support the implementation of A/B test experiments, and work alongside engineers, product teams, and other scientists to help deploy machine learning solutions to production. You will gain hands-on experience with our recommendation systems while working under the guidance of senior scientists. About the team We are Books Personalization a collaborative group of 5-7 scientists, 2 product leaders, and 2 engineering teams that aims to help find the right next read for customers through high quality personalized book recommendation experiences. Books Personalization is a part of the Books Content Demand organization, which focuses on surfacing the best books for customers wherever they are in their current book journey.
EE, Tallinn
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models, speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.
GB, London
Are you a MS student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for a customer obsessed Data Scientist Intern who can innovate in a business environment, building and deploying machine learning models to drive step-change innovation and scale it to the EU/worldwide. If this describes you, come and join our Data Science teams at Amazon for an exciting internship opportunity. If you are insatiably curious and always want to learn more, then you’ve come to the right place. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science Key job responsibilities As a Data Science Intern, you will have following key job responsibilities: • Work closely with scientists and engineers to architect and develop new algorithms to implement scientific solutions for Amazon problems. • Work on an interdisciplinary team on customer-obsessed research • Experience Amazon's customer-focused culture • Create and Deliver Machine Learning projects that can be quickly applied starting locally and scaled to EU/worldwide • Build and deploy Machine Learning models using large data-sets and cloud technology. • Create and share with audiences of varying levels technical papers and presentations • Define metrics and design algorithms to estimate customer satisfaction and engagement A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, France, Germany, Ireland, Israel, Italy, Luxembourg, Netherlands, Poland, Romania, Spain and the UK). Please note these are not remote internships.
IL, Tel Aviv
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models, speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, South Africa, Spain, Sweden, UAE, and UK). Please note these are not remote internships.
RO, Iasi
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.
DE, BE, Berlin
Are you interested in enhancing Alexa user experiences through Large Language Models? The Alexa AI Berlin team is looking for an Applied Scientist to join our innovative team working on Large Language Models (LLMs), Natural Language Processing, and Machine/Deep Learning. You will be at the center of Alexa's LLM transformation, collaborating with a diverse team of applied and research scientists to enhance existing features and explore new possibilities with LLMs. In this role, you'll work cross-functionally with science, product, and engineering leaders to shape the future of Alexa. Key job responsibilities As an Applied Scientist in Alexa Science team: - You will develop core LLM technologies including supervised fine tuning and prompt optimization to enable innovative Alexa use cases - You will research and design novel metrics and evaluation methods to measure and improve AI performance - You will create automated, multi-step processes using AI agents and LLMs to solve complex problems - You will communicate effectively with leadership and collaborate with colleagues from science, engineering, and business backgrounds - You will participate in on-call rotations to support our systems and ensure continuous service availability A day in the life As an Applied Scientist, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create technical roadmaps and drive production level projects that will support Amazon Science. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. About the team You would be part of the Alexa Science Team where you would be collaborating with Fellow Applied and research scientists!
CA, ON, Toronto
Are you a passionate scientist in the computer vision area who is aspired to apply your skills to bring value to millions of customers? Here at Ring, we have a unique opportunity to innovate and see how the results of our work improve the lives of millions of people and make neighborhoods safer. As a Principal Applied Scientist, you will work with talented peers pushing the frontier of computer vision and machine learning technology to deliver the best experience for our neighbors. This is a great opportunity for you to innovate in this space by developing highly optimized algorithms that will work at scale. This position requires experience with developing Computer Vision, Multi-modal LLMs and/or Vision Language Models. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms. Key job responsibilities - You will be responsible for defining key research directions in Multimodal LLMs and Computer Vision, adopting or inventing new techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. - You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. - You will also participate in organizational planning, hiring, mentorship and leadership development. - You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).