How to produce factually accurate automatic text summaries

New metric can be calculated 55 times as quickly as its state-of-the-art predecessor, making it practical for model training.

Abstractive summarization is the automatic extraction and recombination of phrases from a text in order to summarize that text. Deep-learning-based abstractive-summarization systems are usually trained to maximize the overlap between the summaries they generate and sample summaries in their training data.

The trouble with this approach is that a summary that overlaps significantly with a target summary may recombine phrases in factually inaccurate manner. In the example below, which concerns an upcoming boxing match, the summarization model correctly concludes that “has a chink in his armor” summarizes an important aspect of the input text, but it applies it to the wrong boxer:

Klitschko example.png
Conventional metrics for training abstractive-summarization models don’t account for factual accuracy.

Although abstractive-summarization models have become very good at generating fluent, syntactically correct text, their frequent factual inaccuracy has severely hampered their adoption.

In a paper we presented at this year’s meeting of the Association for Computational Linguistics (ACL), we describe a new metric for measuring the performance of abstractive-summarization models, which accounts for factual accuracy. We also describe a methodology for using our metric to train abstractive-summarization models.

Our metric adopts the same general strategy as the earlier QAGS metric, but it’s 55 times as fast to apply, which makes it more practical for model training.

QAGS-QUALS-Image.png
Our new summary-scoring metric, QUALS (bottom), uses the same strategy as the earlier QAGS (top) but has a simpler architecture, enabling it to generate a score 55 times as quickly.
Credit: Glynis Condon

Using QAGS as an evaluation metric, we compared models trained using our approach to models trained using traditional metrics and methodologies, and we found that our approach improved on the best-performing previous models by 15% on one dataset and by 2% on another.

Scoring through question answering

QAGS (which stands for question answering and generation for summarization) uses a four-step procedure to score a text summary. First, it extracts names and noun phrases from the summary; these are potential answers to potential questions about the summary. 

Second, it feeds each extracted noun, together with the text of the summary, to a trained question generation model, which produces a question whose answer is the noun. Third, it feeds each of the generated questions to a trained question-answering model, once accompanied by the summary and once accompanied by the source text. 

QAGS-Image.cropped.png
QAGS requires the sequential application of three neural models: an answer extraction model, a question-answering model, and a question generation model.
Credit: Glynis Condon

The final score assesses the similarity between the answers based on the source text and the answers based on the summary. The intuition is that if both the summary and the source text cause the question-answering model to answer the questions in the same way, the summary is factually accurate. If they cause different answers, then the summary has probably garbled some facts.

By accounting for factual accuracy, QAGS offers a better assessment of summary quality than metrics based on phrasal overlap. But it requires the sequential application of three different deep-learning networks, which is inefficient.

QUALS

Our approach, which we call QUALS (for question answering with language model score for summarization), reduces the number of models to one, which makes it 55 times as fast as QAGS.

That one model is the joint question-and-answer generation (QAGen) model that members of our group presented at last year’s ACL. It takes a text as input and generates question-and-answer pairs pertaining to it.

QUALS-Image.cropped.png
QUALS requires a single neural model, a question-and-answer generation model.
Credit: Glynis Condon

The output of the QAGen model for a given input can be thought of as a huge tree, in which the nodes are words and each edge encodes the likelihood that a particular word will be followed by another word.

For a given summary, we search the resulting tree to produce 60 high-probability question-and-answer pairs. Our search algorithm ensures that we explore diverse paths through the tree, in order to generate a variety of candidate questions and answers. Then we throw out all the question-answer pairs whose answers are not sequences of words found in the summary.

Next, we feed the source text on which the summary is based to the QAGen model. We use the resulting tree to calculate the probabilities of the same question-answer pairs we extracted for the summary. When, for the source text, the probability of generating a particular question-answer pair is small compared to the probability for the summary, the QUALS will be low. Intuitively, the discrepancy suggests that the question-answer pair was plausible for the summary but not in the source text, indicating factual inconsistency.

QUALS scoring.png
Probabilities per token (words and other standalone symbols) of two different question-answer pairs, based on a summary (blue) and an input document (orange). The large probability differences for the answer in the right-hand example give it a much lower QUALS score (-2.615) than the right-hand example (-0.054).

Training methodology

The QUALS score gives us an efficiently computable measure of a summary’s factual accuracy, but using it to train a machine learning model is not straightforward. Differences in QUALS score can’t simply be back-propagated through the QAGen model to update the summarization model.

So in our paper, we propose contrastive learning as a method for using QUALS to train a summarization model. First, we train a summarization model using the standard approach, which uses maximum-likelihood estimation (MLE) to approximate a phrasal-overlap score.

Next, we use the trained model to generate new summaries for all the source texts in the training data and create two different groups of summaries. One group, S+, contains ground truth summaries that have high QUALS scores (indicating factually accurate summaries); the other, S- contains generated summaries that have low QUALS scores (indicating factually inaccurate summaries).

Finally, we retrain the summarization model, using a loss function that encourages it to generate summaries like those in S+ and discourages it from generating summaries like those in S-.

Evaluation

Sample summaries.png
Examples from the human-evaluation study, featuring input texts and summaries produced using both MLE and the ConSeq model, which is trained using QUALS.

As baselines for the evaluation of our approach, we used two models. One was trained using MLE in the standard way, to fine-tune a BART language model. For the other, we used our contrastive-learning methodology, but instead of using QUALS to evaluate summaries, we used an ensemble of three ROUGE metrics (ROUGE 1, ROUGE 2, and ROUGE L), all of which are based on phrasal overlap.

In addition to evaluating the models’ performance using QAGS, we evaluated them according to the three ROUGE metrics and FactCC, another model-based metric that simply predicts the factual consistency of two texts. On all five metrics, models trained using QUALS outperformed the two baselines.

For validation, we also conducted a human-evaluation study, which involved 100 summaries generated using QUALS and 100 summaries generated using MLE for each of two datasets (XSUM and CNNDM). Human subjects were asked to compare the summaries on three attributes: factual consistency, informativeness and grammatical correctness.

On average, annotators found the QUALS-based summaries more factually accurate and more informative than the MLE-based summaries, for both datasets. On grammatical correctness, the two models’ performance was virtually indistinguishable.

Human-study stats.png
The results of the human-evaluation study. Subjects were asked whether summaries produced using QUALS were better than, worse than, or equal to those produced using MLE, on three axes.

Related content

US, WA, Seattle
Job summaryEmployer: Amazon.com Services LLCPosition: Data Scientist IILocation: Seattle WAMultiple Positions Available:1. Perform analytical tasks on vast amounts of structured and unstructured data to extract actionable business insights2. Develop algorithms using advanced mathematical and statistical techniques including machine learning to predict business outcomes and recommend optimal actions to management3. Use advanced algorithms to solve business problems, and transform one-off models into automated systems4. Analyze and validate data to ensure high data quality and reliable insights5. Run analytical experiments in a methodical manner to find opportunities for product and process optimization6. Communicate analytical output and business insights to management using visualization techniques and data storytelling7. Write concise documents communicating results to stakeholders and visualize data to drive decision making#0000
US, VA, Arlington
Job summaryAmazon is looking for a passionate, talented, and inventive Research Scientists with a strong machine learning background to help build industry-leading Speech and Language technology. This includes delivering innovative HR experiences for our employees. To get there, we need exceptionally talented, bright, and driven people. The Employee Services Technology (ES Tech) is growing rapidly and is looking for a talented senior research scientist to create massive scale and powerful new solutions that our customers need to be successful. In this role, you will work to establish world class data science, analytics and reporting for Amazonians as part of building the Personalization Engine for myHR. This key role will work closely with internal partners to assist in developing and managing solutions for ES Tech. We are looking for candidates who want to invent the future for employee engagement.This role will play critical part in forecasting initiatives, responsible for building models and prototypes for ES Tech division, and will require close collaboration with other scientists on the team that are developing state-of-the-art optimization algorithms to scale. This team plays a significant role in various stages of the innovation pipeline from identifying employee needs, developing new algorithms, prototyping/simulation, to implementation by working closely with colleagues in engineering, product management and employee benefits.We are seeking a Senior Research Scientist with expertise in mixed-methods research, preferably in social science and behavioral research. In this role, you will lead and support research efforts within the Recruiting, Talent Management, and Leadership & Development space.You will help set the direction for science and research in the organization. You will be a thought leader on the team, partnering with a diverse set of stakeholders to identify and develop impactful areas for novel research about talent and recruiting outcomes, mechanisms, and programs. You will mentor and provide scientific expertise/peer review to other scientists and analysts on the team.The ideal candidate should have strong problem solving skills, excellent business acumen, as well as an expertise in both qualitative and quantitative methods. This role will need to navigate complex and ambiguous business challenges by asking the right questions, understanding what methodologies to employ, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). Translating business and stakeholder needs into realistic and actionable scientific research will be a regular challenge in this role.Responsibilities include:· Leading scientific direction, providing consult, mentorship, and peer review.· Partnering closely and driving effective collaborations across multi-disciplinary science, analytics, HR, and business teams. Reviewing and scoping research requests, and recommending appropriate scientific methodologies.· Designing, developing, and executing qualitative and quantitative data collection, research, and experimentation. Strong experience with statistical analysis required. Experience with survey development and experimental design (including quasi-experimental) preferred.· Communicating findings and business impact effectively (written and verbally) with both technical and non-technical stakeholders
US, VA, Arlington
Job summaryAmazon is looking for a passionate, talented, and inventive Research Scientists with a strong machine learning background to help build industry-leading Speech and Language technology. This includes delivering innovative HR experiences for our employees. To get there, we need exceptionally talented, bright, and driven people. The Employee Services Technology (ES Tech) is growing rapidly and is looking for a talented senior research scientist to create massive scale and powerful new solutions that our customers need to be successful. In this role, you will work to establish world class data science, analytics and reporting for Amazonians as part of building the Personalization Engine for myHR. This key role will work closely with internal partners to assist in developing and managing solutions for ES Tech. We are looking for candidates who want to invent the future for employee engagement.This role will play critical part in forecasting initiatives, responsible for building models and prototypes for ES Tech division, and will require close collaboration with other scientists on the team that are developing state-of-the-art optimization algorithms to scale. This team plays a significant role in various stages of the innovation pipeline from identifying employee needs, developing new algorithms, prototyping/simulation, to implementation by working closely with colleagues in engineering, product management and employee benefits.We are seeking a Senior Research Scientist with expertise in mixed-methods research, preferably in social science and behavioral research. In this role, you will lead and support research efforts within the Recruiting, Talent Management, and Leadership & Development space.You will help set the direction for science and research in the organization. You will be a thought leader on the team, partnering with a diverse set of stakeholders to identify and develop impactful areas for novel research about talent and recruiting outcomes, mechanisms, and programs. You will mentor and provide scientific expertise/peer review to other scientists and analysts on the team.The ideal candidate should have strong problem solving skills, excellent business acumen, as well as an expertise in both qualitative and quantitative methods. This role will need to navigate complex and ambiguous business challenges by asking the right questions, understanding what methodologies to employ, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). Translating business and stakeholder needs into realistic and actionable scientific research will be a regular challenge in this role.Responsibilities include:· Leading scientific direction, providing consult, mentorship, and peer review.· Partnering closely and driving effective collaborations across multi-disciplinary science, analytics, HR, and business teams. Reviewing and scoping research requests, and recommending appropriate scientific methodologies.· Designing, developing, and executing qualitative and quantitative data collection, research, and experimentation. Strong experience with statistical analysis required. Experience with survey development and experimental design (including quasi-experimental) preferred.· Communicating findings and business impact effectively (written and verbally) with both technical and non-technical stakeholders
US, CA, San Francisco
Job summaryAbout Us:Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We're always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog.About the Role:Amazon Interactive Video Service (IVS) is a managed live video service, within Amazon Web Services (AWS), built on top of Twitch Video. It has one of the world's largest live video networks. IVS powers the many companies that are using live video to uniquely engage and build communities, just as it does for Twitch, where more than a billion hours are streamed live each month.The IVS Video Intelligence team, a central data team for all of IVS, is looking for a principal applied scientist to create a strategic impact for the entire IVS business. As part of the team, you will shape the data and science strategy for the whole of IVS. You will both invent new solutions and bring your knowledge of ML ops best practices to establish cross-company standards. You will oversee our novel end-to-end predictive simulation of our video streaming infrastructure to test product hypotheses, forecast demand and capacity needs, and cover failure scenarios. You will report to the Video Intelligence team lead.If you are experienced in predictive analytics and have a passion for deep technical problems at consumer web scale, come help us grow an enterprise cloud service. You will work in our San Francisco office, but we will also consider other locations including Seattle (WA), Salt Lake City (UT), or US-Virtual.You Will:• Create the organizational science strategy.• Invent new technologies to give live video broadcasters and viewers a better overall experience.• Become a domain expert real-time video streaming tracking and measurement, building trust with your product and infrastructure partners.• Mentor junior team members and promote analytics and experimentation best practices throughout the company.
US, WA, Bellevue
Job summaryAt Amazon, we're working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, result oriented, and driven people. Amazon is seeking a Simulation Engineer to assist in designing and optimizing the middle mile concepts and process improvements using discrete event simulations. Successful candidates will be natural self-starters who have the drive to design, model, and simulate new concepts and processes. The Simulation Engineer will be expected to deep dive problems and drive relentlessly towards creative solutions. This individual needs to be comfortable interfacing and driving various functional teams and individuals at all levels of the organization in order to be successful. Perform process modelling and simulation using discrete event simulation software’s, process optimization, statistical data analysis, and Design of Experiments (DOE) etc. to drive decisions on process and designs. Need based remote work option is available.Responsibilities:· Lead system level complex Discrete Event Simulation (DES) projects to build , simulate, and optimize the fulfillment center operational process flow models using FlexSim, Demo 3D, AnyLogic or any other Discrete Event Simulation (DES) software packages· Understand process flows , analyze data, perform Design of Experiments and effectively represent in simulation model to achieve better correlation and process improvements· Manage multiple DES simulation projects and tasks simultaneously and effectively influence, negotiate, and communicate with internal and external business partners, contractors and vendors.· Facilitate process improvement initiatives among site operations, engineering, and corporate systems groups.· Utilize code (python or another object oriented language) for data analysis and modeling algorithms· Analyze historical data to identify trends and support decision making using Statistical Techniques· Lead and coordinate simulation efforts between internal teams and outside vendors to develop optimal solutions for the network, including equipment specification, material flow, process design, and site layout.· Deliver results according to project schedules and quality· Provide written and verbal presentations to share insights and recommendations to audiences of varying levels of technical sophistication.· Make technical trade-offs for long term/short-term needs considering challenges in business area by applying relevant data science disciplines, and interactions among systems.
US, PA, Pittsburgh
Job summaryAmazon is looking for passionate Applied Scientists with expertise in Machine Translation and Information retrieval to help us expand our MT and NLP technology for search.Amazon’s International Search team owns defining and delivering Amazon’s cutting-edge Search and Machine Translation initiatives to customers and cultures across multiple regions. We do this by developing our own customer-facing features, services and platforms based on customer needs, and through partnerships with Amazon technical teams around the world including Central organizations.You will work with the largest online retail search application in the world, both in terms of users, catalogue size, and computing resources, and your work will directly impact millions of our customers. You will collaborate with recognized experts in business, science and engineering.Our mission is to enable a superior experience for all of Amazon’s customers in their native language by developing and deploying state-of-the art technology and applications in Machine Translation (MT), Natural Language Processing (NLP) and Machine Learning (ML). The global scale of Amazon means true big data problems, big data opportunities, and building for global customers. Machine Translation allows Amazon to reach people in their preferred language. We are a team tackling the tough language problems that general-purpose MT cannot solve.As part of our MT research & development team, you will work on developing, deploying, maintaining and supporting our fleet of Machine Translation systems, particularly focused on Search. You will collaborate with recognized experts in science and engineering to develop novel algorithms and modeling techniques to advance the state-of-the-art in MT, search and information retrieval. You will partner closely with our engineering teams, publish scientific papers and apply for patents for your inventions. Your work will directly impact millions of our customers in the form of products and services that make use of our MT and NLP technology. You will gain hands-on experience with Amazon’s heterogeneous language data sources and large-scale computing resources to accelerate advances in MT and NLP.Key job responsibilities* Drive scientific exploration and innovation for MT improvements and evaluation techniques* Train and evaluate neural MT models using Amazon's proprietary MT technology* Analyze the accuracy and performance of MT models for search* Assist in deploying, maintaining, and supporting Amazon's fleet of dedicated MT systems in production* Design and implement software to collect, analyze and prepare large natural language data sets
DE, BY, Munich
Job summaryGlobal Workforce Staffing (WFS), a division of Amazon’s Worldwide Operations People Experience and Technology (PxT, aka Human Resources in other companies), manages Amazon’s Tier 1 talent supply chain. We attract, hire, and onboard the associates who, by fulfilling orders at the frontlines of the company, make Amazon a global leader in delivery and logistics. The Market Intelligence ensures that Amazon can deliver an industry leading customer delivery experience while raising the bar for its largest candidate and employee population--Tier 1 Associates. We ensure that Amazon can take scale globally in a consistent manner, accounting for the specific regional needs and characteristics of European countries.We are seeking a Manager, Data Science, with a heavy focus on quantitative data analysis and evaluation, and a deep focus on understanding European labor markets. You will be responsible for leading a new global expansion team from the ground up, develop research roadmaps, run experiments, and drive business impact through your research at global scale.The ideal candidate should be well versed in quantitative methods, including classical statistics and machine learning approaches. Competitive candidates will be very comfortable with at least one computational language (e.g., Python, R). Candidates should be comfortable selecting and leading their team through deployment of the best fit computational models and machine learning algorithms for analyzing their output.Candidates should have demonstrated experience leading data science and analytics projects related to labor market research and analysis, including research on wage sensitivity and elasticity, addressable workforce sizing, competition, and other factors.A day in the lifeYou will lead a team of Data Scientists and BI Engineers, and partner with Data and Software Engineers in scaling intelligence by country pertaining to market presence (site selection), launch risk, and hiring risk. You will build strong trusting relationships with key business partners to influence and drive success in their operations. You will continually coach and mentor a team to be successful in agile research iterations, working hand-in-hand with subject matter experts as humans-in-the-loop refining and ensuring the adoption of your models and products.
GB, MLN, Edinburgh
Job summaryAre you a MS or PhD student interested in a 2022 Applied Science Internship in the fields of Speech, Robotics, Computer Vision, or Machine Learning/Deep Learning?Do you enjoy diving deep into hard technical problems and coming up with solutions that enable successful products that improve the lives of people in a meaningful way?If this describes you, come join our research teams at Amazon. As an Applied Science Intern, you will have access to large datasets with billions of images and video to build large-scale machine learning systems. Additionally, you will analyze and model terabytes of text, images, and other types of data to solve real-world problems and translate business and functional requirements into quick prototypes or proofs of concept.We are looking for smart scientists capable of using a variety of domain expertise combined with machine learning and statistical techniques to invent, design, evangelize, and implement state-of-the-art solutions for never-before-solved problems.Machine Learning Science:Amazon has multiple positions available for Applied Scientists in Berlin, Munich, Tuebingen, Cambridge, Edinburgh, London, Iasi and Barcelona..A few of the teams that are hiring include:· Core AI· · Amazon Search· · AWS AI· · Advertising Technologies· · Community Shopping· · Prime VideoSpeech and Language Technology:We are hiring in all areas of spoken language understanding: ASR, NLP, NLU, text-to-speech (TTS), and Dialog Management. Amazon has multiple positions available for Speech Scientists in Aachen, Barcelona, Berlin, Cambridge, Edinburgh, Gdansk, Haifa, Tel Aviv and Turin.A few of the teams that are hiring currently include:· Alexa ML· · Alexa Brain· · Alexa Shopping· · Amazon Search· · CS TechnologyComputer Vision:Amazon has multiple positions available for Computer Vision Scientists in locations such as Berlin, Barcelona, Tuebingen, Haifa and Tel Aviv.We are currently hiring for multiple teams including:· Visual Search· Amazon AI (AWS Rekognition)· Amazon Go· Lab126
TW, TPE, Taipei
Job summaryThere is nothing COOLER than thinking and developing a brand-new IoT protocol like Sidewalk.Amazon Sidewalk is looking for a Sr. Data Scientist to help build and maintain analytics platforms in Sidewalk Network Health and Analytics team.About the RoleAs a Sr. Data Scientist, you will work closely with internal teams to improve and solve large scale analytics. You will leverage your analytical skills to provide insights on the end-to-end architecture and drive optimization. You must be willing to own, you will be responsive, flexible and able to succeed within an open collaborative peer environment. You must also be able and willing to multi-task and open to learn new technologies quickly. You are comfortable with proactive outward communication and technical leadership and you never shy away from a challenge.Ideally, you have hands-on experience with python and jupyter notebooks, you have familiarity with AWS infrastructure and running cloud operations.A successful candidate will need excellent communication skills and the ability to interact with software developers and managers that will be contributing and leveraging the analytics platform. If you are interested on building and managing large scale platforms, this position will provide you with a unique opportunity to deploy analytics at scale in a platform for real-time operations.About the team : Amazon SidewalkAmazon Sidewalk is a new long-term effort to greatly extend the working range of low-bandwidth, low-power, smart lights, sensors, and other low-cost devices customers install at the edge of their home network. We believe customers shouldn’t have to settle for connected devices that lose functionality past the front door. With Amazon Sidewalk, customers will be able to place smart devices anywhere on their property and know the devices will work, even in dead spots where Wi-Fi and Bluetooth don’t reach, helping bridge the connectivity gaps around our homes!Key job responsibilities· Broad cross-functional engineering interaction to deliver analytics real-time for Amazon Sidewalk· Strong analytical and quantitative data-analysis; use hard data and metrics to back up assumptions, recommendations, and drive actions· Drive system level integration of telemetry and analytics.
IT, Turin
Job summaryAmazon is looking for a Data Scientist with a passion for languages to join our Alexa AI - Natural Understanding Turin team. We are seeking a candidate with strong analytical skills and Natural Language Processing (NLP) experience to help us develop language components for a variety of Alexa products. Come join the Alexa team, building the speech and language solutions behind Alexa, Amazon Echo and other Amazon products and services! You will help us invent the future. As a NLP Data Scientist of the Alexa AI - NU Team, you will work close with Language Engineers to build and releases NLU models in production and improve them. You will gain hands-on experience with Amazon’s heterogeneous structured data sources; as well as large-scale computing resources to accelerate advances in training deep neural networks for natural language understanding. You will take lead on solving highly visible and impactful business problems in areas of automation, self-service solution and quality improvement to continue delight Alexa customers and help driving Amazon business performance. The ideal candidate is clearly passionate about delivering experiences that delight customers and creating solutions that are robust. Creating reliable, scalable and high performance products requires exceptional technical expertise, and a sound understanding of the fundamentals of Machine Learning, NLP, Linguistic and Problem solving. This role requires working closely with business, engineering and other scientists within the team and across Amazon to raise the bar in operational excellence, improving tools and automating workflows. You will lead high visibility and high impact programs collaborating with various teams across Amazon. You will focus on deliver results with the right quality and in a timely fashion. Your bias for action will be critical to move quickly on projects, with calculated risk taking.