Amazon releases data set of annotated conversations to aid development of socialbots

Today I am happy to announce the public release of the Topical Chat Dataset, a text-based collection of more than 235,000 utterances (over 4,700,000 words) that will help support high-quality, repeatable research in the field of dialogue systems.

The goal of Topical Chat is to enable innovative research in knowledge-grounded neural response-generation systems by tackling hard challenges that are not addressed by other publicly available datasets. Those challenges, which we have seen universities begin to tackle in the Alexa Prize Socialbot Grand Challenge, include transitioning between topics in a natural manner, knowledge selection and enrichment, and integration of fact and opinion into dialogue.

Each conversation in the data set refers to a group of three related entities, and every turn of conversation is supported by an extract from a collection of unstructured or loosely structured text resources. To our knowledge, Topical Chat is the largest social-conversation and knowledge dataset available publicly to the research community.

Both the conversations themselves and the annotations linking them to particular knowledge sources were provided by workers recruited through Mechanical Turk. The data set does not include any conversations between Alexa and Alexa customers.

Amazon Topical Chat Dataset
To build the Topical Chat Dataset, workers recruited throught Mechanical Turk engaged in instant-message conversations (right) in which they substantiated their assertions with information exracted from a collection of unstructured or loosely structured text resources (left).

To build the data set, we first identified 300 named entities in eight different topic categories that came up frequently in conversations with Alexa Prize socialbots. Then we clustered the named entities into groups of three, based on their co-occurrence in information sources. One information source, for instance, mentioned three entities on our list — Star Wars, planet, and earth — so they became a cluster. For each entity in a cluster, we collected several additional sources of information, and we divided the information corresponding to each cluster between pairs of Mechanical Turk workers, or “Turkers”.

Sometimes, Turkers would receive the same information. Sometimes one would receive only a subset of the information received by the other. And sometimes the information would be divided between the Turkers, so that each had knowledge that complemented the other’s.

The Turkers were then asked to carry on instant-messaging conversations about the knowledge sets they’d received. For each of their own messages, they were asked to document where they found the information they used and to gauge the message’s sentiment — happy, sad, curious, fearful, and so on. For each of their interlocutors’ messages, they were asked to assess its quality — whether it was conversationally appropriate. We then winnowed the conversations using a combination of manual and automatic review.

Once we’d arrived at our final data set, we used it to train different machine learning models to produce conversational responses to input utterances. In a paper about the data set that we’re presenting this week at Interspeech, we report automated and human evaluation of all three models’ performance, which we hope will serve as a baseline against which other research groups may measure the success of their own socialbot systems.

Acknowledgments: This project came to be through the efforts and support of several people on the Alexa AI team. Thanks to Arindam Mandal, Raefer Gabriel, Mohammad Shami, Anu Venkatesh, Anjali Chadha, Anju Khatri, Anna Gottardi, Sanjeev Kwatra, Behnam Hedayatnia, Ben Murdoch, Karthik Gopalakrishnan, Mihail Eric, Seokhwan Kim, and Yang Liu for your work on the release.

Research areas

Latest news

The latest updates, stories, and more about Alexa Prize.
US, NY, New York
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Senior Applied Scientist to work on pre-training methodologies for Generative Artificial Intelligence (GenAI) models. You will interact closely with our customers and with the academic and research communities. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Scaling laws - Hardware-informed efficient model architecture, low-precision training - Optimization methods, learning objectives, curriculum design - Deep learning theories on efficient hyperparameter search and self-supervised learning - Learning objectives and reinforcement learning methods - Distributed training methods and solutions - AI-assisted research About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities - Develop ML models for various recommendation & search systems using deep learning, online learning, and optimization methods - Work closely with other scientists, engineers and product managers to expand the depth of our product insights with data, create a variety of experiments to determine the high impact projects to include in planning roadmaps - Stay up-to-date with advancements and the latest modeling techniques in the field - Publish your research findings in top conferences and journals A day in the life We're using advanced approaches such as foundation models to connect information about our videos and customers from a variety of information sources, acquiring and processing data sets on a scale that only a few companies in the world can match. This will enable us to recommend titles effectively, even when we don't have a large behavioral signal (to tackle the cold-start title problem). It will also allow us to find our customer's niche interests, helping them discover groups of titles that they didn't even know existed. We are looking for creative & customer obsessed machine learning scientists who can apply the latest research, state of the art algorithms and ML to build highly scalable page personalization solutions. You'll be a research leader in the space and a hands-on ML practitioner, guiding and collaborating with talented teams of engineers and scientists and senior leaders in the Prime Video organization. You will also have the opportunity to publish your research at internal and external conferences.
US, NY, New York
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Scientist to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Us: Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come together for whatever, every day. We’re about community, inside and out. You’ll find coworkers who are eager to team up, collaborate, and smash (or elegantly solve) problems together. We’re on a quest to empower live communities, so if this sounds good to you, see what we’re up to on LinkedIn and X, and discover the projects we’re solving on our Blog. Be sure to explore our Interviewing Guide to learn how to ace our interview process. You can work in San Francisco, CA or Seattle, WA. Perks - Medical, Dental, Vision & Disability Insurance - 401(k) - Maternity & Parental Leave - Flexible PTO - Amazon Employee Discount
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As an Applied Scientist with the AGI team, you will work with world-class scientists and engineers to develop novel data, modeling and engineering solutions to support the responsible AI initiatives at AGI. Your work will directly impact our customers in the form of products and services that make use of audio technology. About the team While the rapid advancements in Generative AI have captivated global attention, we see these as just the starting point. Our team is dedicated to pushing the boundaries of what’s possible, leveraging Amazon’s unparalleled ML infrastructure, computing resources, and commitment to responsible AI principles. And Amazon’s leadership principle of customer obsession guides our approach, prioritizing our customers’ needs and preferences each step of the way.
US, WA, Bellevue
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! As a Quantitative Researcher on our team, you will be working at the intersection of mathematics, computer science, and finance, you will collaborate with a diverse team of engineers in a fast-paced, intellectually challenging environment where innovative thinking is encouraged and rewarded. We operate at Amazon's large scale with the energy of a nimble start-up. If you have a learner's mindset, enjoy solving challenging problems, and value an inclusive team culture, you will thrive in this role, and we hope to hear from you. Key job responsibilities * Conduct statistical analyses on web-scale datasets to develop state-of-the-art multimodal large language models * Conceptualize and develop mathematical models, data sampling and preparation strategies to continuously improve existing algorithms * Identify and utilize data sources to drive innovation and improvements to our LLMs About the team We are passionate engineers and scientists dedicated to pushing the boundaries of innovation. We evaluate and represent the customer perspective through accurate benchmarking.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (Gen AI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
MX, DIF, Mexico City
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Software Development Center in Sao Paulo. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning and big data, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise senior leadership, both tech and non-tech. - Make technical trade-offs between short-term needs and long-term goals.