"This technology will be transformative in ways we can barely comprehend"

A judge and some of the finalists from the Alexa Prize Grand Challenge 3 talk about the competition, the role of COVID-19, and the future of socialbots.

Human beings are social creatures, and conversations are what connect us—they enable us to share everything from the prosaic to the profound with the people that matter to us. Living through an era marked by pandemic-induced isolation means many of those conversations have shifted online, but the connection they provide remains essential.

So what happens when you replace one of the human participants in a conversation with a socialbot? What does it mean to have an engaging conversation with an AI assistant? How can that kind of conversation prove to be valuable, and can it provide its own kind of connection?

Application period for next Alexa Prize challenge opens

The Amazon Alexa Prize team encourages all interested teams to apply for the Grand Challenge 4 by 11:59 p.m. PST on October 6, 2020.

The participants in this year’s Alexa Prize contest are driven by those questions. Amazon recently announced that a team from Emory University has won the 2020 Alexa Prize. We talked to that team, along with a judge from this year’s competition, as well as representatives from the other finalist teams at Czech Technical University, Stanford University, University of California, Davis, and University of California, Santa Cruz. We wanted to learn what drives them to participate, how COVID-19 has influenced their work and what they see as the possibilities and challenges for socialbots moving forward.

Winners of the Alexa Prize SocialBot Grand Challenge 3 discuss their research

Q: What inspired you to participate in this year’s competition?

Sarah Fillwock, team leader, Emora, Emory University: We had a group of students who were interested in dialogue system research, some of whom had actually participated in the Alexa Prize in its previous years, and we all knew that the Alexa Prize offers a really unique opportunity for anyone interested in this type of work. It is really exciting to use the Alexa device platform to launch a socialbot, because we are able to get hundreds of conversations a day between our socialbot and human users, which really allows for quick turnaround time when assessing whether or not our hypotheses and strategies are improving the performance of our dialogue system.

Marilyn Walker, faculty advisor, Athena, University of California, Santa Cruz: In our Natural Language and Dialogue Systems lab, our main research focus is dialogue management and language generation. Conversational AI is a very challenging problem, and we felt like we could have a research impact in this area. The field has been developing extremely quickly recently, and the Alexa Prize offers an opportunity to try out cutting-edge technologies in dialogue management and language generation on a large Alexa user population.

Amazon Alexa Prize Finalists 2020
The five Alexa Prize finalist teams: Czech Technical University in Prague; Emory University; Stanford University; the University of California, Davis; and the University of California, Santa Cruz.

Vrindavan (Davan) Harrison, team leader, Athena, UCSC: As academics, our primary focus is on research. This year’s competition aimed at being more research-oriented, allowing the teams to spend more time on developing new ideas.

Kai-Hui Liang, team lead, Gunrock, University of California, Davis: Our experience in last year’s competition motivated us to join again as we realized there is still a large room for improvement. I’m especially interested in how to find topics that engage users the most, including trying different ways to elicit and reason about users’ interests. How can we retrieve content that is relevant and interesting, and make the dialog flow more naturally?

Jan Pichl, team leader, Alquist, Czech Technical University: Since the first year of the Alexa Prize competition, we have been developing Alquist to deliver a wide range of topics with a closer focus on the most popular ones. The first Alquist guided a user through the conversation quite strictly. We learned quickly that we needed to introduce more flexibility and let the user be "in charge". With that in mind, we have been pushing Alquist in that direction. Moreover, we want Alquist to manage dialogue utilizing the knowledge graph, and suggest relevant information based on the previously discussed topics and entities.

Christopher D. Manning, faculty advisor, Chirpy Cardinal, Stanford University: It was our first time doing the Alexa Prize, and the team really hadn’t done advance preparation, so it’s all been a wild ride—by which I mean a lot of work and stress for everyone on the team. But it was super exciting that we were largely able to catch up with other leading teams who have been doing the competition for several years.

Hugh Howey, judge and science fiction author: Artificial intelligence is a passionate interest of mine. As a science fiction author, I have the freedom to write about most anything, but the one topic I keep coming back to is the impact that thinking machines already have on our lives and how that impact will only expand in the future. So any chance to be involved with those doing work and research in the field is a no-brainer for me. I leapt at the chance like a Boston Dynamics dog.

Q: What excites you about the potential of socialbots?

Hugh Howey (Judge): This technology will be transformative in ways we can barely comprehend. Right now, the human/computer interface is a bottleneck. It takes a long time for us to tell our computers what we want them to do, and they'll generally only do that thing the one time and forget what it learned. In the future, more and more of the trivial will be automated. This will free up human capital to tackle larger problems. It will also bring us together by removing language barriers, by helping those with disabilities, and eventually this technology will be available to anyone who needs it.

Jinho D. Choi, faculty advisor, Emory: It has been reported that more than 44 million adults in US have mental health issues such as anxiety or depression. We believe that developing an innovative socialbot that comforts people can really help those with mental health conditions, who are generally afraid of talking to other human beings. You may wonder how artificial intelligence can convey a human emotion such as caring. However, humans have used their own creations, such as arts and music, to comfort themselves. It is our vision to advance AI, the greatest invention of humankind, to help individuals learn more about their inner selves so they can feel more positive about themselves, and have a bigger impact in the world.

Ashwin Paranjape, co-team leader, Stanford: As socialbots become more sophisticated and prevalent, increasing numbers of people are chatting with them regularly. As the name suggests, socialbots have the potential to fulfill social needs, such as chit-chatting about everyday life, or providing support to a person struggling with mental health difficulties. Furthermore, socialbots could become a primary user interface through which we engage with the world—for example, chatting about the news, or discussing a book.

Sarah Fillwock, Emory: Our experience in this competition has really solidified this idea of the potential of socialbots being value to people who need support and are in troubling situations. I think that the most compelling role for socialbots in global challenges is to provide a supportive environment to allow people to express themselves, and explore their feelings with regard to whatever dramatic event is going on. This is especially important for vulnerable populations, such as those who do not have a strong social circle or have reduced social contact with others, prohibiting them from being able to achieve the feeling of being valued and understood.

Q: What are the main challenges to realizing that potential?

Abigail See, co-team leader, Stanford: Currently, socialbots struggle to make sense of long, involved conversations, and this limits their ability to talk about any topic in depth. To do this better, socialbots will need to understand what a particular user wants—not only in terms of discussion topics, but also what kind of conversation they want to have. Another important challenge is to allow users to take more initiative, and drive the conversation themselves. Currently, socialbots tend to take more initiative, to ensure the conversation stays within their capabilities. If we can make our socialbots more flexible, they will be much more useful and engaging to people.

Sarah Fillwock, Emory: One major challenge facing the field of dialogue system research is establishing a best practice for evaluation of the performance of dialogue approaches. There is currently a diverse set of evaluation strategies that the research community uses to determine how well their new dialogue approach performs. Another challenge is that dialogues are more than just a pattern-matching problem. Having a back-and-forth dialogue on any topic with another agent tends to involve planning towards achieving specific goals during the conversation as new information about your speaking partner is revealed. Dialogues also rely a lot on having a foundation of general world knowledge that you use to fully understand the implications of what the other person is saying.

Amazon releases Topical Chat dataset

The text-based collection of more than 235,000 utterances will help support high-quality, repeatable research in the field of dialogue systems.

Marilyn Walker, UCSC: There’s a shortage of large annotated conversational corpora for the task of open-domain conversation. For example, progress in NLU has been supported by large annotated corpora, such as Penn Treebank, however, there are currently no such publicly available corpora for open-domain conversation. Also, a rich model of individual users would enable much more natural conversations, but privacy issues currently make it difficult to build such models.

Hugh Howey (Judge): The challenge will be for our ethics and morality to keep up with our gizmos. We will be far more powerful in the future. I only hope we'll be more responsible as well.

Q: What role has the COVID-19 pandemic played in your work?

Jurik Juraska, team member, UCSC: The most immediate effect the onset of the pandemic had on our socialbot was, of course, that it could not just ignore this new dynamic situation. Our socialbot had to acknowledge this new development, as that was what most people were talking about at that point. We would thus have Athena bring up the topic at the beginning of the conversation, sympathizing with the users' current situation, but avoiding wallowing in the negative aspects of it. In the feedback that some users left, there were a number of expressions of gratitude for the ability to have a fun interaction with a socialbot at a time when direct social interaction with friends and family was greatly restricted.

Kai-Hui Liang, UC Davis: We noticed an evident difference in the way Alexa users reacted to popular topics. For example, before COVID-19, many users gave engaging responses when discussing their favorite sports to watch, their travel experiences, or events they plan to do over the weekend. After the breakout of COVID-19, more users replied saying there’s no sports game to watch or they are not able to travel. Therefore, we adapted our topics to better fit the situation. We added discussion about their life experience during the quarantine (eg. how their diet has changed or if they walk outside daily to stay healthy). We also observed more users having negative feelings potentially due to the quarantine. For instance, some users said they feel lonely and they miss their friends or family. Therefore, we enhanced our comforting module that expresses empathy through active listening.

Abigail See, Stanford: As the pandemic unfolded, we saw in real time how users changed their expectations of our socialbot. Not only did they want our bot to deliver up-to-date information, they also wanted it to show emotional understanding for the situation they were in.

Sarah Fillwock, Emory: When COVID became a significant societal issue, we tried two things: we had an experience-oriented COVID topic where our bot discussed with people how they felt about COVID in a sympathetic and reassuring atmosphere, and we had a fact-oriented COVID topic that gave objective information. What we observed was that people had a much stronger positive reaction to the experience-oriented COVID-19 approach than the fact-oriented COVID-19 approach, and seemed to prefer it when talking. This really gave us some empirical evidence that social agents have a strong potential to be helpful in times of turmoil by giving people a safe and caring space to talk about these major events in their life since people responded positively to our approach at doing this.

Q: Lastly, are there any particular advancements in the fields of NLU, dialogue management, conversational AI, etc., that you find promising?

Jan Pichl, Czech Technical University: It is exciting to see the capabilities of the Transformer-based models these days. They are able to generate large articles or even whole stories that are coherent. However, they demand a lot of computation power during the training phase and even during the runtime. Additionally, it is still challenging to use them in a socialbot when you need to work with constantly changing information about the world.

Abigail See, Stanford: As NLP researchers, we are amazed by the incredible pace of progress in the field. Since the last Alexa Prize in 2018, there have been game-changing advancements, particularly in the use of large pretrained language models to understand and generate language. The Alexa Prize offers a unique opportunity for us to apply these techniques, which so far have mostly been tested only on neat, well-defined tasks, and put them in front of real people, with all the messiness that entails! In particular, we were excited to explore the possibility of using neural generative models to chat with people. As recently as the 2018 Alexa Prize, these models generally performed poorly, and so were not used by any of the finalist teams. However, this year, these systems became an important backbone of our system.

Sarah Fillwock, Emory: The work people have been putting into incorporating common sense knowledge and common sense reasoning into dialogue systems is one of the most interesting directions of the current conversational AI field. A lot of the common sense knowledge we use is not explicitly detailed in any type of data set as people have learned them through physical experience or inference over time, so there isn’t necessarily any convenient way to currently accomplish this goal. There have been a lot of attempts to see how far a language modeling approach to dialogue agents can go, but even using huge dialogue data sets and highly complex models still results in hit-and-miss success at common sense information. I am really looking forward to the dialogue approaches and dialogue resources that more explicitly try to model this type of common sense knowledge.

Research areas

Latest news

The latest updates, stories, and more about Alexa Prize.
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
GB, London
As a STRUC Economist Intern, you'll specialize in structural econometric analysis to estimate fundamental preferences and strategic effects in complex business environments. Your responsibilities include: Analyze large-scale datasets using structural econometric techniques to solve complex business challenges Applying discrete choice models and methods, including logistic regression family models (such as BLP, nested logit) and models with alternative distributional assumptions Utilizing advanced structural methods including dynamic models of customer or firm decisions over time, applied game theory (entry and exit of firms), auction models, and labor market models Building datasets and performing data analysis at scale Collaborating with economists, scientists, and business leaders to develop data-driven insights and strategic recommendations Tackling diverse challenges including pricing analysis, competition modeling, strategic behavior estimation, contract design, and marketing strategy optimization Helping business partners formalize and estimate business objectives to drive optimal decision-making and customer value Build and refine comprehensive datasets for in-depth structural economic analysis Present complex analytical findings to business leaders and stakeholders
US, WA, Seattle
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying experience for customers worldwide so they can find, discover, and buy any product they want. We innovate on behalf of our customers to ensure uniqueness and consistency of product identity and to infer relationships between products in Amazon Catalog to drive the selection gateway for the search and browse experiences on the website. We're solving a fundamental AI challenge: establishing product identity and relationships at unprecedented scale. Using Generative AI, Visual Language Models (VLMs), and multimodal reasoning, we determine what makes each product unique and how products relate to one another across Amazon's catalog. The scale is staggering: billions of products, petabytes of multimodal data, millions of sellers, dozens of languages, and infinite product diversity—from electronics to groceries to digital content. The research challenges are immense. GenAI and VLMs hold transformative promise for catalog understanding, but we operate where traditional methods fail: ambiguous problem spaces, incomplete and noisy data, inherent uncertainty, reasoning across both images and textual data, and explaining decisions at scale. Establishing product identities and groupings requires sophisticated models that reason across text, images, and structured data—while maintaining accuracy and trust for high-stakes business decisions affecting millions of customers daily. Amazon's Item and Relationship Platform group is looking for an innovative and customer-focused applied scientist to help us make the world's best product catalog even better. In this role, you will partner with technology and business leaders to build new state-of-the-art algorithms, models, and services to infer product-to-product relationships that matter to our customers. You will pioneer advanced GenAI solutions that power next-generation agentic shopping experiences, working in a collaborative environment where you can experiment with massive data from the world's largest product catalog, tackle problems at the frontier of AI research, rapidly implement and deploy your algorithmic ideas at scale, across millions of customers. Key job responsibilities Key job responsibilities include: * Formulate open research problems at the intersection of GenAI, multimodal reasoning, and large-scale information retrieval—defining the scientific questions that transform ambiguous, real-world catalog challenges into publishable, high-impact research * Push the boundaries of VLMs, foundation models, and agentic architectures by designing novel approaches to product identity, relationship inference, and catalog understanding—where the problem complexity (billions of products, multimodal signals, inherent ambiguity) demands methods that don't yet exist * Advance the science of efficient model deployment—developing distillation, compression, and LLM/VLM serving optimization strategies that preserve frontier-level multimodal reasoning in compact, production-grade architectures while dramatically reducing latency, cost, and infrastructure footprint at billion-product scale * Make frontier models reliable—advancing uncertainty calibration, confidence estimation, and interpretability methods so that frontier-scale GenAI systems can be trusted for autonomous catalog decisions impacting millions of customers daily * Own the full research lifecycle from problem formulation through production deployment—designing rigorous experiments over petabytes of multimodal data, iterating on ideas rapidly, and seeing your research directly improve the shopping experience for hundreds of millions of customers * Shape the team's research vision by defining technical roadmaps that balance foundational scientific inquiry with measurable product impact * Mentor scientists and engineers on advanced ML techniques, experimental design, and scientific rigor—building deep organizational capability in GenAI and multimodal AI * Represent the team in the broader science community—publishing findings, delivering tech talks, and staying at the forefront of GenAI, VLM, and agentic system research