Amazon's annual machine learning conference featured presentations from thought leaders within academia
Amazon Machine Learning Conference highlights use of machine learning across the company’s businesses and its increasing value to customers.
Amazon’s annual internal science conference, designed to showcase advancements in the application of machine learning across the breadth of the company’s businesses, and to foster greater collaboration within the company’s science community, occurred virtually earlier this month.
The 9th annual event featured five keynote presentations from leading academics (see below), oral and poster presentations, tutorials, and workshops.
Muthu Muthukrishnan, the event’s executive sponsor, and vice president of sponsored products, Performance Advertising Technology, kicked off the event, followed by an opening keynote from Prem Natarajan, Alexa AI vice president of Natural Understanding.
“This conference plays a crucial role in expanding the future of machine learning at Amazon,” Muthukrishnan said in his opening remarks, while Natarajan added that the growth of Amazon’s science community is testimony to “the use of machine learning across Amazon to deliver increasing value to our customers.”
Presentations from distinguished members of the academic science community were provided by:
- Yoshua Bengio, who is a Turing Award winner, and recognized as one of the world’s leading experts in artificial intelligence. He is a professor within the Department of Computer Science and Operations Research at the Université de Montréal and the founder and scientific director of the Montreal Institute for Learning Algorithms (MILA);
- Rama Chellappa, a Bloomberg Distinguished Professor in the Departments of Electrical and Computer Engineering and Biomedical Engineering and chief scientist at the Johns Hopkins Institute for Assured Autonomy;
- Thomas Dietterich, Emeritus Professor at the school of Electrical Engineering and Computer Science at Oregon State University and associate director of Policy for Collaborative Robotics and Intelligent Systems (CoRIS), who is considered one of the pioneers in the machine learning field;
- Mirella Lapata, a professor within the School of Informatics at the University of Edinburgh and elected Fellow of the Royal Society of Edinburgh, whose research focuses on probabilistic learning techniques for natural language understanding and generation; and
- Christopher Manning, the inaugural Thomas M. Siebel Professor in Machine Learning in the Departments of Linguistics and Computer Science at Stanford University, director of Stanford’s Artificial Intelligence Laboratory (SAIL) and associate director of the Stanford Human-centered Artificial Intelligence Institute (HAI).
Each of the presenters graciously agreed to share their presentations publicly, and each is provided in its entirety below.
- Yoshua Bengio: GFlowNets for Generative Active Learning
Abstract: We consider the following setup: a ML system can interact with an expensive oracle (the “real world”) by iteratively proposing batches of candidate experiments and then obtaining a score for each experiment (“how well did it work?”). The data from all the rounds of queries and results can be used to train a proxy for the oracle, a form of world model. The world model can then be queried (much more cheaply than the world model) in order to train (in-silico) a generative model which proposes experiments, to form the next round of queries. Systems which can do that well can be applied in interactive recommendations, to discover new drugs, new materials, control plants or learn how to reason and build a causal model. They involve many interesting ML research threads, including active learning, reinforcement learning, representation learning, exploration, meta-learning, Bayesian optimization, black-box optimization. What should be the training criterion for this generative model? Why not simply use Monte-Carlo Markov chain (MCMC) methods to generate these samples? Is it possible to bypass the mode-mixing limitation of MCMCs? How can the generative model guess where good experiments might be before having tried them? How should the world model construct a representation of its epistemic uncertainty, i.e., where it expects to predict well or not? On the path to answering these questions, we will introduce a new and exciting deep learning framework called GFlowNets which can amortize the very expensive work normally done by MCMC to convert an energy function into samples and opens the door to fascinating possibilities for probabilistic modeling, including the ability to quickly estimate marginalized probabilities and efficiently represent distributions over sets and graphs.Yoshua Bengio AMLC presentation
- Rama Chellappa: Open Problems in Machine Learning
Abstract: In this talk, I will briefly survey my group’s recent works on building operational systems for face recognition and action recognition using deep learning. While reasonable success can be claimed, many open problems still remain to be addressed. These include bias detection and mitigation, domain adaptation and generalization, learning from unlabeled data, handling adversarial attacks, and selecting the best subsets of training data in mini-batch learning. Some of our recent works addressing these challenges will be summarized.Rama Chellappa AMLC presentation
- Thomas Dietterich: Anomaly Detection for OOD and Novel Category Detection
Abstract: Every deployed learning system should be accompanied by a competence model that can detect when new queries fall outside its region of competence. This presentation will discuss the application of anomaly detection to provide a competence model for object classification in deep learning. We consider two threats to competence: queries that are out-of-distribution and queries that correspond to novel classes. The talk will review the four main strategies for anomaly detection and then survey some of the many recently-published methods for anomaly detection in deep learning. The central challenge is to learn a representation that assigns distinct representations to the anomalies. The talk will conclude with a discussion of how to set the anomaly detection threshold to achieve a desired missed-alarm rate without relying on labeled anomaly data.Thomas Dietterich AMLC presentation
- Mirella Lapata: Automatic Movie Analysis and Summarization via Turning Point
Abstract: Movie analysis is an umbrella term for many tasks aiming to automatically interpret, extract, and summarize the content of a movie. Potential applications include generating shorter versions of scripts to help with the decision-making process in a production company, enhancing movie recommendation engines, and notably generating movie previews.
In this talk I will introduce the task of turning point identification as a means of analyzing movie content. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie: they define its plot structure, determine its progression and segment it into thematic units. I will argue that turning points and the segmentation they provide can facilitate the analysis of long, complex narratives, such as screenplays. I will further formalize the generation of a shorter version of a movie as the problem of identifying scenes with turning points and present a graph neural network model for this task based on linguistic and audiovisual information. Finally, I will discuss why the representation of screenplays as (sparse) graphs offers interpretability and exposes the morphology of different movie genres.Mirella Lapata AMLC presentation
- Christopher Manning: From Large Pre-Trained Language Models Discovering Linguistic Structure towards Foundation Models
Abstract: I will first briefly outline the recent sea change in NLP with the rise of large pre-trained transformer language models, such as BERT, and the effectiveness of these models on NLP tasks. I will then focus in on two particular aspects on which I have worked. First, I will show how, despite only using a simple self-supervision task, BERT-like models not only learn word associations but act as linguistic structure discovery devices, capturing such things as human language syntax and pronominal coreference. Secondly, I will emphasize how recent progress has been bought at enormous computational cost and explore the ELECTRA model, in which an alternative discriminative learning method allows building highly effective neural word representations with considerably less computation. Finally, I will introduce how large pre-trained models are being extended into a larger class of Foundation Models, a direction with much promise but also concomitant risks, and how we hoping to contribute to their exploration at Stanford.Christopher Manning AMLC presentation