Kathleen McKeown
Kathleen McKeown is the Henry and Gertrude Rothschild professor of computer science at Columbia University, the founding director of the school's Data Science Institute, and an Amazon Scholar.
Credit: Columbia University

Amazon Scholar Kathleen McKeown takes stock of natural language processing; where we are, and where we’re going

After nearly 40 years of research, this year’s ACL 2020 keynote speaker sees big improvements coming in three key areas.

Kathleen McKeown is the Henry and Gertrude Rothschild professor of computer science at Columbia University, and the founding director of the school’s Data Science Institute. McKeown received a PhD in computer science from the University of Pennsylvania in 1982, and has been at Columbia since then. Her research interests include text summarization, natural language generation, multi-media explanation, question-answering and multilingual applications.

McKeown has received many honors and distinctions throughout her career, including being selected a AAAI Fellow, an ACM Fellow, and one of the Founding Fellows of the Association for Computational Linguistics (ACL). Early in her career she received a National Science Foundation Presidential Young Investigator Award; in 2010, she received the Anita Borg Women of Vision Award in Innovation for her work on text summarization; and in 2019 she was elected to the American Academy of Arts and Sciences.

McKeown also is an Amazon Scholar, an expanding group of academics who work on large-scale technical challenges for Amazon while continuing to teach and conduct research at their universities. In early July, she is the keynote speaker at ACL 2020 – the annual conference of the Association for Computational Linguistics.

We recently spoke with McKeown about the field of natural language processing, her career, and her keynote topic for ACL 2020: Rewriting the Past: Assessing the Field through the Lens of Language Generation.

What drew you to the field of natural language processing?

My undergraduate major was in comparative literature. I also majored in math, so I had both of those interests. But it wasn't until my senior year as an undergraduate that I learned about computer science and the field of computational linguistics. What got me interested in computational linguistics was that I could bring my two interests together, so I applied to graduate school in computer science.

How did you come to join the Amazon Scholars program?

I was on a sabbatical and someone I knew at Amazon asked me if I’d be interested in working there. And I thought, ‘Well, that would be fun to do on my sabbatical.’ It took a while to happen; I was well into the second half of the sabbatical when it did.

But I’ve continued doing it, one or two days a week. I like the work – the industry perspective helps with my academic research. And working at Amazon is a lot like working at Columbia. There are a lot of young people, and they’re very bright. Plus, the tools we use at Amazon – setting up a problem and de-bugging it – give me some insight into what I should have my students looking at.

How has the field evolved during your time working in it?

The ACL 2020 conference has a theme of “Taking stock of where we are and where we are going with natural language processing.” Before neural nets (computer systems modeled after the human brain), people were using statistical methods, machine learning, discrete methods. Then in 2014 there were some startling advances in natural language processing using neural networks – mostly in machine translation. In the two or three years after that the whole field shifted.

I believe we should be moving on to harder, novel problems. One is the summarization of chapters in novels, where we’ve used as a data set chapters of books taken from Project Gutenberg.
Kathleen McKeown, Columbia University computer science professor and Amazon Scholar

My ACL talk will focus on language generation and summarization. Neural networks have had a huge impact in those areas. Language generation has really been transformed. Today we can really do language generation from a lot of unstructured data. We’re really seeing some very creative work; at Columbia, we've been working on the generation of arguments in the context of generating counterarguments. How do you do that? How do you generate text that is persuasive?

One of the powerful tools being used right now is BERT (Bidirectional Encoder Representations from Transformers), which came out of Google in 2019. BERT has a pretty good idea of how a sentence fits together grammatically, and through fine-tuning enables learning from smaller data sets than was possible before.

What’s the current state of natural language processing?

One of the problems with current approaches is that people grab onto a data set that is available, then work on that data set to get a result – whether or not that solves a problem that needs to be solved. For some time now people have focused on using natural language process to summarize news stories. That’s not something we really need – the story’s lead is often a very good summary.

I believe we should be moving on to harder, novel problems. One is the summarization of chapters in novels, where we’ve used as a data set chapters of books taken from Project Gutenberg.

We’ve been doing this in our work at Amazon, where we are developing a system to generate summaries of chapters using as training data, the chapters from Project Gutenberg, and summaries from online study guides.

That is a hard problem, and a very interesting problem. That’s because the study guides use a huge amount of paraphrasing of novel chapters and trying to teach a computer to understand what is a paraphrase and what is not is really hard.

How will the field change over the next five or 10 years?

That’s a hard question. Just five years ago we couldn’t generate language from unstructured data like images or video, so the field is moving quickly.

One of the things I’m collaborating on is how can we take meeting recordings and generate summaries – action items and things like that. And I’d like to do more with the summarization of novels. I love that work. Summaries are often written in everyday language, while the books themselves have a completely different style from a very different time. So matching everyday language and the language in a book is difficult.

Overall, I think we’ll see big improvements in three areas: One is machine translation. We live in a global world, and there is a huge need to be able to understand documents in other languages. The second is in conversational systems. I would love it if we could develop systems that could be true companions – think of how beneficial that could be to the elderly who are isolated because of COVID-19. And I think we’ll see better ways to get good answers when asking questions on the web.

And third is how we interact with online information. There is just so much on the web, so the ability to summarize content and then drill down into it will be extremely important.

We as computer scientists and people who work with language need to think about how we can help. Look at the COVID-19 epidemic – natural language processing might help us better track the evolution of a disaster.

One last thing. ACL 2020 will be virtual this year. Is that difficult to work with?

(Laughs) In some ways, being remote makes things easier. I won’t see a big audience in front of me – so I’ll be less nervous!


US, WA, Seattle
Job summaryAt Alexa Shopping, we strive to enable shopping in everyday life. We allow customers to instantly order whatever they need, by simply interacting with their Smart Devices such as Amazon Show, Spot, Echo, Dot or Tap. Our Services allow you to shop, no matter where you are or what you are doing, you can go from 'I want that' to 'that's on the way' in a matter of seconds. We are seeking the industry's best to help us create new ways to interact, search and shop. Join us, and you'll be taking part in changing the future of everyday lifeWe are seeking a Data Scientist to be part of the NLU science team for Alexa Shopping. This is a strategic role to shape and deliver our technical strategy in developing and deploying NLU, Machine Learning solutions to our hardest customer facing problems. Our goal is to delight customers by providing a conversational interaction. These initiatives are at the heart of the organization and recognized as the innovations that will allow us to build a differentiated product that exceeds customer expectations. We're a high energy, fast growth business excited to have the opportunity to shape Alexa Shopping NLU is defined for years to come. If this role seems like a good fit, please reach out, we'd love to talk to you.This role requires working closely with business, engineering and other scientists within Alexa Shopping and across Amazon to deliver ground breaking features. You will lead high visibility and high impact programs collaborating with various teams across Amazon. You will work with a team of Language Engineers and Scientists to launch new customer facing features and improve the current features.
US, WA, Bellevue
Job summaryThe People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.We are looking for economists who are able to work with business partners to hone complex problems into specific, scientific questions, and test those questions to generate insights. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team.Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science and a scientific resource for business teams, so that scientific processes permeate throughout the HR organization to the benefit of Amazonians and Amazon. Ideal candidates will own the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions.Key job responsibilitiesUse causal inference methods to evaluate the impact of policies on employee outcomes. Examine how external labor market and economic conditions impact Amazon's ability to hire and retain talent. Use scientifically rigorous methods to develop and recommend career paths for employees.A day in the lifeWork with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions.About the teamWe are a multidisciplinary team that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer.
US, CA, Sunnyvale
Job summaryThe Amazon Alexa app is a companion to Alexa devices for setup, remote control, and enhanced features. The Alexa app understands a customer’s habits, preferences and delivers a personalized experience to help them manage their day by providing relevant information as customers want it. We believe voice is the most natural user interface for interacting with technology across many domains; we are inventing the future. As voice-enabled technology becomes increasingly advanced, consumers are demanding more from what their voice products can do. We’re looking for Scientists who are passionate about innovating on behalf of customers, demonstrate a high degree of product ownership, and want to have fun while they make history.As a Data Scientist, you will help build a production scaled personalized recommendation, Machine Learning (ML) and Deep Learning (DL) models to help derive business value and new insights through the adoption of Artificial Intelligence (AI).Key job responsibilitiesThe successful candidate will be responsible for distilling user data insights for ML science applications and influence business decision with data-driven approach to increase Alexa mobile engagement and growth. A successful candidate will be a person who enjoys diving deep into data, doing analysis, discovering root causes, and designing long-term solutions.· Expertise in the areas of data science, machine learning and statistics.· Translate business needs into advanced analytics and machine learning models and provide strong algorithm and coding execution and delivery of Machine Learning & Artificial Intelligence.· Work closely with the engineers to architect and develop the best technical design and approach.· Being able to dive a ML / DL project from beginning to end, including understanding the business need, aggregating data, exploring data, building & validating predictive models, and deploying completed models to deliver business impact to the organization.· Analyze, extract, normalize, and label relevant data.· Work with Engineers to help our customers operationalize models after they are built.A day in the life· Design and review mobile experiments for growth and engagement· Build statistical models and generate data insights to understand mobile growth and retention· Feature engineering to improve ML model performance.· Analyze, extract, normalize, and label relevant data.· Work with Engineers to deploy applications to production· Work with product manager to convert business problems to science problems and define the solutions.About the teamAlexa Mobile Intelligence team is motivated to make Alexa mobile app being the best intelligent assistant and providing personalized relevant features and content by understanding customers' habits, preferences, hence will reach high growth and retention for the app.
US, CA, Sunnyvale
Job summaryOur Alexa Product Advisor (part of Alexa Shopping) vision is to provide the best possible answers for a wide range of questions around product being asked by the customer. Our customers ask various questions to Alexa regarding products, and not all the time we can find an answer in our knowledge sources. "Alexa, how strong is the magsafe on iPhone 12?" is a typical question that could be asked to our system. The first step in providing these answers is to form high quality classification and machine understanding of natural language questions into their core components (shape, product references, attributes, pronouns etc).Alexa Shopping is looking for an experienced Data Scientist to be a part of a team solving complex natural language processing problems and customer demand insights (including segmentation analysis and personas building using big data, ML and potentially AI). This is a blue-sky role that gives you a chance to roll up your sleeves and dive into big data sets in order to build simulations and experimentation systems at scale, build optimization algorithms and leverage cutting-edge technologies across Amazon. This is an opportunity to think big about how to solve a challenging problem for the customers and understand their requirements for products.If you are thinking how big is this, then think how we searched on desktops in 2000's, mobiles in 2010s and on voice and intelligent devices today! We want to provide a great product experience though the intelligence we are building about products on any platform, making it easier for customers to learn about the products on Echo devices, mobile app, desktop, etcYou will work closely with product and technical leaders throughout Alexa Shopping and will be responsible for influencing technical decisions in areas of development/modelling that you identify as critical future product offerings. You will identify both enablers and blockers of adoption for product understanding, and build programs to raise the bar in terms of understanding product questions and predict the shaping of customer utterances as we move from simple to complex utterances.The ideal candidate will have extensive experience in Science work, business analytics and have the aptitude to incorporate new approaches and methodologies while dealing with ambiguities in sourcing processes. Excellent business and communication skills are a must to develop and define key business questions and to build data sets that answer those questions. You should have a demonstrated ability to think strategically and analytically about business, product, and technical challenges. Further, you must have the ability to build and communicate compelling value propositions, and work across the organization to achieve consensus. This role requires a strong passion for customers, a high level of comfort navigating ambiguity, and a keen sense of ownership and drive to deliver results.
US, CA, Palo Alto
Job summaryAmazon is the 4th most popular site in the US (http://www.alexa.com/topsites/countries/US). Our product search engine is one of the most heavily used services in the world, indexes billions of products, and serves hundreds of millions of customers world-wide. We are working on a new AI-first initiative to re-architect and reinvent the way we do search through the use of extremely large scale next-generation deep learning techniques. Our goal is to make step function improvements in the use of advanced Machine Learning (ML) on very large scale datasets, specifically through the use of aggressive systems engineering and hardware accelerators. This is a rare opportunity to develop cutting edge ML solutions and apply them to a problem of this magnitude. Some exciting questions that we expect to answer over the next few years include:· Can a focus on compilers and custom hardware help us accelerate model training and reduce hardware costs?· Can combining supervised multi-task training with unsupervised training help us to improve model accuracy?· Can we transfer our knowledge of the customer to every language and every locale ?This is a unique opportunity to get in on the ground floor, shape, and build the next-generation of Amazon Search. We are looking for exceptional scientists and ML engineers who are passionate about innovation and impact, and want to work in a team with a startup culture within a larger organization.Please visit https://www.amazon.science for more information
US, CA, Sunnyvale
Job summaryAmazon Lab 126 specializes in pioneering new home experiences that brings the future one step closer. The most recent invention is Amazon Astro, a home robot that brings the family closer and provides peace of mind. Building a home robot that gracefully moves through an ever-changing environment, such as one’s home, required challenging the state-of-the-art and furthering it, in areas of Perception, SLAM, Mapping and Intelligent Motion to name a few. Packing that technology in an affordable piece of hardware that consistently accomplishes its tasks, is a whole another story!Ada Lovelace, the first computer programmer, once famously said, “Those who have learned to walk on the threshold of the unknown worlds, by means of what are commonly termed par excellence the exact sciences, may then, with the fair white wings of imagination, hope to soar further into the unexplored amidst which we live”. With the launch of Astro, we are on the threshold of something that will change our lives forever. Join us, as we soar further to imagine and invent new experiences that will one day become the future. It is still Day One!Key job responsibilitiesAs a Senior Applied Scientist in Robotics, you will work with a team of smart, passionate and diverse engineers researching and developing mobility solutions for the robot, in the areas of intelligent motion, mapping, exploration - to name a few. You will design solutions for complex and ambiguous problem areas where the business problem or opportunity may not yet be defined. Most business problems that you will take on, require scientific breakthroughs. You will provide context for current technology choices and make recommendations on the right modelling and component design approach to achieve the desired customer experience/business outcome. You will set standards and proactively drive components to utilize and improve on state-of-the-art techniques. Your will create solutions that are inventive, easily maintainable, scalable, and extensible. You will file for patents and publish research work where opportunities arise, and give internal or external presentations about your area of speciality.
IL, Haifa
Job summaryYou: Alexa, I am looking for a role in which I could learn, research, and innovate in AI and, most of all, impact the life of millions of customers worldwide. What do you suggest?Alexa: The Alexa Shopping team is looking for research engineers to help me become the best personal shopping assistant. Do you want to hear more?You: Yes, please!Alexa: As a research engineer, you will work with top researchers and engineers, both locally and abroad, to explore and develop new AI technologies helping me in my journey to become the ultimate shopping assistant for millions of customers around the world. You should have strong computer science foundations, excellent development skills, and some experience with research methodology. You also preferably have some applied or research expertise in at least one of the following fields: Web search and mining, Machine Learning, Natural Language Processing, Computer Vision, Speech Processing, or Artificial Intelligence.
US, CA, Sunnyvale
Job summaryAmazon Lab126 is an inventive research and development company that designs and engineers high-profile consumer electronics. Lab126 began in 2004 as a subsidiary of Amazon.com, Inc., originally creating the best-selling Kindle family of products. Since then, we have produced groundbreaking devices like Fire tablets, Fire TV and Amazon Echo. What will you help us create?The Role:We are looking for a passionate, talented and inventive Senior Applied Scientist - Sensors to join our team. As part of the larger technology team working on new consumer technology, your work will have a large impact to hardware, internal software developers, ecosystem, and ultimately the lives of Amazon customers. You must love high quality signal processing, enjoy sensor data analysis, optimizing sensor performance, and have a feel for what a good consumer experience should be like. In this role, you will: - Engage with an experienced cross-disciplinary staff to conceive and design innovative consumer products · Work closely with an internal interdisciplinary team, and outside partners to drive key aspects of product definition, execution and test · Development of new sensor algorithms · Optimization and porting of sensor algorithms to different platforms. · Integrate vendor hardware and software stacks · Be able, and willing, to multi-task and learn new technologies quickly · Be responsive, flexible and able to succeed within an open collaborative peer environment
IE, D, Dublin
Job summary*Flexibility for alternate EU Amazon offices*Amazon’s mission is to be the most customer centric company in the world. The Workforce Staffing organization is on the front line of that mission by hiring the hourly fulfilment associates who make that mission a reality. To drive the necessary growth and continued scale of Amazon’s associate needs within a constrained employment environment, Amazon is creating a Workforce Staffing research program.This program will re-invent how Amazon attracts, communicates with, and ultimately hires its hourly associates. This team will own multi-layered research and program implementation to drive deep learnings, process improvements, and strategic recommendations to global leadership. Are you passionate about data? Are you a tinkerer by trade? Do you enjoy questioning the status quo? Do complex and difficult challenges excite you? If yes, this may be the team for you.As a Manager, Data Science in Workforce Staffing, you will have a strong focus on quantitative data analysis, understanding labor markets and the candidates within them. You will be responsible for building and developing a team, developing roadmaps, and driving business impact through your research at global scale.You will lead data science projects using your deep expertise in statistics (regressions, multilevel models, structural equation models, etc.), and data collection in a variety of settings (e.g., field studies, surveys, existing large data sets) to define and answer nebulous problems. You leverage your quantitative background to develop and test theoretical frameworks and design experiments. You design, deployment, and conduct analysis of our global candidate research activities, using experimental, quasi-experimental, and RCT methods. You relentlessly obsess over understanding our candidates and what attracts them to Amazon. You work with colleagues across Research, Data Science, Business Intelligence and related teams to enable Amazon find and hire the right candidates for the right roles at an unprecedented scale.A customer-obsessed, relentless curiosity is a must, as is commitment to the highest standards of methodological rigor that a given study allows. This role provides opportunity for significant exposure to Amazon’s culture, leadership, and global businesses, and furthermore provides significant opportunity to influence how Workforce Staffing matches talent to business demand.This will be a highly visible role across multiple key deliverables for our global organization. If you are passionate and curious about data, obsess over customers, love questioning the status quo, and want to make the world a better place, let’s chat. #scienceemea
ES, M, Madrid
Job summary*Flexibility for alternate EU Amazon offices*Amazon’s mission is to be the most customer centric company in the world. The Workforce Staffing organization is on the front line of that mission by hiring the hourly fulfilment associates who make that mission a reality. To drive the necessary growth and continued scale of Amazon’s associate needs within a constrained employment environment, Amazon is creating a Workforce Staffing research program.This program will re-invent how Amazon attracts, communicates with, and ultimately hires its hourly associates. This team will own multi-layered research and program implementation to drive deep learnings, process improvements, and strategic recommendations to global leadership. Are you passionate about data? Are you a tinkerer by trade? Do you enjoy questioning the status quo? Do complex and difficult challenges excite you? If yes, this may be the team for you.As a Manager, Data Science in Workforce Staffing, you will have a strong focus on quantitative data analysis, understanding labor markets and the candidates within them. You will be responsible for building and developing a team, developing roadmaps, and driving business impact through your research at global scale.You will lead data science projects using your deep expertise in statistics (regressions, multilevel models, structural equation models, etc.), and data collection in a variety of settings (e.g., field studies, surveys, existing large data sets) to define and answer nebulous problems. You leverage your quantitative background to develop and test theoretical frameworks and design experiments. You design, deployment, and conduct analysis of our global candidate research activities, using experimental, quasi-experimental, and RCT methods. You relentlessly obsess over understanding our candidates and what attracts them to Amazon. You work with colleagues across Research, Data Science, Business Intelligence and related teams to enable Amazon find and hire the right candidates for the right roles at an unprecedented scale.A customer-obsessed, relentless curiosity is a must, as is commitment to the highest standards of methodological rigor that a given study allows. This role provides opportunity for significant exposure to Amazon’s culture, leadership, and global businesses, and furthermore provides significant opportunity to influence how Workforce Staffing matches talent to business demand.This will be a highly visible role across multiple key deliverables for our global organization. If you are passionate and curious about data, obsess over customers, love questioning the status quo, and want to make the world a better place, let’s chat. #scienceemea