ACL: What comes next for natural-language processing?
Amazon Scholar and Columbia professor Kathleen McKeown on model compression, data distribution shifts, language revitalization, and more.
Kathleen McKeown, the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University and an Amazon Scholar, has a storied history with the Association for Computational Linguistics, whose annual meeting is taking place this week. From 1990 to 1992, she served one-year terms as a member of the association’s executive committee, its vice president, and finally its president, and from 1995 to 1997, she was secretary treasurer.
More recently, in 2020, she was one of the conference’s two keynote speakers, and this year, she’s a senior area chair on the program committee. So she has a particularly good perspective on where the conference has been and where it’s headed.
“At the beginning, there was a lot more interdisciplinary work between linguistics, computer science, and psychology,” McKeown says. “And over time we saw ACL become really more focused on machine learning and computer science.
“Of course, the biggest change in the field has been the introduction of large language models and few-shot learning with the use of prompts, which we've seen with GPT-3 and some other large language models. But before few-shot learning with prompts we had fine-tuning. The large language models were pretrained with very large amounts of data, and then you could fine-tune them to specific tasks with a smaller amount of labeled data. That means we don't have to have huge amounts of labeled data for every task we want to do, which is a big advance.
“And then the more recent advance is where you could just have a few examples that you feed as input to a model like GPT-3, and you could get very impressive output."
“When I teach, students ask me, ‘Is there anything more to do in NLP [natural-language processing]?’" McKeown says. "Which at face value is a good question, because these things look so impressive.
“The thing is, we're not done. These large language models are very, very big. They require a lot of computational resources, even if you're using a pretrained language model. So one question is, How can we do things with smaller models that have fewer parameters? Also, how do we deal with tasks where we have very small amounts of data? Those are two issues that I think are ongoing.
Most models are static. But the world changes every minute, every second. Dealing with a dynamic world is a new area that's up and coming.
“What we see now in ACL with the GPT-3-like models is that people are experimenting with the prompts. The prompt could be a single word, it could be a tag of some sort, or it could be a natural-language instruction, like ‘A summary for this paragraph is this sentence.’ So one question is, How do we design those prompts and those instructions?”
Another question for NLP, McKeown says, is one that also confronts many other applications of machine learning: how to adapt to changing data distributions in the real world.
“Most models are static,” McKeown explains. “They are built at a point in time, and what they represent is basically what's true for that point in time. But the world changes every minute, every second. Things that were true are no longer true. The president changes; we probably should be able to see that in some kind of distribution shift. Dealing with a dynamic world is a new area that's up and coming.
“Another topic is ethics. At ACL, there's a whole track on ethics in NLP, looking at what biases these large language models encode. And if we can discover them, how can we mitigate them?
“The theme for this ACL Conference — and a theme in general for the field — is low-resource or endangered languages. These are languages for which we have very little data. We may have very few speakers. How do we develop machine translation for them, or how do we develop different multilingual applications?
“At ACL, there is a best-special-theme paper on ‘Requirements and motivations of low-resource speech synthesis for language revitalization’. This is a really interesting paper on three indigenous languages in Canada. And there are very few people in the community left who can speak these languages. But speech synthesis can help in teaching the younger generation about the language and letting them hear and learn about the language in order that the language is not lost.
“Another big area that is also important for the field is interpretability and analysis of models for NLP. If we have NLP models that make predictions, but we can't explain why they made the prediction that they did, they’re less useful. Interpretability can also include things like inspection of models. You know, we used to always have a separate syntax component. Now we don't, but if we inspect the models, we can see what kind of syntactic information is used.
Finally, McKeown says, there’s a line of work at ACL that hearkens back to the conference’s interdisciplinary past.
“I think people are still interested in psycholinguistics and cognitive modeling,” she says. “There are a good number of people who have shown or show that this kind of information can still impact how well our models and applications do. There has been work, for example, that shows that these models do encode syntactic information and what kind of information they encode. So that tradition is still alive at ACL.”