“Who we are shapes what we say and how we say it”
Amazon Research Award recipient Shrikanth Narayanan is on a mission to make inclusive human-AI conversational experiences.
To hear Shrikanth Narayanan describe it, every single human conversation is a feat of engineering — a complex system for creating and interpreting a dizzying array of signals.
“When I'm speaking, I'm producing this audio signal, which you're able to make sense out of by processing it in your auditory system and neural systems,” Narayanan says. “Meanwhile, you’re decoding my intent and emotions. I've always been fascinated by that.”
Narayanan uses signal processing and machine learning to better understand this sort of real-world information transfer as university professor and Niki & C. L. Max Nikias Chair in Engineering at the University of Southern California (USC).
In 2020, his lab earned an Amazon Research Award for work on creating “inclusive human-AI conversational experiences for children." Today, he continues to collaborate with Amazon researchers through The Center for Secure and Trusted Machine Learning at the USC Viterbi School of Engineering. He’s also gained a reputation for training future Amazon scientists, with dozens of his former students now working full time for the company.
They’re finding new approaches to machine learning privacy, security, and trustworthiness that are helping to shape a future that Narayanan hopes will be more equitable, more secure, and more empathetic.
A signal with ‘complex underpinnings’
Narayanan recalls being fascinated by the scientific side of the human experience as early as high school. At the time, he says, he was mainly interested in our physiology. But in retrospect, he says, his curiosity had the tenor of a tinkering engineer.
“I was always interested in how it all worked,” he says. “I wanted to know how the heart worked, what happened in the brain, how it worked together. I was looking at humans through this lens of systems — the information flow that happens within individuals and between individuals.”
It was in the early ‘90s, while he was pursuing a PhD in electrical engineering at the University of California, Los Angeles, that he managed to combine his diverse interests.
“I was training in electrical engineering, but I really wanted the chance to look at something more directly connected to those human systems,” he says. He got the chance to intern at AT&T Bell Laboratories and realized human language held all the sorts of mysteries he’d been hoping to help solve.
“Human speech is a signal that has these complex underpinnings,” he says. “There’s a cognitive aspect, the mind, and motoric aspects. We use the vocal instrument to create the signal, which in turn gets processed by people.”
Narayanan was fascinated by all the data involved in helping a conversation go right — and how easily conversations can go wrong.
He also became interested in the ways developmental disorders and health conditions could change the process of creating and interpreting speech, as well as how the rich diversity of human cultural contexts could impact the efficacy of voice recognition and synthesis.
In 2000, Narayanan founded USC’s Signal Analysis and Interpretation Laboratory (SAIL) to focus “on human-centered signal and information processing that address key societal needs.”
Over the last two decades, SAIL has enabled advances in audio, speech, language, image, video and bio signal processing, human and environment sensing and imaging, and human-centered machine learning. The lab also applies their findings to create “technologies that are inclusive, and technologies that support inclusion,” Narayanan says.
By that, he means that in addition to making sure technologies like voice recognition actually work for everyone — some of his earliest work involved helping AI pick up on a speaker’s emotional state regardless of their spoken language — he uses signal analysis and interpretation to help uncover and spotlight inequality.
In 2017, SAIL created algorithms for analyzing movie script dialogue in order to measure representation of BIPOC characters. Another SAIL tool analyzed footage directly to track and tally female screen time and speaking time.
In 2019, the lab reported that an algorithm trained on human speech patterns could predict whether or not couples facing hard times would actually stay together. It did so even better than a trained therapist presented with video recordings of the couples in question. Instead of interpreting the content of the discussions —or any visual cues— the algorithm focused on factors like cadence and pitch. A similar tool predicted changes in mental well-being in psychiatric patients as well as human physicians could.
Building trust in AI
“Even if we speak the same language,” Narayanan says, “who we are shapes what we say and how we say it. And this is particularly fascinating for children, because their speech represents a moving target with ongoing developmental changes.”
Even if we speak the same language, who we are shapes what we say and how we say it. And this is particularly fascinating for children, because their speech represents a moving target with ongoing developmental changes.
It’s not just that a child’s vocal instrument is constantly changing as they grow. They’re also developing cognitively and socially. That can mean rapid shifts in the words they use and how they use them. When you add in other factors that might make those speech shifts different from the already diverse average —cultural contexts, speaking or hearing impairments, cognitive differences, or developmental delays — training a voice assistant to effectively communicate with kids poses a real challenge.
The analysis gets even more complicated when interacting with two humans at once, especially if one is an adult and one is a child. Using Amazon Elastic Compute Cloud (Amazon EC2) to process their data, SAIL made advances in core competences like automatic speech recognition to improve speaker diarization — the process of partitioning audio of human speech to determine which person is speaking when.
In 2021, SAIL also published a detailed empirical study of children’s speech recognition. They found that the state-of-the-art end-to-end systems setting high benchmarks on adult speech had serious shortcomings when it came to understanding children. The following year, the lab proposed a novel technique for estimating a child’s age based on temporal variability in their speech.
By measuring the same aspects of speech that make children difficult for AI to interact with — like variations in pause length and the time it takes to pronounce certain sounds — his team was able to reliably measure a child’s developmental stage. That could help AI adapt to the needs of users with less sophisticated language skills. Because the analysis relies on signals that can be stripped of other identifying information, the method also has the potential to help protect a child’s privacy.
Narayanan refers to this and similar projects as “trustworthy speech processing,” and says he and collaborators he’s found through Amazon are working to spread interest in the idea across their booming field. In March, the International Speech Communication Association (ISCA) awarded him their ISCA Medal for Scientific Achievement — the group’s most prestigious award — for his sustained and diverse contributions to speech communication science and technology and its application to human-centered engineering systems. He will receive the medal and deliver the opening keynote lecture in August at Interspeech 2023, held in Dublin, Ireland.
Narayanan notes that the last five years have seen radical changes in our ability to gather and analyze information about human behavior.
“The technology systems have made this sort of engineering leap and allowed applications we hadn’t even imagined yet,” he says. “All these people are interacting with these devices in open, real-world environments, and we have the machine learning and deep learning advances to actually use that audio data.”
The next big challenge, he says, is figuring out how to process that data in a way that not only serves the user, but ensures their trust. In addition to continuing to study how various developmental differences might impact voice recognition—and how AI can learn to adapt to them—Narayanan hopes to find new ways to mask as much user data as possible for privacy while pulling out the signals that voice assistants need.
Ushering in the next generation of researchers
Working with Amazon enables Narayanan’s lab to explore key research themes through a practical lens. He notes that collaborations of this nature provide academics like himself with the time and support to tackle complex, delicate research questions — such as those involving children and other vulnerable populations.
In addition, Naraynan’s graduate students get to work directly with Amazon scientists to understand the potential practical applications of their research.
“This kind of partnership really takes research to the next level,” he says.
The AI revolution that's happening has a very nice connection to what's happening at Amazon, so naturally it was a place where my students found the most exciting challenges and opportunities.
Narayanan has also encouraged dozens of his students to pursue internships at Amazon to explore what industry has to offer. Just as his time at Bell Laboratories helped to crystalize his own interests, he says, he’s watched countless young engineers find exciting new applications for their skills at Amazon.
What started as a gentle nudge to consider Amazon internships and job postings has grown into a steady pipeline of Amazon hires — one that Narayanan says owes entirely to the merits of his lab’s alums.
Angeliki Metallinou, a senior applied science manager for Alexa AI, joined Amazon fulltime in 2014 with Narayanan’s encouragement. Alexa was a top-secret project at the time, so she didn’t know exactly what she’d be working on until she got there. She credits Narayanan with encouraging her to dive in.
“As a student, I hadn’t realized the extent that Amazon scientists collaborate with academia and are able to publish their work at top tier venues and conferences,” she recalls. “I wasn’t even aware that there was such a strong science community here. But Shri already had a few former PhD students working at Amazon, and he recommended it as a great place for an industry career.”
Rahul Gupta, a senior applied scientist for Amazon Alexa, first connected with Amazon for an internship near the end of his SAIL PhD in 2015. These days, he says, he has one or two SAIL students doing summer internships in his group alone.
“There's really good cultural alignment between SAIL and Amazon,” Gupta says.
Narayanan, who proudly displays photos of all of his lab graduates on the wall of his office, admits he’s lost count of how many have worked at Amazon over the years.
“It's exciting,” he says. “The AI revolution that's happening has a very nice connection to what's happening at Amazon, so naturally it was a place where my students found the most exciting challenges and opportunities. But I’ve also seen many of them progress into leadership positions, which I did my best to set them up for — I always encourage creativity and collaboration, and I don’t micromanage them in my lab.”
Now that his graduates are thriving at Amazon, he says, the internship opportunities for his current students are all the more robust.
“It sustains itself,” he says. “They shine in what they do at Amazon and in the community, and that connects back to the lab. It’s incredibly exciting.”