The science behind the new “Alexa, what should I watch?” Fire TV experience
The phrase launches a feature built to help customers navigate an increasingly complex and diverse world of content.
"What should I watch?"
In an entertainment universe filled with a rapidly expanding catalog of shows across myriad channels and apps, this might be one of the most common questions to pop up in many households. And if you are among those who have trouble keeping up with all the latest shows and pinpointing which ones are worth your time, you are not alone.
In fact, more than half of respondents in a recent survey from the consulting firm Deloitte found it difficult to access content across multiple services, and 49% were frustrated if a service failed to provide them with good recommendations. Viewers find themselves surfing … and surfing. It takes the average smart TV owner 12 minutes to land on a show, according to a 2020 survey by Tivo — and for some viewers that can take up to half an hour.
"It's kind of shocking how much time customers have to spend on finding content instead of just sitting down on the couch and jumping into a TV show or a movie that they really enjoy," said Cosmin Laslau, a technical program manager who works on spoken language understanding as part of the Amazon Alexa Entertainment team. "We wanted to leverage new technology to help solve that problem for customers."
The team did that by launching What Should I Watch (WSIW). The new experience, released in mid-September, combines Alexa AI and Fire TV recommendations to turn Alexa into an entertainment expert who provides relevant suggestions with a conversational customer experience. The experience also works with the new Fire TV Cube, the Fire TV Omni QLED Series, and the Alexa Voice Remote Pro announced at the 2022 Devices and Services event.
“We built WSIW to rapidly experiment with new Alexa technologies and push the envelope on discovery experiences to address the core customer need to find something interesting to watch,” explained Parthasarathi Dutta Sharma, a product manager who helped bring WSIW to customers.
WSIW displays personalized recommendations when customers ask, “Alexa, what should I watch?” or a variant of that phrase. Customers can then customize the recommendations using voice prompts (for example, “just the ones that are free to me”) or by using their remote to select filters on the screen, watch trailers, view additional information (eg genre, ratings), and initiate playback.
The experience combines innovation for both Fire TV, with its extensive catalog, search and recommendation features, and the conversational AI that drives Alexa.
"We wanted to layer on these new innovations that have been developed around Alexa Conversations specifically," Laslau said. "We've given customers a broad range of natural ways to interact with Alexa, without being limited to a single utterance."
Since previewing WSIW last fall and beginning beta testing with customers, teams have worked to refine the customer experience.
“We used beta testing to closely observe how customers interacted with WSIW and to validate our core hypotheses on what works for customers,” explained Dutta Sharma. “A prime hypothesis we validated was viewers naturally gravitate to using natural language, with variability in inputs, while interacting with Alexa.”
For example, to customize recommendations, the team found that initially customers might say, “I am in the mood for something funny”. They would then follow that by asking, “Which of these are on Prime Video?” or simply stating, “free to me”. So, the team worked to ensure WSIW could support those types of interactions with Alexa. It proved to be a feature customers responded to enthusiastically.
The team also responded to early feedback by introducing more gradual introductions to autoplay trailers and swapped an intro video on how to use the WSIW feature with on-screen contextual hints.
“Another insight was that customers wanted to be able to view only the titles they were already entitled to — versus those for rent or purchase — so we added a permanent free-to-me filter. Customers routinely call that out as a highlight,” Dutta Sharma said.
Building AI for the entertainment space
"But bringing natural conversation to the entertainment domain has its own set of unique challenges," Laslau explained. Maybe a show, like The Boys or The Expanse, is ambiguously named, or a movie starts to trend that wasn't in the catalog a week or two ago. Optimizing the feature required combining core advances in AI around natural, multi-turn conversations with a fast-changing catalog.
"We are making sure those natural conversations are intelligent enough to reflect the very latest of what's happening in entertainment," he said.
The team also worked to ensure a mix of personalization based on your preferences— those British detective series you always gravitate toward — and something new that you might not have seen otherwise.
They did this by customizing Fire TV's existing recommender technology, mixing personalization with popular titles and randomizing subsets of these lists so that viewers encounter fresh ideas each time they turn on the TV.
A flywheel effect on innovation
The deep-learning-based Alexa Conversations makes it far simpler to develop the thousands of potential dialogue turns that a “What Should I Watch?” utterance might generate.
Alexa Conversations comprises three models: entity recognition (identifying Tom Cruise as an actor, for example), action prediction (utilizing the “movie searching” API to find movies), and argument filling (indicating the movies to be those with Tom Cruise).
“Alexa Conversations is designed to reduce the burden on developers, generating variations of dialogue automatically. The team has added several new features recently,” said Jiun-Yu Kao, an applied scientist within the Alexa AI Natural Understanding organization.
The WSIW experience is the first to launch with enhanced understanding of screen context.
Those include conversational Q&A which allow customers to ask broad questions about the recommended titles, such as which movies won an Oscar; a context reset function that allows a user to "start over" with a blank slate; and visual context, which enhances Alexa’s ability to respond correctly when a viewer says something like, "play the one on the left,” referencing what’s on the screen instead of naming the movie title.
“The WSIW experience is the first to launch with enhanced understanding of screen context,” Kao said. “It is also the first to combine all above-listed features for improved customer experience.”
Alexa and Fire TV science, engineering, and product teams collaborated to build the different components of the new feature.
“What’s super cool is that we are tapping into so many different services in parts of Alexa and Fire TV,” said Carlos Mattoso, a Fire TV software development engineer. “We are using a lot of the domain knowledge and capabilities that Fire TV has built around the recommendation space, for instance. But where we do that, we’re also trying to raise the bar: How can we use the information we’re gleaning from usage of What Should I Watch back into the system so that we have this flywheel that continuously improves?”
Mattoso noted that work with the Alexa team enabled not just suggestions but new in-context commands for Fire TV playback and volume changes, for example, that weren’t previously available.
“For instance, when we were building the first beta, we did not really have a way of initiating playback of a title from within an Alexa skill for Fire TV,” he explained. “So, we worked together with the Alexa Video team to extend the existing capability and then add support for that feature so that we could use it on WSIW.”
Teams continue to work on making What Should I Watch faster and smarter.
One possibility is for users to explicitly guide Alexa by saying something like, "I'm a big sci-fi fan," or "I don't like horror movies." This type of interaction represents an opportunity for Alexa to adapt to customer engagement preferences, with some preferring to guide the service directly, and others wanting to lean back and take in recommendations.
As collaboration on the experience continues, both Alexa and Fire TV are becoming more capable. That could have a broader effect, particularly for the Alexa skill development community.
“We’re really trying to raise the bar,” Mattoso said, “and the capabilities we develop may eventually benefit third-party skill developers. Those might include improved long-term memory, better context resetting, and better visual context understanding.”