RecSys 2022: “Recommenders are ubiquitous”

Adapting natural-language-processing techniques to recommendation systems and algorithmic fairness are two central topics at this year’s conference.

The ACM Conference on Recommender Systems (RecSys), the leading conference in the field of recommendation systems, takes place this week, and two Amazon scientists — Max Harper, a senior applied scientist, and Vanessa Murdock, a senior applied-science manager, both in the Alexa Shopping organization — are among the conference’s three general chairs, along with Jennifer Golbeck of the University of Maryland. Harper and Murdock spoke to Amazon Science about the conference program and what it indicates about the state of research on recommender systems.

Amazon Science: Can you tell us a little bit about RecSys?

Max Harper: RecSys has been around for a long time — since the ’90s — and it's a community that's interested in both algorithms and applications of machine learning techniques that model the behavior of users. In particular, RecSys focuses on domains where the definition of the best thing for the model to return depends on which person you ask. So it's personalized.

RecSys portrait.png
Senior applied scientist Max Harper (left) and senior applied-science manager Vanessa Murdock, both of the Alexa Shopping organization, are two of the three general chairs at this year's RecSys.

The classical applications include movies, music, and books, which are obviously taste-driven domains. But these days, it's expanded into tons of areas, including travel, fashion, and job finding.

In addition to algorithms and applications, I'd say about 20% of the field is interested in people, how people perceive recommendations, how to design user interfaces that work well and how to shape the user experience in a variety of ways.

There's also a whole host of machine learning issues that comes along with it, including how to measure performance, how to scale the algorithms, how to preserve users’ privacy. And finally, an increasingly important issue is the societal impacts of these algorithms.

Related content
In 2017, when the journal IEEE Internet Computing was celebrating its 20th anniversary, its editorial board decided to identify the single paper from its publication history that had best withstood the “test of time”. The honor went to a 2003 paper called “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, by then Amazon researchers Greg Linden, Brent Smith, and Jeremy York.

Vanessa Murdock: I sit between the fields of search and recommendation, and they're somewhat different in that recommendations can be made even if the user isn't asking for them, whereas search is usually in response to a request.

Recommenders are ubiquitous — they’re in many of the apps and tools we use every day. For example, if you're looking for a coffee in Seattle, and you look at a map, the resolution of the map that you see on the first view will show you some points of interest, and then, if you zoom in, you'll see more. You can view those first points of interest as recommendations, but it's not what you usually think of as a recommender.

Your Instagram feed and Tik Tok are all recommendations. Your Twitter feed is a set of recommended tweets. It's central to our experience with the digital world in everything that we do.
Vanessa Murdock

All of this research on deciding what people would like to engage with has had significant influence on online commerce and ads and sponsored placements. Your Instagram feed and Tik Tok are all recommendations. Your Twitter feed is a set of recommended tweets. It's central to our experience with the digital world in everything that we do.

AS: In 2017, when IEEE Internet Computing celebrated its 20th anniversary, it gave its test-of-time award to Amazon’s 2003 paper on item-to-item collaborative filtering. How has the field evolved since that paper?

MH: The concept of collaborative filtering is still very, very relevant. These days, matrix factorization techniques are much more common; you use them to complete an item-customer matrix. But it's essentially the same class of techniques.

There's a paper at this year's RecSys, “Revisiting the performance of iALS on item recommendation benchmarks”, and it's part of the RecSys replicability track, which is kind of a unique thing at RecSys. This paper has to do with matrix factorization, which the field thinks of as an old-fashioned technique. And the point that authors make in this paper is that a well-tuned matrix factorization algorithm can hold its own against a whole range of more modern deep-learning algorithms.

Related content
Learn how the Amazon Music Conversations team is using pioneering machine learning to make Alexa's discernment better than ever.

VM: The reproducibility track at RecSys is especially good because a lot of reported research is incremental gains over many years. In every paper, the numbers always go up, and the results are always significant, but the improvements don’t always add up over time. Having a reproducibility track really sets RecSys apart. It means that as we are making gains in some area, we can look back and say, “Is this really true?”

In my own work, I've found that when I've tried to reproduce work from other people, the results depend on the collection or the queries or the system parameters. And that's not what a scientific advance really should be. So I think that that's a very important track, and more conferences should add it.

Sequential recommendation

AS: What are some of the newer ideas in the field that you find most intriguing?

MH: If I were to pick the number one thing that seems to have taken over the conference, it would be the application of techniques from natural-language processing to the field of recommender systems. In particular, Transformers and large language models like BERT have been adapted to the context of recommendations in an interesting way.

Related content
Two-day RecSys workshop that extends the popular REVEAL to include CONSEQUENCES features Amazon organizers, speakers.

Essentially, these language models learn the semantics of sentences by modeling which words go with which other words, and you can take an analogous approach in the field of recommendations by looking at, not sentences of words, but sequences of items — for example, products at Amazon or movies at Netflix that users engage with. And by using similar training techniques to what they use in NLP, they can solve problems like next-item prediction: given that the user has looked at these three products most recently, what's the product that they're most likely to look at next?

Language models learn the semantics of sentences by modeling which words go with which other words, and you can take an analogous approach in the field of recommendations by looking at, not sentences of words, but sequences of items.
Max Harper

That concept is called sequential recommendation, and it is everywhere at RecSys this year.

AS: Does sequential recommendation use the same kind of masked training that language models do?

MH: Yeah, it does. You take a sequence of user behavior, and you hide one of the items that they actually interacted with and try to predict that that's part of the sequence.

AS: How is that approach adapted to the new setting?

MH: Two examples I can think of: One is that there's aren’t necessarily natural boundaries in a sequence of user interactions, so you might be tempted to look at the entire sequence of interactions in order to predict the next one. Researchers are looking at the degree to which recency is important in next-item recommendation.

Another one is that sentences are more predictable: if you're missing a word in a sentence, it's more likely that a human could guess what that word is. With a sequence of item clicks or ratings or purchases, there might be a lot of noise with certain items.

Related content
The scientist's work is driving practical outcomes within an exploding machine learning research field.

Yusan Lin, who joined Amazon Fashion this year as an applied scientist, is a coauthor on a RecSys paper called “Denoising self-attentive sequential recommendation”, and it's about that concept: how do you find those items that are potentially harmful to the performance of the system and essentially hide them from the training so that the system learns more of a clean language, if you will, of what people are interested in?

VM: Sometimes the sequence of interactions is way too predictable. In e-commerce, if you think about reordering, where, say, you order the same brand of coffee absolutely every week, there's not really a benefit to recommending that coffee to you, even though it's very accurate. So there's some subtlety in there when we're talking about predicting the next recommendation — the next good recommendation — from a sequence of user interactions.

Fairness

AS: Vanessa, are there any other recent research trends that you find particularly interesting?

VM: In the last, say, 10 years, the attention that researchers have been paying to bias and fairness is tremendously important. As we get better at predicting what people need, and as we become more embedded in everyday life, the effort to make sure that we're not introducing unintended biases is very, very important. It's a hard problem, and I'm very happy to see attention to that.

AS: What kind of approaches do people take to that problem?

VM: The first thing is that the researcher actually has to be aware of the problem. A lot of times the data is very large, and the items you are trying to predict are a very small subset. Suppose that you have a group of people who have blue hair, and they're very interested in products for blue hair. You can imagine they are a tiny, tiny proportion of your data. If your recommender is based on what most people like, you're never going to offer them anything for their blue hair.

It's a class of problems called unknown unknowns, where there’s a small positive class, but you don't know how big it is, and you don't have a way to find that in your data. You know there are some people with blue hair because they've interacted with blue-hair things, but you don't know how many of your customers actually have blue hair.

Related content
Research investigates how to construct recommendation algorithms when the search space is massive and how to perform natural-language searches on the COVID-19 literature.

Some approaches for that are to sample in a clever way or to create synthetic data or to do domain adaptation, where you have a large amount of known data from some other domain that you can adapt to this new area. For instance, you have a lot of data about people who have green hair, and you can adapt that to people with blue hair.

Another is to look at whether the data itself has a skew in the features. Maybe the features are accidentally correlated, or maybe something is not represented well, because the feature space for the blue haired items is too small. Those are all things to look at.

MH: I totally agree that fairness, along with privacy and explainability, are big topics at this year's RecSys. There definitely is research into news recommendation, which is a big, important topic to the world. There's this idea of filter bubbles, which is a long-hypothesized problem, but one that we're seeing in practice, in which personalization technology makes the range of opinions that we see online shallower and shallower. So for instance, we'll see news that confirms our own beliefs rather than seeing a diversity of viewpoints.

There's some work on those topics at this year's RecSys. One paper in particular I thought was quite interesting because they took a principled approach to looking at what it means for a news article to be diverse. There's a shallow, algorithmic definition of diversity that most prior research has used that may or may not line up with what humans perceive as diversity in news articles.

So they took this more principled approach to measuring diversity using natural-language techniques. They provided a mathematical foundation for measuring the diversity of a set of articles and looked at how different algorithms actually behave on a news dataset. I think that work on fairness is really important and will be very influential in years to come.

Related content

US, CA, San Francisco
About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We’re always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog. About the role: Twitch builds data-driven machine learning solutions across several rich problem spaces: Natural Language Processing (NLP), Recommendations, Semantic Search, Classification/Categorization, Anomaly Detection, Forecasting, Safety, and HCI/Social Computing/Computational Social Science. As an Intern, you will work with a dedicated Mentor and Manager on a project in one of these problem areas. You will also be supported by an Advisor and participate in cohort activities such as research teach backs and leadership talks. This position can also be located in San Francisco, CA or virtual. You Will: Solve large-scale data problems. Design solutions for Twitch's problem spaces Explore ML and data research
LU, Luxembourg
Are you a talented and inventive scientist with a strong passion about modern data technologies and interested to improve business processes, extracting value from the data? Would you like to be a part of an organization that is aiming to use self-learning technology to process data in order to support the management of the procurement function? The Global Procurement Technology, as a part of Global Procurement Operations, is seeking a skilled Data Scientist to help build its future data intelligence in business ecosystem, working with large distributed systems of data and providing Machine Learning (ML) and Predictive Modeling expertise. You will be a member of the Data Engineering and ML Team, joining a fast-growing global organization, with a great vision to transform the Procurement field, and become the role model in the market. This team plays a strategic role supporting the core Procurement business domains as well as it is the cornerstone of any transformation and innovation initiative. Our mission is to provide a high-quality data environment to facilitate process optimization and business digitalization, on a global scale. We are supporting business initiatives, including but not limited to, strategic supplier sourcing (e.g. contracting, negotiation, spend analysis, market research, etc.), order management, supplier performance, etc. We are seeking an individual who can thrive in a fast-paced work environment, be collaborative and share knowledge and experience with his colleagues. You are expected to deliver results, but at the same time have fun with your teammates and enjoy working in the company. In Amazon, you will find all the resources required to learn new skills, grow your career, and become a better professional. You will connect with world leaders in your field and you will be tackling Data Science challenges to ensure business continuity, by taking the right decisions for your customers. As a Data Scientist in the team, you will: -be the subject matter expert to support team strategies that will take Global Procurement Operations towards world-class predictive maintenance practices and processes, driving more effective procurement functions, e.g. supplier segmentation, negotiations, shipping supplies volume forecast, spend management, etc. -have strong analytical skills and excel in the design, creation, management, and enterprise use of large data sets, combining raw data from different sources -provide technical expertise to support the development of ML models to facilitate intelligent digital services, such as Contract Lifecycle Management (CLM) and Negotiations platform -cooperate closely with different groups of stakeholders, e.g. data/software engineers, product/program managers, analysts, senior leadership, etc. to evaluate business needs and objectives to set up the best data management environment -create and share with audiences of varying levels technical papers and presentations -deal with ambiguity, prioritizing needs, and delivering results in a dynamic environment Basic qualifications -Master’s Degree in Computer Science/Engineering, Informatics, Mathematics, or a related technical discipline -3+ years of industry experience in data engineering/science, business intelligence or related field -3+ years experience in algorithm design, engineering and implementation for very-large scale applications to solve real problems -Very good knowledge of data modeling and evaluation -Very good understanding of regression modeling, forecasting techniques, time series analysis, machine-learning concepts such as supervised and unsupervised learning, classification, random forest, etc. -SQL and query performance tuning skills Preferred qualifications -2+ years of proficiency in using R, Python, Scala, Java or any modern language for data processing and statistical analysis -Experience with various RDBMS, such as PostgreSQL, MS SQL Server, MySQL, etc. -Experience architecting Big Data and ML solutions with AWS products (Redshift, DynamoDB, Lambda, S3, EMR, SageMaker, Lex, Kendra, Forecast etc.) -Experience articulating business questions and using quantitative techniques to arrive at a solution using available data -Experience with agile/scrum methodologies and its benefits of managing projects efficiently and delivering results iteratively -Excellent written and verbal communication skills including data visualization, especially in regards to quantitative topics discussed with non-technical colleagues
US, CA, San Francisco
About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We’re always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog. About the role: Twitch builds data-driven machine learning solutions across several rich problem spaces: Natural Language Processing (NLP), Recommendations, Semantic Search, Classification/Categorization, Anomaly Detection, Forecasting, Safety, and HCI/Social Computing/Computational Social Science. As an Intern, you will work with a dedicated Mentor and Manager on a project in one of these problem areas. You will also be supported by an Advisor and participate in cohort activities such as research teach backs and leadership talks. This position can also be located in San Francisco, CA or virtual. You Will: Solve large-scale data problems. Design solutions for Twitch's problem spaces Explore ML and data research
US, CA, San Francisco
About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We’re always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog. About the role: Twitch builds data-driven machine learning solutions across several rich problem spaces: Natural Language Processing (NLP), Recommendations, Semantic Search, Classification/Categorization, Anomaly Detection, Forecasting, Safety, and HCI/Social Computing/Computational Social Science. As an Intern, you will work with a dedicated Mentor and Manager on a project in one of these problem areas. You will also be supported by an Advisor and participate in cohort activities such as research teach backs and leadership talks. This position can also be located in San Francisco, CA or virtual. You Will: Solve large-scale data problems. Design solutions for Twitch's problem spaces Explore ML and data research
US, CA, San Francisco
About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We’re always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog. About the role: Twitch builds data-driven machine learning solutions across several rich problem spaces: Natural Language Processing (NLP), Recommendations, Semantic Search, Classification/Categorization, Anomaly Detection, Forecasting, Safety, and HCI/Social Computing/Computational Social Science. As an Intern, you will work with a dedicated Mentor and Manager on a project in one of these problem areas. You will also be supported by an Advisor and participate in cohort activities such as research teach backs and leadership talks. This position can also be located in San Francisco, CA or virtual. You Will: Solve large-scale data problems. Design solutions for Twitch's problem spaces Explore ML and data research
US, CA, San Francisco
About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate, learn, and grow their personal interests and passions. We’re always live at Twitch. Stay up to date on all things Twitch on Linkedin, Twitter and on our Blog. About the role: Twitch builds data-driven machine learning solutions across several rich problem spaces: Natural Language Processing (NLP), Recommendations, Semantic Search, Classification/Categorization, Anomaly Detection, Forecasting, Safety, and HCI/Social Computing/Computational Social Science. As an Intern, you will work with a dedicated Mentor and Manager on a project in one of these problem areas. You will also be supported by an Advisor and participate in cohort activities such as research teach backs and leadership talks. This position can also be located in San Francisco, CA or virtual. You Will: Solve large-scale data problems. Design solutions for Twitch's problem spaces Explore ML and data research
US, WA, Seattle
Amazon is seeking an experienced, self-directed data scientist to support the research and analytical needs of Amazon Web Services' Sales teams. This is a unique opportunity to invent new ways of leveraging our large, complex data streams to automate sales efforts and to accelerate our customers' journey to the cloud. This is a high-visibility role with significant impact potential. You, as the right candidate, are adept at executing every stage of the machine learning development life cycle in a business setting; from initial requirements gathering to through final model deployment, including adoption measurement and improvement. You will be working with large volumes of structured and unstructured data spread across multiple databases and can design and implement data pipelines to clean and merge these data for research and modeling. Beyond mathematical understanding, you have a deep intuition for machine learning algorithms that allows you to translate business problems into the right machine learning, data science, and/or statistical solutions. You’re able to pick up and grasp new research and identify applications or extensions within the team. You’re talented at communicating your results clearly to business owners in concise, non-technical language. Key job responsibilities • Work with a team of analytics & insights leads, data scientists and engineers to define business problems. • Research, develop, and deliver machine learning & statistical solutions in close partnership with end users, other science and engineering teams, and business stakeholders. • Use AWS services like SageMaker to deploy scalable ML models in the cloud. • Examples of projects include modeling usage of AWS services to optimize sales planning, recommending sales plays based on historical patterns, and building a sales-facing alert system using anomaly detection.
US, WA, Seattle
We are a team of doers working passionately to apply cutting-edge advances in deep learning in the life sciences to solve real-world problems. As a Senior Applied Science Manager you will participate in developing exciting products for customers. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the leading edge of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with others teams. Location is in Seattle, US Embrace Diversity Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust Balance Work and Life Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives Mentor & Grow Careers Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future. Key job responsibilities • Manage high performing engineering and science teams • Hire and develop top-performing engineers, scientists, and other managers • Develop and execute on project plans and delivery commitments • Work with business, data science, software engineer, biological, and product leaders to help define product requirements and with managers, scientists, and engineers to execute on them • Build and maintain world-class customer experience and operational excellence for your deliverables
US, Virtual
The Amazon Economics Team is hiring Interns in Economics. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Stata, R, or Python is necessary. Experience with SQL, UNIX, Sawtooth, and Spark would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, data scientists and MBAʼs. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Seattle
Amazon internships are full-time (40 hours/week) for 12 consecutive weeks with start dates in May - July 2023. Our internship program provides hands-on learning and building experiences for students who are interested in a career in hardware engineering. This role will be based in Seattle, and candidates must be willing to work in-person. Corporate Projects (CPT) is a team that sits within the broader Corporate Development organization at Amazon. We seek to bring net-new, strategic projects to life by working together with customers and evolving projects from ZERO-to-ONE. To do so, we deploy our resources towards proofs-of-concept (POCs) and pilot programs and develop them from high-level ideas (the ZERO) to tangible short-term results that provide validating signal and a path to scale (the ONE). We work with our customers to develop and create net-new opportunities by relentlessly scouring all of Amazon and finding new and innovative ways to strengthen and/or accelerate the Amazon Flywheel. CPT seeks an Applied Science intern to work with a diverse, cross-functional team to build new, innovative customer experiences. Within CPT, you will apply both traditional and novel scientific approaches to solve and scale problems and solutions. We are a team where science meets application. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems.