Conversational AI

Model assesses the validity of tips offered in product reviews

Method would enable customers to evaluate supporting evidence for tip reliability.

September 2, 2022

4 min read

Product reviews are a popular and important feature in e-commerce websites, which many customers rely on in their shopping journeys. The reviews often contain personal experiences and opinions that can help other customers make more informed purchasing decision. Additionally, the reviews contain practical and non-obvious advice for making better, easier, and safer use of products. For example, “Charge for 8 hours before using this camera for the first time.” Such recommendations are referred to as “product tips”.

To save customers from having to read through tens and even hundreds of reviews to locate helpful tips, researchers have introduced automatic methods to extract tips from reviews. These tips can be presented, for example, in dedicated widgets on the sites. However, as tips are typically non-obvious recommendations, customers may rightfully question their validity and look for support or opposition from fellow customers.

New model sets new standard in accuracy while enabling 60-fold speedups.

In a paper that we presented at this year’s meeting of the ACM Special Interest Group in Information Retrieval (SIGIR), which we cowrote with Miriam Farber (who was at Amazon when the work was done) and David Carmel, we present a method for determining the degree to which a tip is supported or opposed by all of a product’s reviews.

At the heart of our method is a model that determines the level of support, contradiction, or neutrality between a tip and a sentence from another review. This is a challenging task, as support and contradiction between two natural-language sentences come in many forms. For example, the recommendation “Charge for 8 hours before using this camera for the first time” is supported by the sentence “it’s recommended to charge before usage” but contradicted by the statement “The battery comes pre-charged”.

Novel pretraining method enables increases of 5% to 14% on five different evaluation metrics.

In an experiment using product tips from multiple product categories, we retrieved for each tip up to five review sentences that our model identified as supporting the tip and up to five sentences identified as contradicting it. At coverage of 50% — that is, when we restrict ourselves to the 50% of tip-sentence pairs for which our model makes its most confident predictions — our method achieves precision of 72% and 58% in detecting support relation and contradiction relation respectively.

As our task is precision oriented, we also consider coverage of 25% and find that the precision is improved to 79% and 67% in detecting support and contradiction relations. These results reflect 8% and 29% relative improvements over off-the-shelf models, attesting to the challenging nature of this task. We further found that at least half of extracted tips have supporting reviews, and at least a third have contradicting reviews.

Our new method can potentially be integrated into widgets that offer tips and also provide their support levels and links to related reviews, so customers can assess their validity.

An illustration of a wireless home security camera, together with a tip extracted from a product review ("Charge for 8 hours before using") and star ratings indicating the degree of support the tip receives from other product reviews. — Potential widget design that incorporates the support level per tip and links to supporting and opposing reviews, to help customers assess tips’ validity.

Tips’ support-level estimation

Our method operates in three steps, as shown in the following example:

Tip support process.jpg — The three-step process for gauging tip support level.

Step 1: Given a product tip that was extracted from a customer review, our goal is to measure the amount of support and contradiction the tip receives from all reviews of that product. However, some products have thousands of reviews, so our algorithm retrieves the few hundred sentences with the greatest similarity to the tip. We estimate similarity using nearest-neighbor search over sentence embeddings. This is done in order to expedite the next steps, which rely on more computation-intensive models.

Identifying descriptions of events that did not take place in product reviews improves product retrieval results.

Step 2: Using a sentence-to-sentence support-level classifier, we compute a support score and a contradiction score for the tip and each of the related sentences. The support-level classifier is a neural model that was trained on pairs of sentences that were manually annotated as supportive, contradictory, or neutral relative to each other. The classifier outputs three scores — for support, contradiction, and neutrality — that sum to 1.

Step 3: Finally, all the support scores and contradiction scores are aggregated over all related sentences, providing a global support score and a global contradiction score, which reflect the support level of all reviews relative to the given tip.

With the ability to estimate a tip’s support and contradiction scores, we define the following taxonomy to characterize a tip:

This screenshot shows the top third of the Amazon.com homepage as of Jan. 19, 2022

The scientist's work is driving practical outcomes within an exploding machine learning research field.

Highly supported: Tip with many supporting and almost no contradicting sentences.
Highly contradicted: Tip with many contradicting and almost no supporting sentences.
Controversial: Tip with many supporting and many contradicting sentences.
Anecdotal: Tip with almost no support and no contradiction sentences.

In order to examine the distribution of tips according to this taxonomy, we split the support and contradiction scores into three ranges, low, medium, and high. The tips are then assigned to the cells they belong to, creating three-by-three heat-maps.

As examples, the figure below presents the heat map (a) across all categories and (b) for the apparel category. We found that controversial tips are very common in the apparel category (43% of tips). These tips are often size related, e.g., "Order a size bigger than what you would normally wear", while other reviews suggest, "This is true to size and fits perfectly."

Tip support taxonomy.png — Distribution of tips (a) across all categories and (b) in the apparel category, according to the tip support-level taxonomy.

Product reviews, and product tips in particular, are important and helpful to customers. We believe that by presenting the support level per tip and providing links to supporting or opposing reviews, we can help customers estimate tips’ validity and decide how much credence to give each tip.

About the Author

Lital Kuchy

Lital Kuchy is a research scientist in the Alexa Shopping organization.

Avihai Mejer

Avihai Mejer is a managing research scientist in the Alexa Shopping organization.

Model assesses the validity of tips offered in product reviews

Method would enable customers to evaluate supporting evidence for tip reliability.

Tips’ support-level estimation

Related content

Work with us