Don’t Lie To Me: Integrating Client-Side Web Scraping And Review Behavior Analysis To Detect Fake Reviews

Levinson, Benjamin Joseph

Abstract

User reviews are a widespread across the Internet as an indicator of the quality of a product. However, review systems can be vulnerable to attack. Malicious parties can manipulate the ratings of items by soliciting fake reviews in exchange for small payments. Sellers can use these fake reviews to hurt competitors or to promote their own products, artificially decreasing or increasing the ratings of products by paying for reviews. From previous work that uses crowdsourcing website postings to find fake reviews, we have a trained model that can detect fraudulent reviews using the time and rating features of reviews for a product (Kaghazgaran et. al "TOMCAT", ICWSM’19). This work also provides a web-based demo to validate the reliability of the reviews for a product. We encapsulate this model into a browser application that, when activated on an Amazon product page, crawls the reviews associated with that product, and issues a review manipulation score to the user. We also store the crawled reviews with the intention of building a dataset of reviews over time that can be used for further study into review manipulation and ways to improve review systems. Finally, we have analyzed the behavioral features of reviews using the dataset provided by previous work (Kaghazgaran et. al "TOMCAT", ICWSM’19) that contains a set of random products and reviews from products known to be targets of manipulation. This analysis has uncovered more possible methods of determining review manipulation on a product by looking at the average number of helpfulness votes that reviews on a product receive, the average title length of a product’s reviews, and the average length of a product’s reviews. However, we found that the average length of a product’s reviews was the feature that was most correlated within our dataset with review manipulation. We expect that future work focusing on the addition of these features will increase the overall effectiveness of the detection model.

URI

https://hdl.handle.net/1969.1/175409

Subject

textual analysis
human computer interaction
machine learning
software design

Collections

Undergraduate Research Scholars Capstone (2006–present)

Citation

Levinson, Benjamin Joseph (2019). Don’t Lie To Me: Integrating Client-Side Web Scraping And Review Behavior Analysis To Detect Fake Reviews. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /175409.