Search CORE

120 research outputs found

Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features

Author: Budhi Gregorius Satia
Chiong Raymond
Wang Zuli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/01/2021
Field of study

Fraudulent online sellers often collude with reviewers to garner fake reviews for their products. This act undermines the trust of buyers in product reviews, and potentially reduces the effectiveness of online markets. Being able to accurately detect fake reviews is, therefore, critical. In this study, we investigate several preprocessing and textual-based featuring methods along with machine learning classifiers, including single and ensemble models, to build a fake review detection system. Given the nature of product review data, where the number of fake reviews is far less than that of genuine reviews, we look into the results of each class in detail in addition to the overall results. We recognise from our preliminary analysis that, owing to imbalanced data, there is a high imbalance between the accuracies for different classes (e.g., 1.3% for the fake review class and 99.7% for the genuine review class), despite the overall accuracy looking promising (around 89.7%). We propose two dynamic random sampling techniques that are possible for textual-based featuring methods to solve this class imbalance problem. Our results indicate that both sampling techniques can improve the accuracy of the fake review class—for balanced datasets, the accuracies can be improved to a maximum of 84.5% and 75.6% for random under and over-sampling, respectively. However, the accuracies for genuine reviews decrease to 75% and 58.8% for random under and over-sampling, respectively. We also discover that, for smaller datasets, the Adaptive Boosting ensemble model outperforms other single classifiers; whereas, for larger datasets, the performance improvement from ensemble models is insignificant compared to the best results obtained by single classifiers

Scientific Repository

Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews

Author: Budhi Gregorius Satia
Chiong Raymond
Dhakal Sandeep
Wang Zuli
Publication venue
Publication date: 31/03/2021
Field of study

The financial impact of positive reviews has prompted some fraudulent sellers to generate fake product reviews for either promoting their products or discrediting competing products. Many e-commerce portals have implemented measures to detect such fake reviews, and these measures require excellent detectors to be effective. In this work, we propose 133 unique features from the combination of content and behaviour-based features to detect fake reviews using machine learning classifiers. Preliminary results show that these features can provide good results for all datasets tested. Detailed analysis of the results, however, reveals the existence of class imbalance issues for two of the bigger datasets - there is a high imbalance between the accuracies of different classes (e.g., 7.73% for the fake class and 99.3% for the genuine class using a Multilayer Perceptron classifier). We therefore introduce two sampling methods that can improve the accuracy of the fake review class on balanced datasets. The accuracies can be improved to a maximum of 89% for both random under and oversampling on Convolutional Neural Networks. Additionally, we propose a parallel cross-validation method that can speed up the validation process in a parallel environment

Scientific Repository