Abstract
The financial impact of online reviews has prompted some fraudulent sellers to generate fake consumer reviews for either promoting their products or discrediting competing products. In this study, we propose a novel ensemble model—the Multi-type Classifier Ensemble (MtCE)—combined with a textual-based featuring method, which is relatively independent of the system, to detect fake online consumer reviews. Unlike other ensemble models that utilise only the same type of single classifier, our proposed ensemble utilises several customised machine learning classifiers (including deep learning models) as its base classifiers. The results of our experiments show that the MtCE can adequately detect fake reviews, and that it outperforms other single and ensemble methods in terms of accuracy and other measurements for all the relevant public datasets used in this study. Moreover, if set correctly, the parameters of MtCE, such as base-classifier types, the total number of base classifiers, bootstrap, and the method to vote on output (e.g., majority or priority), can further improve the performance of the proposed ensemble.
- [1] . 2018. Finding rotten eggs: A review spam detection model using diverse feature sets. KSII Transactions on Internet and Information Systems 12, 10 (2018).
DOI: DOI: http://dx.doi.org/10.3837/tiis.2018.10.026Google Scholar - [2] . 2020. Fake reviews identification based on deep computational linguistic features. International Journal of Advanced Science and Technology 29, 8s (2020), 3846–3856.Google Scholar
- [3] . 2019. A framework for fake review detection in online consumer electronics retailers. Information Processing & Management 56, 4 (2019), 1234–1244.
DOI: DOI: http://dx.doi.org/10.1016/j.ipm.2019.03.002Google ScholarCross Ref
- [4] . 2016. Unsupervised extraction of popular product attributes from e-commerce web sites by considering customer reviews. ACM Transactions on Internet Technology 16, 2 (2016), 1–17.
DOI: http://dx.doi.org/10.1145/2857054Google ScholarDigital Library
- [5] . 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123–140.
DOI: DOI: https://doi.org/10.1007/bf00058655Google ScholarDigital Library
- [6] . 1999. Pasting small votes for classification in large databases and on-line. Machine Learning 36, 1 (1999), 85–103.
DOI: DOI: http://dx.doi.org/10.1023/A:1007563306331Google ScholarCross Ref
- [7] . 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.
DOI: DOI: https://doi.org/10.1023/a:1010933404324Google ScholarDigital Library
- [8] . 2020. Multi-level particle swarm optimisation and its parallel version for parameter optimisation of ensemble models: A case of sentiment polarity prediction. Cluster Computing 23 (2020), 3371–3386.
DOI: DOI: https://doi.org/10.1007/s10586-020-03093-3Google ScholarCross Ref
- [9] . 2017. Predicting rating polarity through automatic classification of review texts. In Proceedings of the 2017 IEEE Conference on Big Data and Analytics (ICBDA). Kuching, Malaysia, 19–24.
DOI: DOI: https://doi.org/10.1109/ICBDAA.2017.8284101Google ScholarCross Ref
- [10] . 2021. Using machine learning to predict the sentiment of online reviews: A new framework for comparative analysis. Archives of Computational Methods in Engineering 28 (2021), 2543–2566.
DOI: DOI: https://doi.org/10.1007/s11831-020-09464-8Google ScholarCross Ref
- [11] . 2021. Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features. Multimedia Tools and Applications 80 (2021), 13079–13097.
DOI: DOI: https://doi.org/10.1007/s11042-020-10299-5Google ScholarCross Ref
- [12] . 2021. Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews. Electronic Commerce Research and Applications 47 (2021), 101048.
DOI: DOI: https://doi.org/10.1016/j.elerap.2021.101048Google ScholarCross Ref
- [13] . 2011. Learning with Support Vector Machines. Morgan & Claypool.Google Scholar
Digital Library
- [14] . 2018. Towards automatic filtering of fake reviews. Neurocomputing 309 (2018), 106–116.
DOI: DOI: http://dx.doi.org/10.1016/j.neucom.2018.04.074Google ScholarDigital Library
- [15] . 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 1–27.
DOI: DOI: https://doi.org/10.1145/1961189.1961199Google ScholarDigital Library
- [16] . 2021. Combining sentiment lexicons and content-based features for depression detection. IEEE Intelligent Systems 36, 6 (2021), 99--105.
DOI: DOI: https://doi.org/10.1109/MIS.2021.3093660Google ScholarCross Ref
- [17] . 2021. A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Computers in Biology and Medicine 135 (2021), 104499.
DOI: DOI: https://doi.org/10.1016/j.compbiomed.2021.104499Google ScholarCross Ref
- [18] . 2019. Feature selection for text classification: A review. Multimedia Tools and Applications 78, 3 (2019), 3797–3816.
DOI: DOI: https://doi.org/10.1007/s11042-018-6083-5Google ScholarDigital Library
- [19] . 2018. Opinion fraud detection via neural autoencoder decision forest. Pattern Recognition Letters 132 (2018), 21--29.
DOI: DOI: http://dx.doi.org/10.1016/j.patrec.2018.07.013Google Scholar - [20] . 2021. Fake reviews detection using supervised machine learning. International Journal of Advanced Computer Science and Applications 12, 1 (2021), 601.
DOI: DOI: https://dx.doi.org/10.14569/IJACSA.2021.0120169Google ScholarCross Ref
- [21] . 2017. The impact of applying different preprocessing steps on review spam detection. Procedia Computer Science 113 (2017), 273–279.Google Scholar
Cross Ref
- [22] . 2016. The role of emotions for the perceived usefulness in online customer reviews. Journal of Interactive Marketing 36 (2016), 60–76.
DOI: DOI: http://dx.doi.org/10.1016/j.intmar.2016.05.004Google Scholar - [23] . 2013. Detecting deceptive opinions with profile compatibility. In Proceedings of International Joint Conference on Natural Language Processing. Nagoya, Japan, 338–346.Google Scholar
- [24] . 1997. A decision-theoretic generalisation of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119–139.
DOI: DOI: https://doi.org/10.1006/jcss.1997.1504Google ScholarCross Ref
- [25] . 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29, 5 (2001), 1189–1232Google Scholar
Cross Ref
- [26] . 2018. Detecting opinion spams through supervised boosting approach. PLoS ONE 13, 6 (2018), e0198884.
DOI: DOI: http://dx.doi.org/10.1371/journal.pone.0198884Google ScholarCross Ref
- [27] . 2015. Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management 51, 4 (2015), 433–443.
DOI: DOI: http://dx.doi.org/10.1016/j.ipm.2014.11.001Google ScholarCross Ref
- [28] . 2016. Detection of fake opinions using time series. Expert Systems with Applications 58 (2016), 83–92.
DOI: DOI: http://dx.doi.org/10.1016/j.eswa.2016.03.020Google ScholarCross Ref
- [29] . 2015. Detection of review spam: A survey. Expert Systems with Applications 42, 7 (2015), 3634–3642.
DOI: DOI: http://dx.doi.org/10.1016/j.eswa.2014.12.029Google ScholarCross Ref
- [30] . 2019. Malicious web domain identification using online credibility and performance data by considering the class imbalance issue. Industrial Management & Data Systems 119, 3 (2019), 676–696.
DOI: DOI: https://doi.org/10.1108/IMDS-02-2018-0072Google ScholarCross Ref
- [31] . 2022. A fuzzy-based ensemble model for improving malicious web domain identification. Expert Systems with Applications 204 (2022), 117243.
DOI: DOI: https://doi.org/10.1016/j.eswa.2022.117243Google ScholarCross Ref
- [32] . 1991. Adaptive mixtures of local experts. Neural Computation 3, 1 (1991), 79–87.
DOI: DOI: https://doi.org/10.1162/neco.1991.3.1.79Google Scholar - [33] . 2021. Fake reviews classification using deep learning ensemble of shallow convolutions. Journal of Computational Social Science 4, 2 (2021), 883–902.
DOI: DOI: http://dx.doi.org/10.1007/s42001-021-00114-yGoogle ScholarCross Ref
- [34] . 2022. A novel ensemble learning approach for stock market prediction based on sentiment analysis and the sliding window method. IEEE Transactions on Computational Social Systems.
DOI: DOI: http://dx.doi.org/10.1109/TCSS.2022.3182375Google ScholarCross Ref
- [35] . 2001. An introduction to the bootstrap. Teaching Statistics 23, 2 (2001), 49–54.
DOI: DOI: https://doi.org/10.1111/1467-9639.00050Google ScholarCross Ref
- [36] . 2019. Keras: The Python Deep Learning library.Google Scholar
- [37] . 2015. Deep neural decision forests. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 1467–1475.Google Scholar
Digital Library
- [38] . 2022. Fraudulent review detection model focusing on emotional expressions and explicit aspects: Investigating the potential of feature engineering. Decision Support Systems 155 (2022), 113728.
DOI: DOI: http://dx.doi.org/10.1016/j.dss.2021.113728Google ScholarCross Ref
- [39] . 2018. Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems 35, 1 (2018), 350–380.
DOI: DOI: http://dx.doi.org/10.1080/07421222.2018.1440758Google ScholarCross Ref
- [40] . 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
DOI: DOI: https://doi.org/10.1109/5.726791Google ScholarCross Ref
- [41] . 2014. Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD, USA, 1566–1576.Google Scholar
Cross Ref
- [42] . 2017. Document representation and feature combination for deceptive spam review detection. Neurocomputing 254 (2017), 33–41.
DOI: DOI: http://dx.doi.org/10.1016/j.neucom.2016.10.080Google ScholarCross Ref
- [43] . 2013. Taking fake online consumer reviews seriously. Journal of Consumer Policy 36, 2 (2013), 139–157.
DOI: DOI: http://dx.doi.org/10.1007/s10603-012-9216-7Google ScholarCross Ref
- [44] . 2008. Naïve Bayes text classification. In Introduction to Information Retrieval Cambridge University Press, 234–265.Google Scholar
Cross Ref
- [45] . 2019. Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering 24, 6 (2019), 3316–3355.
DOI: DOI: http://dx.doi.org/10.1007/s10664-019-09706-9Google ScholarCross Ref
- [46] . 2010. Logistic Regression: From Introductory to Advanced Concepts and Applications. SAGE, Los Angeles, CA.Google Scholar
Cross Ref
- [47] . 2021. Comprehensive strategy for classification of voltage sags source location using optimal feature selection applied to support vector machine and ensemble techniques. International Journal of Electrical Power & Energy Systems 124 (2021), 106363.
DOI: DOI: http://dx.doi.org/10.1016/j.ijepes.2020.106363Google ScholarCross Ref
- [48] . 2016. How to write a spelling corrector.Google Scholar
- [49] . 2013. Negative deceptive opinion spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia, USA, 497–501.Google Scholar
- [50] . 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, 309–319.Google Scholar
- [51] . 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830.Google Scholar
Digital Library
- [52] . 2006. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6, 3 (2006), 21–45.
DOI: DOI: http://dx.doi.org/10.1109/MCAS.2006.1688199Google ScholarCross Ref
- [53] . 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81–106.
DOI: DOI: https://doi.org/10.1007/bf00116251Google ScholarCross Ref
- [54] . 2019. Tectonic discrimination of olivine in basalt using data mining techniques based on major elements: A comparative study from multiple perspectives. Big Earth Data 3, 1 (2019), 8–25.
DOI: DOI: https://doi.org/10.1080/20964471.2019.1572452Google ScholarCross Ref
- [55] . 2017. Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences 385--386, 213–224.
DOI: DOI: http://dx.doi.org/10.1016/j.ins.2017.01.015Google ScholarCross Ref
- [56] . 2016. Ensemble classification and regression-recent developments, applications and future directions. IEEE Computational Intelligence Magazine 11, 1 (2016), 41–53.
DOI: DOI: http://dx.doi.org/10.1109/mci.2015.2471235Google ScholarCross Ref
- [57] . 2016. Deceptive review detection using labeled and unlabeled data. Multimedia Tools and Applications 76, 3 (2016), 3187–3211.
DOI: DOI: http://dx.doi.org/10.1007/s11042-016-3819-yGoogle ScholarCross Ref
- [58] . 1986. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 318–362.Google Scholar
- [59] . 2015. Detection of opinion spam based on anomalous rating deviation. Expert Systems with Applications 42, 22 (2015), 8650–8657.
DOI: DOI: http://dx.doi.org/10.1016/j.eswa.2015.07.019Google ScholarCross Ref
- [60] . 1990. The strength of weak learnability. Machine Learning 5, 2 (1990), 197–227.
DOI: DOI: http://dx.doi.org/10.1007/BF00116037Google ScholarCross Ref
- [61] . 2021. From conflicts and confusion to doubts: Examining review inconsistency for fake review detection. Decision Support Systems 144 (2021), 113513.
DOI: DOI: http://dx.doi.org/10.1016/j.dss.2021.113513Google ScholarCross Ref
- [62] . 2021. Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. Ecological Informatics 61 (2021), 101202.
DOI: DOI: http://dx.doi.org/10.1016/j.ecoinf.2020.101202Google Scholar - [63] . 2016. Exploiting product related review features for fake review detection. Mathematical Problems in Engineering 2016. 1–7.
DOI: DOI: http://dx.doi.org/10.1155/2016/4935792Google Scholar - [64] . 2020. Generating behavior features for cold-start spam review detection with adversarial learning. Information Sciences 526 (2020), 274–288.
DOI: DOI: http://dx.doi.org/10.1016/j.ins.2020.03.063Google ScholarCross Ref
- [65] . 2016. Fake review detection from a product review using modified method of iterative computation framework. In Proceedings of MATEC Web of Conferences 58, 03003.
DOI: DOI: http://dx.doi.org/10.1051/matecGoogle ScholarCross Ref
- [66] . 2016. Learning to represent review with tensor decomposition for spam detection. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Texas, USA, 866–875.Google Scholar
Cross Ref
- [67] . 1992. Stacked generalisation. Neural Networks 5, 2 (1992), 241–259.
DOI: DOI: https://doi.org/10.1016/S0893-6080(05)80023-1Google ScholarCross Ref
- [68] . 2020. Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems 132 (2020), 113280.
DOI: DOI: http://dx.doi.org/10.1016/j.dss.2020.113280Google ScholarCross Ref
- [69] . 2017. Melanoma classification on dermoscopy images using a neural network ensemble model. IEEE Transactions on Medical Imaging 36, 3 (2017), 849–858.
DOI: DOI: http://dx.doi.org/10.1109/TMI.2016.2633551Google ScholarCross Ref
- [70] . 2015. Forecasting interval time series using a fully complex-valued RBF neural network with DPSO and PSO algorithms. Information Sciences 305 (2015), 77–92.
DOI: DOI: https://doi.org/10.1016/j.ins.2015.01.029Google ScholarCross Ref
- [71] . 2016. What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems 33, 2 (2016), 456–481.
DOI: DOI: https://doi.org/10.1080/07421222.2016.1205907Google ScholarCross Ref
- [72] . 2018. DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Information Processing & Management 54, 4 (2018), 576–592.
DOI: DOI: http://dx.doi.org/10.1016/j.ipm.2018.03.007Google ScholarCross Ref
Index Terms
A Multi-type Classifier Ensemble for Detecting Fake Reviews Through Textual-based Feature Extraction
Recommendations
Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features
AbstractFraudulent online sellers often collude with reviewers to garner fake reviews for their products. This act undermines the trust of buyers in product reviews, and potentially reduces the effectiveness of online markets. Being able to accurately ...
Rough Ensemble Classifier: A Comparative Study
WILF '09: Proceedings of the 8th International Workshop on Fuzzy Logic and ApplicationsCombining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier (RSM). ...
Multi-layer heterogeneous ensemble with classifier and feature selection
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceDeep Neural Networks have achieved many successes when applying to visual, text, and speech information in various domains. The crucial reasons behind these successes are the multi-layer architecture and the in-model feature transformation of deep ...






Comments