Abstract
To effectively identify the influencing factors of the perceived usefulness of multimodal data in online reviews of tourism products, this article explores the optimization method of online tourism products based on user-generated content and conducts feature fusion of multimodal data in online reviews of tourism products from the perspective of data fusion analysis. Therefore, based on the word vector model, this article proposes a method to select the seed word set of emotion dictionary. In this method, emotional words are represented in vector form and the distance between word vectors is calculated to form the selection criteria and classification basis of seed word set, and then the sentiment dictionary of online review is formed by category judgment. This article takes the real online review data of tourism products as the research object, carries out descriptive statistical analysis, uses machine learning and deep learning methods, carries out text vector embedding and image content recognition, integrates image and text feature vector, constructs multimodal online review usefulness classification model, and conducts model test. The experimental results show that, compared with the single-mode reviews containing only text or pictures, the multimodal reviews combined with text and pictures can better predict the usefulness of online reviews, improve the quality of online reviews, give full play to the potential value of user-generated content, provide optimization ideas for product providers, and provide decision support for product consumers.
- S. M. Mudambi and D. Schuff. 2010. What makes a helpful review? A study of customer reviews on Amazon. Com. MIS Quart. 34, 1 (2010), 185–200. Google Scholar
Digital Library
- S. W. Sussman and W. S. Siegal. 2003. Informational influence in organizations: An integrated approach to knowledge adoption. Inf. Syst. Res. 14, 1 (2003), 47–65. Google Scholar
Digital Library
- A. Ghose and P. G. Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Trans. Knowl. Data Eng. 23, 10 (2011), 1498–1512. Google Scholar
Digital Library
- N. Korfiatis, E. García-Bariocanal, and S. Sánchez-Alonso. 2012. Evaluating content quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content. Electron. Commerce Res. Applic. 11, 3 (2012), 205–217. Google Scholar
Digital Library
- Y. F. Ma, Z. Xiang, Q. Z. Du, et al. 2018. Effects of user-provided photos on hotel review helpfulness: An analytical approach with deep learning. Int. J. Hosp. Manag. 71 (2018), 120–131.Google Scholar
Cross Ref
- T. Baltrušaitis, C. Ahuja, and L. P. Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2019), 423–443. Google Scholar
Digital Library
- R. Bernardi, R. Cakici, D. Elliott, et al. 2016. Automatic description generation from images: A survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, (2016), 409–442. Google Scholar
Digital Library
- O. Vinyals, A. Toshev, S. Bengio, et al. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3156–3164.Google Scholar
Cross Ref
- Y. Song, S. M. Shi, J. Li, et al. 2018. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 175-180.Google Scholar
Cross Ref
- A. Vaswani, N. Shazeer, N. Parmar, et al. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- J. Devlin, M. W. Chang, K. Lee, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arxiv.org/abs/1810.04805.Google Scholar
- D. H. Hubel and T. N. Wiesel. 1968. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 1 (1968), 215–243.Google Scholar
Cross Ref
- F. Yang, X. Peng, G. Ghosh, et al. 2019. Exploring deep multimodal fusion of text and photo for hate speech classification. In Proceedings of the 3rd Workshop on Abusive Language Online. 11–18.Google Scholar
Cross Ref
- Y. D. Zhang, Z. Dong, S. H. Wang, et al. 2020. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inf. Fus. 64 (2020), 149–187.Google Scholar
Cross Ref
- M. Angelou, V. Solachidis, N. Vretos, et al. 2019. Graph-based multimodal fusion with metric learning for multimodal classification. Pattern Recog. 95 (2019), 296–307.Google Scholar
Cross Ref
- L. A. Maglanoc, T. Kaufmann, R. Jonassen, et al. 2020. Multimodal fusion of structural and functional brain imaging in depression using linked independent component analysis. Hum. Brain Map. 41, 1 (2020), 241–255.Google Scholar
Cross Ref
- S. Nemati, R. Rohani, M. E. Basiri, et al. 2019. A hybrid latent space data fusion method for multimodal emotion recognition. IEEE Access 7 (2019), 172948–172964.Google Scholar
Cross Ref
- K. M. He, X. Y. Zhang, S. Q. Ren, et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770–778.Google Scholar
Cross Ref
- G. Huang, Z. Liu, L. van der Maaten, et al. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- C. Szegedy, V. Vanhoucke, S. Ioffe, et al. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2818–2826.Google Scholar
Cross Ref
- F. N. Iandola, S. Han, M. W. Moskewicz, et al. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. Retrieved from https://arxiv.org/abs/1602.07360.Google Scholar
- Y. C. Sun, X. J. Dong, and S. McIntyre. 2017. Motivation of user-generated content: Social connectedness moderates the effects of monetary rewards. Market. Sci. 36, 3 (2017), 329–337. Google Scholar
Digital Library
- O. Toubia and A. T. Stephen. 2013. Intrinsic vs. image-related utility in social media: Why do people contribute content to Twitter? Market. Sci. 32, 3 (2013), 368–392. Google Scholar
Digital Library
- S. Baccianella, A. Esuli, and F. Sebastiani. 2010. SentiWordNet 3. 0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation. 83–90.Google Scholar
- Y. Yuan, L. He, L. Peng, et al. 2014. A new study based on Word2vec and cluster for document categorization. J. Compu. Inf. Syst. 10, 21 (2014), 9301–9308.Google Scholar
- G. Wang and K. Araki. 2007. Modifying SO-PMI for Japanese Weblog opinion mining by using a balancing factor and detecting neutral expressions. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 189–192. Google Scholar
Digital Library
- Aimin Yang, Jianghao Lin, Yongmei Zhou, et al. 2013. Research on building a Chinese sentiment lexicon based on SO-PMI. App. Mech. Mater. 263–266 (2013), 1688–1693.Google Scholar
- T. Mikolov, Kai Chen, G. Corrado, et al. 2013. Efficient estimation of word representations in vector space. Retrieved from https://arxiv.org/abs/1301. 3781.Google Scholar
- M. Giatsoglou, M. G. Vozalis, K. Diamantaras, et al. 2017. Sentiment analysis leveraging emotions and word embeddings. Exp. Syst. Applic. 69, 3 (2017), 214–224.Google Scholar
Cross Ref
- S. Mai, H. Hu, and S. Xing. 2020. Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion. In Proceedings of the AAAI Conference on Artificial Intelligence. 164–172.Google Scholar
Index Terms
Research on Extraction of Useful Tourism Online Reviews Based on Multimodal Feature Fusion
Recommendations
Research on Feature Extraction and Multimodal Fusion of Video Caption Based on Deep Learning
ICMSS 2020: Proceedings of the 2020 4th International Conference on Management Engineering, Software Engineering and Service SciencesVideo Caption shows the objects, attributes and their relationship in natural language. It has been a very challenging research topic in the field of computer and multimedia. In this paper, the method of deep learning is used to extract the video frame ...
Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
AbstractEmotion is a key element in video data. However, it is difficult to understand the emotions conveyed in such videos due to the sparsity of video frames expressing emotion. Meanwhile, some approaches proposed to consider utterances as independent ...
A method towards biometric feature fusion
For multimodal biometric person recognition, information fusion can be classified into several levels: rank, decision, sensor, feature and match-score levels. In this paper, a novel method is proposed to fuse information from two or more biometric ...






Comments