skip to main content
research-article

Unsupervised Extraction of Popular Product Attributes from E-Commerce Web Sites by Considering Customer Reviews

Authors Info & Claims
Published:15 April 2016Publication History
Skip Abstract Section

Abstract

We develop an unsupervised learning framework for extracting popular product attributes from product description pages originated from different E-commerce Web sites. Unlike existing information extraction methods that do not consider the popularity of product attributes, our proposed framework is able to not only detect popular product features from a collection of customer reviews but also map these popular features to the related product attributes. One novelty of our framework is that it can bridge the vocabulary gap between the text in product description pages and the text in customer reviews. Technically, we develop a discriminative graphical model based on hidden Conditional Random Fields. As an unsupervised model, our framework can be easily applied to a variety of new domains and Web sites without the need of labeling training samples. Extensive experiments have been conducted to demonstrate the effectiveness and robustness of our framework.

References

  1. Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio. 2010. Acquisition of instance attributes via labeled and related instances. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 58--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lidong Bing, Wai Lam, and Yuan Gu. 2011. Towards a unified solution: Data record region detection and segmentation. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY, 1265--1274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lidong Bing, Wai Lam, and Tak-Lam Wong. 2013. Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, New York, NY, USA, 567--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lidong Bing, Tak-Lam Wong, and Wai Lam. 2012. Unsupervised extraction of popular product attributes from web sites. In Proceedings of the 8th Asia Information Retrieval Societies Conference. 437--446.Google ScholarGoogle ScholarCross RefCross Ref
  5. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kenneth Bloom, Navendu Garg, and Shlomo Argamon. 2007. Extracting appraisal expressions. In Proceedings of Human Language Technologies/North American Association of Computational Linguists. Association for Computational Linguistics, Rochester, New York, 308--315.Google ScholarGoogle Scholar
  7. D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. 2004. Block-based web search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 456--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xiaowen Ding, Bing Liu, and Lei Zhang. 2009. Entity discovery and assignment for opinion mining applications. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1125--1134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378--382.Google ScholarGoogle ScholarCross RefCross Ref
  10. Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, and Andrew Fano. 2006. Text mining for product attribute extraction. SIGKDD Explor. Newslett. 8, 1 (2006), 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Guo, H. Zhu, Z. Guo, Z. Zhang, and Z. Su. 2009. Product feature categorization with multilevel latent semantic association. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1087--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Minqing Hu and Bing Liu. 2004a. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Minqing Hu and Bing Liu. 2004b. Mining opinion features in customer reviews. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI’04). 755--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Kenji Tateishi, and Toshikazu Fukushima. 2004. Collecting evaluative expressions for opinion extraction. In Proceedings of the International Joint Conference on Natural Language Processing. 584--589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of 18th International Conference on Machine Learning. 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiao Li, Ye-Yi Wang, and Alex Acero. 2009. Extracting structured information from user queries with semi-supervised conditional random fields. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 572--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bing Liu, Robert Grossman, and Yanhong Zhai. 2003. Mining data records in web pages. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, New York, NY, USA, 601--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web. ACM, New York, NY, USA, 342--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, and Zhongzhi Shi. 2009. Towards combining web classification and web information extraction: A case study. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1235--1244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ana-Maria Popescu and Oren Etzioni. 2005. Extracting product features and opinions from reviews. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 339--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Probst, M. Krema R. Ghai, A. Fano, and Y. Liu. 2007. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2838--2843. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Changqin Quan and Fuji Ren. 2014. Unsupervised product feature extraction for feature-oriented opinion determination. Inf. Sci. 272 (2014), 16--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. 2007. Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10) (2007), 1848--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xinying Song, Jing Liu, Yunbo Cao, Chin-Yew Lin, and Hsiao-Wuen Hon. 2010. Automatic extraction of web data records containing user-generated content. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y.-H. Sung and D. Jurafsky. 2009. Hidden conditional random fields for phone recognition. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA, 107--112.Google ScholarGoogle Scholar
  26. Huifeng Tang, Songbo Tan, and Xueqi Cheng. 2009. A survey on sentiment detection of reviews. Expert Syst. Appl. 36 (September 2009), 10760--10773. Issue 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th International Conference on World Wide Web. ACM, New York, NY, USA, 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Peter D. Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hongning Wang, Yue Lu, and Chengxiang Zhai. 2010. Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 783--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tak-Lam Wong, Lidong Bing, and Wai Lam. 2011. Normalizing web product attributes and discovering domain ontology with minimal effort. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM’11). ACM, New York, NY, 805--814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tak-Lam Wong and W. Lam. 2007. Adapting web information extraction knowledge via mining site invariant and site depdent features. ACM Trans. Internet Technol. 7(1) (2007), Article 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tak-Lam Wong, W. Lam, and T. S. Wong. 2008. An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 35--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liheng Xu, Kang Liu, Siwei Lai, and Jun Zhao. 2014. Product feature mining: Semantic clues versus syntactic constituents. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22--27, 2014, Baltimore, MD, USA, Volume 1: Long Papers. 336--346.Google ScholarGoogle ScholarCross RefCross Ref
  34. Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, and Ji-Rong Wen. 2010. Closing the loop in webpage understanding. IEEE Trans. Knowledge Data Eng. 22 (May 2010), 639--650. Issue 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yanhong Zhai and Bing Liu. 2006. Structured data extraction from the web based on partial tree alignment. IEEE Trans. Knowledge Data Eng. 18(12) (2006), 1614--1628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain. 2010. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 1462--1470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xin Wayne Zhao, Yanwei Guo, Yulan He, Han Jiang, Yuexin Wu, and Xiaoming Li. 2014. We know what you want to buy: A demographic-based system for product recommendation on microblogs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 1935--1944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shuyi Zheng, Ruihua Song, Ji-Rong Wen, and C. Lee Giles. 2009. Efficient record-level wrapper induction. In Proceeding of the 18th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Zhu, Z. Nie, B. Zhang, and J.-R. Wen. 2008. Dynamic hierarchical Markov random fields for integrated web data extraction. J. Mach. Learn. Res. (2008), 1583--1614. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unsupervised Extraction of Popular Product Attributes from E-Commerce Web Sites by Considering Customer Reviews

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!