skip to main content
research-article

Adversarial Cross-domain Community Question Retrieval

Authors Info & Claims
Published:10 January 2022Publication History
Skip Abstract Section

Abstract

Community Q&A forum is a special type of social media that provides a platform to raise questions and to answer them (both by forum participants), to facilitate online information sharing. Currently, community Q&A forums in professional domains have attracted a large number of users by offering professional knowledge. To support information access and save users’ efforts of raising new questions, they usually come with a question retrieval function, which retrieves similar existing questions (and their answers) to a user’s query. However, it can be difficult for community Q&A forums to cover all domains, especially those emerging lately with little labeled data but great discrepancy from existing domains. We refer to this scenario as cross-domain question retrieval. To handle the unique challenges of cross-domain question retrieval, we design a model based on adversarial training, namely, X-QR, which consists of two modules—a domain discriminator and a sentence matcher. The domain discriminator aims at aligning the source and target data distributions and unifying the feature space by domain-adversarial training. With the assistance of the domain discriminator, the sentence matcher is able to learn domain-consistent knowledge for the final matching prediction. To the best of our knowledge, this work is among the first to investigate the domain adaption problem of sentence matching for community Q&A forums question retrieval. The experiment results suggest that the proposed X-QR model offers better performance than conventional sentence matching methods in accomplishing cross-domain community Q&A tasks.

REFERENCES

  1. [1] Agirre Eneko and Lacalle Oier Lopez de. 2009. Supervised domain adaption for WSD. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL’09), Lascarides Alex, Gardent Claire, and Nivre Joakim (Eds.). ACL, 4250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Blitzer John, McDonald Ryan T., and Pereira Fernando. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). ACL, 120128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bromley Jane, Bentz James W., Bottou Léon, Guyon Isabelle, LeCun Yann, Moore Cliff, Säckinger Eduard, and Shah Roopak. 1993. Signature verification using A “Siamese” time delay neural network. Int. J. Pattern Recogn. Artif. Intell. 7, 4 (1993), 669688.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Cai Li, Zhou Guangyou, Liu Kang, and Zhao Jun. 2011. Learning the latent topics for question retrieval in community QA. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP’11). ACL, 273281.Google ScholarGoogle Scholar
  5. [5] Chen Long, Jose Joemon M., Yu Haitao, Yuan Fajie, and Zhang Dell. 2016. A semantic graph-based topic model for question retrieval in community question answering. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining, Bennett Paul N., Josifovski Vanja, Neville Jennifer, and Radlinski Filip (Eds.). ACM, 287296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Qian, Ling Zhen-Hua, and Zhu Xiaodan. 2018. Enhancing sentence embedding with generalized pooling. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). ACL, 18151826.Google ScholarGoogle Scholar
  7. [7] Chen Qian, Zhu Xiaodan, Ling Zhen-Hua, Wei Si, Jiang Hui, and Inkpen Diana. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). ACL, 16571668.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Zheqian, Zhang Chi, Zhao Zhou, Yao Chengwei, and Cai Deng. 2018. Question retrieval for community-based question answering via heterogeneous social influential network. Neurocomputing 285 (2018), 117124.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Das Arpita, Yenala Harish, Chinnakotla Manoj Kumar, and Shrivastava Manish. 2016. Together we stand: Siamese networks for similar question retrieval. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). ACL.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). ACL, 41714186.Google ScholarGoogle Scholar
  11. [11] Du Chunning, Sun Haifeng, Wang Jingyu, Qi Qi, and Liao Jianxin. 2020. Adversarial and domain-aware BERT for cross-domain sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). ACL, 40194028.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Duan Huizhong, Cao Yunbo, Lin Chin-Yew, and Yu Yong. 2008. Searching questions by identifying question topic and question focus. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL’08: HLT). ACL, 156164. Retrieved from https://www.aclweb.org/anthology/P08-1019.Google ScholarGoogle Scholar
  13. [13] Eaton Eric, desJardins Marie, and Lane Terran. 2008. Modeling transfer relationships between learning tasks for improved inductive transfer. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’08)(Lecture Notes in Computer Science, Vol. 5211), Daelemans Walter, Goethals Bart, and Morik Katharina (Eds.). Springer, 317332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, and Lempitsky Victor S.. 2016. Domain-Adversarial training of neural networks. J. Mach. Learn. Res. 17 (2016), 59:1–59:35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gong Yichen, Luo Heng, and Zhang Jian. 2018. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.Google ScholarGoogle Scholar
  16. [16] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2015. Explaining and harnessing adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  17. [17] Jeon Jiwoon, Croft W. Bruce, and Lee Joon Ho. 2005. Finding similar questions in large question and answer archives. In Proceedings of the ACM CIKM International Conference on Information and Knowledge Management, Herzog Otthein, Schek Hans-Jörg, Fuhr Norbert, Chowdhury Abdur, and Teiken Wilfried (Eds.). ACM, 8490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Ji Zongcheng, Xu Fei, Wang Bin, and He Ben. 2012. Question-answer topic model for question retrieval in community question answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), Chen Xue-wen, Lebanon Guy, Wang Haixun, and Zaki Mohammed J. (Eds.). ACM, 24712474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jiang Jing and Zhai ChengXiang. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). ACL.Google ScholarGoogle Scholar
  20. [20] Jing Baoyu, Lu Chenwei, Wang Deqing, Zhuang Fuzhen, and Niu Cheng. 2018. Cross-Domain labeled LDA for cross-domain text classification. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE Computer Society, 187196.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Karan Mladen, Zmak Lovro, and Snajder Jan. 2013. Frequently asked questions retrieval for Croatian based on semantic textual similarity. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing ([email protected]’13), Piskorski Jakub, Pivovarova Lidia, Tanev Hristo, and Yangarber Roman (Eds.). ACL, 2433.Google ScholarGoogle Scholar
  22. [22] Li Linyang and Qiu Xipeng. 2020. TextAT: Adversarial training for natural language understanding with token-level perturbation. Retrieved from https://arxiv.org/abs/2004.14543.Google ScholarGoogle Scholar
  23. [23] Liang Tao, Wang Wenya, and Lv Fengmao. 2020. Weakly-supervised domain adaption for aspect extraction via multi-level interaction transfer. Retrieved from https://arxiv.org/abs/2006.09235.Google ScholarGoogle Scholar
  24. [24] Liao Xuejun, Xue Ya, and Carin Lawrence. 2005. Logistic regression with an auxiliary data source. In Proceedings of the 22nd International Conference (ICML’05)(ACM International Conference Proceeding Series, Vol. 119), Raedt Luc De and Wrobel Stefan (Eds.). ACM, 505512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Liu Miaofeng, Han Jialong, Zhang Haisong, and Song Yan. 2018. Domain adaptation for disease phrase matching with adversarial networks. In Proceedings of the Biomedical Natural Language Processing Workshop (BioNLP’18). ACL, 137141.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] McClosky David, Charniak Eugene, and Johnson Mark. 2006. Effective self-training for parsing. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Gregory S., and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 31113119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Miyato Takeru, Maeda Shin Ichi, Koyama Masanori, Nakae Ken, and Ishii Shin. 2015. Distributional smoothing with virtual adversarial training. Retrieved from https://arxiv.org/abs/1507.00677.Google ScholarGoogle Scholar
  29. [29] Mueller Jonas and Thyagarajan Aditya. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 27862792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Othman Nouha, Faiz Rim, and Smaïli Kamel. 2019. Manhattan siamese LSTM for question retrieval in community question answering. In Proceedings of the On the Move to Meaningful Internet Systems (OTM’19) Confederated International Conferences: CoopIS, ODBASE, and C&TC(Lecture Notes in Computer Science, Vol. 11877), Panetto Hervé, Debruyne Christophe, Hepp Martin, Lewis Dave, Ardagna Claudio Agostino, and Meersman Robert (Eds.). Springer, 661677.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Pan Jianhan, Cui Teng, Le Thuc Duy, Li Xiaomei, and Zhang Jing. 2020. Multi-Group transfer learning on multiple latent spaces for text classification. IEEE Access 8 (2020), 6412064130.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Pang Ning, Zhao Xiang, Wang Wei, Xiao Weidong, and Guo Deke. 2021. Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64, 3 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Peng Minlong, Zhang Qi, Jiang Yu-Gang, and Huang Xuanjing. 2018. Cross-Domain sentiment classification with target domain specific information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). ACL, 25052513.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). ACL, 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Walker Marilyn A., Ji Heng, and Stent Amanda (Eds.). ACL, 22272237.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Pontes Elvys Linhares, Huet Stéphane, Linhares Andréa Carneiro, and Torres-Moreno Juan-Manuel. 2018. Predicting the semantic textual similarity with siamese CNN and LSTM. In Proceedings of the Actes de la Conférence TALN (CORIA-TALN-RJC’18). ATALA, 311320.Google ScholarGoogle Scholar
  37. [37] Sakata Wataru, Shibata Tomohide, Tanaka Ribeka, and Kurohashi Sadao. 2019. FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19), Piwowarski Benjamin, Chevalier Max, Gaussier Éric, Maarek Yoelle, Nie Jian-Yun, and Scholer Falk (Eds.). ACM, 11131116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Steedman Mark, Hwa Rebecca, Clark Stephen, Osborne Miles, Sarkar Anoop, Hockenmaier Julia, Ruhlen Paul, Baker Steven, and Crim Jeremiah. 2003. Example selection for bootstrapping statistical parsers. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03). ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Szegedy Christian, Zaremba Wojciech, Sutskever Ilya, Bruna Joan, Erhan Dumitru, Goodfellow Ian J., and Fergus Rob. 2014. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14).Google ScholarGoogle Scholar
  40. [40] Tseng Wen-Ting, Lo Tien-Hong, Hsu Yung-Chang, and Chen Berlin. 2020. Effective FAQ retrieval and question matching with unsupervised knowledge injection. Retrieved from https://arxiv.org/abs/2010.14049.Google ScholarGoogle Scholar
  41. [41] Wang Alex, Singh Amanpreet, Michael Julian, Hill Felix, Levy Omer, and Bowman Samuel R.. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google ScholarGoogle Scholar
  42. [42] Wang Shuohang and Jiang Jing. 2017. A compare-aggregate model for matching text sequences. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.Google ScholarGoogle Scholar
  43. [43] Wang Zhiguo, Hamza Wael, and Florian Radu. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). ijcai.org, 41444150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Zhiguo, Mi Haitao, Hamza Wael, and Florian Radu. 2016. Multi-Perspective context matching for machine comprehension. Retrieved from https://arxiv.org/abs/1612.04211.Google ScholarGoogle Scholar
  45. [45] Williams Adina, Nangia Nikita, and Bowman Samuel R.. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Walker Marilyn A., Ji Heng, and Stent Amanda (Eds.). ACL, 11121122.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Xue Xiaobing, Jeon Jiwoon, and Croft W. Bruce. 2008. Retrieval models for question and answer archives. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08), Myaeng Sung-Hyon, Oard Douglas W., Sebastiani Fabrizio, Chua Tat-Seng, and Leong Mun-Kew (Eds.). ACM, 475482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Yin Wenpeng, Schütze Hinrich, Xiang Bing, and Zhou Bowen. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4 (2016), 259272.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zhang Kai, Wu Wei, Wu Haocheng, Li Zhoujun, and Zhou Ming. 2014. Question retrieval with high quality answers in community question answering. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14), Li Jianzhong, Wang Xiaoyang Sean, Garofalakis Minos N., Soboroff Ian, Suel Torsten, and Wang Min (Eds.). ACM, 371380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Zhang Minghua and Wu Yunfang. 2018. An unsupervised model with attention autoencoders for question retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), McIlraith Sheila A. and Weinberger Kilian Q. (Eds.). AAAI Press, 49784986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhou Guangyou, Cai Li, Zhao Jun, and Liu Kang. 2011. Phrase-Based translation model for question retrieval in community question answer archives. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Lin Dekang, Matsumoto Yuji, and Mihalcea Rada (Eds.). ACL, 653662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhou Guangyou, He Tingting, Zhao Jun, and Hu Po. 2015. Learning continuous word embedding with metadata for question retrieval in community question answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15). ACL, 250259.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhou Guangyou and Huang Jimmy Xiangji. 2017. Modeling and learning distributed word representation with metadata for question retrieval. IEEE Trans. Knowl. Data Eng. 29, 6 (2017), 12261239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhou Guangyou, Liu Yang, Liu Fang, Zeng Daojian, and Zhao Jun. 2013. Improving question retrieval in community question answering using world knowledge. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), Rossi Francesca (Ed.). IJCAI/AAAI, 22392245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Zhu Chen, Cheng Yu, Gan Zhe, Sun Siqi, Goldstein Tom, and Liu Jingjing. 2020. FreeLB: Enhanced adversarial training for natural language understanding. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.Google ScholarGoogle Scholar

Index Terms

  1. Adversarial Cross-domain Community Question Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 3
      May 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3505182
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 January 2022
      • Accepted: 1 September 2021
      • Revised: 1 August 2021
      • Received: 1 July 2021
      Published in tallip Volume 21, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!