Abstract
In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.
- Krisztian Balog, Toine Bogers, Leif Azzopardi, Maarten De Rijke, and Antal Van Den Bosch. 2007. Broad expertise retrieval in sparse data environments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 551–558. Google Scholar
Digital Library
- Krisztian Balog and Maarten De Rijke. 2007. Determining expert profiles (with an application to expert finding). In Proceedings of the 20th International Joint Conference on Artifical Intelligence. Vol. 7, 2657–2662. Google Scholar
Digital Library
- Rianne van den Berg, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. Sylvester normalizing flows for variational inference. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI’18). Association For Uncertainty in Artificial Intelligence (AUAI), 393–402.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993–1022. Google Scholar
Digital Library
- Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating sentences from a continuous space. In Proceedings of the Conference on Computational Natural Language Learning.Google Scholar
- Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, and Yongdong Zhang. 2019. Semi-supervised user profiling with heterogeneous graph attention networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2116–2122. Google Scholar
Digital Library
- Yifan Chen, Yang Wang, Xiang Zhao, Hongzhi Yin, Ilya Markov, and Maarten De Rijke. 2020. Local variational feature-based similarity models for recommending top-N new items. ACM Transactions on Information Systems 38, 2, Article 12 (Feb. 2020), 33 pages. Google Scholar
Digital Library
- Nick Craswell, Arjen P. de Vries, and Ian Soboroff. 2005. Overview of the TREC 2005 enterprise track. In Proceedings of the Text Retrieval Conference. Vol. 5, 1–7.Google Scholar
- W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Vol. 520. Addison-Wesley Reading. Google Scholar
Digital Library
- Ali Daud. 2012. Using time topic modeling for semantics-based dynamic research interest finding. Knowledge-Based Systems 26 (2012), 154–163. Google Scholar
Digital Library
- Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, and Jakub M. Tomczak. 2018. Hyperspherical variational auto-encoders. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence. 856–865.Google Scholar
- Maarten De Rijke, Krisztian Balog, Toine Bogers, and Antal Van Den Bosch. 2010. On the evaluation of entity profiles. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages. 94–99. Google Scholar
Digital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics.Google Scholar
- Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2019. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics 8 (2020), 439–453.Google Scholar
- Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density estimation using real NVP. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In Proceedings of the Advances in Neural Information Processing Systems. 658–666. Google Scholar
Digital Library
- Hui Fang and ChengXiang Zhai. 2007. Probabilistic models for expert finding. In Proceedings of the 29th European Conference on IR Research. 418–430. Google Scholar
Digital Library
- Yi Fang and Archana Godavarthy. 2014. Modeling the dynamics of personal expertise. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 1107–1110. Google Scholar
Digital Library
- Golnoosh Farnadi, Lise Getoor, Marie-Francine Moens, and Martine De Cock. 2020. User profiling using hinge-loss Markov random fields. arxiv:2001.01177. Retrieved from https://arxiv.org/abs/2001.01177.Google Scholar
- Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. 2015. Made: Masked autoencoder for distribution estimation. In Proceedings of the International Conference on Machine Learning. 881–889. Google Scholar
Digital Library
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 315–323.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 855–864. Google Scholar
Digital Library
- Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, and Dawei Yin. 2020. Hierarchical user profiling for e-commerce recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining. 223–231. Google Scholar
Digital Library
- Ralf Herbrich. 2000. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers 88, 2 (2000), 115–132.Google Scholar
- Emiel Hoogeboom, Rianne van den Berg, and Max Welling. 2019. Emerging convolutions for generative normalizing flows. In Proceedings of the International Conference on Machine Learning. PMLR, 2771–2780.Google Scholar
- Renjun Hu, Charu C. Aggarwal, Shuai Ma, and Jinpeng Huai. 2016. An embedding approach to anomaly detection. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering. 385–396.Google Scholar
Cross Ref
- Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed attributed network embedding. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 731–739. Google Scholar
Digital Library
- Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, and Tao Mei. 2015. Author topic model-based collaborative filtering for personalized POI recommendations. IEEE Transactions on Multimedia 17, 6 (2015), 907–918.Google Scholar
Cross Ref
- Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 427–431.Google Scholar
Cross Ref
- Wei-Chen Kao, Duen-Ren Liu, and Shiu-Wen Wang. 2010. Expert finding in question-answering websites: A novel hybrid approach. In Proceedings of the 2010 ACM Symposium on Applied Computing. 867–871. Google Scholar
Digital Library
- Maryam Karimzadehgan, Ryen W. White, and Matthew Richardson. 2009. Enhancing expert finding using organizational hierarchies. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. 177–188. Google Scholar
Digital Library
- Youngwoo Kim, Myungha Jang, and James Allan. 2020. Explaining text matching on neural natural language inference. ACM Transactions on Information Systems 38, 4 (2020), 1–23. Google Scholar
Digital Library
- Durk P. Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 10215–10224.Google Scholar
- Durk P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 3581–3589. Google Scholar
Digital Library
- Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improved variational inference with inverse autoregressive flow. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 4743–4751. Google Scholar
Digital Library
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (2013).Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.Google Scholar
- Rémi Lebret and Ronan Collobert. 2013. Word emdeddings through hellinger PCA. In Proceedings of the European Chapter of the Association for Computational Linguistics.Google Scholar
- Jiwei Li, Alan Ritter, and Eduard Hovy. 2014. Weakly supervised user profile extraction from Twitter. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 165–174.Google Scholar
Cross Ref
- Rui Li, Chi Wang, and Kevin Chen-Chuan Chang. 2014. User profiling in an ego network: Co-profiling attributes and relationships. In Proceedings of the 23rd International Conference on World Wide Web. 819–830. Google Scholar
Digital Library
- Shangsong Liang. 2018. Dynamic user profiling for streams of short texts. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Shangsong Liang. 2019. Collaborative, dynamic and diversified user profiling. Proceedings of the AAAI Conference on Artificial Intelligence 33, 1 (2019), 4269–4276.Google Scholar
Cross Ref
- Shangsong Liang and Maarten de Rijke. 2016. Formal language models for finding groups of experts. Information Processing & Management 52, 4 (2016), 529–549. Google Scholar
Digital Library
- Shangsong Liang, Emine Yilmaz, and Evangelos Kanoulas. 2016. Dynamic clustering of streaming short documents. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 995–1004. Google Scholar
Digital Library
- S. Liang, E. Yilmaz, and E. Kanoulas. 2019. Collaboratively tracking interests for user clustering in streams of short texts. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2019), 257–272. Google Scholar
Digital Library
- Shangsong Liang, Xiangliang Zhang, Zhaochun Ren, and Evangelos Kanoulas. 2018. Dynamic embeddings for user profiling in Twitter. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1764–1773. Google Scholar
Digital Library
- Yupeng Luo, Shangsong Liang, and Zaiqiao Meng. 2019. Constrained co-embedding model for user profiling in question answering communities. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 439–448. Google Scholar
Digital Library
- Tong Man, Huawei Shen, Shenghua Liu, Xiaolong Jin, and Xueqi Cheng. 2016. Predict anchor links across social networks via an embedding approach. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. Vol. 16, 1823–1829. Google Scholar
Digital Library
- Zaiqiao Meng, Shangsong Liang, Hongyan Bao, and Xiangliang Zhang. 2019. Co-embedding attributed networks. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 393–401. Google Scholar
Digital Library
- Zaiqiao Meng, Shangsong Liang, Jinyuan Fang, and Teng Xiao. 2019. Semi-supervisedly co-embedding attributed networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 6507–6516. Google Scholar
Digital Library
- Zaiqiao Meng, Shangsong Liang, Xiangliang Zhang, Richard McCreadie, and Iadh Ounis. 2020. Jointly learning representations of nodes and attributes for attributed networks. ACM Transactions on Information Systems 38, 2 (2020), 1–32. Google Scholar
Digital Library
- Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the 33rd International Conference on International Conference on Machine Learning. 1727–1736. Google Scholar
Digital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems. 3111–3119. Google Scholar
Digital Library
- George Papamakarios, Theo Pavlakou, and Iain Murray. 2017. Masked autoregressive flow for density estimation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 2338–2347. Google Scholar
Digital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google Scholar
Cross Ref
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 701–710. Google Scholar
Digital Library
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
Cross Ref
- Desislava Petkova and W. Bruce Croft. 2008. Hierarchical language models for expert finding in enterprise corpora. International Journal on Artificial Intelligence Tools 17, 01 (2008), 5–18.Google Scholar
Cross Ref
- Ryan Prenger, Rafael Valle, and Bryan Catanzaro. 2019. Waveglow: A flow-based generative network for speech synthesis. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. 3617–3621.Google Scholar
Cross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Retrieved on 20 Aug 2019 from https://www.cs.ubc.ca/amuham01/LING530/papers/radford2018improving.pdf.Google Scholar
- Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. International conference on machine learning. Google Scholar
Digital Library
- Karl Ridgeway and Michael C. Mozer. 2018. Learning deep disentangled embeddings with the f-statistic loss. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 185–194. Google Scholar
Digital Library
- Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 487–494. Google Scholar
Digital Library
- Jan Rybak, Krisztian Balog, and Kjetil Nørvåg. 2014. Temporal expertise profiling. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval. 540–546.Google Scholar
Cross Ref
- Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 175–180.Google Scholar
Cross Ref
- Xu Sun, Houfeng Wang, and Wenjie Li. 2012. Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers. 253–262. Google Scholar
Digital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067–1077. Google Scholar
Digital Library
- Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, and Ben Poole. 2019. Discrete flows: Invertible generative models of discrete data. Advances in Neural Information Processing Systems 32 (2019), 14719–14728. Google Scholar
Digital Library
- Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016. Max-margin deepwalk: Discriminative learning of network representation. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. Vol. 2016, 3889–3895. Google Scholar
Digital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference. 2022–2032. Google Scholar
Digital Library
- Joshua F. Wiley. 2016. R Deep Learning Essentials. Packt Publishing Ltd.Google Scholar
- Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng. 2020. Dual-factor generation model for conversation. ACM Transactions on Information Systems 38, 3 (2020), 1–31. Google Scholar
Digital Library
- Yukun Zhao, Shangsong Liang, Zhaochun Ren, Jun Ma, Emine Yilmaz, and Maarten de Rijke. 2016. Explainable user clustering in short text streams. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 155–164. Google Scholar
Digital Library
Index Terms
- Profiling Users for Question Answering Communities via Flow-Based Constrained Co-Embedding Model
Recommendations
Constrained Co-embedding Model for User Profiling in Question Answering Communities
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementIn this paper, we study the problem of user profiling in question answering communities. We address the problem by proposing a constrained co-embedding model (CCEM). CCEM jointly infers the embeddings of both users and words in question answering ...
User Profiling for Policy Management in Social Communities
COMPSAC '12: Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications ConferenceUser profiles are personal images of social community users. Users store and share their documents and express themselves with their personal information. In social community, user may also need to describe herself with more than one image and more than ...
User Profiling for University Recommender System Using Automatic Information Retrieval
User Profiling is the process of Extracting, Integrating and Identifying the keyword based information to generate a structured Profile and then visualizing the knowledge out of these findings. User profiling helps personalizing a system to work ...





Comments