skip to main content
research-article

Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites

Authors Info & Claims
Published:01 November 2018Publication History
Skip Abstract Section

Abstract

To ensure the post quality, Q&A sites usually develop a list of quality assurance guidelines for "dos and don'ts", and adopt collaborative editing mechanism to fix quality violations. Quality guidelines are mostly high-level principles, and many tacit and context-sensitive aspects of the expected quality cannot be easily enforced by a set of explicit rules. Collaborative editing is a reactive mechanism after low-quality posts have been posted. Our study of collaborative editing data on Stack Overflow suggests that tacit and context-sensitive quality-assurance knowledge is manifested in the editing patterns of large numbers of collaborative edits. Inspired by this observation, we develop and evaluate a Convolutional Neural Network based approach to learn editing patterns from historical post edits for predicting the need of editing a post. Our approach provides a proactive policy assurance mechanism that warns users potential quality issues in a post before it is posted.

References

  1. 2009. Answer technical questions helpfully. https://codeblog.jonskeet.uk/2009/02/17/ answering-technical-questions-helpfully/. (2009). Accessed: 2018-03-01.Google ScholarGoogle Scholar
  2. 2010. Write the perfect question. https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/. (2010). Accessed: 2018-03-01.Google ScholarGoogle Scholar
  3. 2017. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. (2017).Google ScholarGoogle Scholar
  4. 2018. Community norm. http://communitymgt.wikia.com/wiki/Community_norm. (2018). Accessed: 2018-06--20.Google ScholarGoogle Scholar
  5. 2018. Comparison of Q&A sites. https://en.wikipedia.org/wiki/Comparison_of_Q&A_sites. (2018). Accessed: 2018-06- 20.Google ScholarGoogle Scholar
  6. 2018. Deep Learning Tutorial. http://deeplearning.net/tutorial/deeplearning.pdf. (2018). Accessed: 2018-06--20.Google ScholarGoogle Scholar
  7. 2018. How do I ask a good question? https://stackoverflow.com/help/how-to-ask. (2018). Accessed: 2018-03-01.Google ScholarGoogle Scholar
  8. 2018. How do I write a good answer? https://stackoverflow.com/help/how-to-answer. (2018). Accessed: 2018-03-01.Google ScholarGoogle Scholar
  9. 2018. Thanks a Million, Jon Skeet! https://stackoverflow.blog/2018/01/15/thanks-million-jon-skeet/. (2018). Accessed: 2018-03-01.Google ScholarGoogle Scholar
  10. 2018. Training, test, and validation sets. https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets. (2018). Accessed: 2018-06--20.Google ScholarGoogle Scholar
  11. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting quality flaws in user-generated content: the case of wikipedia. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 981--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).Google ScholarGoogle Scholar
  15. Lasse Bergroth, Harri Hakonen, and Timo Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. IEEE, 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83--92.Google ScholarGoogle ScholarCross RefCross Ref
  19. Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 356--366.Google ScholarGoogle ScholarCross RefCross Ref
  20. Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites. PACMHCI 1, CSCW (2017), 32:1--32:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 744--755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. François Chollet et al. 2015. Keras. (2015).Google ScholarGoogle Scholar
  24. Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 69--78.Google ScholarGoogle Scholar
  25. Alexander Genkin, David D Lewis, and David Madigan. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3 (2007), 291--304.Google ScholarGoogle ScholarCross RefCross Ref
  26. Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 2: Short Papers. 427--431.Google ScholarGoogle Scholar
  27. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042--2050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady Wirawan Lauw, and Ba-Quy Vuong. 2007. Measuring article quality in wikipedia: models and evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In ICML, Vol. 99. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification.. In AAAI, Vol. 333. 2267--2273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jean Lave and Etienne Wenger. 1991. Situated learning: Legitimate peripheral participation. Cambridge university press.Google ScholarGoogle Scholar
  33. Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is It Good to Be Like Wikipedia?: Exploring the Trade-offs of Introducing Collaborative Editing Model to Q&A Sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1080--1091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yi Lian, Pengfei Liu, Hongyuan Huo, Hu Zhang, Tiejun Cui, and Peng Du. 2016. Inversion of FeO and TiO2 content using microwave radiance simulation based on Chang-E2 passive microwave radiometer data. In 2016 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2016, Beijing, China, July 10--15, 2016. 4319--4322.Google ScholarGoogle ScholarCross RefCross Ref
  35. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-task Learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI'16). AAAI Press, 2873--2879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest Q&A site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2857--2866. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  38. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. 2017. Feature Visualization. Distill 2, 11 (2017), e7.Google ScholarGoogle ScholarCross RefCross Ref
  40. Luca Ponzanelli, Andrea Mocci, Alberto Bacchelli, Michele Lanza, and David Fullerton. 2014. Improving low quality stack overflow post detection. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 541--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533.Google ScholarGoogle Scholar
  42. Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 959--962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11--21.Google ScholarGoogle ScholarCross RefCross Ref
  44. Besiki Stvilia, Michael B Twidale, Les Gasser, and Linda C Smith. 2005. Information quality discussions in Wikipedia. In Proceedings of the 2005 international conference on knowledge management. Citeseer, 101--113.Google ScholarGoogle Scholar
  45. Besiki Stvilia, Michael B Twidale, Linda C Smith, and Les Gasser. 2008. Information quality work organization in Wikipedia. Journal of the Association for Information Science and Technology 59, 6 (2008), 983--1001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. 2009. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23, 04 (2009), 687--719.Google ScholarGoogle ScholarCross RefCross Ref
  47. Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. 2016. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016).Google ScholarGoogle Scholar
  48. Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923 (2017).Google ScholarGoogle Scholar
  50. Wenpeng Yin and Hinrich Schütze. 2015. Convolutional neural network for paraphrase identification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901--911.Google ScholarGoogle ScholarCross RefCross Ref
  51. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!