Abstract
To ensure the post quality, Q&A sites usually develop a list of quality assurance guidelines for "dos and don'ts", and adopt collaborative editing mechanism to fix quality violations. Quality guidelines are mostly high-level principles, and many tacit and context-sensitive aspects of the expected quality cannot be easily enforced by a set of explicit rules. Collaborative editing is a reactive mechanism after low-quality posts have been posted. Our study of collaborative editing data on Stack Overflow suggests that tacit and context-sensitive quality-assurance knowledge is manifested in the editing patterns of large numbers of collaborative edits. Inspired by this observation, we develop and evaluate a Convolutional Neural Network based approach to learn editing patterns from historical post edits for predicting the need of editing a post. Our approach provides a proactive policy assurance mechanism that warns users potential quality issues in a post before it is posted.
- 2009. Answer technical questions helpfully. https://codeblog.jonskeet.uk/2009/02/17/ answering-technical-questions-helpfully/. (2009). Accessed: 2018-03-01.Google Scholar
- 2010. Write the perfect question. https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/. (2010). Accessed: 2018-03-01.Google Scholar
- 2017. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. (2017).Google Scholar
- 2018. Community norm. http://communitymgt.wikia.com/wiki/Community_norm. (2018). Accessed: 2018-06--20.Google Scholar
- 2018. Comparison of Q&A sites. https://en.wikipedia.org/wiki/Comparison_of_Q&A_sites. (2018). Accessed: 2018-06- 20.Google Scholar
- 2018. Deep Learning Tutorial. http://deeplearning.net/tutorial/deeplearning.pdf. (2018). Accessed: 2018-06--20.Google Scholar
- 2018. How do I ask a good question? https://stackoverflow.com/help/how-to-ask. (2018). Accessed: 2018-03-01.Google Scholar
- 2018. How do I write a good answer? https://stackoverflow.com/help/how-to-answer. (2018). Accessed: 2018-03-01.Google Scholar
- 2018. Thanks a Million, Jon Skeet! https://stackoverflow.blog/2018/01/15/thanks-million-jon-skeet/. (2018). Accessed: 2018-03-01.Google Scholar
- 2018. Training, test, and validation sets. https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets. (2018). Accessed: 2018-06--20.Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283. Google Scholar
Digital Library
- Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81. Google Scholar
Digital Library
- Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting quality flaws in user-generated content: the case of wikipedia. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 981--990. Google Scholar
Digital Library
- Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).Google Scholar
- Lasse Bergroth, Harri Hakonen, and Timo Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. IEEE, 39--48. Google Scholar
Digital Library
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022. Google Scholar
Digital Library
- Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14. Google Scholar
Digital Library
- Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83--92.Google Scholar
Cross Ref
- Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 356--366.Google Scholar
Cross Ref
- Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites. PACMHCI 1, CSCW (2017), 32:1--32:21. Google Scholar
Digital Library
- Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450--461. Google Scholar
Digital Library
- Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 744--755. Google Scholar
Digital Library
- François Chollet et al. 2015. Keras. (2015).Google Scholar
- Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 69--78.Google Scholar
- Alexander Genkin, David D Lewis, and David Madigan. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3 (2007), 291--304.Google Scholar
Cross Ref
- Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 2: Short Papers. 427--431.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042--2050. Google Scholar
Digital Library
- Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady Wirawan Lauw, and Ba-Quy Vuong. 2007. Measuring article quality in wikipedia: models and evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 243--252. Google Scholar
Digital Library
- Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In ICML, Vol. 99. 200--209. Google Scholar
Digital Library
- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification.. In AAAI, Vol. 333. 2267--2273. Google Scholar
Digital Library
- Jean Lave and Etienne Wenger. 1991. Situated learning: Legitimate peripheral participation. Cambridge university press.Google Scholar
- Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is It Good to Be Like Wikipedia?: Exploring the Trade-offs of Introducing Collaborative Editing Model to Q&A Sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1080--1091. Google Scholar
Digital Library
- Yi Lian, Pengfei Liu, Hongyuan Huo, Hu Zhang, Tiejun Cui, and Peng Du. 2016. Inversion of FeO and TiO2 content using microwave radiance simulation based on Chang-E2 passive microwave radiometer data. In 2016 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2016, Beijing, China, July 10--15, 2016. 4319--4322.Google Scholar
Cross Ref
- Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-task Learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI'16). AAAI Press, 2873--2879. Google Scholar
Digital Library
- Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest Q&A site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2857--2866. Google Scholar
Digital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google Scholar
Digital Library
- Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. 2017. Feature Visualization. Distill 2, 11 (2017), e7.Google Scholar
Cross Ref
- Luca Ponzanelli, Andrea Mocci, Alberto Bacchelli, Michele Lanza, and David Fullerton. 2014. Improving low quality stack overflow post detection. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 541--544. Google Scholar
Digital Library
- David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533.Google Scholar
- Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 959--962. Google Scholar
Digital Library
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11--21.Google Scholar
Cross Ref
- Besiki Stvilia, Michael B Twidale, Les Gasser, and Linda C Smith. 2005. Information quality discussions in Wikipedia. In Proceedings of the 2005 international conference on knowledge management. Citeseer, 101--113.Google Scholar
- Besiki Stvilia, Michael B Twidale, Linda C Smith, and Les Gasser. 2008. Information quality work organization in Wikipedia. Journal of the Association for Information Science and Technology 59, 6 (2008), 983--1001. Google Scholar
Digital Library
- Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. 2009. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23, 04 (2009), 687--719.Google Scholar
Cross Ref
- Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. 2016. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016).Google Scholar
- Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 51--62. Google Scholar
Digital Library
- Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923 (2017).Google Scholar
- Wenpeng Yin and Hinrich Schütze. 2015. Convolutional neural network for paraphrase identification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901--911.Google Scholar
Cross Ref
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657. Google Scholar
Digital Library
Index Terms
Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites
Recommendations
By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites
Community edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. ...
Information and Quality Assurance: An Unsolved, Perpetual Problem for Past and Future Generations
Quality assurance is well understood in the hardware/manufacturing community but not necessarily within the software community. The meaning of Information assurance depends on who you ask; those in the computer security field interpret information ...
State of the art in software quality assurance
The authors present a summary of their understanding of the publications that appeared in the computing literature during the last five-year period, with the word quality as part of their title and dealing with some or other aspects of software quality ...






Comments