Abstract
Pattern mining, that is, the automated discovery of patterns from data, is a mathematically complex and computationally demanding problem that is generally not manageable by humans. In this article, we focus on small datasets and study whether it is possible to mine patterns with the help of the crowd by means of a set of controlled experiments on a common crowdsourcing platform. We specifically concentrate on mining model patterns from a dataset of real mashup models taken from Yahoo! Pipes and cover the entire pattern mining process, including pattern identification and quality assessment. The results of our experiments show that a sensible design of crowdsourcing tasks indeed may enable the crowd to identify patterns from small datasets (40 models). The results, however, also show that the design of tasks for the assessment of the quality of patterns to decide which patterns to retain for further processing and use is much harder (our experiments fail to elicit assessments from the crowd that are similar to those by an expert). The problem is relevant in general to model-driven development (e.g., UML, business processes, scientific workflows), in that reusable model patterns encode valuable modeling and domain knowledge, such as best practices, organizational conventions, or technical choices, that modelers can benefit from when designing their own models.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Mining and Quality Assessment of Mashup Model Patterns with the Crowd: A Feasibility Study
- Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay Machiraju. 2003. Web Services: Concepts, Architectures, and Applications. Springer. Google Scholar
Digital Library
- Cinzia Cappiello, Florian Daniel, Agnes Koschmider, Maristella Matera, and Matteo Picozzi. 2011. A quality model for mashups. In The International Conference of Web Engineering (ICWE’11). Springer, 137--151. Google Scholar
Digital Library
- Cinzia Cappiello, Florian Daniel, Maristella Matera, and Cesare Pautasso. 2010. Information quality in mashups. IEEE Internet Computing 14, 4 (2010), 14--22. Google Scholar
Digital Library
- Cinzia Cappiello, Maristella Matera, Matteo Picozzi, Florian Daniel, and Adrian Fernandez. 2012. Quality-aware mashup composition: Issues, techniques and tools. In The International Conference on the Quality of Information and Communications Technology (QUATIC’12). IEEE, 10--19. Google Scholar
Digital Library
- Michael Pierre Carlson, Anne H. Ngu, Rodion Podorozhny, and Liangzhao Zeng. 2008. Automatic mash up of composite applications. In The International Conference of Service Oriented Computing (ICSOC’08). Springer, 317--330. Google Scholar
Digital Library
- Huajun Chen, Bin Lu, Yuan Ni, Guotong Xie, Chunying Zhou, Jinhua Mi, and Zhaohui Wu. 2009. Mashup by surfing a web of data APIs. VLDB 2, 2 (August 2009), 1602--1605. Google Scholar
Digital Library
- Florian Daniel and Maristella Matera. 2014. Mashups: Concepts, Models and Architectures. Springer. Google Scholar
Digital Library
- Ewa Deelman, Dennis Gannon, Matthew S. Shields, and Ian Taylor. 2009. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation of Computer Systems 25, 5 (2009), 528--540. Google Scholar
Digital Library
- Remco Dijkman, Marlon Dumas, Boudewijn Van Dongen, Reina Käärik, and Jan Mendling. 2011. Similarity of business process models: Metrics and evaluation. Information Systems 36, 2 (2011), 498--516. Google Scholar
Digital Library
- Hazem Elmeleegy, Anca Ivan, Rama Akkiraju, and Richard Goodwin. 2008. Mashup advisor: A recommendation tool for mashup development. In The International Conference on Web Services (ICWS’08). IEEE Computer Society, 337--344. Google Scholar
Digital Library
- Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In SIGMOD. 61--72. Google Scholar
Digital Library
- Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computer Surveys 38, 3 (2006), 9. Google Scholar
Digital Library
- Dan Gillick and Yang Liu. 2010. Non-expert evaluation of summarization systems is risky. In The NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (NAACL-HLT’10). Association for Computational Linguistics, 148--151. Google Scholar
Digital Library
- Ohad Greenshpan, Tova Milo, and Neoklis Polyzotis. 2009. Autocompletion for mashups. VLDB 2, 1 (August 2009), 538--549. Google Scholar
Digital Library
- Jeff Howe. 2008. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. Crown Publishing Group, New York, NY. Google Scholar
Digital Library
- Till Janner, Robert Siebeck, Christoph Schroth, and Volker Hoyer. 2009. Patterns for enterprise mashups in b2b collaborations to foster lightweight composition and end user development. In The International Conference on Web Services (ICWS’09). 976--983. Google Scholar
Digital Library
- Hyun Joon Jung and Matthew Lease. 2011. Improving consensus accuracy via Z-score and weighted voting. In The AAAI Conference on Human Computation (AAAIWS’11). 88--90. Google Scholar
Digital Library
- Christian Keimel, Julian Habigt, Clemens Horch, and Klaus Diepold. 2012. Qualitycrowd—A framework for crowd-based quality evaluation. In Picture Coding Symposium (PCS’12). IEEE, 245--248.Google Scholar
Cross Ref
- Faiza Khan Khattak and Ansaf Salleb-Aouissi. 2011. Quality control of crowd labeling through expert evaluation. In The Neural Information Processing Systems Workshop on Computational Social Science and the Wisdom of Crowds (NISP’11).Google Scholar
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’08). ACM, 453--456. Google Scholar
Digital Library
- Christian Kohls. 2011. The structure of patterns: Part II - qualities. In The Conference on Pattern Languages of Programs (PLoP’11). 27:1--27:18. Google Scholar
Digital Library
- Wenchao Li, Sanjit A. Seshia, and Somesh Jha. 2012. CrowdMine: Towards crowdsourced human-assisted verification. In The Annual Design Automation Conference (DAC’12). IEEE, 1250--1251. Google Scholar
Digital Library
- Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, and Robert C. Miller. 2011. Crowdsourced databases: Query processing with people. In The Conference on Innovative Data Systems Research (CIDR’11). 211--214.Google Scholar
- Winter Mason and Duncan J. Watts. 2010. Financial incentives and the performance of crowds. ACM SigKDD Explorations Newsletter 11, 2 (2010), 100--108. Google Scholar
Digital Library
- Thomas J. McCabe. 1976. A complexity measure. IEEE Transactions on Software Engineering SE-2, 4 (Dec. 1976), 308--320. Google Scholar
Digital Library
- Richard M. C. McCreadie, Craig Macdonald, and Iadh Ounis. 2010. Crowdsourcing a news query classification dataset. In The SIGIR Workshop on Crowdsourcing for Search Evaluation (CSE’10). 31--38.Google Scholar
- OMG. 2011. Business Process Model and Notation (BPMN) version 2.0. http://www.bpmn.org. (2011).Google Scholar
- Object Management Group (OMG). 2014. The Interaction Flow Modeling Language (IFML), Version 974 1.0. OMG standard specification. Object Management Group, http://www.ifml.org.Google Scholar
- OMG. 2014. Unified Modeling Language (UML). http://www.uml.org/. (2014).Google Scholar
- Aditya Parameswaran and Neoklis Polyzotis. 2011. Answering queries using humans, algorithms and databases. In The Conference on Innovative Data Systems Research (CIDR’11). 160--166.Google Scholar
- Anton V. Riabov, Eric Boillet, Mark D. Feblowitz, Zhen Liu, and Anand Ranganathan. 2008. Wishful search: Interactive composition of data mashups. In The International Conference on World Wide Web (WWW’08). ACM, 775--784. Google Scholar
Digital Library
- Carlos Rodríguez, Florian Daniel, and Fabio Casati. 2014a. Crowd-based mining of reusable process model patterns. In The International Conference on Business Process Management (BPM’14). Springer, 51--66.Google Scholar
Cross Ref
- Carlos Rodríguez, Soudip Roy Chowdhury, Florian Daniel, Hamid R. Motahari Nezhad, and Fabio Casati. 2014b. Assisted mashup development: On the discovery and recommendation of mashup composition knowledge. In Web Services Foundations. Springer, 683--708.Google Scholar
- Soudip Roy Chowdhury, Florian Daniel, and Fabio Casati. 2014. Recommendation and weaving of reusable mashup model patterns for assisted development. ACM Transactions on Internet Technologies 14, 2--3 (2014), Article 21. Google Scholar
Digital Library
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In The International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, 614--622. Google Scholar
Digital Library
- Kathryn T. Stolee and Sebastian Elbaum. 2010. Exploring the use of crowdsourcing to support empirical studies in software engineering. In The ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’10). ACM, 35. Google Scholar
Digital Library
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley.Google Scholar
- Stefano Tranquillini, Florian Daniel, Pavel Kucherbaev, and Fabio Casati. 2014. Modeling, enacting and integrating custom crowdsourcing processes. ACM Transactions on the Web 9, 2 (2014). Google Scholar
Digital Library
- Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google Scholar
Digital Library
- Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In The SIGCHI Conference on Human Factors in Computing Systems (CHI’04). ACM, 319--326. Google Scholar
Digital Library
- Mathias Weske. 2007. Business Process Management: Concepts, Languages, Architectures. Springer. Google Scholar
Digital Library
Index Terms
Mining and Quality Assessment of Mashup Model Patterns with the Crowd: A Feasibility Study
Recommendations
Mining Regular Patterns in Transactional Databases
The frequency of a pattern may not be a sufficient criterion for identifying meaningful patterns in a database. The temporal regularity of a pattern can be another key criterion for assessing the importance of a pattern in several applications. A ...
Pattern mining patterns: a search for the seeds of patterns
PLoP '16: Proceedings of the 23rd Conference on Pattern Languages of ProgramsThis paper presents the Pattern Mining Patterns. "Mining" is a phase in the process of Pattern Languages for identifying overlapping meanings or relationships among ideas. We wrote 121 patterns based on Iba Laboratory's empirical knowledge on mining, ...
An efficient and effective algorithm for mining top-rank-k frequent patterns
Using N-list structure for mining top-rank-k frequent patterns effectively.Subsume concept was also used to speed up the runtime of the mining process.The experiment was conducted to show the effectiveness of the proposed algorithm. Frequent pattern ...






Comments