ABSTRACT
In this short paper, I describe six data management research challenges relevant for Big Data and the Cloud. Although some of these problems are not new, their importance is amplified by Big Data and Cloud Computing.
Supplemental Material
- Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. SIGMOD Conference 1999: 275--286. Google Scholar
Digital Library
- Chaiken R. et. al.: SCOPE: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 2008. Google Scholar
Digital Library
- Chaudhuri, S., Motwani, R., Narasayya, V..: On Random Sampling over Joins. SIGMOD Conference 1999: 263--274. Google Scholar
Digital Library
- Chaudhuri, S.: Query optimizers: time to rethink the contract? SIGMOD Conference 2009: 961--968. Google Scholar
Digital Library
- Chaudhuri, S., Dayal, U., Narasayya, V. An Overview of Business Intelligence Technology. Communications of the ACM Vol. 54 No. 8, Pages 88--98. Google Scholar
Digital Library
- Cheng T., Lauw H.W., Paparizos S.: Entity Synonyms for Structured Web Search, IEEE Trans. Knowledge and Data Eng., 2011.Google Scholar
- Dageville, B., Zait, M. SQL Memory Management in Oracle 9i. In Proceedings of VLDB 2002, Hong Kong, China. Google Scholar
Digital Library
- Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1): 72--77 (2010). Google Scholar
Digital Library
- Dwork, C., Differential Privacy. 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer Verlag, Venice, Italy, July 2006. Google Scholar
Digital Library
- Dwork,C., McSherry, F., Nissim,K., Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pages 265--284, 2006. Google Scholar
Digital Library
- Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen,A., Madhavan, J., Shapley, R., Shen, R., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. SIGMOD Conference 2010: 1061--1066 Google Scholar
Digital Library
- Haas, P.J., Hellerstein, J.M.: Ripple Joins for Online Aggregation. SIGMOD Conference 1999: 287--298 Google Scholar
Digital Library
- Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. SIGMOD Conference 1997: 171--182 Google Scholar
Digital Library
- Hoffart J. et.al.: Robust Disambiguation of Named Entities in Text, EMNLP 2011. Google Scholar
Digital Library
- Kulkarni K., Singh A., Ramakrishnan G., Chakrabarti, S.: Collective Annotation of Wikipedia Entities in Web Text. KDD 2009. Google Scholar
Digital Library
- Lampson, B.: Privacy and security - Usable security: how to get it. Communications of the ACM 52(11): 25--27 (2009). Google Scholar
Digital Library
- McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. SIGMOD Conference 2009: 19--30. Google Scholar
Digital Library
- Olston C. et.al.: Pig Latin: a not-so-Foreign Language for Data Processing. SIGMOD'08. Google Scholar
Digital Library
- Oracle Virtual Private Database (VPD). http://www.oracle.com.Google Scholar
- Stonebraker, M., Abadi, D.A., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Communications of the ACM 53(1): 64--71 (2010). Google Scholar
Digital Library
- Storm et al. Adaptive Self-Tuning Memory in IBM DB2. In Proceedings of VLDB 2006, Seoul, Korea. Google Scholar
Digital Library
- Thusoo, A. et al. Hive: a Warehousing Solution over a Map-Reduce Framework. PVLDB 2(2), 2009. Google Scholar
Digital Library
- Wang C., Chakrabarti K, Cheng T., Chaudhuri S.: Targeted Disambiguation of Ad-hoc, Homogeneous Sets of Named Entities, WWW 2012. Google Scholar
Digital Library
Index Terms
What next?: a half-dozen data management research goals for big data and the cloud
Recommendations
A study of big data evolution and research challenges
The world is already into the information age. The huge growth of digital data has overwhelmed the traditional systems and approaches. Big data is touching almost all aspects of our life and the data-driven discovery approach is an emerging paradigm for ...
Management of distributed big data for social networks
CCGRID '16: Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingIn the current era of big data, high volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. Due to the well-known 5V's of these big data, many ...
WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningThe SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...






Comments