Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

We present TwitterMonitor, a system that performs trend detection over the Twitter stream. The system identifies emerging topics (i.e. 'trends') on Twitter in real time and provides meaningful analytics that synthesize an accurate description of each topic. Users interact with the system by ordering the identified trends using different criteria and submitting their own description for each trend.

We discuss the motivation for trend detection over social media streams and the challenges that lie therein. We then describe our approach to trend detection, as well as the architecture of TwitterMonitor. Finally, we lay out our demonstration scenario.

top of pageAUTHORS



Michael Mathioudakis Michael Mathioudakis

homepage
michael.mathioudakisataalto.fi
Bibliometrics: publication history
Publication years2006-2016
Publication count10
Citation Count243
Available for download9
Downloads (6 Weeks)214
Downloads (12 Months)1,562
Downloads (cumulative)8,740
Average downloads per article971.11
Average citations per article24.30
View colleagues of Michael Mathioudakis


Author image not provided  Nick Koudas

 homepage
 koudasatcs.toronto.edu
Bibliometrics: publication history
Publication years1994-2015
Publication count128
Citation Count3,393
Available for download84
Downloads (6 Weeks)312
Downloads (12 Months)2,028
Downloads (cumulative)49,461
Average downloads per article588.82
Average citations per article26.51
View colleagues of Nick Koudas

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alltop, http://alltop.com/.
 
2
Radian6, http://www.radian6.com/.
 
3
Scoutlabs, http://scoutlabs.com/.
 
4
Sysomos, http://www.sysomos.com/.
 
5
Thoora, http://www.thoora.com/.
 
6
Twitscoop, http://www.twitscoop.com/.
7
 
8
N. Bansal and N. Koudas. Blogscope: A system for online analysis of high volume text streams. In WebDb, 2007.
 
9
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.
10

top of pageCITED BY

162 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title SIGMOD '10 Proceedings of the 2010 ACM SIGMOD International Conference on Management of data table of contents
General Chairs Ahmed Elmagarmid Purdue University, USA
Program Chairs Divyakant Agrawal University of California at Santa Barbara, USA
Pages 1155-1158
Publication Date2010-06-06 (yyyy-mm-dd)
Sponsor SIGMOD ACM Special Interest Group on Management of Data
PublisherACM New York, NY, USA ©2010
ISBN: 978-1-4503-0032-2 Order Number: 405102 doi>10.1145/1807167.1807306
Conference MODInternational Conference on Management of Data MOD logo
Paper Acceptance Rate 80 of 384 submissions, 21%
Overall Acceptance Rate 1,104 of 5,662 submissions, 19%
Year Submitted Accepted Rate
SIGMOD '96 290 47 16%
SIGMOD '97 202 42 21%
SIGMOD '00 248 42 17%
SIGMOD '01 293 44 15%
SIGMOD '02 240 42 18%
SIGMOD '03 342 53 15%
SIGMOD '06 446 58 13%
SIGMOD '07 480 70 15%
SIGMOD '08 435 78 18%
SIGMOD '09 430 118 27%
SIGMOD '10 384 80 21%
SIGMOD '11 375 93 25%
SIGMOD '12 289 48 17%
SIGMOD '13 372 76 20%
SIGMOD '14 421 107 25%
SIGMOD '15 415 106 26%
Overall 5,662 1,104 19%

APPEARS IN
Digital Content

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Table of Contents
SECTION: SIGMOD Awards
2010 SIGMOD Awards Presentations
Ahmed Elmagarmid
Article No: 1
doi>10.1145/1807167.1837141
Full text: MovMov
SESSION: Keynotes
Divy Agrawal, Anastasia Ailamaki
The flow of on-line information in global networks
Jon Kleinberg
Pages: 1-2
doi>10.1145/1807167.1807169
Full text: PDFPDF
Other formats: MovMov
Warehouse-scale Computing
Luiz Andre Barroso
Article No: 2
doi>10.1145/1807167.1837133
Full text: MovMov
SESSION: Advanced query processing
Walid Aref
Efficiently evaluating complex boolean expressions
Marcus Fontoura, Suhas Sadanandan, Jayavel Shanmugasundaram, Sergei Vassilvitski, Erik Vee, Srihari Venkatesan, Jason Zien
Pages: 3-14
doi>10.1145/1807167.1807171
Full text: PDFPDF

The problem of efficiently evaluating a large collection of complex Boolean expressions - beyond simple conjunctions and Disjunctive/Conjunctive Normal Forms (DNF/CNF) - occurs in many emerging online advertising applications such as advertising exchanges ...
expand
How to ConQueR why-not questions
Quoc Trung Tran, Chee-Yong Chan
Pages: 15-26
doi>10.1145/1807167.1807172
Full text: PDFPDF

One useful feature that is missing from today's database systems is an explain capability that enables users to seek clarifications on unexpected query results. There are two types of unexpected query results that are of interest: the presence of unexpected ...
expand
Call to order: a hierarchical browsing approach to eliciting users' preference
Feng Zhao, Gautam Das, Kian-Lee Tan, Anthony K.H. Tung
Pages: 27-38
doi>10.1145/1807167.1807173
Full text: PDFPDF

Computing preference queries has received a lot of attention in the database community. It is common that the user is unsure of his/her preference, so care must be taken to elicit the preference of the user correctly. In this paper, we propose to elicit ...
expand
Boosting spatial pruning: on optimal pruning of MBRs
Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle
Pages: 39-50
doi>10.1145/1807167.1807174
Full text: PDFPDF

Fast query processing of complex objects, e.g. spatial or uncertain objects, depends on efficient spatial pruning of objects' approximations, which are typically minimum bounding rectangles (MBRs). In this paper, we propose a novel effective and efficient ...
expand
SESSION: Data cleaning & data mining
Timos Sellis
Leveraging spatio-temporal redundancy for RFID data cleansing
Haiquan Chen, Wei-Shinn Ku, Haixun Wang, Min-Te Sun
Pages: 51-62
doi>10.1145/1807167.1807176
Full text: PDFPDF

Radio Frequency Identification (RFID) technologies are used in many applications for data collection. However, raw RFID readings are usually of low quality and may contain many anomalies. An ideal solution for RFID data cleansing should address the following ...
expand
Sampling dirty data for matching attributes
Henning Köhler, Xiaofang Zhou, Shazia Sadiq, Yanfeng Shu, Kerry Taylor
Pages: 63-74
doi>10.1145/1807167.1807177
Full text: PDFPDF

We investigate the problem of creating and analyzing samples of relational databases to find relationships between string-valued attributes. Our focus is on identifying attribute pairs whose value sets overlap, a pre-condition for typical joins over ...
expand
ERACER: a database approach for statistical inference and data cleaning
Chris Mayfield, Jennifer Neville, Sunil Prabhakar
Pages: 75-86
doi>10.1145/1807167.1807178
Full text: PDFPDF

Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures incorporated into modern DBMSs. We present ERACER, an iterative statistical framework for inferring missing information and ...
expand
Recsplorer: recommendation algorithms based on precedence mining
Aditya G. Parameswaran, Georgia Koutrika, Benjamin Bercovitz, Hector Garcia-Molina
Pages: 87-98
doi>10.1145/1807167.1807179
Full text: PDFPDF

We study recommendations in applications where there are temporal patterns in the way items are consumed or watched. For example, a student who has taken the Advanced Algorithms course is more likely to be interested in Convex Optimization, but a student ...
expand
SESSION: Graph data & querying
Graham Cormode
TEDI: efficient shortest path query answering on graphs
Fang Wei
Pages: 99-110
doi>10.1145/1807167.1807181
Full text: PDFPDF

Efficient shortest path query answering in large graphs is enjoying a growing number of applications, such as ranked keyword search in databases, social networks, ontology reasoning and bioinformatics. A shortest path query on a graph finds the shortest ...
expand
GBLENDER: towards blending visual query formulation and query processing in graph databases
Changjiu Jin, Sourav S. Bhowmick, Xiaokui Xiao, James Cheng, Byron Choi
Pages: 111-122
doi>10.1145/1807167.1807182
Full text: PDFPDF

Given a graph database D and a query graph g, an exact subgraph matching query asks for the set S of graphs in D that contain g as a subgraph. This type of queries find important applications in several domains such ...
expand
Computing label-constraint reachability in graph databases
Ruoming Jin, Hui Hong, Haixun Wang, Ning Ruan, Yang Xiang
Pages: 123-134
doi>10.1145/1807167.1807183
Full text: PDFPDF

Our world today is generating huge amounts of graph data such as social networks, biological networks, and the semantic web. Many of these real-world graphs are edge-labeled graphs, i.e., each edge has a label that denotes the relationship between the ...
expand
Pregel: a system for large-scale graph processing
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski
Pages: 135-146
doi>10.1145/1807167.1807184
Full text: PDFPDF

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. ...
expand
SESSION: Data streams & time-series data
Alex Labrinidis
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees
Shimin Chen, Phillip B. Gibbons, Suman Nath
Pages: 147-158
doi>10.1145/1807167.1807186
Full text: PDFPDF

Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on a large amount of data. Essential to the success of online aggregation is a good non-blocking join algorithm that enables ...
expand
PODS: a new model and processing algorithms for uncertain data streams
Thanh T.L. Tran, Liping Peng, Boduo Li, Yanlei Diao, Anna Liu
Pages: 159-170
doi>10.1145/1807167.1807187
Full text: PDFPDF

Uncertain data streams, where data is incomplete, imprecise, and even misleading, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern ...
expand
Fast approximate correlation for massive time-series data
Abdullah Mueen, Suman Nath, Jie Liu
Pages: 171-182
doi>10.1145/1807167.1807188
Full text: PDFPDF

We consider the problem of computing all-pair correlations in a warehouse containing a large number (e.g., tens of thousands) of time-series (or, signals). The problem arises in automatic discovery of patterns and anomalies in data intensive applications ...
expand
An algorithmic approach to event summarization
Peng Wang, Haixun Wang, Majin Liu, Wei Wang
Pages: 183-194
doi>10.1145/1807167.1807189
Full text: PDFPDF

Recently, much study has been directed toward summarizing event data, in the hope that the summary will lead us to a better understanding of the system that generates the events. However, instead of offering a global picture of the system, the summary ...
expand
SESSION: Innovative data management
Mirek Riedewald
Spreadsheet as a relational database engine
Jerzy Tyszkiewicz
Pages: 195-206
doi>10.1145/1807167.1807191
Full text: PDFPDF

Spreadsheets are among the most commonly used applications for data management and analysis. Perhaps they are even among the most widely used computer applications of all kinds. However, the spreadsheet paradigm of computation still lacks sufficient ...
expand
Scalable architecture and query optimization fortransaction-time DBs with evolving schemas
Hyun Jin Moon, Carlo A. Curino, Carlo Zaniolo
Pages: 207-218
doi>10.1145/1807167.1807192
Full text: PDFPDF

The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database ...
expand
Data conflict resolution using trust mappings
Wolfgang Gatterbauer, Dan Suciu
Pages: 219-230
doi>10.1145/1807167.1807193
Full text: PDFPDF

In massively collaborative projects such as scientific or community databases, users often need to agree or disagree on the content of individual data items. On the other hand, trust relationships often exist between users, allowing them ...
expand
Analyzing the energy efficiency of a database server
Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah
Pages: 231-242
doi>10.1145/1807167.1807194
Full text: PDFPDF

Rising energy costs in large data centers are driving an agenda for energy-efficient computing. In this paper, we focus on the role of database software in affecting, and, ultimately, improving the energy efficiency of a server. We first characterize ...
expand
SESSION: Location & sensor based data
Gottfried Vossen
Processing proximity relations in road networks
Zhengdao Xu, Hans-Arno Jacobsen
Pages: 243-254
doi>10.1145/1807167.1807196
Full text: PDFPDF

Applications ranging from location-based services to multi-player online gaming require continuous query support to monitor, track, and detect events of interest among sets of moving objects. Examples are alerting capabilities for detecting whether the ...
expand
Searching trajectories by locations: an efficiency study
Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, Xing Xie
Pages: 255-266
doi>10.1145/1807167.1807197
Full text: PDFPDF

Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is ...
expand
Processing continuous join queries in sensor networks: a filtering approach
Mirco Stern, Klemens Böhm, Erik Buchmann
Pages: 267-278
doi>10.1145/1807167.1807198
Full text: PDFPDF

While join processing in wireless sensor networks has received a lot of attention recently, current solutions do not work well for continuous queries. In those networks however, continuous queries are the rule. To minimize the communication costs of ...
expand
TACO: tunable approximate computation of outliers in wireless sensor networks
Nikos Giatrakos, Yannis Kotidis, Antonios Deligiannakis, Vasilis Vassalos, Yannis Theodoridis
Pages: 279-290
doi>10.1145/1807167.1807199
Full text: PDFPDF

Wireless sensor networks are becoming increasingly popular for a variety of applications. Users are frequently faced with the surprising discovery that readings produced by the sensing elements of their motes are often contaminated with outliers. Outlier ...
expand
SESSION: Probabilistic & uncertain data
Yannis Papakonstantinou
GRN model of probabilistic databases: construction, transition and querying
Ruiwen Chen, Yongyi Mao, Iluju Kiringa
Pages: 291-302
doi>10.1145/1807167.1807201
Full text: PDFPDF

Under the tuple-level uncertainty paradigm, we formalize the use of a novel graphical model, Generator-Recognizer Network (GRN), as a model of probabilistic databases. The GRN modeling framework is capable of representing a much wider range of tuple ...
expand
Consistent query answers in inconsistent probabilistic databases
Xiang Lian, Lei Chen, Shaoxu Song
Pages: 303-314
doi>10.1145/1807167.1807202
Full text: PDFPDF

Efficient and effective manipulation of probabilistic data has become increasingly important recently due to many real applications that involve the data uncertainty. This is especially crucial when probabilistic data collected from different sources ...
expand
Threshold query optimization for uncertain data
Yinian Qi, Rohit Jain, Sarvjeet Singh, Sunil Prabhakar
Pages: 315-326
doi>10.1145/1807167.1807203
Full text: PDFPDF

The probabilistic threshold query (PTQ) is one of the most common queries in uncertain databases, where all results satisfying the query with probabilities that meet the threshold requirement are returned. PTQ is used widely in nearest-neighbor queries, ...
expand
Probabilistic string similarity joins
Jeffrey Jestes, Feifei Li, Zhepeng Yan, Ke Yi
Pages: 327-338
doi>10.1145/1807167.1807204
Full text: PDFPDF

Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the ...
expand
SESSION: Leveraging hardware for data management
Anastasia Ailamaki
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, Pradeep Dubey
Pages: 339-350
doi>10.1145/1807167.1807206
Full text: PDFPDF

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures ...
expand
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Nadathur Satish, Changkyu Kim, Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, Daehyun Kim, Pradeep Dubey
Pages: 351-362
doi>10.1145/1807167.1807207
Full text: PDFPDF

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and ...
expand
Page-differential logging: an efficient and DBMS-independent approach for storing data into flash memory
Yi-Reun Kim, Kyu-Young Whang, Il-Yeol Song
Pages: 363-374
doi>10.1145/1807167.1807208
Full text: PDFPDF

Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage ...
expand
Similarity search and locality sensitive hashing using ternary content addressable memories
Rajendra Shinde, Ashish Goel, Pankaj Gupta, Debojyoti Dutta
Pages: 375-386
doi>10.1145/1807167.1807209
Full text: PDFPDF

Similarity search methods are widely used as kernels in various data mining and machine learning applications including those in computational biology, web search/clustering. Nearest neighbor search (NNS) algorithms are often used to retrieve similar ...
expand
SESSION: University of Washington
Magdalena Balazinska
Automatically incorporating new sources in keyword search-based data integration
Partha Pratim Talukdar, Zachary G. Ives, Fernando Pereira
Pages: 387-398
doi>10.1145/1807167.1807211
Full text: PDFPDF

Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available ...
expand
Active knowledge: dynamically enriching RDF knowledge bases by web services
Nicoleta Preda, Gjergji Kasneci, Fabian M. Suchanek, Thomas Neumann, Wenjun Yuan, Gerhard Weikum
Pages: 399-410
doi>10.1145/1807167.1807212
Full text: PDFPDF

The proliferation of knowledge-sharing communities and the advances in information extraction have enabled the construction of large knowledge bases using the RDF data model to represent entities and relationships. However, as the Web and its latently ...
expand
Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
Hatem A. Mahmoud, Ashraf Aboulnaga
Pages: 411-422
doi>10.1145/1807167.1807213
Full text: PDFPDF

A data integration system offers a single interface to multiple structured data sources. Many application contexts (e.g., searching structured data on the web) involve the integration of large numbers of structured data sources. At web scale, it is impractical ...
expand
Expressive and flexible access to web-extracted data: a keyword-based structured query language
Jeffrey Pound, Ihab F. Ilyas, Grant Weddell
Pages: 423-434
doi>10.1145/1807167.1807214
Full text: PDFPDF

Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured ...
expand
SESSION: Social networks & community data
Susan Davidson
Multiple feature fusion for social media applications
Bin Cui, Anthony K.H. Tung, Ce Zhang, Zhe Zhao
Pages: 435-446
doi>10.1145/1807167.1807216
Full text: PDFPDF

The emergence of social media as a crucial paradigm has posed new challenges to the research and industry communities, where media are designed to be disseminated through social interaction. Recent literature has noted the generality of multiple features ...
expand
Finding maximal cliques in massive networks by H*-graph
James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, Linhong Zhu
Pages: 447-458
doi>10.1145/1807167.1807217
Full text: PDFPDF

Maximal clique enumeration (MCE) is a fundamental problem in graph theory and has important applications in many areas such as social network analysis and bioinformatics. The problem is extensively studied; however, the best existing algorithms ...
expand
K-isomorphism: privacy preserving network publication against structural attacks
James Cheng, Ada Wai-chee Fu, Jia Liu
Pages: 459-470
doi>10.1145/1807167.1807218
Full text: PDFPDF

Serious concerns on privacy protection in social networks have been raised in recent years; however, research in this area is still in its infancy. The problem is challenging due to the diversity and complexity of graph data, on which an adversary can ...
expand
Load-balanced query dissemination in privacy-aware online communities
Emiran Curtmola, Alin Deutsch, K. K. Ramakrishnan, Divesh Srivastava
Pages: 471-482
doi>10.1145/1807167.1807219
Full text: PDFPDF

We propose a novel privacy-preserving distributed infrastructure in which data resides only with the publishers owning it. The infrastructure disseminates user queries to publishers, who answer them at their own discretion. The infrastructure enforces ...
expand
SESSION: Scalable data analytics
Chris Olston
Automatic contention detection and amelioration for data-intensive operations
John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, Yang Ye
Pages: 483-494
doi>10.1145/1807167.1807221
Full text: PDFPDF

To take full advantage of the parallelism offered by a multi-core machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, there are numerous performance pitfalls. For example, an unrecognized data ...
expand
Efficient parallel set-similarity joins using MapReduce
Rares Vernica, Michael J. Carey, Chen Li
Pages: 495-506
doi>10.1145/1807167.1807222
Full text: PDFPDF

In this paper we study how to efficiently perform set-similarity joins in parallel using the popular MapReduce framework. We propose a 3-stage approach for end-to-end set-similarity joins. We take as input a set of records and output a set of joined ...
expand
ParaTimer: a progress indicator for MapReduce DAGs
Kristi Morton, Magdalena Balazinska, Dan Grossman
Pages: 507-518
doi>10.1145/1807167.1807223
Full text: PDFPDF

Time-oriented progress estimation for parallel queries is a challenging problem that has received only limited attention. In this paper, we present ParaTimer, a new type of time-remaining indicator for parallel queries. Several parallel data processing ...
expand
The DataPath system: a data-centric analytic processing engine for large data warehouses
Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, Luis Perez
Pages: 519-530
doi>10.1145/1807167.1807224
Full text: PDFPDF

Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally ...
expand
SESSION: Advanced query processing
Jiaheng Lu
Variance aware optimization of parameterized queries
Surajit Chaudhuri, Hongrae Lee, Vivek R. Narasayya
Pages: 531-542
doi>10.1145/1807167.1807226
Full text: PDFPDF

Parameterized queries are commonly used in database applications. In a parameterized query, the same SQL statement is potentially executed multiple times with different parameter values. In today's DBMSs the query optimizer typically chooses a single ...
expand
Positional update handling in column stores
Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, Peter Boncz
Pages: 543-554
doi>10.1145/1807167.1807227
Full text: PDFPDF

In this paper we investigate techniques that allow for on-line updates to columnar databases, leaving intact their high read-only performance. Rather than keeping differential structures organized by the table key values, the core proposition of this ...
expand
Durable top-k search in document archives
Leong Hou U, Nikos Mamoulis, Klaus Berberich, Srikanta Bedathur
Pages: 555-566
doi>10.1145/1807167.1807228
Full text: PDFPDF

We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects ...
expand
Ajax-based report pages as incrementally rendered views
Yupeng Fu, Keith Kowalczykowski, Kian Win Ong, Yannis Papakonstantinou, Kevin Keliang Zhao
Pages: 567-578
doi>10.1145/1807167.1807229
Full text: PDFPDF

While Ajax-based programming enables faster performance and higher interface quality over pure server-side programming, it is demanding and error prone as each action that partially updates the page requires custom, ad-hoc code. The problem is exacerbated ...
expand
SESSION: Cloud computing & internet scale computing
Mehul Shah
An evaluation of alternative architectures for transaction processing in the cloud
Donald Kossmann, Tim Kraska, Simon Loesing
Pages: 579-590
doi>10.1145/1807167.1807231
Full text: PDFPDF

Cloud computing promises a number of advantages for the deployment of data-intensive applications. One important promise is reduced cost with a pay-as-you-go business model. Another promise is (virtually) unlimited throughput by adding servers if the ...
expand
Indexing multi-dimensional data in a cloud system
Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi
Pages: 591-602
doi>10.1145/1807167.1807232
Full text: PDFPDF

Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high ...
expand
Low overhead concurrency control for partitioned main memory databases
Evan P.C. Jones, Daniel J. Abadi, Samuel Madden
Pages: 603-614
doi>10.1145/1807167.1807233
Full text: PDFPDF

Database partitioning is a technique for improving the performance of distributed OLTP databases, since "single partition" transactions that access data on one partition do not need coordination with other partitions. For workloads that are amenable ...
expand
Efficient querying and maintenance of network provenance at internet-scale
Wenchao Zhou, Micah Sherr, Tao Tao, Xiaozhou Li, Boon Thau Loo, Yun Mao
Pages: 615-626
doi>10.1145/1807167.1807234
Full text: PDFPDF

Network accountability, forensic analysis, and failure diagnosis are becoming increasingly important for network management and security. Such capabilities often utilize network provenance - the ability to issue queries over network meta-data. ...
expand
SESSION: Data summarization
Lei Chen
Hierarchically organized skew-tolerant histograms for geographic data objects
Yohan J. Roh, Jae Ho Kim, Yon Dohn Chung, Jin Hyun Son, Myoung Ho Kim
Pages: 627-638
doi>10.1145/1807167.1807236
Full text: PDFPDF

Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects ...
expand
Logging every footstep: quantile summaries for the entire history
Yufei Tao, Ke Yi, Cheng Sheng, Jian Pei, Feifei Li
Pages: 639-650
doi>10.1145/1807167.1807237
Full text: PDFPDF

Quantiles are a crucial type of order statistics in databases. Extensive research has been focused on maintaining a space-efficient structure for approximate quantile computation as the underlying dataset is updated. The existing solutions, however, ...
expand
Continuous sampling for online aggregation over multiple queries
Sai Wu, Beng Chin Ooi, Kian-Lee Tan
Pages: 651-662
doi>10.1145/1807167.1807238
Full text: PDFPDF

In this paper, we propose an online aggregation system called COSMOS (Continuous Sampling for Multiple queries in an Online aggregation System), to process multiple aggregate queries efficiently. In COSMOS, a dataset is first scrambled so that sequentially ...
expand
Histograms reloaded: the merits of bucket diversity
Carl-Christian Kanne, Guido Moerkotte
Pages: 663-674
doi>10.1145/1807167.1807239
Full text: PDFPDF

Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by investigating the estimation precision of three commercial database systems ...
expand
SESSION: Probabilistic data, fuzzy data, & data provenance
Martin Theobald
Lineage processing over correlated probabilistic databases
Bhargav Kanagal, Amol Deshpande
Pages: 675-686
doi>10.1145/1807167.1807241
Full text: PDFPDF

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first compute lineages ...
expand
Evaluation of probabilistic threshold queries in MCDB
Luis L. Perez, Subi Arumugam, Christopher M. Jermaine
Pages: 687-698
doi>10.1145/1807167.1807242
Full text: PDFPDF

MCDB is a prototype database system for managing stochastic models for uncertain data. In this paper, we study the problem of how to use MCDB to answer statistical queries that search for database objects which satisfy some filter condition with greater ...
expand
K-nearest neighbor search for fuzzy objects
Kai Zheng, Pui Cheong Fung, Xiaofang Zhou
Pages: 699-710
doi>10.1145/1807167.1807243
Full text: PDFPDF

The K-Nearest Neighbor search (kNN) problem has been investigated extensively in the past due to its broad range of applications. In this paper we study this problem in the context of fuzzy objects that have indeterministic boundaries. Fuzzy objects ...
expand
An optimal labeling scheme for workflow provenance using skeleton labels
Zhuowei Bao, Susan B. Davidson, Sanjeev Khanna, Sudeepa Roy
Pages: 711-722
doi>10.1145/1807167.1807244
Full text: PDFPDF

We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the ...
expand
SESSION: Data security & privacy
Chris Clifton
SecureBlox: customizable secure distributed data processing
William R. Marczak, Shan Shan Huang, Martin Bravenboer, Micah Sherr, Boon Thau Loo, Molham Aref
Pages: 723-734
doi>10.1145/1807167.1807246
Full text: PDFPDF

We present SecureBlox, a declarative system that unifies a distributed query processor with a security policy framework. SecureBlox decouples security concerns from system specification, allowing easy reconfiguration of a system's security properties ...
expand
Differentially private aggregation of distributed time-series with transformation and encryption
Vibhor Rastogi, Suman Nath
Pages: 735-746
doi>10.1145/1807167.1807247
Full text: PDFPDF

We propose the first differentially private aggregation algorithm for distributed time-series data that offers good practical utility without any trusted server. This addresses two important challenges in participatory data-mining applications where ...
expand
Non-homogeneous generalization in privacy preserving data publishing
Wai Kit Wong, Nikos Mamoulis, David Wai Lok Cheung
Pages: 747-758
doi>10.1145/1807167.1807248
Full text: PDFPDF

Most previous research on privacy-preserving data publishing, based on the k-anonymity model, has followed the simplistic approach of homogeneously giving the same generalized value in all quasi-identifiers within a partition. We observe that ...
expand
Preserving privacy and fairness in peer-to-peer data integration
Hazem Elmeleegy, Mourad Ouzzani, Ahmed Elmagarmid, Ahmad Abusalah
Pages: 759-770
doi>10.1145/1807167.1807249
Full text: PDFPDF

Peer-to-peer data integration - a.k.a. Peer Data Management Systems (PDMSs) - promises to extend the classical data integration approach to the Internet scale. Unfortunately, some challenges remain before realizing this promise. One of the biggest challenges ...
expand
SESSION: Web data integration
Fatma Ozcan
Structured annotations of web queries
Nikos Sarkas, Stelios Paparizos, Panayiotis Tsaparas
Pages: 771-782
doi>10.1145/1807167.1807251
Full text: PDFPDF

Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language ...
expand
On active learning of record matching packages
Arvind Arasu, Michaela Götz, Raghav Kaushik
Pages: 783-794
doi>10.1145/1807167.1807252
Full text: PDFPDF

We consider the problem of learning a record matching package (classifier) in an active learning setting. In active learning, the learning algorithm picks the set of examples to be labeled, unlike more traditional passive learning setting where a user ...
expand
I4E: interactive investigation of iterative information extraction
Anish Das Sarma, Alpa Jain, Divesh Srivastava
Pages: 795-806
doi>10.1145/1807167.1807253
Full text: PDFPDF

Information extraction systems are increasingly being used to mine structured information from unstructured text documents. A commonly used unsupervised technique is to build iterative information extraction (IIE) systems that learn task-specific ...
expand
ONDUX: on-demand unsupervised learning for information extraction
Eli Cortez, Altigran S. da Silva, Marcos André Gonçalves, Edleno S. de Moura
Pages: 807-818
doi>10.1145/1807167.1807254
Full text: PDFPDF

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important ...
expand
SESSION: Web data management
Jun Tatemura
Optimizing content freshness of relations extracted from the web using keyword search
Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang
Pages: 819-830
doi>10.1145/1807167.1807256
Full text: PDFPDF

An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy ...
expand
Feeding frenzy: selectively materializing users' event feeds
Adam Silberstein, Jeff Terrace, Brian F. Cooper, Raghu Ramakrishnan
Pages: 831-842
doi>10.1145/1807167.1807257
Full text: PDFPDF

Near real-time event streams are becoming a key feature of many popular web applications. Many web sites allow users to create a personalized feed by selecting one or more event streams they wish to follow. Examples include Twitter and ...
expand
Constructing and exploring composite items
Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das, Cong Yu
Pages: 843-854
doi>10.1145/1807167.1807258
Full text: PDFPDF

Nowadays, online shopping has become a daily activity. Web users purchase a variety of items ranging from books to electronics. The large supply of online products calls for sophisticated techniques to help users explore available items. We propose to ...
expand
Unbiased estimation of size and other aggregates over hidden web databases
Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan Zhang, Gautam Das
Pages: 855-866
doi>10.1145/1807167.1807259
Full text: PDFPDF

Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose ...
expand
SESSION: Graph mining
Chen Li
Towards proximity pattern mining in large graphs
Arijit Khan, Xifeng Yan, Kun-Lung Wu
Pages: 867-878
doi>10.1145/1807167.1807261
Full text: PDFPDF

Mining graph patterns in large networks is critical to a variety of applications such as malware detection and biological module discovery. However, frequent subgraphs are often ineffective to capture association existing in these applications, due to ...
expand
GAIA: graph classification using evolutionary computation
Ning Jin, Calvin Young, Wei Wang
Pages: 879-890
doi>10.1145/1807167.1807262
Full text: PDFPDF

Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent ...
expand
Finding maximum degrees in hidden bipartite graphs
Yufei Tao, Cheng Sheng, Jianzhong Li
Pages: 891-902
doi>10.1145/1807167.1807263
Full text: PDFPDF

An (edge) hidden graph is a graph whose edges are not explicitly given. Detecting the presence of an edge requires expensive edge-probing queries. We consider the k most connected vertex problem on hidden bipartite graphs. Specifically, ...
expand
Connected substructure similarity search
Haichuan Shang, Xuemin Lin, Ying Zhang, Jeffrey Xu Yu, Wei Wang
Pages: 903-914
doi>10.1145/1807167.1807264
Full text: PDFPDF

Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment ...
expand
SESSION: Indexing & storage management
Daniel Abadi
Bed-tree: an all-purpose index structure for string similarity search based on edit distance
Zhenjie Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Divesh Srivastava
Pages: 915-926
doi>10.1145/1807167.1807266
Full text: PDFPDF

Strings are ubiquitous in computer systems and hence string processing has attracted extensive research effort from computer scientists in diverse areas. One of the most important problems in string processing is to efficiently evaluate the similarity ...
expand
On indexing error-tolerant set containment
Parag Agrawal, Arvind Arasu, Raghav Kaushik
Pages: 927-938
doi>10.1145/1807167.1807267
Full text: PDFPDF

Prior work has identified set based comparisons as a useful primitive for supporting a wide variety of similarity functions in record matching. Accordingly, various techniques have been proposed to improve the performance of set similarity lookups. However, ...
expand
Workload-aware storage layout for database systems
Oguzhan Ozmen, Kenneth Salem, Jiri Schindler, Steve Daniel
Pages: 939-950
doi>10.1145/1807167.1807268
Full text: PDFPDF

The performance of a database system depends strongly on the layout of database objects, such as indexes or tables, onto the underlying storage devices. A good layout will both balance the I/O workload generated by the database system and avoid the performance-degrading ...
expand
Querying data provenance
Grigoris Karvounarakis, Zachary G. Ives, Val Tannen
Pages: 951-962
doi>10.1145/1807167.1807269
Full text: PDFPDF

Many advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was produced, ...
expand
SESSION: Industrial session 1: new platforms
Divy Agrawal
Overview of sciDB: large scale array storage, processing and analysis
Paul G. Brown
Pages: 963-968
doi>10.1145/1807167.1807271
Full text: PDFPDF

SciDB [4, 3] is a new open-source data management system intended primarily for use in application domains that involve very large (petabyte) scale array data; for example, scientific applications such as astronomy, remote sensing and climate modeling, ...
expand
Integrating hadoop and parallel DBMs
Yu Xu, Pekka Kostamaa, Like Gao
Pages: 969-974
doi>10.1145/1807167.1807272
Full text: PDFPDF

Teradata's parallel DBMS has been successfully deployed in large data warehouses over the last two decades for large scale business analysis in various industries over data sets ranging from a few terabytes to multiple petabytes. However, due to the ...
expand
A comparison of join algorithms for log processing in MaPreduce
Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, Yuanyuan Tian
Pages: 975-986
doi>10.1145/1807167.1807273
Full text: PDFPDF

The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As ...
expand
SESSION: Industrial session 2: advanced analytics
Berthold Reinwald
Ricardo: integrating R and Hadoop
Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, John McPherson
Pages: 987-998
doi>10.1145/1807167.1807275
Full text: PDFPDF

Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential ...
expand
PYMK: friend recommendation at myspace
Michael Moricz, Yerbolat Dosbayev, Mikhail Berlyant
Pages: 999-1002
doi>10.1145/1807167.1807276
Full text: PDFPDF

In recent years Social Networking has enjoyed a significant increase in popularity. The main reason behind this surge in popularity is the social experience associated with connecting content to people and also connecting people with other people. Knowing, ...
expand
Forecasting high-dimensional data
Deepak Agarwal, Datong Chen, Long-ji Lin, Jayavel Shanmugasundaram, Erik Vee
Pages: 1003-1012
doi>10.1145/1807167.1807277
Full text: PDFPDF

We propose a method for forecasting high-dimensional data (hundreds of attributes, trillions of attribute combinations) for a duration of several months. Our motivating application is guaranteed display advertising, a multi-billion dollar industry, whereby ...
expand
Data warehousing and analytics infrastructure at facebook
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, Hao Liu
Pages: 1013-1020
doi>10.1145/1807167.1807278
Full text: PDFPDF

Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis of data and creation of business intelligence dashboards by analysts across the company, ...
expand
SESSION: Industrial session 3: advances in DBMSs
Sunil Prabhakar
Extreme scale with full SQL language support in microsoft SQL Azure
David G. Campbell, Gopal Kakivaya, Nigel Ellis
Pages: 1021-1024
doi>10.1145/1807167.1807280
Full text: PDFPDF

Cloud SQL Server is an Internet scale relational database service which is currently used by Microsoft delivered services and also offered directly as a fully relational database service known as "SQL Azure". One of the principle design objectives in ...
expand
Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content
Zhen Hua Liu, Thomas Baby, Sukhendu Chakraborty, Junyan Ding, Anguel Novoselsky, Vikas Arora
Pages: 1025-1036
doi>10.1145/1807167.1807281
Full text: PDFPDF

RDBMS provides best performance for querying structured data that starts out with a well-defined schema. However, such a 'schema first, data later' approach does not work for unstructured data or data without much structure. Therefore, RDBMS typically ...
expand
Sedna: native XML database management system (internals overview)
Ilya Taranov, Ivan Shcheklein, Alexander Kalinin, Leonid Novak, Sergei Kuznetsov, Roman Pastukhov, Alexander Boldakov, Denis Turdakov, Konstantin Antipin, Andrey Fomichev, Peter Pleshachkov, Pavel Velikhov, Nikolai Zavaritski, Maxim Grinev, Maria Grineva, Dmitry Lizorkin
Pages: 1037-1046
doi>10.1145/1807167.1807282
Full text: PDFPDF

We present a native XML database management system, Sedna, which is implemented from scratch as a full-featured database management system for storing large amounts of XML data. We believe that the key contribution of this system is an improved schema-based ...
expand
Optimizing schema-last tuple-store queries in graphd
Scott M. Meyer, Jutta Degener, John Giannandrea, Barak Michener
Pages: 1047-1056
doi>10.1145/1807167.1807283
Full text: PDFPDF

Current relational databases require that a database schema exist prior to data entry and require manual optimization for best performance. We describe the query optimization techniques used by graphd, the schema-last, automatically indexed tuple-store ...
expand
SESSION: Industrial session 4: information integration, collaboration & visualization
Chen Li
OpenII: an open source information integration toolkit
Len Seligman, Peter Mork, Alon Halevy, Ken Smith, Michael J. Carey, Kuang Chen, Chris Wolf, Jayant Madhavan, Akshay Kannan, Doug Burdick
Pages: 1057-1060
doi>10.1145/1807167.1807285
Full text: PDFPDF

OpenII (openintegration.org) is a collaborative effort to create a suite of open-source tools for information integration (II). The project is leveraging the latest developments in II research to create a platform on which integration tools can be built ...
expand
Google fusion tables: web-centered data management and collaboration
Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, Jonathan Goldberg-Kidon
Pages: 1061-1066
doi>10.1145/1807167.1807286
Full text: PDFPDF

It has long been observed that database management systems focus on traditional business applications, and that few people use a database management system outside their workplace. Many have wondered what it will take to enable the use of data management ...
expand
Visual interfaces to data
Christopher Stolte
Pages: 1067-1068
doi>10.1145/1807167.1807287
Full text: PDFPDF

Easy-to-use visual interfaces to data can broadly expand the audience for databases. Domain experts rather than database experts can engage in rapid-fire Q&A sessions with the data. Visual interfaces can provide a medium for story-telling, debate, ...
expand
Graphical XQuery in the aqualogic data services platform
Vinayak Borkar, Michael Carey, Sebu Koleth, Alex Kotopoulis, Kautul Mehta, Joshua Spiegel, Sachin Thatte, Till Westmann
Pages: 1069-1080
doi>10.1145/1807167.1807288
Full text: PDFPDF

The AquaLogic Data Services Platform (ALDSP) is a middleware platform developed at BEA Systems for building services, referred to as data services, that integrate, access, and manipulate information coming from multiple heterogeneous sources of data ...
expand
SESSION: Industrial session 5: stream processing
Graham Cormode
Continuous analytics over discontinuous streams
Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, Neil Thombre
Pages: 1081-1092
doi>10.1145/1807167.1807290
Full text: PDFPDF

Continuous analytics systems that enable query processing over steams of data have emerged as key solutions for dealing with massive data volumes and demands for low latency. These systems have been heavily influenced by an assumption that data streams ...
expand
IBM infosphere streams for scalable, real-time, intelligent transportation services
Alain Biem, Eric Bouillet, Hanhua Feng, Anand Ranganathan, Anton Riabov, Olivier Verscheure, Haris Koutsopoulos, Carlos Moran
Pages: 1093-1104
doi>10.1145/1807167.1807291
Full text: PDFPDF

With the widespread adoption of location tracking technologies like GPS, the domain of intelligent transportation services has seen growing interest in the last few years. Services in this domain make use of real-time location-based data from a variety ...
expand
SIE-OBI: a streaming information extraction platform for operational business intelligence
Malu Castellanos, Song Wang, Umeshwar Dayal, Chetan Gupta
Pages: 1105-1110
doi>10.1145/1807167.1807292
Full text: PDFPDF

Emerging business intelligence (BI) applications aim to provide situational awareness, i.e., information about real-world events that might affect the business operations of an enterprise. For instance, an enterprise might want to know whether customers ...
expand
DEMONSTRATION SESSION: Session A: cloud, OLAP, and XML
HadoopDB in action: building real world applications
Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, Avi Silberschatz
Pages: 1111-1114
doi>10.1145/1807167.1807294
Full text: PDFPDF

HadoopDB is a hybrid of MapReduce and DBMS technologies, designed to meet the growing demand of analyzing massive datasets on very large clusters of machines. Our previous work has shown that HadoopDB approaches parallel databases in performance and ...
expand
Online aggregation and continuous query support in MapReduce
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, Russell Sears
Pages: 1115-1118
doi>10.1145/1807167.1807295
Full text: PDFPDF

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this demonstration, we describe a ...
expand
MapDupReducer: detecting near duplicates over massive datasets
Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, Haixun Wang, Hongsong Li, Wanpeng Tian, Jun Xu, Rui Li
Pages: 1119-1122
doi>10.1145/1807167.1807296
Full text: PDFPDF

Near duplicate detection benefits many applications, e.g., on-line news selection over the Web by keyword search. The purpose of this demo is to show the design and implementation of MapDupReducer, a MapReduce based system capable of detecting near duplicates ...
expand
Large graph processing in the cloud
Rishan Chen, Xuetian Weng, Bingsheng He, Mao Yang
Pages: 1123-1126
doi>10.1145/1807167.1807297
Full text: PDFPDF

As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks challenge existing tools. We propose to demonstrate Surfer, a large ...
expand
DCUBE: discrimination discovery in databases
Salvatore Ruggieri, Dino Pedreschi, Franco Turini
Pages: 1127-1130
doi>10.1145/1807167.1807298
Full text: PDFPDF

Discrimination discovery in databases consists in finding unfair practices against minorities which are hidden in a dataset of historical decisions. The DCUBE system implements the approach of [5], which is based on classification rule extraction and ...
expand
S-OLAP: an OLAP system for analyzing sequence data
Chun Kit Chui, Ben Kao, Eric Lo, David Cheung
Pages: 1131-1134
doi>10.1145/1807167.1807299
Full text: PDFPDF

The Sequence OLAP (S-OLAP) system is a novel online analytical processing system for analyzing sequence data. S-OLAP supports "pattern-based" grouping and aggregation on sequence data - a very powerful concept and capability that is not supported by ...
expand
ProgXe: progressive result generation framework for multi-criteria decision support queries
Venkatesh Raghavan, Elke A. Rundensteiner
Pages: 1135-1138
doi>10.1145/1807167.1807300
Full text: PDFPDF

We demonstrate ProgXe, a practical approach to support Multi-Criteria Decision Support (MCDS) applications that need to report results as they are being generated to enable the user to make competitive decisions. ProgXe transforms the execution ...
expand
XTaGe: a flexible XML collection generator
María Pérez, Ismael Sanz, Rafael Berlanga
Pages: 1139-1142
doi>10.1145/1807167.1807301
Full text: PDFPDF

In this demonstration we present XTaGe (XML Tester and Generator), a flexible tool for the creation of complex XML collections. XTaGe focuses on XML collections with complex structural constraints and domain-specific characteristics, which would be very ...
expand
K*SQL: a unifying engine for sequence patterns and XML
Barzan Mozafari, Kai Zeng, Carlo Zaniolo
Pages: 1143-1146
doi>10.1145/1807167.1807302
Full text: PDFPDF

A strong interest is emerging in SQL extensions for sequence patterns using Kleene-closure expressions. This burst of interest from both the research community and the commercial world is due to the many database and data stream applications made possible ...
expand
DEMONSTRATION SESSION: Session B: stream, keyword search, and web
Symbiote: a Reconfigurable Logic Assisted Data Stream Management System (RLADSMS)
Pranav S. Vaidya, Jaehwan John Lee, Francis Bowen, Yingzi Du, Chandima H. Nadungodage, Yuni Xia
Pages: 1147-1150
doi>10.1145/1807167.1807304
Full text: PDFPDF

Numerous monitoring applications such as traffic control systems, border patrol monitoring, and person locater services generate a large number of multimedia data streams that need to be analyzed and processed using image processing and data stream management ...
expand
Interactive visual exploration of neighbor-based patterns in data streams
Di Yang, Zhenyu Guo, Zaixian Xie, Elke A. Rundensteiner, Matthew O. Ward
Pages: 1151-1154
doi>10.1145/1807167.1807305
Full text: PDFPDF

We will demonstrate our system, called V iStream, supporting interactive visual exploration of neighbor-based patterns [7] in data streams. V iStream does not only apply innovative multi-query strategies to compute a broad range of popular ...
expand
TwitterMonitor: trend detection over the twitter stream
Michael Mathioudakis, Nick Koudas
Pages: 1155-1158
doi>10.1145/1807167.1807306
Full text: PDFPDF

We present TwitterMonitor, a system that performs trend detection over the Twitter stream. The system identifies emerging topics (i.e. 'trends') on Twitter in real time and provides meaningful analytics that synthesize an accurate description of each ...
expand
Glacier: a query-to-hardware compiler
Rene Mueller, Jens Teubner, Gustavo Alonso
Pages: 1159-1162
doi>10.1145/1807167.1807307
Full text: PDFPDF

Field-programmable gate arrays (FPGAs) are a promising technology that can be used in database systems. In this demonstration we show Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on ...
expand
Exploratory keyword search on data graphs
Hilit Achiezra, Konstantin Golenberg, Benny Kimelfeld, Yehoshua Sagiv
Pages: 1163-1166
doi>10.1145/1807167.1807308
Full text: PDFPDF

A system for keyword search on data graphs is demonstrated on two challenging datasets: the large DBLP and Mondial (which is highly cyclic and has a complex schema). The system supports search, exploration and question answering. The demonstration shows ...
expand
Integrating keyword search with multiple dimension tree views over a summary corpus data cube
Mark Sifer, Jian Lin, Yutaka Watanobe, Subhash Bhalla
Pages: 1167-1170
doi>10.1145/1807167.1807309
Full text: PDFPDF

We demonstrate a system that integrates a novel OLAP component with a keyword search engine, to support querying over sparse and ragged corpus data. The key contribution of our system is the integration of dynamically selected point sets such as search ...
expand
Query portals: dynamically generating portals for entity-oriented web queries
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, Dong Xin
Pages: 1171-1174
doi>10.1145/1807167.1807310
Full text: PDFPDF

Many web queries seek information about named entities (such as products or people). Web search engines federate such entity-oriented queries to relevant structured databases; the results of those searches are then returned to the user along with web ...
expand
Creating and exploring web form repositories
Luciano Barbosa, Hoa Nguyen, Thanh Nguyen, Ramesh Pinnamaneni, Juliana Freire
Pages: 1175-1178
doi>10.1145/1807167.1807311
Full text: PDFPDF

We present DeepPeep (http://www.deeppeep.org), a new system for discovering, organizing and analyzing Web forms. DeepPeep allows users to explore the entry points to hidden-Web sites whose contents are out of reach for traditional search engines. Besides ...
expand
DEMONSTRATION SESSION: Session C: schema, language, and spatial
Exploring schema similarity at multiple resolutions
Ken Smith, Craig Bonaceto, Chris Wolf, Beth Yost, Michael Morse, Peter Mork, Doug Burdick
Pages: 1179-1182
doi>10.1145/1807167.1807313
Full text: PDFPDF

Large, dynamic, and ad-hoc organizations must frequently initiate data integration and sharing efforts with insufficient awareness of how organizational data sources are related. Decision makers need to reason about data model interactions much as they ...
expand
An automated, yet interactive and portable DB designer
Ioannis Alagiannis, Debabrata Dash, Karl Schnaitter, Anastasia Ailamaki, Neoklis Polyzotis
Pages: 1183-1186
doi>10.1145/1807167.1807314
Full text: PDFPDF

Tuning tools attempt to configure a database to achieve optimal performance for a given workload. Selecting an optimal set of physical structures is computationally hard since it involves searching a vast space of possible configurations. Commercial ...
expand
Midas: integrating public financial data
Sreeram Balakrishnan, Vivian Chu, Mauricio A. Hernández, Howard Ho, Rajasekar Krishnamurthy, Shi Xia Liu, Jan H. Pieper, Jeffrey S. Pierce, Lucian Popa, Christine M. Robson, Lei Shi, Ioana R. Stanoi, Edison L. Ting, Shivakumar Vaithyanathan, Huahai Yang
Pages: 1187-1190
doi>10.1145/1807167.1807315
Full text: PDFPDF

The primary goal of the Midas project is to build a system that enables easy and scalable integration of unstructured and semi-structured information present across multiple data sources. As a first step in this direction, we have built a system that ...
expand
Worry-free database upgrades: automated model-driven evolution of schemas and complex mappings
James F. Terwilliger, Philip A. Bernstein, Adi Unnithan
Pages: 1191-1194
doi>10.1145/1807167.1807316
Full text: PDFPDF

Schema evolution is an unavoidable consequence of the application development lifecycle. The two primary schemas in an application, the client conceptual object model and the persistent database model, must co-evolve or risk quality, stability, and maintainability ...
expand
US-SQL: managing uncertain schemata
Matteo Magnani, Danilo Montesi
Pages: 1195-1198
doi>10.1145/1807167.1807317
Full text: PDFPDF

In this paper we describe a demo concerning the management of uncertain schemata. Many works have studied the problem of representing uncertainty on attribute values or tuples, like the fact that a value is 10 with probability .3 or 20 with probability ...
expand
PAROS: pareto optimal route selection
Franz Graf, Hans-Peter Kriegel, Matthias Renz, Matthias Schubert
Pages: 1199-1202
doi>10.1145/1807167.1807318
Full text: PDFPDF

Modern maps provide a variety of information about roads and their surrounding landscape allowing navigation systems to go beyond simple shortest path computation. In this demo, we show how the concept of skyline queries can be successfully adapted to ...
expand
MoveMine: mining moving object databases
Zhenhui Li, Ming Ji, Jae-Gil Lee, Lu-An Tang, Yintao Yu, Jiawei Han, Roland Kays
Pages: 1203-1206
doi>10.1145/1807167.1807319
Full text: PDFPDF

With the maturity of GPS, wireless, and Web technologies, increasing amounts of movement data collected from various moving objects, such as animals, vehicles, mobile devices, and climate radars, have become widely available. Analyzing such data has ...
expand
PIQL: a performance insightful query language
Michael Armbrust, Stephen Tu, Armando Fox, Michael J. Franklin, David A. Patterson, Nick Lanham, Beth Trushkowsky, Jesse Trutna
Pages: 1207-1210
doi>10.1145/1807167.1807320
Full text: PDFPDF

Large-scale websites are increasingly moving from relational databases to distributed key-value stores for high request rate, low latency workloads. Often this move is motivated not only by key-value stores' ability to scale simply by adding more hardware, ...
expand
DoCQS: a prototype system for supporting data-oriented content query
Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang
Pages: 1211-1214
doi>10.1145/1807167.1807321
Full text: PDFPDF

Witnessing the richness of data in document content and many ad-hoc efforts for finding such data, we propose a Data-oriented Content Query System(DoCQS), which is oriented towards fine granularity data of all types by searching directly into ...
expand
DEMONSTRATION SESSION: Session D: new technology, and potpourri
QRelX: generating meaningful queries that provide cardinality assurance
Manasi Vartak, Venkatesh Raghavan, Elke A. Rundensteiner
Pages: 1215-1218
doi>10.1145/1807167.1807323
Full text: PDFPDF

In many business and consumer applications, queries have cardinality constraints. However, current database systems provide minimal support for cardinality assurance. Consequently, users must adopt a cumbersome trial-and-error approach to find queries ...
expand
Performing sound flash device measurements: some lessons from uFLIP
Matias Bjørling, Lionel Le Folgoc, Ahmed Mseddi, Philippe Bonnet, Luc Bouganim, Björn Jónsson
Pages: 1219-1222
doi>10.1145/1807167.1807324
Full text: PDFPDF

It is amazingly easy to get meaningless results when measuring flash devices, partly because of the peculiarity of flash memory, but primarily because their behavior is determined by layers of complex, proprietary, and undocumented software and hardware. ...
expand
GDR: a system for guided data repair
Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani
Pages: 1223-1226
doi>10.1145/1807167.1807325
Full text: PDFPDF

Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data ...
expand
Crescando
Georgios Giannikis, Philipp Unterbrunner, Jeremy Meyer, Gustavo Alonso, Dietmar Fauser, Donald Kossmann
Pages: 1227-1230
doi>10.1145/1807167.1807326
Full text: PDFPDF

This demonstration presents Crescando, an implementation of a distributed relational table that guarantees predictable response time on unpredictable workloads. In Crescando, data is stored in main memory and accessed via full-table scans. By using scans ...
expand
iTuned: a tool for configuring and visualizing database parameters
Vamsidhar Thummala, Shivnath Babu
Pages: 1231-1234
doi>10.1145/1807167.1807327
Full text: PDFPDF

iTuned is a tool that takes a SQL workload as input and recommends good settings for database configuration parameters such as buffer pool sizes, multi-programming level, and number of I/O daemons. iTuned also provides response-surface and sensitivity-analysis ...
expand
Pluggable personal data servers
Nicolas Anciaux, Luc Bouganim, Yanli Guo, Philippe Pucheral, Jean-Jacques Vandewalle, Shaoyi Yin
Pages: 1235-1238
doi>10.1145/1807167.1807328
Full text: PDFPDF

An increasing amount of personal data is automatically gathered on servers by administrations, hospitals and private companies while several security surveys highlight the failure of database servers to keep confidential data really private. The advent ...
expand
Mask: a system for privacy-preserving policy-based access to published content
Mohamed Nabeel, Ning Shang, John Zage, Elisa Bertino
Pages: 1239-1242
doi>10.1145/1807167.1807329
Full text: PDFPDF

We propose to demonstrate Mask, the first system addressing the seemingly-unsolvable problem of how to selectively share contents among a group of users based on access control policies expressed as conditions against the identity attributes of ...
expand
SimDB: a similarity-aware database system
Yasin N. Silva, Ahmed M. Aly, Walid G. Aref, Per-Ake Larson
Pages: 1243-1246
doi>10.1145/1807167.1807330
Full text: PDFPDF

The identification and processing of similarities in the data play a key role in multiple application scenarios. Several types of similarity-aware operations have been studied in the literature. However, in most of the previous work, similarity-aware ...
expand
A demonstration of FlexPref: extensible preference evaluation inside the DBMS engine
Justin J. Levandoski, Mohamed F. Mokbel, Mohamed E. Khalefa, Venkateshwar R. Korukanti
Pages: 1247-1250
doi>10.1145/1807167.1807331
Full text: PDFPDF

This demonstration presents FlexPref, a framework implemented inside the DBMS query processor that enables efficient and extensible preference query processing. FlexPref provides query processing support inside the database engine for a wide-array ...
expand
TUTORIAL SESSION: Tutorial 1
Mining knowledge from databases: an information network analysis approach
Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu
Pages: 1251-1252
doi>10.1145/1807167.1807333
Full text: PDFPDF

Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous ...
expand
TUTORIAL SESSION: Tutorial 2
Database systems research on data mining
Carlos Ordonez, Javier García-García
Pages: 1253-1254
doi>10.1145/1807167.1807335
Full text: PDFPDF

Data mining remains an important research area in database systems. We present a review of processing alternatives, storage mechanisms, algorithms, data structures and optimizations that enable data mining on large data sets. We focus on the computation ...
expand
TUTORIAL SESSION: Tutorial 3
Information theory for data management
Divesh Srivastava, Suresh Venkatasubramanian
Pages: 1255-1256
doi>10.1145/1807167.1807337
Full text: PDFPDF

We explore the use of information theory as a tool to express and quantify notions of information content and information transfer for representing and analyzing data, using examples from database design, data integration and data anonymization. We also ...
expand
TUTORIAL SESSION: Tutorial 4
Enterprise information extraction: recent developments and open challenges
Laura Chiticariu, Yunyao Li, Sriram Raghavan, Frederick R. Reiss
Pages: 1257-1258
doi>10.1145/1807167.1807339
Full text: PDFPDF

Information extraction (IE) - the problem of extracting structured information from unstructured text - has become an increasingly important topic in recent years. A SIGMOD 2006 tutorial [3] outlined challenges and opportunities for the database community ...
expand
PANEL SESSION: Panel
Crowds, clouds, and algorithms: exploring the human side of "big data" applications
Sihem Amer-Yahia, AnHai Doan, Jon Kleinberg, Nick Koudas, Michael Franklin
Pages: 1259-1260
doi>10.1145/1807167.1807341
Full text: PDFPDF

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder