Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me.

The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.

top of pageAUTHORS



Author image not provided  Randal Burns

No contact information provided yet.

Bibliometrics: publication history
Publication years2012-2013
Publication count3
Citation Count8
Available for download2
Downloads (6 Weeks)9
Downloads (12 Months)83
Downloads (cumulative)791
Average downloads per article395.50
Average citations per article2.67
View colleagues of Randal Burns


Author image not provided  Kunal Lillaney

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Kunal Lillaney


Author image not provided  Daniel R. Berger

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Daniel R. Berger


Author image not provided  Logan Grosenick

No contact information provided yet.

Bibliometrics: publication history
Publication years2006-2013
Publication count4
Citation Count2
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article0.50
View colleagues of Logan Grosenick


Author image not provided  Karl Deisseroth

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Karl Deisseroth


Author image not provided  R. Clay Reid

No contact information provided yet.

Bibliometrics: publication history
Publication years2007-2013
Publication count6
Citation Count14
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article2.33
View colleagues of R. Clay Reid


Author image not provided  William Gray Roncal

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of William Gray Roncal


Author image not provided  Priya Manavalan

No contact information provided yet.

Bibliometrics: publication history
Publication years2008-2013
Publication count2
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article0.50
View colleagues of Priya Manavalan


Author image not provided  Davi D. Bock

No contact information provided yet.

Bibliometrics: publication history
Publication years2007-2013
Publication count4
Citation Count6
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.50
View colleagues of Davi D. Bock


Author image not provided  Narayanan Kasthuri

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count3
Citation Count8
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article2.67
View colleagues of Narayanan Kasthuri


Author image not provided  Michael Kazhdan

No contact information provided yet.

Bibliometrics: publication history
Publication years2000-2015
Publication count45
Citation Count1,600
Available for download19
Downloads (6 Weeks)170
Downloads (12 Months)1,224
Downloads (cumulative)27,107
Average downloads per article1,426.68
Average citations per article35.56
View colleagues of Michael Kazhdan


Author image not provided  Stephen J. Smith

No contact information provided yet.

Bibliometrics: publication history
Publication years2009-2013
Publication count2
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article0.50
View colleagues of Stephen J. Smith


Author image not provided  Dean Kleissas

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Dean Kleissas


Author image not provided  Eric Perlman

 homepage
 ericatcs.jhu.edu
Bibliometrics: publication history
Publication years2006-2013
Publication count8
Citation Count23
Available for download6
Downloads (6 Weeks)8
Downloads (12 Months)70
Downloads (cumulative)899
Average downloads per article149.83
Average citations per article2.88
View colleagues of Eric Perlman


Author image not provided  Kwanghun Chung

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Kwanghun Chung


Author image not provided  Nicholas C. Weiler

No contact information provided yet.

Bibliometrics: publication history
Publication years2013-2013
Publication count1
Citation Count1
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article1.00
View colleagues of Nicholas C. Weiler


Author image not provided  Jeff Lichtman

No contact information provided yet.

Bibliometrics: publication history
Publication years1998-2013
Publication count14
Citation Count29
Available for download3
Downloads (6 Weeks)6
Downloads (12 Months)72
Downloads (cumulative)1,143
Average downloads per article381.00
Average citations per article2.07
View colleagues of Jeff Lichtman


Author image not provided  Alexander S. Szalay

No contact information provided yet.

Bibliometrics: publication history
Publication years1999-2015
Publication count74
Citation Count539
Available for download36
Downloads (6 Weeks)102
Downloads (12 Months)745
Downloads (cumulative)19,006
Average downloads per article527.94
Average citations per article7.28
View colleagues of Alexander S. Szalay


Author image not provided  Joshua T. Vogelstein

No contact information provided yet.

Bibliometrics: publication history
Publication years2007-2016
Publication count19
Citation Count63
Available for download3
Downloads (6 Weeks)51
Downloads (12 Months)241
Downloads (cumulative)538
Average downloads per article179.33
Average citations per article3.32
View colleagues of Joshua T. Vogelstein


Author image not provided  R. Jacob Vogelstein

No contact information provided yet.

Bibliometrics: publication history
Publication years2002-2013
Publication count9
Citation Count39
Available for download1
Downloads (6 Weeks)4
Downloads (12 Months)41
Downloads (cumulative)250
Average downloads per article250.00
Average citations per article4.33
View colleagues of R. Jacob Vogelstein

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
D. D. Bock, W.-C. A. Lee, A. M. Kerlin, M. L. Andermann, G. Hood, A. W. Wetzel, S. Yurgenson, E. R. Soucy, H. S. Kim, and R. C. Reid. Network anatomy and in vivo physiology of visual cortical neurons. Nature, 471(7337), 2011.
 
4
A. Cardona, S. Saalfeld, J. Schindelin, I. Arganda-Carreras, S. Preibisch, M. Longair, P. Tomancak, V. Hartenstein, and R. J. Douglas. TrakEM2 software for neural circuit reconstruction. PLoS ONE, 7(6), 2012.
5
 
6
7
8
 
9
P. Furtado and P. Baumann. Storage of multidimensional arrays based on arbitrary tiling. In ICDE, 1999.
 
10
J. Gray and A. Szalay. Science in an exponential world. Nature, 440(23), 23 March 2006.
11
 
12
H. S. Seung et al. Eyewire. Available at eyewire.org, 2012.
 
13
V. Jain, H. S. Seung, and S. C. Turaga. Machines that learn to segment images: a crucial technology for connectomics. Current opinion in neurobiology, 20(5), 2010.
 
14
V. Jain, S. Turaga, K. Briggman, W. Denk, and S. Seung. Learning to agglomerate superpixel hierarchies. In Neural Information Processing Systems, 2011.
 
15
 
16
N. Kasthuri and J. Lichtman. Untitled. In preparation, 2013.
 
17
18
 
19
D. M. Kleissas, W. R. Gray, J. M. Burck, J. T. Vogelstein, E. Perlman, P. M. Burlina, R. Burns, and R. J. Vogelstein. CAJAL3D: toward a fully automatic pipeline for connectome estimation from high-resolution em data. In Neuroinformatics, 2012.
 
20
Y. Li, E. Perlman, M. Wang, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink. A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. Journal of Turbulence, 9(31):1--29, 2008.
 
21
J. Markoff. Obama seeking to boost study of human brain. New York Times, February 17, 2013.
 
22
K. D. Micheva, B. Busse, N. C. Weiler, N. O'Rourke, and S. J. Smith. Single-synapse analysis of a diverse synapse population: Proteomic imaging methods and markers. Neuron, 68(1), 2010.
 
23
 
24
 
25
R. A. Musaloiu-E, A. Terzis, K. Szlavecz, A. Szalay, J. Cogan, and J. Gray. Life under your feet: Wireless sensors in soil ecology. In Embedded Networked Sensors, 2006.
 
26
 
27
A. Norton and J. Clyne. The VAPOR visualization application. In High Performance Visualization, 2012.
 
28
N. O'Rourke, N. C. Weiler, K. D. Micheva, and S. J. Smith. Deep molecular diversity of mammalian synapses: why it matters and how to measure it. Nature Reviews Neuroscience, 13(1), 2012.
29
30
 
31
32
 
33
D. E. Rex, J. Q. Ma, and A. W. Tioga. The LONI pipeline processing environment. Neuroimage, 19(3), 2003.
 
34
 
35
 
36
The Sloan Digital Sky Survey, 2013. Available at http://www.sdss.org/.
 
37
 
38
C. Sommer, C. Straehle, U. Koethe, and F. A. Hamprecht. "ilastik: Interactive learning and segmentation toolkit". In Biomedical Imaging, 2011.
39
 
40
M. Stonebraker, J. Becla, D. DeWitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. Zdonik. Requirements for science data bases and SciDB. In Conference on Innovative Data Systems Research, 2009.
 
41
A. S. Szalay, K. Church, C. Meneveau, A. Terzis, and S. Zeger. MRI: The Development of Data-Scope---a multi-petabyte generic data analysis environment for science. Available at https://wiki.pha.jhu.edu/escience_wiki/images/7/7f/DataScope.pdf, 2012.
 
42
43
 
44
K. Wu. FastBit: Interactively searching massive data. Journal of Physics: Conference Series, 180(1), 2009.

top of pageCITED BY

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title SSDBM Proceedings of the 25th International Conference on Scientific and Statistical Database Management table of contents
Editors Alex Szalay
Tamas Budavari
Magdalena Balazinska
Alexandra Meliou
Ahmet Sacan
Article No. 27
Publication Date2013-07-29 (yyyy-mm-dd)
PublisherACM New York, NY, USA ©2013
ISBN: 978-1-4503-1921-8 doi>10.1145/2484838.2484870
Overall Acceptance Rate 26 of 71 submissions, 37%
Year Submitted Accepted Rate
SSDBM '14 71 26 37%
Overall 71 26 37%

APPEARS IN
ICPS ICPS: ACM International Conference Proceeding Series

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Table of Contents
no previous proceeding |next proceeding next
SESSION: Keynote sessions
Making sense of big data with the Berkeley data analytics stack
Michael J. Franklin
Article No.: 1
doi>10.1145/2484838.2484884

The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications require a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical ...
expand
Computational challenges in next-generation genomics
Steven L. Salzberg
Article No.: 2
doi>10.1145/2484838.2484885

Next-generation sequencing (NGS) technology allows us to peer inside the cell in exquisite detail, revealing new insights into biology, evolution, and disease that would have been impossible to find just a few years ago. The enormous volumes of data ...
expand
SESSION: Panel
Education and career paths for data scientists
Magdalena Balazinska, Susan B. Davidson, Bill Howe, Alexandros Labrinidis
Article No.: 3
doi>10.1145/2484838.2484886
Full text: PDFPDF

MOTIVATION: As industry and science are increasingly data-driven, the need for skilled data scientists is exceeding what our universities are producing. According to a Mckinsey report: "By 2018, the United States alone could face a shortage of 140,000 ...
expand
SESSION: Research sessions: multidimensional data
On the combination of relative clustering validity criteria
Lucas Vendramin, Pablo A. Jaskowiak, Ricardo J. G. B. Campello
Article No.: 4
doi>10.1145/2484838.2484844
Full text: PDFPDF

Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific ...
expand
Parameter-free and domain-independent similarity search with diversity
Lucio F. D. Santos, Willian D. Oliveira, Monica R. P. Ferreira, Agma J. M. Traina, Caetano Traina, Jr.
Article No.: 5
doi>10.1145/2484838.2484854
Full text: PDFPDF

New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the ...
expand
A multidimensional data model with subcategories for flexibly capturing summarizability
Sina Ariyan, Leopoldo Bertossi
Article No.: 6
doi>10.1145/2484838.2484857
Full text: PDFPDF

In multidimensional (MD) databases and data warehouses we commonly prefer instances that have summarizable dimensions. This is because they have good properties for query answering. Most typically, with summarizable dimensions, precomputed and materialized ...
expand
Nearest group queries
Dongxiang Zhang, Chee-Yong Chan, Kian-Lee Tan
Article No.: 7
doi>10.1145/2484838.2484866
Full text: PDFPDF

k nearest neighbor (kNN) search is an important problem in a vast number of applications, including clustering, pattern recognition, image retrieval and recommendation systems. It finds k elements from a data source D that are closest to a given query ...
expand
SESSION: Research sessions: spatio-temporal data
Providing multi-scale consistency for multi-scale geospatial data
João S. C. Longo, Claudia Bauzer Medeiros
Article No.: 8
doi>10.1145/2484838.2484867
Full text: PDFPDF

We are immersed in a world in which we constantly deal (and cope) with objects and phenomena in a variety of scales in space and time. With the increase in collaborative and inter-disciplinary research, there appeared a growing need for handling data ...
expand
Reasoning about RFID-tracked moving objects in symbolic indoor spaces
Sari Haj Hussein, Hua Lu, Torben Bach Pedersen
Article No.: 9
doi>10.1145/2484838.2484877
Full text: PDFPDF

In recent years, indoor spatial data management has started to attract attention, partly due to the increasing use of receptor devices (e.g., RFID readers, and wireless sensor networks) in indoor, as well as outdoor spaces. There is thus a great need ...
expand
Towards a universal tracking database
Gereon Schueller, Andreas Behrend
Article No.: 10
doi>10.1145/2484838.2484845
Full text: PDFPDF

In moving object databases, authors usually assume that number and position of objects to be processed are always known in advance. Detecting an unknown moving object and pursuing its movement, however, is usually left to tracking algorithms resting ...
expand
GIPSY: joining spatial datasets with contrasting density
Mirjana Pavlovic, Farhan Tauheed, Thomas Heinis, Anastasia Ailamakit
Article No.: 11
doi>10.1145/2484838.2484855
Full text: PDFPDF

Many scientific and geographical applications rely on the efficient execution of spatial joins. Past research has produced several efficient spatial join approaches and while each of them can join two datasets, the problem of efficiently joining two ...
expand
Publishing trajectories with differential privacy guarantees
Kaifeng Jiang, Dongxu Shao, Stéphane Bressan, Thomas Kister, Kian-Lee Tan
Article No.: 12
doi>10.1145/2484838.2484846
Full text: PDFPDF

The pervasiveness of location-acquisition technologies has made it possible to collect the movement data of individuals or vehicles. However, it has to be carefully managed to ensure that there is no privacy breach. In this paper, we investigate the ...
expand
SESSION: Research sessions: streaming and time-series data
Fast computation of approximate biased histograms on sliding windows over data streams
Hamid Mousavi, Carlo Zaniolo
Article No.: 13
doi>10.1145/2484838.2484851
Full text: PDFPDF

Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications, including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. Moreover, ...
expand
Multi-scale dissemination of time series data
Qingsong Guo, Yongluan Zhou, Li Su
Article No.: 14
doi>10.1145/2484838.2484878
Full text: PDFPDF

In this paper, we consider the problem of continuous dissemination of time series data, such as sensor measurements, to a large number of subscribers. These subscribers fall into multiple subscription levels, where each subscription level is specified ...
expand
Multi-query scheduling for time-critical data stream applications
Yongluan Zhou, Ji Wu, Ahmed Khan Leghari
Article No.: 15
doi>10.1145/2484838.2484864
Full text: PDFPDF

Many data stream applications, such as network intrusion detection, on-line financial tickers and environmental monitoring, typically exhibit certain "real-time" traits. In such applications, people are interested in strategies that ensure on-time delivery ...
expand
Learning uncertainty models from weather forecast performance databases using quantile regression
Ashkan Zarnani, Petr Musilek
Article No.: 16
doi>10.1145/2484838.2484840
Full text: PDFPDF

Forecast uncertainty information is not available in the immediate output of Numerical weather prediction (NWP) models. Such important information is required for optimal decision making processes in many domains. Prediction intervals are a prominent ...
expand
SESSION: Research sessions: miscellaneous
Search and result presentation in scientific workflow repositories
Susan B. Davidson, Xiaocheng Huang, Julia Stoyanovich, Xiaojie Yuan
Article No.: 17
doi>10.1145/2484838.2484847
Full text: PDFPDF

We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a ...
expand
Tuning large scale deduplication with reduced effort
Guilherme Dal Bianco, Renata Galante, Carlos A. Heuser, Marcos André Gonçalves
Article No.: 18
doi>10.1145/2484838.2484873
Full text: PDFPDF

Deduplication is the task of identifying which objects are potentially the same in a data repository. It usually demands user intervention in several steps of the process, mainly to identify some pairs representing matchings and non-matchings. This information ...
expand
HmSearch: an efficient hamming distance query processing algorithm
Xiaoyang Zhang, Jianbin Qin, Wei Wang, Yifang Sun, Jiaheng Lu
Article No.: 19
doi>10.1145/2484838.2484842
Full text: PDFPDF

Hamming distance measures the number of dimensions where two vectors have different values. In applications such as pattern recognition, information retrieval, and databases, we often need to efficiently process Hamming distance query, which retrieves ...
expand
pcApriori: scalable apriori for multiprocessor systems
Benjamin Schlegel, Tim Kiefer, Thomas Kissinger, Wolfgang Lehner
Article No.: 20
doi>10.1145/2484838.2484879
Full text: PDFPDF

Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several ...
expand
Shortlisting top-K assignments
Yimin Lin, Kyriakos Mouratidis
Article No.: 21
doi>10.1145/2484838.2484859
Full text: PDFPDF

In this paper we identify a novel query type, the top-K assignment query (αTop-K). Consider a set of objects and a set of suppliers, where each object must be assigned to one supplier. Assume that there is a cost associated with every ...
expand
SESSION: Research sessions: graphs and indexes
GPS: a graph processing system
Semih Salihoglu, Jennifer Widom
Article No.: 22
doi>10.1145/2484838.2484843
Full text: PDFPDF

GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, ...
expand
RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs
Brigitte Boden, Stephan Günnemann, Holger Hoffmann, Thomas Seidl
Article No.: 23
doi>10.1145/2484838.2484860
Full text: PDFPDF

Detecting dense subgraphs in a large graph is an important graph mining problem and various approaches have been proposed for its solution. While most existing methods only consider unlabeled and one-dimensional graph data, many real-world applications ...
expand
SMIX: self-managing indexes for dynamic workloads
Hannes Voigt, Thomas Kissinger, Wolfgang Lehner
Article No.: 24
doi>10.1145/2484838.2484862
Full text: PDFPDF

As databases accumulate growing amounts of data at an increasing rate, adaptive indexing becomes more and more important. At the same time, applications and their use get more agile and flexible, resulting in less steady and less predictable workload ...
expand
Inverted indices for particle tracking in petascale cosmological simulations
Daniel Crankshaw, Randal Burns, Bridget Falck, Tamás Budavári, Alexander S. Szalay, Jie Wang
Article No.: 25
doi>10.1145/2484838.2484882
Full text: PDFPDF

We describe the challenges arising from tracking dark matter particles in state of the art cosmological simulations. We are in the process of running the Indra suite of simulations, with an aggregate count of more than 35 trillion particles and 1.1PB ...
expand
SESSION: Research sessions: case-studies
Accelerating gene context analysis using bitmaps
Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis
Article No.: 26
doi>10.1145/2484838.2484856
Full text: PDFPDF

Gene context analysis determines the function of genes by examining the conservation of chromosomal gene clusters and co-occurrence functional profiles across genomes. This is based on the observation that functionally related genes are often collocated ...
expand
The open connectome project data cluster: scalable analysis and vision for high-throughput neuroscience
Randal Burns, Kunal Lillaney, Daniel R. Berger, Logan Grosenick, Karl Deisseroth, R. Clay Reid, William Gray Roncal, Priya Manavalan, Davi D. Bock, Narayanan Kasthuri, Michael Kazhdan, Stephen J. Smith, Dean Kleissas, Eric Perlman, Kwanghun Chung, Nicholas C. Weiler, Jeff Lichtman, Alexander S. Szalay, Joshua T. Vogelstein, R. Jacob Vogelstein
Article No.: 27
doi>10.1145/2484838.2484870
Full text: PDFPDF

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily ...
expand
Real-time collaborative analysis with (almost) pure SQL: a case study in biogeochemical oceanography
Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A. Saito, Bill Howe, E. Virginia Armbrust
Article No.: 28
doi>10.1145/2484838.2484880
Full text: PDFPDF

We consider a case study using SQL-as-a-Service to support "instant analysis" of weakly structured relational data at a multi-investigator science retreat. Here, "weakly structured" means tabular, rows-and-columns datasets that share some common context, ...
expand
Optimizing fastquery performance on lustre file system
Kuan-Wu Lin, Surendra Byna, Jerry Chou, Kesheng Wu
Article No.: 29
doi>10.1145/2484838.2484853
Full text: PDFPDF

FastQuery is a parallel indexing and querying system we developed for accelerating analysis and visualization of scientific data. We have applied it to a wide variety of HPC applications and demonstrated its capability and scalability using a petascale ...
expand
Graywulf: a platform for federated scientific databases and services
László Dobos, István Csabai, Alexander S. Szalay, Tamás Budavári, Nolan Li
Article No.: 30
doi>10.1145/2484838.2484863
Full text: PDFPDF

Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very ...
expand
SESSION: Short papers
Learning to explore scientific workflow repositories
Julia Stoyanovich, Paramveer Dhillon, Susan B. Davidson, Brian Lyons
Article No.: 31
doi>10.1145/2484838.2484848
Full text: PDFPDF

Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we describe TopicsExplorer, a data exploration approach for myExperiment.org, a collaborative platform for the exchange of ...
expand
Best of both worlds: relational databases and statistics
Hannes Mühleisen, Thomas Lumley
Article No.: 32
doi>10.1145/2484838.2484869
Full text: PDFPDF

Statistics software packages and relational database systems possess considerable overlap in the area of data loading, handling, and transformation. However, only databases are mainly optimized towards high performance in this area. In this paper, we ...
expand
Data management systems on GPUs: promises and challenges
Yi-Cheng Tu, Anand Kumar, Di Yu, Ran Rui, Ryan Wheeler
Article No.: 33
doi>10.1145/2484838.2484871
Full text: PDFPDF

The past decade has witnessed the popularity of push-based data management systems, in which the query executor passively receives data from either remote data sources (e.g., sensors) or I/O processes that scan database tables/files from local storage. ...
expand
Nesting the earth mover's distance for effective cluster tracing
Hardy Kremer, Stephan Günnemann, Simon Wollwage, Thomas Seidl
Article No.: 34
doi>10.1145/2484838.2484881
Full text: PDFPDF

Cluster tracing algorithms are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. ...
expand
Semantic query reformulation: the NIF experience
Amarnath Gupta, Anita Bandrowski, Christopher Condit, Xufei Qian, Jeffrey S. Grethe, Maryann E. Martone
Article No.: 35
doi>10.1145/2484838.2484839
Full text: PDFPDF

The NIF system is a semantic search engine that uses an ontology to improve search quality. In this experience paper we present SKEYQL, our semantic keyword query language and describe a number of ontology-based query reformulation strategies that go ...
expand
Autonomous clustering for wireless sensor networks
Fabian D. Winter, Peer Kröger, Johannes Niedermayer, Matthias Renz
Article No.: 36
doi>10.1145/2484838.2484841
Full text: PDFPDF

Most algorithms treat Wireless Sensor Networks (WSNs) only as a generator of data without any autonomy. In contrast to this approach, we propose the ACIDE framework: A completely decentralized, bottom-up clustering process and information exchange that ...
expand
Forecasting in hierarchical environments
Robert Lorenz, Lars Dannecker, Philipp Rösch, Wolfgang Lehner, Gregor Hackenbroich, Benjamin Schlegel
Article No.: 37
doi>10.1145/2484838.2484849
Full text: PDFPDF

Forecasting is an important data analysis technique and serves as the basis for business planning in many application areas such as energy, sales and traffic management. The currently employed statistical models already provide very accurate predictions, ...
expand
Towards efficient discovery of coverage patterns in transactional databases
R. Uday Kiran, Masashi Toyoda, Masaru Kitsuregawa
Article No.: 38
doi>10.1145/2484838.2484850
Full text: PDFPDF

Coverage pattern mining is an important model in data mining. It provides useful information pertaining to the sets of items that have coverage interesting to the users in a transactional database. The coverage patterns do not satisfy the anti-monotonic ...
expand
Bulk sorted access for efficient top-k retrieval
Dustin Lange, Felix Naumann
Article No.: 39
doi>10.1145/2484838.2484852
Full text: PDFPDF

Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on ...
expand
DoS: an efficient scheme for the diversification of multiple search results
Hina A. Khan, Marina Drosou, Mohamed A. Sharaf
Article No.: 40
doi>10.1145/2484838.2484858
Full text: PDFPDF

Data diversification provides users with a concise and meaningful view of the results returned by search queries. In addition to taming the information overload, data diversification also provides the benefits of reducing data communication costs as ...
expand
Research lattices: towards a scientific hypothesis data model
Bernardo Gonçalves, Fabio Porto
Article No.: 41
doi>10.1145/2484838.2484861
Full text: PDFPDF

As the problems of scientific interest raise in scale and complexity, scientists have to tacitly manage too many analytic elements. Hypotheses are worked out to drive research towards successful explanation and prediction, which characterizes science ...
expand
Sharing confidential data for algorithm development by multiple imputation
Sicco Verwer, Susan van den Braak, Sunil Choenni
Article No.: 42
doi>10.1145/2484838.2484865
Full text: PDFPDF

The availability of real-life data sets is of crucial importance for algorithm and application development, as these often require insight into the specific properties of the data. Often, however, such data are not released because of their proprietary ...
expand
Mining multidimensional contextual outliers from categorical relational data
Guanting Tang, James Bailey, Jian Pei, Guozhu Dong
Article No.: 43
doi>10.1145/2484838.2484883
Full text: PDFPDF

A wide range of methods have been proposed for detecting different types of outliers in full space and subspaces. However, the interpretability of outliers, that is, explaining in what ways and to what extent an object is an outlier, remains a critical ...
expand
DEMONSTRATION SESSION: Demonstrations
A fast handshake join implementation on FPGA with adaptive merging network
Yasin Oge, Takefumi Miyoshi, Hideyuki Kawashima, Tsutomu Yoshinaga
Article No.: 44
doi>10.1145/2484838.2484868
Full text: PDFPDF

One of a critical design issues for implementing handshake-join hardware is result collection performed by a merging network. To address the issue, we introduce an adaptive merging network. Our implementation achieves over 3 million tuples per ...
expand
Adaptive exploration for large-scale protein analysis in the molecular dynamics database
Sarana Nutanong, Nick Carey, Yanif Ahmad, Alex S. Szalay, Thomas B. Woolf
Article No.: 45
doi>10.1145/2484838.2484872
Full text: PDFPDF

Molecular dynamics (MD) simulations generate detailed time-series data of all-atom motions. These simulations are leading users of the world's most powerful supercomputers, and are standard-bearers for a wide range of high-performance computing (HPC) ...
expand
Parallel online aggregation in action
Chengjie Qin, Florin Rusu
Article No.: 46
doi>10.1145/2484838.2484874
Full text: PDFPDF

Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing ...
expand
Astronomical data processing in EXTASCID
Yu Cheng, Florin Rusu
Article No.: 47
doi>10.1145/2484838.2484875
Full text: PDFPDF

Scientific data have dual structure. Raw data are preponderantly ordered multi-dimensional arrays or sequences while metadata and derived data are best represented as unordered relations. Scientific data processing requires complex operations over arrays ...
expand
Data vaults: a database welcome to scientific file repositories
Milena Ivanova, Yağiz Kargin, Martin Kersten, Stefan Manegold, Ying Zhang, Mihai Datcu, Daniela Espinoza Molina
Article No.: 48
doi>10.1145/2484838.2484876
Full text: PDFPDF

Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder