ABSTRACTWe describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me.
The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.
AUTHORS
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Randal Burns | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Kunal Lillaney | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Daniel R. Berger | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Logan Grosenick | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Karl Deisseroth | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of R. Clay Reid | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of William Gray Roncal | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Priya Manavalan | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Davi D. Bock | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Narayanan Kasthuri | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Michael Kazhdan | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Stephen J. Smith | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Dean Kleissas | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Eric Perlman | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Kwanghun Chung | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Nicholas C. Weiler | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Jeff Lichtman | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Alexander S. Szalay | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Joshua T. Vogelstein | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of R. Jacob Vogelstein | |||||||||||||||||||||||||||||||||||||||||
REFERENCESNote: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
|
3
|
D. D. Bock, W.-C. A. Lee, A. M. Kerlin, M. L. Andermann, G. Hood, A. W. Wetzel, S. Yurgenson, E. R. Soucy, H. S. Kim, and R. C. Reid. Network anatomy and in vivo physiology of visual cortical neurons. Nature, 471(7337), 2011.
|
|
|
4
|
A. Cardona, S. Saalfeld, J. Schindelin, I. Arganda-Carreras, S. Preibisch, M. Longair, P. Tomancak, V. Hartenstein, and R. J. Douglas. TrakEM2 software for neural circuit reconstruction. PLoS ONE, 7(6), 2012.
|
|
| |
5
|
|
|
6
|
||
| |
7
|
|
| |
8
|
|
|
9
|
P. Furtado and P. Baumann. Storage of multidimensional arrays based on arbitrary tiling. In ICDE, 1999.
|
|
|
10
|
J. Gray and A. Szalay. Science in an exponential world. Nature, 440(23), 23 March 2006.
|
|
| |
11
|
|
|
12
|
H. S. Seung et al. Eyewire. Available at eyewire.org, 2012.
|
|
|
13
|
V. Jain, H. S. Seung, and S. C. Turaga. Machines that learn to segment images: a crucial technology for connectomics. Current opinion in neurobiology, 20(5), 2010.
|
|
|
14
|
V. Jain, S. Turaga, K. Briggman, W. Denk, and S. Seung. Learning to agglomerate superpixel hierarchies. In Neural Information Processing Systems, 2011.
|
|
|
15
|
||
|
16
|
N. Kasthuri and J. Lichtman. Untitled. In preparation, 2013.
|
|
|
17
|
||
| |
18
|
|
|
19
|
D. M. Kleissas, W. R. Gray, J. M. Burck, J. T. Vogelstein, E. Perlman, P. M. Burlina, R. Burns, and R. J. Vogelstein. CAJAL3D: toward a fully automatic pipeline for connectome estimation from high-resolution em data. In Neuroinformatics, 2012.
|
|
|
20
|
Y. Li, E. Perlman, M. Wang, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink. A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. Journal of Turbulence, 9(31):1--29, 2008.
|
|
|
21
|
J. Markoff. Obama seeking to boost study of human brain. New York Times, February 17, 2013.
|
|
|
22
|
K. D. Micheva, B. Busse, N. C. Weiler, N. O'Rourke, and S. J. Smith. Single-synapse analysis of a diverse synapse population: Proteomic imaging methods and markers. Neuron, 68(1), 2010.
|
|
|
23
|
||
|
24
|
||
|
25
|
R. A. Musaloiu-E, A. Terzis, K. Szlavecz, A. Szalay, J. Cogan, and J. Gray. Life under your feet: Wireless sensors in soil ecology. In Embedded Networked Sensors, 2006.
|
|
|
26
|
||
|
27
|
A. Norton and J. Clyne. The VAPOR visualization application. In High Performance Visualization, 2012.
|
|
|
28
|
N. O'Rourke, N. C. Weiler, K. D. Micheva, and S. J. Smith. Deep molecular diversity of mammalian synapses: why it matters and how to measure it. Nature Reviews Neuroscience, 13(1), 2012.
|
|
| |
29
|
Stratos Papadomanolakis , Anastassia Ailamaki , Julio C. Lopez , Tiankai Tu , David R. O'Hallaron , Gerd Heber, Efficient query processing on unstructured tetrahedral meshes, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA [doi>10.1145/1142473.1142535]
|
| |
30
|
|
|
31
|
||
| |
32
|
|
|
33
|
D. E. Rex, J. Q. Ma, and A. W. Tioga. The LONI pipeline processing environment. Neuroimage, 19(3), 2003.
|
|
|
34
|
||
|
35
|
||
|
36
|
The Sloan Digital Sky Survey, 2013. Available at http://www.sdss.org/.
|
|
|
37
|
||
|
38
|
C. Sommer, C. Straehle, U. Koethe, and F. A. Hamprecht. "ilastik: Interactive learning and segmentation toolkit". In Biomedical Imaging, 2011.
|
|
| |
39
|
|
|
40
|
M. Stonebraker, J. Becla, D. DeWitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. Zdonik. Requirements for science data bases and SciDB. In Conference on Innovative Data Systems Research, 2009.
|
|
|
41
|
A. S. Szalay, K. Church, C. Meneveau, A. Terzis, and S. Zeger. MRI: The Development of Data-Scope---a multi-petabyte generic data analysis environment for science. Available at https://wiki.pha.jhu.edu/escience_wiki/images/7/7f/DataScope.pdf, 2012.
|
|
|
42
|
||
| |
43
|
|
|
44
|
K. Wu. FastBit: Interactively searching massive data. Journal of Physics: Conference Series, 180(1), 2009.
|
CITED BY
INDEX TERMSThe ACM Computing Classification System (CCS rev.2012)
PUBLICATION| Title | SSDBM Proceedings of the 25th International Conference on Scientific and Statistical Database Management table of contents | ||||||||||||||
| Editors | Alex Szalay | ||||||||||||||
| Tamas Budavari | |||||||||||||||
| Magdalena Balazinska | |||||||||||||||
| Alexandra Meliou | |||||||||||||||
| Ahmet Sacan | |||||||||||||||
| Article No. | 27 | ||||||||||||||
| Publication Date | 2013-07-29 (yyyy-mm-dd) | ||||||||||||||
| Publisher | ACM New York, NY, USA ©2013 | ||||||||||||||
| ISBN: 978-1-4503-1921-8 doi>10.1145/2484838.2484870 | |||||||||||||||
| Overall Acceptance Rate 26 of 71 submissions, 37% | |||||||||||||||
|
|||||||||||||||
REVIEWS
COMMENTSBe the first to comment To Post a comment please sign in or create a free Web account
Table of Contents| SESSION: Keynote sessions | ||
| Making sense of big data with the Berkeley data analytics stack | ||
| Michael J. Franklin | ||
| Article No.: 1 | ||
| doi>10.1145/2484838.2484884 | ||
|
The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications require a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical ...
expand
|
||
| Computational challenges in next-generation genomics | ||
| Steven L. Salzberg | ||
| Article No.: 2 | ||
| doi>10.1145/2484838.2484885 | ||
|
Next-generation sequencing (NGS) technology allows us to peer inside the cell in exquisite detail, revealing new insights into biology, evolution, and disease that would have been impossible to find just a few years ago. The enormous volumes of data ...
expand
|
||
| SESSION: Panel | ||
| Education and career paths for data scientists | ||
| Magdalena Balazinska, Susan B. Davidson, Bill Howe, Alexandros Labrinidis | ||
| Article No.: 3 | ||
| doi>10.1145/2484838.2484886 | ||
Full text: PDF
|
||
|
MOTIVATION: As industry and science are increasingly data-driven, the need for skilled data scientists is exceeding what our universities are producing. According to a Mckinsey report: "By 2018, the United States alone could face a shortage of 140,000 ...
expand
|
||
| SESSION: Research sessions: multidimensional data | ||
| On the combination of relative clustering validity criteria | ||
| Lucas Vendramin, Pablo A. Jaskowiak, Ricardo J. G. B. Campello | ||
| Article No.: 4 | ||
| doi>10.1145/2484838.2484844 | ||
Full text: PDF
|
||
|
Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific ...
expand
|
||
| Parameter-free and domain-independent similarity search with diversity | ||
| Lucio F. D. Santos, Willian D. Oliveira, Monica R. P. Ferreira, Agma J. M. Traina, Caetano Traina, Jr. | ||
| Article No.: 5 | ||
| doi>10.1145/2484838.2484854 | ||
Full text: PDF
|
||
|
New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the ...
expand
|
||
| A multidimensional data model with subcategories for flexibly capturing summarizability | ||
| Sina Ariyan, Leopoldo Bertossi | ||
| Article No.: 6 | ||
| doi>10.1145/2484838.2484857 | ||
Full text: PDF
|
||
|
In multidimensional (MD) databases and data warehouses we commonly prefer instances that have summarizable dimensions. This is because they have good properties for query answering. Most typically, with summarizable dimensions, precomputed and materialized ...
expand
|
||
| Nearest group queries | ||
| Dongxiang Zhang, Chee-Yong Chan, Kian-Lee Tan | ||
| Article No.: 7 | ||
| doi>10.1145/2484838.2484866 | ||
Full text: PDF
|
||
|
k nearest neighbor (kNN) search is an important problem in a vast number of applications, including clustering, pattern recognition, image retrieval and recommendation systems. It finds k elements from a data source D that are closest to a given query ...
expand
|
||
| SESSION: Research sessions: spatio-temporal data | ||
| Providing multi-scale consistency for multi-scale geospatial data | ||
| João S. C. Longo, Claudia Bauzer Medeiros | ||
| Article No.: 8 | ||
| doi>10.1145/2484838.2484867 | ||
Full text: PDF
|
||
|
We are immersed in a world in which we constantly deal (and cope) with objects and phenomena in a variety of scales in space and time. With the increase in collaborative and inter-disciplinary research, there appeared a growing need for handling data ...
expand
|
||
| Reasoning about RFID-tracked moving objects in symbolic indoor spaces | ||
| Sari Haj Hussein, Hua Lu, Torben Bach Pedersen | ||
| Article No.: 9 | ||
| doi>10.1145/2484838.2484877 | ||
Full text: PDF
|
||
|
In recent years, indoor spatial data management has started to attract attention, partly due to the increasing use of receptor devices (e.g., RFID readers, and wireless sensor networks) in indoor, as well as outdoor spaces. There is thus a great need ...
expand
|
||
| Towards a universal tracking database | ||
| Gereon Schueller, Andreas Behrend | ||
| Article No.: 10 | ||
| doi>10.1145/2484838.2484845 | ||
Full text: PDF
|
||
|
In moving object databases, authors usually assume that number and position of objects to be processed are always known in advance. Detecting an unknown moving object and pursuing its movement, however, is usually left to tracking algorithms resting ...
expand
|
||
| GIPSY: joining spatial datasets with contrasting density | ||
| Mirjana Pavlovic, Farhan Tauheed, Thomas Heinis, Anastasia Ailamakit | ||
| Article No.: 11 | ||
| doi>10.1145/2484838.2484855 | ||
Full text: PDF
|
||
|
Many scientific and geographical applications rely on the efficient execution of spatial joins. Past research has produced several efficient spatial join approaches and while each of them can join two datasets, the problem of efficiently joining two ...
expand
|
||
| Publishing trajectories with differential privacy guarantees | ||
| Kaifeng Jiang, Dongxu Shao, Stéphane Bressan, Thomas Kister, Kian-Lee Tan | ||
| Article No.: 12 | ||
| doi>10.1145/2484838.2484846 | ||
Full text: PDF
|
||
|
The pervasiveness of location-acquisition technologies has made it possible to collect the movement data of individuals or vehicles. However, it has to be carefully managed to ensure that there is no privacy breach. In this paper, we investigate the ...
expand
|
||
| SESSION: Research sessions: streaming and time-series data | ||
| Fast computation of approximate biased histograms on sliding windows over data streams | ||
| Hamid Mousavi, Carlo Zaniolo | ||
| Article No.: 13 | ||
| doi>10.1145/2484838.2484851 | ||
Full text: PDF
|
||
|
Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications, including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. Moreover, ...
expand
|
||
| Multi-scale dissemination of time series data | ||
| Qingsong Guo, Yongluan Zhou, Li Su | ||
| Article No.: 14 | ||
| doi>10.1145/2484838.2484878 | ||
Full text: PDF
|
||
|
In this paper, we consider the problem of continuous dissemination of time series data, such as sensor measurements, to a large number of subscribers. These subscribers fall into multiple subscription levels, where each subscription level is specified ...
expand
|
||
| Multi-query scheduling for time-critical data stream applications | ||
| Yongluan Zhou, Ji Wu, Ahmed Khan Leghari | ||
| Article No.: 15 | ||
| doi>10.1145/2484838.2484864 | ||
Full text: PDF
|
||
|
Many data stream applications, such as network intrusion detection, on-line financial tickers and environmental monitoring, typically exhibit certain "real-time" traits. In such applications, people are interested in strategies that ensure on-time delivery ...
expand
|
||
| Learning uncertainty models from weather forecast performance databases using quantile regression | ||
| Ashkan Zarnani, Petr Musilek | ||
| Article No.: 16 | ||
| doi>10.1145/2484838.2484840 | ||
Full text: PDF
|
||
|
Forecast uncertainty information is not available in the immediate output of Numerical weather prediction (NWP) models. Such important information is required for optimal decision making processes in many domains. Prediction intervals are a prominent ...
expand
|
||
| SESSION: Research sessions: miscellaneous | ||
| Search and result presentation in scientific workflow repositories | ||
| Susan B. Davidson, Xiaocheng Huang, Julia Stoyanovich, Xiaojie Yuan | ||
| Article No.: 17 | ||
| doi>10.1145/2484838.2484847 | ||
Full text: PDF
|
||
|
We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a ...
expand
|
||
| Tuning large scale deduplication with reduced effort | ||
| Guilherme Dal Bianco, Renata Galante, Carlos A. Heuser, Marcos André Gonçalves | ||
| Article No.: 18 | ||
| doi>10.1145/2484838.2484873 | ||
Full text: PDF
|
||
|
Deduplication is the task of identifying which objects are potentially the same in a data repository. It usually demands user intervention in several steps of the process, mainly to identify some pairs representing matchings and non-matchings. This information ...
expand
|
||
| HmSearch: an efficient hamming distance query processing algorithm | ||
| Xiaoyang Zhang, Jianbin Qin, Wei Wang, Yifang Sun, Jiaheng Lu | ||
| Article No.: 19 | ||
| doi>10.1145/2484838.2484842 | ||
Full text: PDF
|
||
|
Hamming distance measures the number of dimensions where two vectors have different values. In applications such as pattern recognition, information retrieval, and databases, we often need to efficiently process Hamming distance query, which retrieves ...
expand
|
||
| pcApriori: scalable apriori for multiprocessor systems | ||
| Benjamin Schlegel, Tim Kiefer, Thomas Kissinger, Wolfgang Lehner | ||
| Article No.: 20 | ||
| doi>10.1145/2484838.2484879 | ||
Full text: PDF
|
||
|
Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several ...
expand
|
||
| Shortlisting top-K assignments | ||
| Yimin Lin, Kyriakos Mouratidis | ||
| Article No.: 21 | ||
| doi>10.1145/2484838.2484859 | ||
Full text: PDF
|
||
|
In this paper we identify a novel query type, the top-K assignment query (αTop-K). Consider a set of objects and a set of suppliers, where each object must be assigned to one supplier. Assume that there is a cost associated with every ...
expand
|
||
| SESSION: Research sessions: graphs and indexes | ||
| GPS: a graph processing system | ||
| Semih Salihoglu, Jennifer Widom | ||
| Article No.: 22 | ||
| doi>10.1145/2484838.2484843 | ||
Full text: PDF
|
||
|
GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, ...
expand
|
||
| RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs | ||
| Brigitte Boden, Stephan Günnemann, Holger Hoffmann, Thomas Seidl | ||
| Article No.: 23 | ||
| doi>10.1145/2484838.2484860 | ||
Full text: PDF
|
||
|
Detecting dense subgraphs in a large graph is an important graph mining problem and various approaches have been proposed for its solution. While most existing methods only consider unlabeled and one-dimensional graph data, many real-world applications ...
expand
|
||
| SMIX: self-managing indexes for dynamic workloads | ||
| Hannes Voigt, Thomas Kissinger, Wolfgang Lehner | ||
| Article No.: 24 | ||
| doi>10.1145/2484838.2484862 | ||
Full text: PDF
|
||
|
As databases accumulate growing amounts of data at an increasing rate, adaptive indexing becomes more and more important. At the same time, applications and their use get more agile and flexible, resulting in less steady and less predictable workload ...
expand
|
||
| Inverted indices for particle tracking in petascale cosmological simulations | ||
| Daniel Crankshaw, Randal Burns, Bridget Falck, Tamás Budavári, Alexander S. Szalay, Jie Wang | ||
| Article No.: 25 | ||
| doi>10.1145/2484838.2484882 | ||
Full text: PDF
|
||
|
We describe the challenges arising from tracking dark matter particles in state of the art cosmological simulations. We are in the process of running the Indra suite of simulations, with an aggregate count of more than 35 trillion particles and 1.1PB ...
expand
|
||
| SESSION: Research sessions: case-studies | ||
| Accelerating gene context analysis using bitmaps | ||
| Alex Romosan, Arie Shoshani, Kesheng Wu, Victor Markowitz, Kostas Mavrommatis | ||
| Article No.: 26 | ||
| doi>10.1145/2484838.2484856 | ||
Full text: PDF
|
||
|
Gene context analysis determines the function of genes by examining the conservation of chromosomal gene clusters and co-occurrence functional profiles across genomes. This is based on the observation that functionally related genes are often collocated ...
expand
|
||
| The open connectome project data cluster: scalable analysis and vision for high-throughput neuroscience | ||
| Randal Burns, Kunal Lillaney, Daniel R. Berger, Logan Grosenick, Karl Deisseroth, R. Clay Reid, William Gray Roncal, Priya Manavalan, Davi D. Bock, Narayanan Kasthuri, Michael Kazhdan, Stephen J. Smith, Dean Kleissas, Eric Perlman, Kwanghun Chung, Nicholas C. Weiler, Jeff Lichtman, Alexander S. Szalay, Joshua T. Vogelstein, R. Jacob Vogelstein | ||
| Article No.: 27 | ||
| doi>10.1145/2484838.2484870 | ||
Full text: PDF
|
||
|
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily ...
expand
|
||
| Real-time collaborative analysis with (almost) pure SQL: a case study in biogeochemical oceanography | ||
| Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A. Saito, Bill Howe, E. Virginia Armbrust | ||
| Article No.: 28 | ||
| doi>10.1145/2484838.2484880 | ||
Full text: PDF
|
||
|
We consider a case study using SQL-as-a-Service to support "instant analysis" of weakly structured relational data at a multi-investigator science retreat. Here, "weakly structured" means tabular, rows-and-columns datasets that share some common context, ...
expand
|
||
| Optimizing fastquery performance on lustre file system | ||
| Kuan-Wu Lin, Surendra Byna, Jerry Chou, Kesheng Wu | ||
| Article No.: 29 | ||
| doi>10.1145/2484838.2484853 | ||
Full text: PDF
|
||
|
FastQuery is a parallel indexing and querying system we developed for accelerating analysis and visualization of scientific data. We have applied it to a wide variety of HPC applications and demonstrated its capability and scalability using a petascale ...
expand
|
||
| Graywulf: a platform for federated scientific databases and services | ||
| László Dobos, István Csabai, Alexander S. Szalay, Tamás Budavári, Nolan Li | ||
| Article No.: 30 | ||
| doi>10.1145/2484838.2484863 | ||
Full text: PDF
|
||
|
Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very ...
expand
|
||
| SESSION: Short papers | ||
| Learning to explore scientific workflow repositories | ||
| Julia Stoyanovich, Paramveer Dhillon, Susan B. Davidson, Brian Lyons | ||
| Article No.: 31 | ||
| doi>10.1145/2484838.2484848 | ||
Full text: PDF
|
||
|
Scientific workflows are gaining popularity, and repositories of workflows are starting to emerge. In this paper we describe TopicsExplorer, a data exploration approach for myExperiment.org, a collaborative platform for the exchange of ...
expand
|
||
| Best of both worlds: relational databases and statistics | ||
| Hannes Mühleisen, Thomas Lumley | ||
| Article No.: 32 | ||
| doi>10.1145/2484838.2484869 | ||
Full text: PDF
|
||
|
Statistics software packages and relational database systems possess considerable overlap in the area of data loading, handling, and transformation. However, only databases are mainly optimized towards high performance in this area. In this paper, we ...
expand
|
||
| Data management systems on GPUs: promises and challenges | ||
| Yi-Cheng Tu, Anand Kumar, Di Yu, Ran Rui, Ryan Wheeler | ||
| Article No.: 33 | ||
| doi>10.1145/2484838.2484871 | ||
Full text: PDF
|
||
|
The past decade has witnessed the popularity of push-based data management systems, in which the query executor passively receives data from either remote data sources (e.g., sensors) or I/O processes that scan database tables/files from local storage. ...
expand
|
||
| Nesting the earth mover's distance for effective cluster tracing | ||
| Hardy Kremer, Stephan Günnemann, Simon Wollwage, Thomas Seidl | ||
| Article No.: 34 | ||
| doi>10.1145/2484838.2484881 | ||
Full text: PDF
|
||
|
Cluster tracing algorithms are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. ...
expand
|
||
| Semantic query reformulation: the NIF experience | ||
| Amarnath Gupta, Anita Bandrowski, Christopher Condit, Xufei Qian, Jeffrey S. Grethe, Maryann E. Martone | ||
| Article No.: 35 | ||
| doi>10.1145/2484838.2484839 | ||
Full text: PDF
|
||
|
The NIF system is a semantic search engine that uses an ontology to improve search quality. In this experience paper we present SKEYQL, our semantic keyword query language and describe a number of ontology-based query reformulation strategies that go ...
expand
|
||
| Autonomous clustering for wireless sensor networks | ||
| Fabian D. Winter, Peer Kröger, Johannes Niedermayer, Matthias Renz | ||
| Article No.: 36 | ||
| doi>10.1145/2484838.2484841 | ||
Full text: PDF
|
||
|
Most algorithms treat Wireless Sensor Networks (WSNs) only as a generator of data without any autonomy. In contrast to this approach, we propose the ACIDE framework: A completely decentralized, bottom-up clustering process and information exchange that ...
expand
|
||
| Forecasting in hierarchical environments | ||
| Robert Lorenz, Lars Dannecker, Philipp Rösch, Wolfgang Lehner, Gregor Hackenbroich, Benjamin Schlegel | ||
| Article No.: 37 | ||
| doi>10.1145/2484838.2484849 | ||
Full text: PDF
|
||
|
Forecasting is an important data analysis technique and serves as the basis for business planning in many application areas such as energy, sales and traffic management. The currently employed statistical models already provide very accurate predictions, ...
expand
|
||
| Towards efficient discovery of coverage patterns in transactional databases | ||
| R. Uday Kiran, Masashi Toyoda, Masaru Kitsuregawa | ||
| Article No.: 38 | ||
| doi>10.1145/2484838.2484850 | ||
Full text: PDF
|
||
|
Coverage pattern mining is an important model in data mining. It provides useful information pertaining to the sets of items that have coverage interesting to the users in a transactional database. The coverage patterns do not satisfy the anti-monotonic ...
expand
|
||
| Bulk sorted access for efficient top-k retrieval | ||
| Dustin Lange, Felix Naumann | ||
| Article No.: 39 | ||
| doi>10.1145/2484838.2484852 | ||
Full text: PDF
|
||
|
Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on ...
expand
|
||
| DoS: an efficient scheme for the diversification of multiple search results | ||
| Hina A. Khan, Marina Drosou, Mohamed A. Sharaf | ||
| Article No.: 40 | ||
| doi>10.1145/2484838.2484858 | ||
Full text: PDF
|
||
|
Data diversification provides users with a concise and meaningful view of the results returned by search queries. In addition to taming the information overload, data diversification also provides the benefits of reducing data communication costs as ...
expand
|
||
| Research lattices: towards a scientific hypothesis data model | ||
| Bernardo Gonçalves, Fabio Porto | ||
| Article No.: 41 | ||
| doi>10.1145/2484838.2484861 | ||
Full text: PDF
|
||
|
As the problems of scientific interest raise in scale and complexity, scientists have to tacitly manage too many analytic elements. Hypotheses are worked out to drive research towards successful explanation and prediction, which characterizes science ...
expand
|
||
| Sharing confidential data for algorithm development by multiple imputation | ||
| Sicco Verwer, Susan van den Braak, Sunil Choenni | ||
| Article No.: 42 | ||
| doi>10.1145/2484838.2484865 | ||
Full text: PDF
|
||
|
The availability of real-life data sets is of crucial importance for algorithm and application development, as these often require insight into the specific properties of the data. Often, however, such data are not released because of their proprietary ...
expand
|
||
| Mining multidimensional contextual outliers from categorical relational data | ||
| Guanting Tang, James Bailey, Jian Pei, Guozhu Dong | ||
| Article No.: 43 | ||
| doi>10.1145/2484838.2484883 | ||
Full text: PDF
|
||
|
A wide range of methods have been proposed for detecting different types of outliers in full space and subspaces. However, the interpretability of outliers, that is, explaining in what ways and to what extent an object is an outlier, remains a critical ...
expand
|
||
| DEMONSTRATION SESSION: Demonstrations | ||
| A fast handshake join implementation on FPGA with adaptive merging network | ||
| Yasin Oge, Takefumi Miyoshi, Hideyuki Kawashima, Tsutomu Yoshinaga | ||
| Article No.: 44 | ||
| doi>10.1145/2484838.2484868 | ||
Full text: PDF
|
||
|
One of a critical design issues for implementing handshake-join hardware is result collection performed by a merging network. To address the issue, we introduce an adaptive merging network. Our implementation achieves over 3 million tuples per ...
expand
|
||
| Adaptive exploration for large-scale protein analysis in the molecular dynamics database | ||
| Sarana Nutanong, Nick Carey, Yanif Ahmad, Alex S. Szalay, Thomas B. Woolf | ||
| Article No.: 45 | ||
| doi>10.1145/2484838.2484872 | ||
Full text: PDF
|
||
|
Molecular dynamics (MD) simulations generate detailed time-series data of all-atom motions. These simulations are leading users of the world's most powerful supercomputers, and are standard-bearers for a wide range of high-performance computing (HPC) ...
expand
|
||
| Parallel online aggregation in action | ||
| Chengjie Qin, Florin Rusu | ||
| Article No.: 46 | ||
| doi>10.1145/2484838.2484874 | ||
Full text: PDF
|
||
|
Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing ...
expand
|
||
| Astronomical data processing in EXTASCID | ||
| Yu Cheng, Florin Rusu | ||
| Article No.: 47 | ||
| doi>10.1145/2484838.2484875 | ||
Full text: PDF
|
||
|
Scientific data have dual structure. Raw data are preponderantly ordered multi-dimensional arrays or sequences while metadata and derived data are best represented as unordered relations. Scientific data processing requires complex operations over arrays ...
expand
|
||
| Data vaults: a database welcome to scientific file repositories | ||
| Milena Ivanova, Yağiz Kargin, Martin Kersten, Stefan Manegold, Ying Zhang, Mihai Datcu, Daniela Espinoza Molina | ||
| Article No.: 48 | ||
| doi>10.1145/2484838.2484876 | ||
Full text: PDF
|
||
|
Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific ...
expand
|
||