1
Randal Burns,
Kunal Lillaney,
Daniel R. Berger,
Logan Grosenick,
Karl Deisseroth,
R. Clay Reid,
William Gray Roncal,
Priya Manavalan,
Davi D. Bock,
Narayanan Kasthuri,
Michael Kazhdan,
Stephen J. Smith,
Dean Kleissas,
Eric Perlman,
Kwanghun Chung,
Nicholas C. Weiler,
Jeff Lichtman,
Alexander S. Szalay,
Joshua T. Vogelstein,
R. Jacob Vogelstein
July 2013
SSDBM: Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 4, Downloads (12 Months): 41, Downloads (Overall): 250
Full text available:
PDF
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes ---neural connectivity maps of the brain---using the parallel ...
Keywords:
connectomics, data-intensive computing
Keywords:
data-intensive computing
Abstract:
... openconnecto.me.</p> <p>The system design inherits much from NoSQL scale-out and data- -intensive computing architectures. We distribute data to cluster nodes by partitioning a ...
2
November 2011
NDM '11: Proceedings of the first international workshop on Network-aware data management
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6, Downloads (12 Months): 27, Downloads (Overall): 159
Full text available:
PDF
Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The data-intensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running ...
Keywords:
data-intensive computing
Title:
An architecture for a data-intensive computer
Keywords:
data-intensive computing
Abstract:
... do science. We propose a system that we call the <i>data- -intensive computer</i> for computing with Petascale-sized datasets. The data- -intensive computer consists of an HPC cluster, a massively parallel database and ... database into a layer in the memory hierarchy of the data- -intensive computer. .</p> <p>The data-intensive operating system is <i>data-object-oriented</i>: the abstract programming ...
Full Text:
... call the data-intensive com-puter for computing with Petascale-sized datasets. The data- -intensive computer consists of an HPC cluster, a massivelyparallel database and a ... recently awarded our group to build a 5PBcluster for extreme data- -intensive computations. . The Data-Scope will be co-located and integrated with a ...
... as follows. Insection 2 we survey several research projects involving data- -intensive computing. . An examination of computing require-ments for these projects leads ... for these projects leads us to propose the concept ofthe data- -intensive computer in section 3, where we discussdesign requirements for the computer ... DATA-INTENSIVE SCI-ENTIFIC RESEARCHIn this section we examine several examples involving data- -intensive computations from turbulence, neuroscience andhearing research. The examples, taken from research ... guide us in the design of the operating systemfor the data- -intensive computer. .2.1 Turbulence researchA new database approach to scientific computing in ...
... library is a prototype of an op-erating system for a data- -intensive computer. . It providesdatabase services to scientific computing processes and cur-rently ...
... and have proposed a novel operating system architec-ture to support data- -intensive computations with Petascale-sized data sets. Section 3 outlines a research and ... only a few topics re-lated to the design of the data- -intensive computer, ... , and nec-essarily omits many important aspects. The construction ofthe data- -intensive computer is an ongoing research project.A prototype of the data-intensive operating ...
NeuroscienceThe data- -intensive computer Direct I/O between memory and database Moving the program to ...
3
May 2012
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 0, Downloads (12 Months): 9, Downloads (Overall): 57
Full text available:
PDF
While dissemination of scientific data is becoming crucial for facilitating scientific discoveries, a key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. Coupled with the fact that wide area data transfer bandwidths and disk retrieval speeds are growing at a much slower pace, ...
Keywords:
Scientific databases, Data-intensive computing
Keywords:
Scientific databases, Data-intensive computing
4
June 2012
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 3, Downloads (12 Months): 12, Downloads (Overall): 217
Full text available:
PDF
Fault-tolerance is rapidly becoming a crucial issue in high-end and distributed computing, as increasing number of cores are decreasing the mean-time to failure of the systems. In this work, we present an algorithm-based fault tolerance solution that handles fail-stop failures for a class of iterative data intensive algorithms. We intelligently ...
Keywords:
data-intensive computing, fault tolerance
Keywords:
data-intensive computing
References:
T. Bicer, Wei Jiang, and G. Agrawal. Supporting fault tolerance in a data-intensive computing middleware. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12, april 2010.
Full Text:
... Bicer, Wei Jiang, and G. Agrawal. Supporting faulttolerance in a data- -intensive computing middleware. InParallel Distributed Processing (IPDPS), 2010 IEEEInternational Symposium on, pages ...
5
June 2011
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review: Volume 39 Issue 1, June 2011
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0, Downloads (12 Months): 4, Downloads (Overall): 61
Full text available:
PDF
Keywords:
cloud computing, performance, data-intensive computing, efficiency
Title:
Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing
Keywords:
data-intensive computing
References:
E. Krevat, et al. Applying Simple Performance Models to Understand Inefficiencies in Data-Intensive Computing. Technical report. 2011. Carnegie Mellon University-PDL-11-103.
Full Text:
... Krevat, et al. Applying Simple Performance Models to UnderstandInefficiencies in Data- -Intensive Computing. . Technical report. 2011.CMU-PDL-11-103.[6] E. Krevat, et al. Disks Are ...
6
June 2011
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1, Downloads (12 Months): 4, Downloads (Overall): 123
Full text available:
PDF
Keywords:
performance, data-intensive computing, efficiency, cloud computing
Title:
Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing
Keywords:
data-intensive computing
References:
E. Krevat, et al. Applying Simple Performance Models to Understand Inefficiencies in Data-Intensive Computing. Technical report. 2011. Carnegie Mellon University-PDL-11-103.
Full Text:
... Krevat, et al. Applying Simple Performance Models to UnderstandInefficiencies in Data- -Intensive Computing. . Technical report. 2011.CMU-PDL-11-103.[6] E. Krevat, et al. Disks Are ...
7
March 2016
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 16, Downloads (12 Months): 181, Downloads (Overall): 277
Full text available:
PDF
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application knowledge in guiding frequent memory move, i.e., replicating or migrating virtual memory regions. To this end, we present memif, a protected OS service for asynchronous, hardware-accelerated memory move. Compared to the state of the art -- page migration ...
Keywords:
heterogeneous memory, data-intensive computing, operating systems
Also published in:
June 2016
ACM SIGPLAN Notices - ASPLOS '16: Volume 51 Issue 4, April 2016 June 2016
ACM SIGOPS Operating Systems Review - SIGOPS Member Plus: Volume 50 Issue 2, June 2016 July 2016
ACM SIGARCH Computer Architecture News - ASPLOS'16: Volume 44 Issue 2, May 2016
Keywords:
data-intensive computing
8
June 2003
ICS '03: Proceedings of the 17th annual international conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 1, Downloads (12 Months): 1, Downloads (Overall): 1,375
Full text available:
PDF
Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets.Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient ...
Keywords:
data intensive computing, XML, restructing compilers, XQuery
Keywords:
data intensive computing
References:
Renato Ferreira, Gagan Agrawal, and Joel Saltz. Compiling object-oriented data intensive computations. In Proceedings of the 2000 International Conference on Supercomputing, May 2000.
Full Text:
... Francisco, California, USA.Copyright 2003 ACM 1-58113-733-8/03/0006 ...$5.00.General TermsLanguages, PerformanceKeywordsXQuery, XML, Data Intensive Computing, , Restructing Compilers1. INTRODUCTIONDeclarative, high-level, and/or application-class specific languagesare often ...
9
October 2012
SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing
Publisher: ACM
Bibliometrics:
Citation Count: 17
Downloads (6 Weeks): 7, Downloads (12 Months): 41, Downloads (Overall): 391
Full text available:
PDF
Data-intensive computing (DISC) frameworks scale by partitioning a job across a set of fault-tolerant tasks , then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as a maximum latency for allocating the desired set of resources to every job. ...
Keywords:
checkpoint/restart, data-intensive computing, elasticity, multi-tenancy
Title:
True elasticity in multi-tenant data-intensive compute clusters
Keywords:
data-intensive computing
Abstract:
<p>Data- -intensive computing (DISC) frameworks scale by partitioning a <i>job</i> across a set ...
Full Text:
... 2012, San Jose, CA USACopyright 2012 ACM 978-1-4503-1761-0/12/10 ...$15.00.Keywordselasticity, multi-tenancy, data- -intensive computing, , check-point/restart1. INTRODUCTIONData-intensive computation (DISC) frameworks [3, 13, 22]partition jobs ...
10
June 2009
DADC '09: Proceedings of the second international workshop on Data-aware distributed computing
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2, Downloads (12 Months): 6, Downloads (Overall): 247
Full text available:
Pdf
High-end computing is increasingly I/O bound as computations become more data-intensive, and data transport technologies struggle to keep pace with the demands of large-scale, distributed computations. One approach to avoiding unnecessary I/O is to move the processing to the data, as seen in Google's successful, but relatively specialized, MapReduce system. ...
Keywords:
active storage, data-intensive computing, structured storage
Keywords:
data-intensive computing
Full Text:
... execution environmentfor data-intensive computations.Our approach to scaling HEC environments for data- -intensive computations is to reduce, and where possible,eliminate data movement between computations ...
... the usage of network resources, interconnectswill remain a bottleneck for data- -intensive computations astransport technologies fail to keep pace with the demands ofnext-generation ... must be madein order to support the demands of next-generation, data- -intensive computations. . The remainder of this paper focuseson our proposal for ...
... aspects of36MapReduce while preclude it from being a general modelfor data- -intensive computing. . Namely, the intentionallylimited programming model, which insulates programmersfrom needing ...
11
July 2014
ACM Transactions on Architecture and Code Optimization (TACO): Volume 11 Issue 2, June 2014
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 11, Downloads (12 Months): 75, Downloads (Overall): 437
Full text available:
PDF
In this article, we propose new extensions to Hadoop to enable clusters of reconfigurable active solid-state drives (RASSDs) to process streaming data from SSDs using FPGAs. We also develop an analytical model to estimate the performance of RASSD clusters running under Hadoop. Using the Hadoop RASSD platform and network simulators, ...
Keywords:
Data-intensive computing, active storage, middleware
Keywords:
Data-intensive computing
Full Text:
... SystemsGeneral Terms: Reconfigurable Computing, RASSD, HadoopAdditional Key Words and Phrases: Data- -intensive computing, , middleware, active storageACM Reference Format:Abdulrahman Kaitoua, Hazem Hajj, Mazen ...
12
May 2012
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1, Downloads (12 Months): 5, Downloads (Overall): 146
Full text available:
PDF
The promise of "infinite" resources given by the cloud computing paradigm has led to recent interest in exploiting clouds for large-scale data-intensive computing. Given this supposedly infinite resource set, we need a management function that regulates application workload on these resources. This doctoral research focuses on two aspects of workload ...
Keywords:
data-intensive computing, workload management, cloud computing
Keywords:
data-intensive computing, workload management, cloud computing
Abstract:
... has led to recent interest in exploiting clouds for large-scale data- -intensive computing. . Given this supposedly infinite resource set, we need a ...
Full Text:
... has led to recent interest in exploiting clouds for large-scale data- -intensive computing. . Given this supposedly infinite resource set, we need a ... and resource provisioning, and associated models, algorithms, and protocols. Keywords: data- -intensive computing; ; workload management; cloud computing. I. INTRODUCTION In the current ... is, in turn, helping to realize the potential of large-scale data- -intensive computing by providing effective scaling of resources. A growing number of ... [4] that have very large datasets to store and process. Data- -intensive computing presents new challenges for systems management in the cloud. One ... WORK We have examined the state-of-the-art of workload management for data- -intensive computing in clouds [7, 8]. In these works, a taxonomy is ... thesis aims to address. We presently see a gap between data- -intensive computing systems and provisioning systems. Most systems surveyed use shared-nothing clusters ... to be selected before the execution starts. We believe that data- -intensive computing systems must exploit a cloud?s elasticity in order to cope ... during workload execution. Due to their disjoint nature, we discuss data- -intensive computing systems and provisioning systems in separate sub-sections below. A. Data- -intensive computing systems We see many data- -intensive computing systems perform task scheduling and data replication independently, and place ...
13
February 2013
ACM Transactions on Computer Systems (TOCS): Volume 31 Issue 1, February 2013
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 8, Downloads (12 Months): 29, Downloads (Overall): 509
Full text available:
PDF
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100TB of input data spread across 832 disks in 52 nodes at a rate of 0.938TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort ...
Keywords:
Data-intensive computing, system optimization, balanced systems, sorting
Keywords:
Data-intensive computing
Full Text:
... ApplicationsGeneral Terms: Design, Experimentation, Measurement, PerformanceAdditional Key Words and Phrases: Data- -intensive computing, , balanced systems, sorting, systemoptimizationACM Reference Format:Rasmussen, A., Porter, G., ...
14
April 2012
ICPE '12: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0, Downloads (12 Months): 3, Downloads (Overall): 135
Full text available:
PDF
Stream programs have to be crafted carefully to maximize the performance gain that can be obtained from stream processing environments. Manual fine tuning of a stream program is a very difficult process which requires considerable amount of programmer time and expertise. In this paper we present Hirundo, which is a ...
Keywords:
scalability, performance optimization, data-intensive computing, fault tolerance, stream processing
Keywords:
data-intensive computing
Full Text:
... ...$10.00.General TermsPerformance, Design, Measurement, AlgorithmsKeywordsStream processing, performance optimization, fault toler-ance, data- -intensive computing, , scalability1. INTRODUCTIONImportance of high performance data stream processinghas been ...
15
July 2012
XSEDE '12: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 1, Downloads (12 Months): 6, Downloads (Overall): 66
Full text available:
PDF
As science becomes more computation and data intensive, computing needs often exceed campus capacity. Thus we see a desire to scale from the local environment to other campuses, to national cyberinfrastructure providers such as XSEDE, and/or to cloud providers---in other words, to "bridge" to the wider world. But given the ...
Keywords:
campus bridging, computer networks, Globus, XSEDE, data-intensive computing
Keywords:
data-intensive computing
Abstract:
<p>As science becomes more computation and data intensive, , computing needs often exceed campus capacity. Thus we see a desire ...
Full Text:
... +1 303-735-3886 Jazcek.Braden@colorado.edu ABSTRACT As science becomes more computation and data intensive, , computing needs often exceed campus capacity. Thus we see a desire ... of Colorado Boulder. Keywords Globus, campus bridging, XSEDE, computer networks, data- -intensive computing 1. INTRODUCTION Increasingly computational and data intensive science means that ...
16
April 2012
The VLDB Journal — The International Journal on Very Large Data Bases: Volume 21 Issue 2, April 2012
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 17
Downloads (6 Weeks): 3, Downloads (12 Months): 36, Downloads (Overall): 808
Full text available:
PDF
The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce has enjoyed particular success. However, MapReduce lacks built-in support for iterative programs, which arise naturally in many applications including data mining, web ...
Keywords:
Large-scale analytics, Cloud data management, Data-intensive computing, Mapreduce
Keywords:
Data-intensive computing
Abstract:
... industry and academia to design new types of highly scalable data- -intensive computing platforms. MapReduce has enjoyed particular success. However, MapReduce lacks built-in ...
References:
Borkar, V., Carey, M.J., Grover, R., Onose, N., Vernica, R.: Hyracks: a flexible and extensible foundation for data-intensive computing. In: ICDE Conference (2011).
17
June 2011
ICAC '11: Proceedings of the 8th ACM international conference on Autonomic computing
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 5, Downloads (12 Months): 14, Downloads (Overall): 125
Full text available:
PDF
Cloud collaborators wish to combine large amounts of data, in the order of TBs, from multiple distributed locations to a single datacenter. Such groups are faced with the challenge of reducing the latency of the transfer, without incurring excessive dollar costs. Our Pandora system is an autonomic system that creates ...
Keywords:
data-intensive computing, wide-area data transfer, cloud computing
Keywords:
data-intensive computing
18
November 2009
UltraVis '09: Proceedings of the 2009 Workshop on Ultrascale Visualization
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2, Downloads (12 Months): 2, Downloads (Overall): 55
Full text available:
PDF
Increasingly massive datasets produced by simulations beg the question How will we connect this data to the computational and display resources that support visualization and analysis? This question is driving research into new approaches to allocating computational, storage, and network resources. In this paper we explore potential solutions that couple ...
Keywords:
coupled computations, data intensive computing, high-performance computing, simulation
Keywords:
data intensive computing
Full Text:
... prediction. General Terms Measurement, Performance, Design, Experimentation. Keywords High-performance computing, data intensive computing, , coupled computations, simulation. 1. INTRODUCTION Large-scale computational science requires ...
19
June 2014
DIDC '14: Proceedings of the sixth international workshop on Data intensive distributed computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1, Downloads (12 Months): 9, Downloads (Overall): 89
Full text available:
PDF
This paper presents a new data synchronizing transfer tool called FAST (Flexible Automated Synchronization Transfer) which allows facilities for reliably transferring multi-channel data and metadata periodically from rock physics experiments to a repository and a database located in a remote machine. FAST is compatible with all operating systems, and allows ...
Keywords:
data transfer, geoscience, data intensive computing, data streaming, distributed system
Keywords:
data intensive computing
20
May 2012
IIWeb '12: Proceedings of the Ninth International Workshop on Information Integration on the Web
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 2, Downloads (12 Months): 15, Downloads (Overall): 190
Full text available:
PDF
A growing wealth of digital information is being generated on a daily basis in social networks, blogs, online communities, etc. Organizations and researchers in a wide variety of domains recognize that there is tremendous value and insight to be gained by warehousing this emerging data and making it available for ...
Keywords:
ASTERIX, data-intensive computing, cloud computing, hyracks, semistructured data
Keywords:
data-intensive computing
References:
V. R. Borkar et al. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE, 2011.
Full Text:
... 2012 ACM 978-1-4503-1239-4/12/05 ...$10.00.semistructured data management, parallel database systems, andfirst-generation data- -intensive computing platforms (MapReduceand Hadoop), ASTERIX was envisioned to be a parallel, ...
... R. Borkar et al. Hyracks: A flexible and extensiblefoundation for data- -intensive computing. . In ICDE, 2011.[9] L. Gravano et al. Approximate string ...