It is our great pleasure to welcome you to the Fifth International Workshop on Data-Intensive Distributed Computing (DIDC 2012), which is held in conjunction with the ACM/ International Conference on High Performance Distributed Computing (HPDC 2012).
The data needs of scientific as well as commercial applications from a diverse range of fields have been increasing exponentially over the recent years. This increase in the demand for large-scale data processing has necessitated collaboration and sharing of data collections among the world's leading education, research, and industrial institutions and use of distributed resources owned by collaborating parties. In a widely distributed environment, data is often not locally accessible and has thus to be remotely retrieved and stored. While traditional distributed systems work well for computation that requires limited data handling, they may fail in unexpected ways when the computation accesses, creates, and moves large amounts of data especially over wide-area networks. Further, data accessed and created is often poorly described, lacking both metadata and provenance. Scientists, researchers, and application developers are often forced to solve basic data-handling issues, such as physically locating data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis.
DIDC focuses on the challenges imposed by data-intensive applications on distributed systems, and on the different state-of-the-art solutions proposed to overcome these challenges. It brings together the collaborative and distributed computing community and the data management community in an effort to generate productive conversations on the planning, management, and scheduling of data handling tasks and data storage resources.
Proceeding Downloads
Data-intensive discoveries in science: the fourth paradigm
Scientific computing is increasingly revolving around massive amounts of data. From physical sciences to numerical simulations to high throughput genomics and homeland security, we are soon dealing with Petabytes if not Exabytes of data. This new, data-...
Job and data clustering for aggregate use of multiple production cyberinfrastructures
- Ketan Maheshwari,
- Allan Espinosa,
- Daniel S. Katz,
- Michael Wilde,
- Zhao Zhang,
- Ian Foster,
- Scott Callaghan,
- Phillip Maechling
In this paper, we address the challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with ...
Semantic based data collection for large scale cloud systems
Current tools for monitoring cloud systems are designed for physical servers and are not intended to handle rapid elasticity or dynamic behaviour while operating at scale. Though current monitoring tools can be applied to small cloud systems, the volume ...
Consistency and fault tolerance for erasure-coded distributed storage systems
One challenge in applying erasure codes (or error-correcting codes) to distributed storage systems is to maintain consistency between data and redundancy blocks in the face of crashing servers. We present two access protocols that provide sequential ...
Experiences with 100Gbps network applications
- Mehmet Balman,
- Eric Pouyoul,
- Yushu Yao,
- E. Wes Bethel,
- Burlen Loring,
- Mr Prabhat,
- John Shalf,
- Alex Sim,
- Brian L. Tierney
100Gbps networking has finally arrived, and many research and educational institutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Supercomputing 2011 ...
A study of lustre networking over a 100 gigabit wide area network with 50 milliseconds of latency
As part of the SCinet Research Sandbox at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Indiana University utilized a dedicated 100 Gbps wide area network (WAN) link spanning more than 3,500 ...
Virtual network on demand: dedicating network resources to distributed scientific workflows
The VNOD project aims to build an on-demand network virtualization infrastructure that can deliver the unprecedented networking performance and quality of service required by modern, distributed, data-intensive applications utilized by user communities. ...
Index Terms
Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
Recommendations
Acceptance Rates
| Year | Submitted | Accepted | Rate |
|---|---|---|---|
| DIDC '14 | 12 | 7 | 58% |
| Overall | 12 | 7 | 58% |




