Abstract
Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (communication) costs, in enhancing reliability (via strategic replication policies), and in ensuring a balanced workload or enhancing the available bandwidth from massive storage devices (e.g. disk arrays).
Existing work has largely targeted the placement of relatively simple data types or entities (e.g. elements, vectors, sets, and arrays). Here we investigate several hash-based distributed data placement methods targeting tree- and graph- structured data, and develop a locality enhancing placement service for large cluster systems. Target applications include the placement of a single large graph (e.g. Web graph), a single large tree (e.g. large XML file), a forest of graphs or trees (e.g. XML database) and other specialized graph data types - bi-partite (query-click graphs), directed acyclic graphs etc. We empirically evaluate our service by demonstrating its use in improving mining executions for pattern discovery, nearest neighbor searching, graph computations, and applications that combine link and content analysis.
- A. Broder et al. Min-wise independent permutations (extended abstract). In phSTOC, pages 327--336, 1998. Google Scholar
Digital Library
- G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In phWSDM, pages 95--106, 2008. Google Scholar
Digital Library
- G. Buehrer et al. Toward terabyte pattern mining: an architecture-conscious solution. In phPPOPP, pages 2--12, 2007. Google Scholar
Digital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In phSTOC, pages 604--613, 1998. Google Scholar
Digital Library
- S. Parthasarathy et al. Parallel Data Mining for Association Rules on Shared-Memory Systems. In phKAIS, 3 (1): 1--29, 2001. Google Scholar
Digital Library
- S. Tatikonda and S. Parthasarathy. Hashing Tree-Structured Data: Methods and Applications. phin ICDE (to appear), 2009.Google Scholar
Index Terms
A distributed placement service for graph-structured and tree-structured data
Recommendations
A distributed placement service for graph-structured and tree-structured data
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingEffective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (...
Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems
AbstractIn erasure-coded distributed storage systems, the rapid completion of data placement process is very critical to maintain system performance, where the process is defined as to insert coded blocks into a set of redundant storage nodes. ...
The SPQR-tree data structure in graph drawing
ICALP'03: Proceedings of the 30th international conference on Automata, languages and programmingThe data structure SPQR-tree represents the decomposition of a biconnected graph with respect to its triconnected components. SPQR-trees have been introduced by Di Battista and Tamassia [13] based on ideas by Bienstock and Monma [9, 10]. For planar ...







Comments