skip to main content
poster

A distributed placement service for graph-structured and tree-structured data

Published:09 January 2010Publication History
Skip Abstract Section

Abstract

Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (communication) costs, in enhancing reliability (via strategic replication policies), and in ensuring a balanced workload or enhancing the available bandwidth from massive storage devices (e.g. disk arrays).

Existing work has largely targeted the placement of relatively simple data types or entities (e.g. elements, vectors, sets, and arrays). Here we investigate several hash-based distributed data placement methods targeting tree- and graph- structured data, and develop a locality enhancing placement service for large cluster systems. Target applications include the placement of a single large graph (e.g. Web graph), a single large tree (e.g. large XML file), a forest of graphs or trees (e.g. XML database) and other specialized graph data types - bi-partite (query-click graphs), directed acyclic graphs etc. We empirically evaluate our service by demonstrating its use in improving mining executions for pattern discovery, nearest neighbor searching, graph computations, and applications that combine link and content analysis.

References

  1. A. Broder et al. Min-wise independent permutations (extended abstract). In phSTOC, pages 327--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In phWSDM, pages 95--106, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Buehrer et al. Toward terabyte pattern mining: an architecture-conscious solution. In phPPOPP, pages 2--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In phSTOC, pages 604--613, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Parthasarathy et al. Parallel Data Mining for Association Rules on Shared-Memory Systems. In phKAIS, 3 (1): 1--29, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Tatikonda and S. Parthasarathy. Hashing Tree-Structured Data: Methods and Applications. phin ICDE (to appear), 2009.Google ScholarGoogle Scholar

Index Terms

  1. A distributed placement service for graph-structured and tree-structured data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 5
          PPoPP '10
          May 2010
          346 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1837853
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
            January 2010
            372 pages
            ISBN:9781605588773
            DOI:10.1145/1693453

          Copyright © 2010 Copyright held by author(s).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 January 2010

          Check for updates

          Qualifiers

          • poster

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!