skip to main content
10.1145/1559795.1559836acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Exceeding expectations and clustering uncertain data

Published:29 June 2009Publication History

ABSTRACT

Database technology is playing an increasingly important role in understanding and solving large-scale and complex scientific and societal problems and phenomena, for instance, understanding biological networks, climate modeling, electronic markets, etc. In these settings, uncertainty or imprecise information is a pervasive issue that becomes a serious impediment to understanding and effectively utilizing such systems. Clustering is one of the key problems in this context.

In this paper we focus on the problem of clustering, specifically the k-center problem. Since the problem is NP-Hard in deterministic setting, a natural avenue is to consider approximation algorithms with a bounded performance ratio. In an earlier paper Cormode and McGregor had considered certain variants of this problem, but failed to provide approximations that preserved the number of centers. In this paper we remedy the situation and provide true approximation algorithms for a wider class of these problems.

However, the key aspect of this paper is to devise general techniques for optimization under uncertainty. We show that a particular formulation which uses the contribution of a random variable above its expectation is useful in this context. We believe these techniques will find wider applications in optimization under uncertainty.

References

  1. B.M. Anthony, V. Goyal, A. Gupta, and V. Nagarajan. A plant location guide for the unsure. Proceedings of SODA, pages 1164--1173, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544--562, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location and k-median problems. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Charikar, S. Guha, É. Tardos, and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In STOC '99: Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 1--10, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Charikar, S. Khuller, D.M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 642--651, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. Proceedings of PODS, pages 191--200, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Dean, M. Goemans, and J. Vondrák. Approximating the stochastic knapsack problem: The benefit of adaptivity. In Proc. of the Annual Symp. on Foundations of Computer Science, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B.C. Dean, M.X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In SODA '05: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 395--404, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tomás Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Annual ACM Symp. on Theory of Computing, pages 434--444, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Goel, S. Guha, and K. Munagala. How to probe for an extreme value. ACM Trans. Algorithms (to appear), 2008. Preliminary version appeared in Proc. of the ACM Symp. on Principles of Database Systems (PODS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Goel and P. Indyk. Stochastic load balancing and related problems. In Proc. of the Annual Symp. on Foundations of Computer Science, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Guha and K. Munagala. Model-driven optimization using adaptive probes. CoRR, arXiv:0812.1012, 2008. Preliminary version appeared in SODA 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gupta, M. Pál, R. Ravi, and A. Sinha. Boosted sampling: Approximation algorithms for stochastic optimization. In Proc. of the Annual ACM Symp. on Theory of Computing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gupta and M. Pál. Stochastic steiner trees without a root. Proc. of ICALP, pages 1051--1063, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gupta, M. Pál, R. Ravi, and A. Sinha. What about wednesday? Approximation algorithms for multistage stochastic optimization. Proc. of APPROX-RANDOM, pages 86--98, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gupta, R. Ravi, and A. Sinha. An edge in time saves nine: LP rounding approximation algorithms for stochastic network design. In Proc. of the Annual Symp. on Foundations of Computer Science, pages 218--227, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Hochbaum and D. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180--184, May 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. On the costs and benefits of procrastination: Approximation algorithms for stochastic combinatorial optimization problems. In Proc. of the Annual ACM-SIAM Symp. on Discrete Algorithms, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274--296, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kleinberg, Y. Rabani, and É. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Twenty-first Conference on Uncertainty in Artificial Intelligence (UAI 2005), 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R.H. Mohring, A.S. Schulz, and M. Uetz. Approximation in stochastic scheduling: the power of LP-based priority policies. J. ACM, 46(6):924--942, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as discrete optimization. In FOCS '04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 228--237, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Skutella and M. Uetz. Scheduling precedence-constrained jobs with stochastic processing times on parallel machines. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 589--590, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Swamy and D.B. Shmoys. Sampling-based approximation algorithms for multi-stage stochastic. In FOCS, pages 357--366, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exceeding expectations and clustering uncertain data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2009
      298 pages
      ISBN:9781605585536
      DOI:10.1145/1559795
      • General Chair:
      • Jan Paredaens,
      • Program Chair:
      • Jianwen Su

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!