ABSTRACT
Database technology is playing an increasingly important role in understanding and solving large-scale and complex scientific and societal problems and phenomena, for instance, understanding biological networks, climate modeling, electronic markets, etc. In these settings, uncertainty or imprecise information is a pervasive issue that becomes a serious impediment to understanding and effectively utilizing such systems. Clustering is one of the key problems in this context.
In this paper we focus on the problem of clustering, specifically the k-center problem. Since the problem is NP-Hard in deterministic setting, a natural avenue is to consider approximation algorithms with a bounded performance ratio. In an earlier paper Cormode and McGregor had considered certain variants of this problem, but failed to provide approximations that preserved the number of centers. In this paper we remedy the situation and provide true approximation algorithms for a wider class of these problems.
However, the key aspect of this paper is to devise general techniques for optimization under uncertainty. We show that a particular formulation which uses the contribution of a random variable above its expectation is useful in this context. We believe these techniques will find wider applications in optimization under uncertainty.
- B.M. Anthony, V. Goyal, A. Gupta, and V. Nagarajan. A plant location guide for the unsure. Proceedings of SODA, pages 1164--1173, 2008. Google Scholar
Digital Library
- V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544--562, 2004. Google Scholar
Digital Library
- M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location and k-median problems. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, 1999. Google Scholar
Digital Library
- M. Charikar, S. Guha, É. Tardos, and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In STOC '99: Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 1--10, 1999. Google Scholar
Digital Library
- M. Charikar, S. Khuller, D.M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 642--651, 2001. Google Scholar
Digital Library
- G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. Proceedings of PODS, pages 191--200, 2008. Google Scholar
Digital Library
- B. Dean, M. Goemans, and J. Vondrák. Approximating the stochastic knapsack problem: The benefit of adaptivity. In Proc. of the Annual Symp. on Foundations of Computer Science, 2004. Google Scholar
Digital Library
- B.C. Dean, M.X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In SODA '05: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 395--404, 2005. Google Scholar
Digital Library
- Tomás Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Annual ACM Symp. on Theory of Computing, pages 434--444, 1988. Google Scholar
Digital Library
- A. Goel, S. Guha, and K. Munagala. How to probe for an extreme value. ACM Trans. Algorithms (to appear), 2008. Preliminary version appeared in Proc. of the ACM Symp. on Principles of Database Systems (PODS), 2006. Google Scholar
Digital Library
- A. Goel and P. Indyk. Stochastic load balancing and related problems. In Proc. of the Annual Symp. on Foundations of Computer Science, 1999. Google Scholar
Digital Library
- S. Guha and K. Munagala. Model-driven optimization using adaptive probes. CoRR, arXiv:0812.1012, 2008. Preliminary version appeared in SODA 2007. Google Scholar
Digital Library
- A. Gupta, M. Pál, R. Ravi, and A. Sinha. Boosted sampling: Approximation algorithms for stochastic optimization. In Proc. of the Annual ACM Symp. on Theory of Computing, 2004. Google Scholar
Digital Library
- A. Gupta and M. Pál. Stochastic steiner trees without a root. Proc. of ICALP, pages 1051--1063, 2005. Google Scholar
Digital Library
- A. Gupta, M. Pál, R. Ravi, and A. Sinha. What about wednesday? Approximation algorithms for multistage stochastic optimization. Proc. of APPROX-RANDOM, pages 86--98, 2005. Google Scholar
Digital Library
- A. Gupta, R. Ravi, and A. Sinha. An edge in time saves nine: LP rounding approximation algorithms for stochastic network design. In Proc. of the Annual Symp. on Foundations of Computer Science, pages 218--227, 2004. Google Scholar
Digital Library
- D. Hochbaum and D. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180--184, May 1985.Google Scholar
Digital Library
- N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. On the costs and benefits of procrastination: Approximation algorithms for stochastic combinatorial optimization problems. In Proc. of the Annual ACM-SIAM Symp. on Discrete Algorithms, 2004. Google Scholar
Digital Library
- K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274--296, 2001. Google Scholar
Digital Library
- J. Kleinberg, Y. Rabani, and É. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000. Google Scholar
Digital Library
- A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Twenty-first Conference on Uncertainty in Artificial Intelligence (UAI 2005), 2005.Google Scholar
Digital Library
- R.H. Mohring, A.S. Schulz, and M. Uetz. Approximation in stochastic scheduling: the power of LP-based priority policies. J. ACM, 46(6):924--942, 1999. Google Scholar
Digital Library
- D. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as discrete optimization. In FOCS '04: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 228--237, 2004. Google Scholar
Digital Library
- M. Skutella and M. Uetz. Scheduling precedence-constrained jobs with stochastic processing times on parallel machines. In SODA '01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 589--590, 2001. Google Scholar
Digital Library
- C. Swamy and D.B. Shmoys. Sampling-based approximation algorithms for multi-stage stochastic. In FOCS, pages 357--366, 2005. Google Scholar
Digital Library
Index Terms
Exceeding expectations and clustering uncertain data
Recommendations
Approximation algorithms for clustering uncertain data
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThere is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This uncertainty is typically formalized as probability density functions over ...
On cluster tree for nested and multi-density data clustering
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm
Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. ...






Comments