Abstract
Sampling is a very important and low-cost approach to uncertain data processing, in which output variations caused by input errors are sampled. Traditional methods tend to treat a program as a blackbox. In this paper, we show that through program analysis, we can expose the internals of sample executions so that the process can become more selective and focused. In particular, we develop a sampling runtime that can selectively sample in input error bounds to expose discontinuity in output functions. It identifies all the program factors that can potentially lead to discontinuity and hash the values of such factors during execution in a cost-effective way. The hash values are used to guide the sampling process. Our results show that the technique is very effective for real-world programs. It can achieve the precision of a high sampling rate with the cost of a lower sampling rate.
- H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90. Google Scholar
Digital Library
- V. N. Alexandrov, I. T. Dimov, A. Karaivanova, and C. J. K. Tan. Parallel monte carlo algorithms for information retrieval. Math. Comput. Simul., 62(3-6), 2003. Google Scholar
Digital Library
- J. Barhen and D. B. Reister. Uncertainty analysis based on sensitivities generated using automatic differentiation. In ICCSA, 2003. Google Scholar
Digital Library
- I. Beichl, Y. A. Teng, and J. L. Blue. Parallel monte carlo simulation of mbe growth. In IPPS, 1995. Google Scholar
Digital Library
- M. Carbin and M. C. Rinard. Automatically identifying critical input regions and code in applications. In ISSTA, 2010. Google Scholar
Digital Library
- S. Chaudhuri, S. Gulwani, and R. Lublinerman. Continuity analysis of programs. In POPL, 2010. Google Scholar
Digital Library
- S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving programs robust. In ESEC/FSE, 2011. Google Scholar
Digital Library
- J. Clause, W. Li, and A. Orso. Dytan: a generic dynamic taint analysis framework In ISSTA, 2007. Google Scholar
Digital Library
- U. Consortium. The universal protein resource (uniprot) in 2010. Nucleic Acids Res, 38(Database issue), Jan 2010.Google Scholar
- P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based Whitebox Fuzzing. In PLDI, 2008. Google Scholar
Digital Library
- P. Godefroid, M. Y. Levin, and D. Molnar. Automated White-box Fuzz Testing. In NDSS, 2008.Google Scholar
- M. P.E. Heimdahl, Y. Choi, and M. W. Whalen. Deviation Analysis Through Model Checking. In ASE, 2002. Google Scholar
Digital Library
- J. C. Helton, J. D. Johnson, C. J. Sallaberry, and C. B. Storlie. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliability Eng. & Sys. Safety, 91(10-11), 2006.Google Scholar
- Y. C. Ho, M. A. Eyler, and T. T. Chien. A gradient technique for general buffer storage design in a production line. International Journal of Production Research, 1979.Google Scholar
Cross Ref
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 2008. Google Scholar
Digital Library
- P. D. Karp. What we do not know about sequence analysis and sequence databases. Bioinformatics, 14(9), 1998.Google Scholar
- M. D. McKay, R. J. Beckman, and W. J. Conover. A compari-son of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1), 2000. Google Scholar
Digital Library
- M. G. Morgan and M. Henrion. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, 1992.Google Scholar
- S. McCamant and M. Ernst. Quantitative Information Flow as Network Flow Capacity. In PLDI, 2007. Google Scholar
Digital Library
- J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In NDSS, 2005.Google Scholar
- S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. E. Hambrusch, J. Neville, and R. Cheng. Database support for probabilistic attributes and tuples. In ICDE, 2008. Google Scholar
Digital Library
- W. N. Sumner, T. Bao, X. Zhang, and S. Prabhakar. Coalescing executions for fast uncertainty analysis. In ICSE, 2011. Google Scholar
Digital Library
- E. Tang, E. Barr, X. Li, and Z. Su. Perturbing numerical calculations for statistical analysis of floating-point program (in)stability. In ISSTA, 2010. Google Scholar
Digital Library
- S. Tripathi and R. S. Govindaraju. Engaging uncertainty in hydrologic data sets using principal component analysis: Banpca algorithm. Water Resour. Res., 44(10), Oct 2008.Google Scholar
- B. A. Worley. Deterministic uncertainty analysis. Technical Report ORNL-6428, Oak Ridge National Lab. TN (USA), 1987.Google Scholar
Cross Ref
- M. Zhang, X. Zhang, X. Zhang, and S. Prabhakar. Tracing lineage beyond relational operators. In VLDB, 2007. Google Scholar
Digital Library
- X. Zhang, W. Hines, J. Adamec, J. M. Asara, S. Naylor, and F. E. Regnier. An automated method for the analysis of stable isotope labeling data in proteomics. Journal of the American Society for Mass Spectrometry, 16(7):1181--1191, July 2005.Google Scholar
Cross Ref
- X. Zhang, S. Tallam, N. Gupta, and R. Gupta. Towards locating execution omission errors. In PLDI, San Diego, CA, 2007. Google Scholar
Digital Library
Index Terms
White box sampling in uncertain data processing enabled by program analysis
Recommendations
White box sampling in uncertain data processing enabled by program analysis
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsSampling is a very important and low-cost approach to uncertain data processing, in which output variations caused by input errors are sampled. Traditional methods tend to treat a program as a blackbox. In this paper, we show that through program ...
Variance Analysis of Multi-sample and One-sample Multiple Importance Sampling
We reexamine in this paper the variance for the Multiple Importance Sampling MIS estimator for multi-sample and one-sample model. As a result of our analysis we can obtain the optimal estimator for the multi-sample model for the case where the weights ...
Implicitly adaptive importance sampling
AbstractAdaptive importance sampling is a class of techniques for finding good proposal distributions for importance sampling. Often the proposal distributions are standard probability distributions whose parameters are adapted based on the mismatch ...







Comments