skip to main content
research-article

White box sampling in uncertain data processing enabled by program analysis

Published:19 October 2012Publication History
Skip Abstract Section

Abstract

Sampling is a very important and low-cost approach to uncertain data processing, in which output variations caused by input errors are sampled. Traditional methods tend to treat a program as a blackbox. In this paper, we show that through program analysis, we can expose the internals of sample executions so that the process can become more selective and focused. In particular, we develop a sampling runtime that can selectively sample in input error bounds to expose discontinuity in output functions. It identifies all the program factors that can potentially lead to discontinuity and hash the values of such factors during execution in a cost-effective way. The hash values are used to guide the sampling process. Our results show that the technique is very effective for real-world programs. It can achieve the precision of a high sampling rate with the cost of a lower sampling rate.

References

  1. H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Alexandrov, I. T. Dimov, A. Karaivanova, and C. J. K. Tan. Parallel monte carlo algorithms for information retrieval. Math. Comput. Simul., 62(3-6), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Barhen and D. B. Reister. Uncertainty analysis based on sensitivities generated using automatic differentiation. In ICCSA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Beichl, Y. A. Teng, and J. L. Blue. Parallel monte carlo simulation of mbe growth. In IPPS, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Carbin and M. C. Rinard. Automatically identifying critical input regions and code in applications. In ISSTA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri, S. Gulwani, and R. Lublinerman. Continuity analysis of programs. In POPL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving programs robust. In ESEC/FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Clause, W. Li, and A. Orso. Dytan: a generic dynamic taint analysis framework In ISSTA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. U. Consortium. The universal protein resource (uniprot) in 2010. Nucleic Acids Res, 38(Database issue), Jan 2010.Google ScholarGoogle Scholar
  10. P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based Whitebox Fuzzing. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Godefroid, M. Y. Levin, and D. Molnar. Automated White-box Fuzz Testing. In NDSS, 2008.Google ScholarGoogle Scholar
  12. M. P.E. Heimdahl, Y. Choi, and M. W. Whalen. Deviation Analysis Through Model Checking. In ASE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. C. Helton, J. D. Johnson, C. J. Sallaberry, and C. B. Storlie. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliability Eng. & Sys. Safety, 91(10-11), 2006.Google ScholarGoogle Scholar
  14. Y. C. Ho, M. A. Eyler, and T. T. Chien. A gradient technique for general buffer storage design in a production line. International Journal of Production Research, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. D. Karp. What we do not know about sequence analysis and sequence databases. Bioinformatics, 14(9), 1998.Google ScholarGoogle Scholar
  17. M. D. McKay, R. J. Beckman, and W. J. Conover. A compari-son of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. G. Morgan and M. Henrion. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, 1992.Google ScholarGoogle Scholar
  19. S. McCamant and M. Ernst. Quantitative Information Flow as Network Flow Capacity. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In NDSS, 2005.Google ScholarGoogle Scholar
  21. S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. E. Hambrusch, J. Neville, and R. Cheng. Database support for probabilistic attributes and tuples. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. N. Sumner, T. Bao, X. Zhang, and S. Prabhakar. Coalescing executions for fast uncertainty analysis. In ICSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Tang, E. Barr, X. Li, and Z. Su. Perturbing numerical calculations for statistical analysis of floating-point program (in)stability. In ISSTA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Tripathi and R. S. Govindaraju. Engaging uncertainty in hydrologic data sets using principal component analysis: Banpca algorithm. Water Resour. Res., 44(10), Oct 2008.Google ScholarGoogle Scholar
  25. B. A. Worley. Deterministic uncertainty analysis. Technical Report ORNL-6428, Oak Ridge National Lab. TN (USA), 1987.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Zhang, X. Zhang, X. Zhang, and S. Prabhakar. Tracing lineage beyond relational operators. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Zhang, W. Hines, J. Adamec, J. M. Asara, S. Naylor, and F. E. Regnier. An automated method for the analysis of stable isotope labeling data in proteomics. Journal of the American Society for Mass Spectrometry, 16(7):1181--1191, July 2005.Google ScholarGoogle ScholarCross RefCross Ref
  28. X. Zhang, S. Tallam, N. Gupta, and R. Gupta. Towards locating execution omission errors. In PLDI, San Diego, CA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. White box sampling in uncertain data processing enabled by program analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 47, Issue 10
          OOPSLA '12
          October 2012
          1011 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2398857
          Issue’s Table of Contents
          • cover image ACM Conferences
            OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
            October 2012
            1052 pages
            ISBN:9781450315616
            DOI:10.1145/2384616

          Copyright © 2012 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2012

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!