skip to main content
research-article

kb-anonymity: a model for anonymized behaviour-preserving test and debugging data

Authors Info & Claims
Published:04 June 2011Publication History
Skip Abstract Section

Abstract

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot be released to third-party development houses. For example, a healthcare provider may have a set of patient records that are strictly confidential and cannot be used by any third party. Simply masking sensitive values alone may not be sufficient, as the correlation among fields in the data can reveal the masked information. Also, masked data may exhibit different behavior in the system and become less useful than the original data for testing and debugging.

For the purpose of releasing private data for testing and debugging, this paper proposes the kb-anonymity model, which combines the k-anonymity model commonly used in the data mining and database areas with the concept of program behavior preservation. Like k-anonymity, kb-anonymity replaces some information in the original data to ensure privacy preservation so that the replaced data can be released to third-party developers. Unlike k-anonymity, kb-anonymity ensures that the replaced data exhibits the same kind of program behavior exhibited by the original data so that the replaced data may still be useful for the purposes of testing and debugging. We also provide a concrete version of the model under three particular configurations and have successfully applied our prototype implementation to three open source programs, demonstrating the utility and scalability of our prototype.

References

  1. Choco solver. http://www.emn.fr/z-info/choco-solver/.Google ScholarGoogle Scholar
  2. Fujitsu develops technology to enhance comprehensive testing of java programs. http://www.fujitsu.com/global/news/pr/archives/month/2010/20100112-02.html.Google ScholarGoogle Scholar
  3. iTrust. http://sourceforge.net/projects/itrust/.Google ScholarGoogle Scholar
  4. Open hospital. http://sourceforge.net/projects/angal/.Google ScholarGoogle Scholar
  5. PDmanager. http://sourceforge.net/projects/pdmanager/.Google ScholarGoogle Scholar
  6. G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In PODS, pages 153--162, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. In Int. Conf. on Data Theory, 2005.Google ScholarGoogle Scholar
  8. S. Anand, C. Pasareanu, and W. Visser. JPF-SE: A symbolic execution extenion to Java PathFinder. In TACAS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Directed test generation for effective fault localization. In ISSTA, pages 49--60, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181--190, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Broadwell, M. Harren, and N. Sastry. Scrash: A system for generating secure crash information. In 12th USENIX Security Symposium, pages 273--284, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, pages 209--224, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker. A machine learning approach for tracing regulatory codes to product specific requirements. In ICSE, pages 155--164, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Enck, P. Gilbert, B. gon Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In PLDI, pages 213--223. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Godefroid, M. Y. Levin, and D. A. Molnar. Automated whitebox fuzz testing. In NDSS, 2008.Google ScholarGoogle Scholar
  17. P. Golle. Revisiting the uniqueness of simple demographics in the US population. In 5th ACM Workshop on Privacy in Electronic Society (WPES), pages 77--80, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Jayaraman, D. Harvison, V. Ganesh, and A. Kiezun. jFuzz: A concolic tester for NASA Java. In NASA Formal Methods Workshop, 2009.Google ScholarGoogle Scholar
  19. D. Jeffrey, N. Gupta, and R. Gupta. Fault localization using value replacement. In ISSTA, pages 167--178, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Khurshid, C. S. Păsăreanu, and W. Visser. Generalized symbolic execution for model checking and testing. In TACAS, pages 553--568, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385--394, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Li, T. Li, and S. Venkatasubramanian. $t$-closeness: Privacy beyond k-anonymity and l-diversity. In Int. Conf. Data Eng., 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, pages 141--154, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. Merlin: Specification inference for explicit information flow problems. In PLDI, pages 75--86, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. $l$-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. McCamant and M. D. Ernst. Quantitative information flow as network flow capacity. In PLDI, pages 193--205, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. L. Métayer, M. Maarek, V. V. T. Tong, E. Mazza, M.-L. Potet, N. Craipeau, S. Frénot, and R. Hardouin. Liability in software engineering: Overview of the LISE approach and illustration on a case study. In ICSE, pages 135--144, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. D. Penta, D. M. German, Y.-G. Guéhéneuc, and G. Antoniol. An exploratory study of the evolution of software licensing. In ICSE, pages 145--154, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Differential symbolic execution. In FSE, pages 226--237, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. In IEEE Trans. Software Eng., pages 929--948, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Sabelfeld and A. C. Myers. Language-based information-flow security. IEEE Journal on Selected Areas in Communications, 21(1):5--19, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Samarati. Protecting respondents' identities in microdata release. In IEEE Transactions on Knowledge and Data Engineering, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. A. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold. Test-suite augmentation for evolving software. In ASE, pages 218--227, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. A. Santelices and M. Harrold. Exploiting program dependencies for scalable multiple-path symbolic execution. In ISSTA, pages 195--206, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. A. Santelices, J. A. Jones, Y. Yu, and M. J. Harrold. Lightweight fault-localization using multiple coverage types. In ICSE, pages 56--66, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In FSE, pages 263--272, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Sweeney. Uniqueness of simple demographics in the U.S. population. Technical Report LIDAP-WP4, Carnegie Mellon University, School of Computer Science, Data Privacy Laboratory, 2000.Google ScholarGoogle Scholar
  38. L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10:557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Visser and P. Mehlitz. Model checking programs with Java PathFinder. In SPIN, http://babelfish.arc.nasa.gov/trac/jpf, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Wang, X. Wang, and Z. Li. Panalyst: Privacy-aware remote error analysis on commodity software. In 17th USENIX Security Symposium, pages 291--306, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. X. Xiao and Y. Tao. m-invariance: Towards privacy preserving re-publication of dynamic datasets. In SIGMOD, pages 689--700, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In TACAS, pages 365--381, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Zeller. Isolating cause-effect chains from computer programs. In FSE, pages 1--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Zhang, N. Gupta, and R. Gupta. Locating faults through automated predicate switching. In ICSE, pages 272--281, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Zhu, J. Jungy, D. Song, T. Kohnoz, and D. Wetherall. TaintEraser: Protecting sensitive data leaks using application-level taint tracking. ACM SIGOPS Operating Systems Review, 45(1), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. kb-anonymity: a model for anonymized behaviour-preserving test and debugging data

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM SIGPLAN Notices
                  ACM SIGPLAN Notices  Volume 46, Issue 6
                  PLDI '11
                  June 2011
                  652 pages
                  ISSN:0362-1340
                  EISSN:1558-1160
                  DOI:10.1145/1993316
                  Issue’s Table of Contents
                  • cover image ACM Conferences
                    PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
                    June 2011
                    668 pages
                    ISBN:9781450306638
                    DOI:10.1145/1993498
                    • General Chair:
                    • Mary Hall,
                    • Program Chair:
                    • David Padua

                  Copyright © 2011 ACM

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 4 June 2011

                  Check for updates

                  Qualifiers

                  • research-article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!