skip to main content
10.1145/3025453.3025912acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Honorable Mention

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

Published:02 May 2017Publication History

ABSTRACT

Datasets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This paper presents a novel method for generating such datasets, along with several examples. Our technique varies from previous approaches in that new datasets are iteratively generated from a seed dataset through random perturbations of individual data points, and can be directed towards a desired outcome through a simulated annealing optimization strategy. Our method has the benefit of being agnostic to the particular statistical properties that are to remain constant between the datasets, and allows for control over the graphical appearance of resulting output.

Skip Supplemental Material Section

Supplemental Material

suppl.mov

Supplemental video

suppl.mov

Supplemental video

References

  1. Anscombe, F.J. (1973). Graphs in Statistical Analysis. The American Statistician 27, 1, 17--21. Google ScholarGoogle ScholarCross RefCross Ref
  2. Bach, B., Spritzer, A., Lutton, E., and Fekete, J.-D. (2012). Interactive Random Graph Generation with Evolutionary Algorithms. SpringerLink, 541--552.Google ScholarGoogle Scholar
  3. Blyth, C.R. (1972). On Simpson's Paradox and the Sure-Thing Principle. Journal of the American Statistical Association 67, 338, 364--366. Google ScholarGoogle ScholarCross RefCross Ref
  4. Cairo, A. Download the Datasaurus: Never trust summary statistics alone; always visualize your data. http://www.thefunctionalart.com/2016/08/downloaddatasaurus-never-trust-summary.html.Google ScholarGoogle Scholar
  5. Chatterjee, S. and Firat, A. (2007). Generating Data with Identical Statistics but Dissimilar Graphics. The American Statistician 61, 3, 248--254. Google ScholarGoogle ScholarCross RefCross Ref
  6. Fung, B.C.M., Wang, K., Chen, R., and Yu, P.S. (2010). Privacy-preserving Data Publishing: A Survey of Recent Developments. ACM Comput. Surv. 42, 4, 14:1--14:53. Google ScholarGoogle ScholarCross RefCross Ref
  7. Govindaraju, K. and Haslett, S.J. (2008). Illustration of regression towards the means. International Journal of Mathematical Education in Science and Technology 39, 4, 544--550. Google ScholarGoogle ScholarCross RefCross Ref
  8. Haslett, S.J. and Govindaraju, K. (2009). Cloning Data: Generating Datasets with Exactly the Same Multiple Linear Regression Fit. Australian & New Zealand Journal of Statistics 51, 4, 499--503. Google ScholarGoogle ScholarCross RefCross Ref
  9. Hwang, C.-R. Simulated annealing: Theory and applications. Acta Applicandae Mathematica 12, 1, 108--111.Google ScholarGoogle Scholar
  10. Simpson, E.H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society. Series B (Methodological) 13, 2, 238--241.Google ScholarGoogle Scholar
  11. Stefanski, L.A. (2007). Residual (Sur)Realism. The American Statistician, . Google ScholarGoogle ScholarCross RefCross Ref
  12. Wickham, H., Cook, D., Hofmann, H., and Buja, A. (2010). Graphical inference for infovis. IEEE Transactions on Visualization and Computer Graphics 16, 6, 973--979. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
      May 2017
      7138 pages
      ISBN:9781450346559
      DOI:10.1145/3025453

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate5,789of24,782submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader