Abstract
Research has shown that correctly conducting and analysing computer performance experiments is difficult. This paper investigates what is necessary to conduct successful computer performance evaluation by attempting to repeat a prior experiment: the comparison between two Linux schedulers.
In our efforts, we found that exploring an experimental space through a series of incremental experiments can be inconclusive, and there may be no indication of how much experimentation will be enough. Analysis of variance (ANOVA), a traditional analysis method, is able to partly solve the problems with the previous approach, but we demonstrate that ANOVA can be insufficient for proper analysis due to the requirements it imposes on the data.
Finally, we demonstrate the successful application of quantile regression, a recent development in statistics, to computer performance experiments. Quantile regression can provide more insight into the experiment than ANOVA, with the additional benefit of being applicable to data from any distribution. This property makes it especially useful in our field, since non-normally distributed data is common in computer experiments.
- J. Axboe. Latt benchmark. http://git.kernel.dk/?p=latt.git;a=summary.Google Scholar
- B. S. Cade and B. R. Noon. A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment, 1 (8): 412--420, 2003.Google Scholar
Cross Ref
- S. K. Card, A. Newell, and T. P. Moran. The Psychology of Human-Computer Interaction. L. Erlbaum Associates Inc., Hillsdale, NJ, USA, 1983. ISBN 0898592437. Google Scholar
Digital Library
- M. E. Crovella and A. Bestavros. Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans. Netw., 5 (6): 835--846, Dec. 1997. ISSN 1063-6692. 10.1109/90.650143. URL http://dx.doi.org/10.1109/90.650143. Google Scholar
Digital Library
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-786-5. http://doi.acm.org/10.1145/1297027.1297033. URL http://doi.acm.org/10.1145/1297027.1297033. Google Scholar
Digital Library
- Gmane. Linux Kernel Mailing List - BFS vs. mainline scheduler benchmarks and measurements. http://thread.gmane.org/gmane.linux.kernel/886319.Google Scholar
- M. Harchol-balter. The effect of heavy-tailed job size distributions on computer system design. In Proc. of ASA-IMS Conf. on Applications of Heavy Tailed Distributions in Economics, 1999.Google Scholar
- A. S. Harji, P. A. Buhr, and T. Brecht. Our troubles with linux and why you should care. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, pages 2:1--2:5, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-1179-3. 10.1145/2103799.2103802. URL http://doi.acm.org/10.1145/2103799.2103802. Google Scholar
Digital Library
- R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley-Interscience, New York, 1991.Google Scholar
- T. Kalibera and R. Jones. Quantifying performance changes with effect size confidence intervals. Technical Report 4-12, University of Kent, June 2012. URL http://www.cs.kent.ac.uk/pubs/2012/3233.Google Scholar
- T. Kalibera, L. Bulej, and P. Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunications Systems (SPECTS), pages 853--862. SCS, 2005.Google Scholar
- R. Koenker and J. Bassett, Gilbert. Regression quantiles. Econometrica, 46 (1): pp. 33--50, 1978. ISSN 00129682. URL http://www.jstor.org/stable/1913643.Google Scholar
Cross Ref
- R. Koenker and K. F. Hallock. Quantile regression. Journal of Economic Perspectives, 15 (4): 143--156, Fall 2001.Google Scholar
Cross Ref
- C. Kolivas. FAQS about BFS. v0.330. http://ck.kolivas.org/patches/bfs/bfs-faq.txt.Google Scholar
- D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000. Google Scholar
Digital Library
- I. Molnar. Design of the CFS scheduler. http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt.Google Scholar
- D. C. Montgomery. Design and Analysis of Experiments. John Wiley & Sons, 2006. ISBN 0470088109. Google Scholar
Digital Library
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '09, pages 265--276, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-406-5. http://doi.acm.org/10.1145/1508244.1508275. URL http://doi.acm.org/10.1145/1508244.1508275. Google Scholar
Digital Library
- V. Paxson and S. Floyd. Wide area traffic: the failure of poisson modeling. IEEE/ACM Trans. Netw., 3 (3): 226--244, June 1995. ISSN 1063-6692. 10.1109/90.392383. URL http://dx.doi.org/10.1109/90.392383. Google Scholar
Digital Library
- Tukaani Project. The .xz file format. http://tukaani.org/xz/format.html.Google Scholar
- VideoLAN. x264. http://www.videolan.org/developers/x264.html.Google Scholar
- J. Vitek and T. Kalibera. Repeatability, Reproducibility, and Rigor in Systems Research. In Proceedings of the Ninth ACM International Conference on Embedded Software, EMSOFT '11, pages 33--38, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0714-7. 10.1145/2038642.2038650. URL http://doi.acm.org/10.1145/2038642.2038650. Google Scholar
Digital Library
- Xiph.Org Foundation. Xiph.org Video Test Media. http://media.xiph.org/video/derf/.Google Scholar
Index Terms
Why you should care about quantile regression
Recommendations
Why you should care about quantile regression
ASPLOS '13Research has shown that correctly conducting and analysing computer performance experiments is difficult. This paper investigates what is necessary to conduct successful computer performance evaluation by attempting to repeat a prior experiment: the ...
Why you should care about quantile regression
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsResearch has shown that correctly conducting and analysing computer performance experiments is difficult. This paper investigates what is necessary to conduct successful computer performance evaluation by attempting to repeat a prior experiment: the ...
Flexible parametric quantile regression model
This article introduces regression quantile models using both RS (Ramberg and Schmeiser, Commun Assoc Comput Mach 17:78---82, 1974) and FKML (Freimer et al., Commun Stat 17(10):3547---3567, 1988) generalised lambda distributions (GLD) and demonstrates ...







Comments