Abstract
Many internal software metrics and external quality attributes of Java programs correlate strongly with program size. This knowledge has been used pervasively in quantitative studies of software through practices such as normalization on size metrics. This paper reports size-related super- and sublinear effects that have not been known before. Findings obtained on a very large collection of Java programs -- 30,911 projects hosted at Google Code as of Summer 2011 -- unveils how certain characteristics of programs vary disproportionately with program size, sometimes even non-monotonically. Many of the specific parameters of nonlinear relations are reported. This result gives further insights for the differences of ``programming in the small'' vs. ``programming in the large.'' The reported findings carry important consequences for OO software metrics, and software research in general: metrics that have been known to correlate with size can now be properly normalized so that all the information that is left in them is size-independent.
Supplemental Material
Available for Download
Virtual Machine containing data and code supporting the paper.
- S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Science of Computer Programming, 79:241 – 259, 2014. Google Scholar
Digital Library
- Experimental Software and Toolkits (EST 4): A special issue of the Workshop on Academic Software Development Tools and Techniques (WASDeTT-3 2010).Google Scholar
- G. Baxter, M. Frean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero. Understanding the shape of Java software. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 397–412, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- L. C. Briand, J. Wst, J. W. Daly, and D. V. Porter. Exploring the relationships between design measures and software quality in object-oriented systems. Journal of Systems and Software, 51(3):245 – 273, 2000. Google Scholar
Digital Library
- O. Calla´u, R. Robbes, E. Tanter, and D. Röthlisberger. How developers use the dynamic features of programming languages: the case of smalltalk. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pages 23–32, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- M. Cartwright and M. Shepperd. An empirical investigation of an object-oriented software system. Software Engineering, IEEE Transactions on, 26(8):786–796, Aug 2000. Google Scholar
Digital Library
- S. R. Chidamber and C. F. Kemerer. Towards a metrics suite for object oriented design. In Conference Proceedings on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’91, pages 197–211, New York, NY, USA, 1991. ACM. Google Scholar
Digital Library
- C. Collberg, G. Myles, and M. Stepp. An empirical study of Java bytecode programs. Software: Practice and Experience, 37(6):581–641, 2007. Google Scholar
Digital Library
- F. DeRemer and H. Kron. Programming-in-the large versus programming-in-the-small. SIGPLAN Not., 10(6):114–121, Apr. 1975. Google Scholar
Digital Library
- K. El Emam, S. Benlarbi, N. Goel, and S. Rai. The confounding effect of class size on the validity of object-oriented metrics. Software Engineering, IEEE Transactions on, 27(7): 630–650, Jul 2001. Google Scholar
Digital Library
- W. Evanco. Comments on ”the confounding effect of class size on the validity of object-oriented metrics”. Software Engineering, IEEE Transactions on, 29(7):670–672, July 2003. Google Scholar
Digital Library
- M. A. Fortuna, J. A. Bonachela, and S. A. Levin. Evolution of a modular software network. Proceedings of the National Academy of Sciences, 108(50):19985–19989, 2011.Google Scholar
Cross Ref
- M. Gherardi, S. Mandr, B. Bassetti, and M. Cosentino Lagomarsino. Evidence for soft bounds in ubuntu package sizes and mammalian body masses. Proceedings of the National Academy of Sciences, 110(52):21054–21058, 2013.Google Scholar
Cross Ref
- J. Gil and K. Lenz. The use of overloading in Java programs. In Proceedings of the 24th European conference on Objectoriented programming, ECOOP’10, pages 529–551, Berlin, Heidelberg, 2010. Springer-Verlag. Google Scholar
Digital Library
- M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, pages 11:1–11:10, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- D. Landman, A. Serebrenik, and J. Vinju. Empirical analysis of the relationship between cc and sloc in a large corpus of Java methods. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on, Sept 2014. Google Scholar
Digital Library
- C. Lopes and J. Ossher. Sourcerer datasets, 2012. URL http://sourcerer.ics.uci.edu.Google Scholar
- C. Lopes, J. Ossher, S. Bajracharya, and P. Ribeiro. Sourcerer project, 2015. URL https://github.com/Mondego/Sourcerer.Google Scholar
- P. Louridas, D. Spinellis, and V. Vlachos. Power laws in software. ACM Trans. Softw. Eng. Methodol., 18(1):2:1–2:26, Oct. 2008. Google Scholar
Digital Library
- J. D. McGregor and T. D. Korson. Introduction to the special issue. Comm. ACM, 33(9), Oct. 1990. Google Scholar
Digital Library
- T. M. Meyers and D. Binkley. An empirical study of slicebased cohesion and coupling metrics. ACM Trans. Softw. Eng. Methodol., 17(1):2:1–2:27, Dec. 2007. Google Scholar
Digital Library
- R. Muschevici, A. Potanin, E. Tempero, and J. Noble. Multiple dispatch in practice. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications, OOPSLA ’08, pages 563–582, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- C. R. Myers. Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs. Phys. Rev. E, 68:046116, Oct 2003.Google Scholar
- J. Ossher, S. Bajracharya, E. Linstead, P. Baldi, and C. Lopes. SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR ’09, pages 183–186, Washington, DC, USA, 2009. IEEE Computer Society. Google Scholar
Digital Library
- A. Potanin, J. Noble, M. Frean, and R. Biddle. Scale-free geometry in OO programs. Commun. ACM, 48(5):99–103, May 2005. Google Scholar
Digital Library
- H. Sajnani, V. Saini, J. Ossher, and C. Lopes. Is popularity a measure of quality? An analysis of maven components. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on, pages 231–240, Sept 2014. Google Scholar
Digital Library
- E. Tempero, J. Noble, and H. Melton. How do Java programs use inheritance? An empirical study of inheritance in Java software. In Proceedings of the 22nd European conference on Object-Oriented Programming, ECOOP ’08, pages 667–691, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
Digital Library
- E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton, and J. Noble. Qualitas corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference (APSEC2010), pages 336– 345, Dec. 2010. Google Scholar
Digital Library
- S. Valverde and R. V. Solé. Logarithmic growth dynamics in software networks. EPL (Europhysics Letters), 72(5):858, 2005.Google Scholar
- S. Valverde, R. Ferrer Cancho, and R. V. Solé. Scale-free networks from optimal design. EPL (Europhysics Letters), 60:512–517, Nov. 2002.Google Scholar
- X. Zheng, D. Zeng, H. Li, and F. Wang. Analyzing opensource software systems as complex networks. Physica A: Statistical Mechanics and its Applications, 387(24):6190 – 6200, 2008.Google Scholar
Cross Ref
Index Terms
How scale affects structure in Java programs
Recommendations
How scale affects structure in Java programs
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsMany internal software metrics and external quality attributes of Java programs correlate strongly with program size. This knowledge has been used pervasively in quantitative studies of software through practices such as normalization on size metrics. ...
Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study
Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been ...
An empirical evaluation of coupling metrics on aspect-oriented programs
WETSoM '10: Proceedings of the 2010 ICSE Workshop on Emerging Trends in Software MetricsCoupling metrics received increased recognition by object-oriented (OO) software developers when they were found to be indicators of important quality attributes, such as fault-proneness. However, there is no consensus on which coupling metrics are ...






Comments