Abstract
A significant portion of a program's execution cycles are typically dedicated to performing conditional transfers of control. Much of the research on reducing the costs of these operations has focused on the branch, while the comparison has been largely ignored. In this paper we investigate reducing the cost of comparisons in conditional transfers of control. We decouple the specification of the values to be compared from the actual comparison itself, which now occurs as part of the branch instruction. The specification of the register or immediate values involved in the comparison is accomplished via a new instruction called a comparison specification, which is loop invariant. Decoupling the specification of the comparison from the actual comparison performed before the branch reduces the number of instructions in the loop, which provides performance benefits not possible when using conventional comparison instructions. Results from applying this technique on the ARM processor show that both the number of instructions executed and execution cycles are reduced.
- Allen, F. E., and Cocke, J. A catalogue of optimizing transformations. In Design and Optimization of Compilers, R. Rustin, Ed. Prentice-Hall, Englewood Cliffs, NJ, USA, 1971, pp. 1--30.Google Scholar
- Austin, T., Larson, E., and Ernst, D. SimpleScalar: An infrastructure for computer system modeling. Computer 35, 2 (Feb. 2002), 59--67. Google Scholar
Digital Library
- Benitez, M. E., and Davidson, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation (Atlanta, GA, USA, June 1988), ACM Press, pp. 329--338. Google Scholar
Digital Library
- Bodík, R., Gupta, R., and Soffa, M. L. Interprocedural conditional branch elimination. In Proceedings of the SIGPLAN '97 Conference on Programming Language Design and Implementation (New York, June 15--18 1997), vol. 32, 5 of ACM SIGPLAN Notices, ACM Press, pp. 146--158. Google Scholar
Digital Library
- Dongarra, J. J., and Hinds, A. R. Unrolling loops in FORTRAN. Software, Practice and Experience 9, 3 (Mar. 1979), 219--226.Google Scholar
Cross Ref
- Furber, S. ARM System-on-Chip Architecture, second ed. Addison-Wesley Longman, Harlow, Essex CM20 2JE, England, 2000. Also available in Japanese translation, ARM Processor, C Q Publishing Co., Ltd. ISBN 4-7898-3351-8. Google Scholar
Digital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001). Google Scholar
Digital Library
- Hennessy, J., and Patterson, D. Computer Architecture: A Quantitative Approach., second ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 1996. Google Scholar
Digital Library
- Jimenez, D., and Lin, C. Neural methods for dynamic branch prediction. In ACM Transactions on Computer Systems (Nov. 2002), vol. 20, ACM, pp. 369--397. Google Scholar
Digital Library
- McFarling, S. Combining branch predictors. Tech. Rep. TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.Google Scholar
- McFarling, S., and Hennessy, J. Reducing the cost of branches. In Proc. 13th Annual International Symposium on Computer Architecture, Computer Architecture News (June 1986), ACM, pp. 396--403. Published as Proc. 13th Annual International Symposium on Computer Architecture, Computer Architecture News, volume 14, number 2. Google Scholar
Digital Library
- Muchnick, S. S. Advanced compiler design and implementation. Morgan Kaufmann Publishers, 2929 Campus Drive, Suite 260, San Mateo, CA 94403, USA, 1997. Google Scholar
Digital Library
- Mueller, F., and Whalley, D. B. Avoiding conditional branches by code replication. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (La Jolla, CA, June 1995), ACM Press, pp. 56--66. Google Scholar
Digital Library
- Park, J. C. H., and Schlansker, M. S. On predicated execution. Hewlett Packard Laboratories, 1991.Google Scholar
- Stallman, R. M. Using and porting the GNU compiler collection, Feb. 22 2001.Google Scholar
Index Terms
Reducing the cost of conditional transfers of control by using comparison specifications
Recommendations
UltraSPARC: Compiling for Maximum Floating Point Performance
COMPCON '96: Proceedings of the 41st IEEE International Computer ConferenceUltraSPARC-I is the first microprocessor from Sun Microsystems to implement the new 64-bit SPARC V9 architecture. UltraSPARC-I is a superscalar processor capable of issuing up to four instructions together and possesses several features designed to ...
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating SystemsSingle-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD ...






Comments