skip to main content
research-article

PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

Processor Idle Cycle Aggregation (PICA) is a promising approach for low-power execution of processors, in which small memory stalls are aggregated to create large ones, enabling profitable switch of the processor into low-power mode. We extend the previous approach in three dimensions. First we develop static analysis for the PICA technique and present optimal parameters for five common types of loops based on steady-state analysis. Second, to remedy the weakness of software-only control in varying environment, we enhance PICA with minimal hardware extension that ensures correct execution for any loops and parameters, thus greatly facilitating exploration-based parameter tuning. Third, we demonstrate that our PICA technique can be applied to certain types of nested loops with variable bounds, thus enhancing the applicability of PICA. We validate our analytical model against simulation-based optimization and also show, through our experiments on embedded application benchmarks, that our technique can be applied to a wide range of loops with average 20% energy reductions, compared to executions without PICA.

References

  1. Azevedo, A., Issenin, I., Cornea, R., Gupta, R., Dutt, N., Veidenbaum, A., and Nicolau, A. 2002. Profile-based dynamic voltage scheduling using program checkpoints. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’02). IEEE Computer Society, Los Alamitos, CA, 168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Benini, L., Bogliolo, A., and Micheli, G. 2000. A survey of design techniques for system-level dynamic power management. IEEE Trans. VLSI Syst. 8, 3, 299--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brockmeyer, E., Miranda, M., Corporaal, H., and Catthoor, F. 2003. Layer assignment techniques for low energy in multi-layered memory organisations. In Proc. 6th ACM/IEEE Design and Test in Europe Conf. Munich, Germany, 1070--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burd, T. D. and Brodersen, R. W. 2000. Design issues for dynamic voltage scaling. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’00). ACM, New York, NY, 9--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chatterjee, S., Parker, E., Hanlon, P. J., and Lebeck, A. R. 2001. Exact analysis of the cache behavior of nested loops. SIGPLAN Not. 36, 286--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Choi, K., Soma, R., and Pedram, M. 2005. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. IEEE Trans. Comput.-Aid. Design Cir. Syst. 24, 1, 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ghosh, S., Martonosi, M., and Malik, S. 1997. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 11th International Conference on Supercomputing (ICS’97). ACM, New York, NY, 317--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gowan, M. K., Biro, L. L., and Jackson, D. B. 1998. Power considerations in the design of the alpha 21264 microprocessor. In Proceedings of the ACM/IEEE Design Automation Conference. 726--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Issenin, I., Brockmeyer, E., Miranda, M., and Dutt, N. 2004. Data reuse analysis technique for software-controlled memory hierarchies. In Proceedings of the Conference on Design, Automation and Test in Europe. 202--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kandemir, M. and Choudhary, A. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proceedings of the ACM/IEEE Design Automation Conference. 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lee, J. and Shrivastava, A. 2008. Static analysis of processor stall cycle aggregation. http://www.public.asu.edu/~ashriva6/papers/pica.html.Google ScholarGoogle Scholar
  12. Lee, J.-E., Kwon, W., Kim, T., Chung, E.-Y., Choi, K.-M., Kong, J.-T., Eo, S.-K., and Gwilt, D. 2005. System level architecture evaluation and optimization: An industrial case study with AMBA3 AXI. J. Semiconductor Technol. Sci. 5, 4, 229--237.Google ScholarGoogle Scholar
  13. McCalpin, J. D. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Comm. Comput. Architect. Newsl., 19--25.Google ScholarGoogle Scholar
  14. Mowry, T. C., Lam, M. S., and Gupta, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rabaey, J. and Pedram, M., Eds. 1996. Low Power Design Methodologies. Kluwer Academic Publishers, Norwell, MA.Google ScholarGoogle Scholar
  16. Shrivastava, A., Earlie, E., Dutt, N., and Nicolau, A. 2005. Aggregating processor free time for energy reduction. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 154--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shrivastava, A., Lee, J., and Jeyapaul, R. 2010. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. ACM SIGPLAN Not. 45, 4, 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-fetch: Compiler-enabled power-aware fetch throttling. IEEE Comput. Architect. Lett. 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. VanderWiel, S. P. and Lilja, D. J. 2000. Data prefetch mechanisms. ACM Comput. Surv. 32, 2, 174--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1, 37--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zivojnovic, V., Martinez, J., Schläger, C., and Meyr, H. 1994. DSPstone: A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT’94).Google ScholarGoogle Scholar

Index Terms

  1. PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!