Abstract
Energy efficiency is one of the main performance goals when designing processors for embedded systems. Typically, the simpler the processor, the less energy it consumes. Thus, an ultra-low power multicore processor will, likely have very small distributed memory with a simple interconnect. To compile for such an architecture, a partitioning strategy that can tune between space and communication minimization is crucial to fit a program in its limited resources and achieve good performance. A careful program layout design is also critical. Aside fulfilling the space constraint, a compiler needs to be able to optimize for program latency to satisfy a certain timing requirement as well. To satisfy all aforementioned constraints, we present a flexible code partitioning strategy and light-weight mechanisms to express parallelism and program layout. First, we compare two strategies for partitioning program structures and introduce a language construct to let programmers choose which strategies to use and when. The compiler then partitions program structures with a mix of both strategies. Second, we add supports for programmer-specified parallelism and program layout through imposing additional spatial constraints to the compiler. We evaluate our compiler by implementing an accelerometer-based gesture recognition application on GA144, a recent low-power minimalistic multicore architecture. When compared to MSP430, GA144 is overall 19x more energy-efficient and 23x faster when running this application. Without these inventions, this application would not be able to fit on GA144.
- R. Avizienis and P. Ljung. Comparing the Energy Efficiency and Performance of the Texas Instrument MSP430 and the GreenArrays GA144 processors. Technical report, 2012.Google Scholar
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In SC, 2012. Google Scholar
Digital Library
- Z. Bozkus, A. Choudhary, T. Haupt, G. Fox, and S. Ranka. Compiling hpf for distributed memory mimd computers. In The Interaction of Compilation Technology and Computer Architecture. 1994.Google Scholar
Cross Ref
- D. Callahan and K. Kennedy. Compiling programs for distributedmemory multiprocessors. The Journal of Supercomputing.Google Scholar
- D. T. Connolly. An improved annealing scheme for the QAP. European Journal of Operational Research, 1990.Google Scholar
Cross Ref
- G. Delaval, A. Girault, and M. Pouzet. A type system for the automatic distribution of higher-order synchronous dataflow programs. In LCTES, 2008. Google Scholar
Digital Library
- GreenArrays. Product Brief: GreenArrays Architecture, 2010. URL http://www.greenarraychips.com/home/documents/greg/ PB002-100822-GA-Arch.pdf.Google Scholar
- GreenArrays. Appplication Note AB012: Controlling the TI SensorTag with the GA144, 2013. URL http://www.greenarraychips.com/ home/documents/greg/AN012-130606-SENSORTAG.pdf.Google Scholar
- C. Hewitt, P. Bishop, and R. Steiger. A universal modular actor formalism for artificial intelligence. In IJCAI, 1973. Google Scholar
Digital Library
- A. H. Karp. Programming for parallelism. Computer, 20(5):43–57, May 1987. Google Scholar
Digital Library
- P. Ljung. Welcome to the dark side of computing, 2011. Presented at ParLab Summer Retreat, University of California, Berkeley.Google Scholar
- H. Massalin. Superoptimizer: a look at the smallest program. In ASPLOS, 1987. Google Scholar
Digital Library
- J. Merlin. Techniques for the automatic parallelisation of ‘distributed fortran 90’. Technical Report SNARC 92-02, Southampton Novel Architecture Research Centre, 1992.Google Scholar
- T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili. A general constraint-centric scheduling framework for spatial architectures. In PLDI, 2013. Google Scholar
Digital Library
- P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In PLDI, 2014. Google Scholar
Digital Library
- T. Schlömer, B. Poppinga, N. Henze, and S. Boll. Gesture recognition with a wii controller. In International Conference on Tangible and Embedded Interaction, 2008. Google Scholar
Digital Library
- E. Slaughter, W. Lee, S. Treichler, M. Bauer, and A. Aiken. Regent: A high-productivity programming language for hpc with logical regions. In SC, 2015. Google Scholar
Digital Library
- M. Takeuchi, Y. Makino, K. Kawachiya, H. Horii, T. Suzumura, T. Suganuma, and T. Onodera. Compiling x10 to java. In ACM SIGPLAN X10 Workshop, 2011. Google Scholar
Digital Library
- F. Tip. A survey of program slicing techniques. Technical report, Amsterdam, The Netherlands, The Netherlands, 1994. Google Scholar
Digital Library
- Xilinx. FPGA Design Flow Overview, 2008. URL http://www.xilinx.com/itp/xilinx10/isehelp/ise c fpga design flow overview.htm.Google Scholar
Index Terms
Compiling a gesture recognition application for a low-power spatial architecture
Recommendations
Compiling a gesture recognition application for a low-power spatial architecture
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsEnergy efficiency is one of the main performance goals when designing processors for embedded systems. Typically, the simpler the processor, the less energy it consumes. Thus, an ultra-low power multicore processor will, likely have very small ...
Optimizing CAM-based instruction cache designs for low-power embedded systems
Energy consumption and power dissipation are important concerns in the design of embedded systems and they will become even more crucial with finer process geometry, higher frequencies, deeper pipelines and wider issue designs. In particular, the ...
A low power architecture for embedded perception
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systemsRecognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately, the real-time performance requirements of complex perception applications cannot be met by current embedded ...







Comments