skip to main content
research-article

LOCUS: Low-Power Customizable Many-Core Architecture for Wearables

Published:14 November 2017Publication History
Skip Abstract Section

Abstract

Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (system on chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance, and the conventional powerful many-core architectures are not appropriate either due to the stringent power budget in this domain. We propose LOCUS—a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt—an average 3.1× compared to quad-core ARM processors used in state-of-the-art wearable devices. A combination of full system simulation with representative applications from the wearable domain and RTL synthesis of the architecture show that 16-core LOCUS achieves an average 1.52× performance/watt improvement over a conventional 16-core shared memory many-core architecture. A dynamic power management mechanism is proposed to further decrease the power consumption in both computation and communication, which improves the performance/watt of LOCUS by 1.17×.

References

  1. Kanak Agarwal, Kevin Nowka, Harmander Deogun, and Dennis Sylvester. 2006. Power gating with multiple sleep modes. In Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED’06). 633--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09). IEEE, Los Alamitos, CA, 33--42. Google ScholarGoogle ScholarCross RefCross Ref
  3. Shane Bell, Bruce Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, et al. 2008. TILE64-processor: A 64-core SoC with mesh interconnect. In Proceedings of the 2008 IEEE International Solid-State Circuits Conference (ISSCC’08) Digest of Technical Papers. IEEE, Los Alamitos, CA, 88--598.Google ScholarGoogle ScholarCross RefCross Ref
  4. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lucien M. Censier and Paul Feautrier. 1978. A new solution to coherence problems in multicache systems. IEEE Transactions on Computers 100, 12, 1112--1118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramanian, Anantha P. Chandrakasan, and Li-Shiuan Peh. 2013. SMART: A single-cycle reconfigurable NoC for SoC applications. In Proceedings of the Conference on Design, Automation, and Test in Europe. 338--343. Google ScholarGoogle ScholarCross RefCross Ref
  7. Liang Chen, Joseph Tarango, Tulika Mitra, and Philip Brisk. 2013. A just-in-time customizable processor. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). IEEE, Los Alamitos, CA, 524--531. Google ScholarGoogle ScholarCross RefCross Ref
  8. Sergey Chernenko. 2015. ECG Processing—R-Peaks Detection. Retrieved October 18, 2017, from http://goo.gl/oYbn8C.Google ScholarGoogle Scholar
  9. Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner. 2005. An architecture framework for transparent instruction set customization in embedded processors. ACM SIGARCH Computer Architecture News 33, 272--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37). IEEE, Los Alamitos, CA, 30--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Amber ARM Compatible Core. 2009. Home Page. Retrieved October 18, 2017, from http://goo.gl/Jshd3q.Google ScholarGoogle Scholar
  12. Francesco Conti, Davide Rossi, Antonio Pullini, Igor Loi, and Luca Benini. 2015. PULP: A ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. Journal of Signal Processing Systems 84, 3, 339--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrea Corradini. 2001. Dynamic time warping for off-line recognition of a small gesture vocabulary. In Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. IEEE, Los Alamitos, CA, 82--89. Google ScholarGoogle ScholarCross RefCross Ref
  14. Z. Cvetanovic and C. Nofsinger. 1990. Parallel Astar search on message-passing architectures. In Proceedings of the 23rd Annual Hawaii International Conference on System Sciences, Vol. 1. IEEE, Los Alamitos, CA, 82--90. Google ScholarGoogle ScholarCross RefCross Ref
  15. Ahmed Yasir Dogan, Jeremy Constantin, Martino Ruggiero, Andreas Burg, and David Atienza. 2012. Multi-core architecture design for ultra-low-power wearable health monitoring systems. In Proceedings of the Conference on Design, Automation, and Test in Europe. 988--993. Google ScholarGoogle ScholarCross RefCross Ref
  16. David Duarte, Yuh-Fang Tsai, Narayanan Vijaykrishnan, and Mary Jane Irwin. 2002. Evaluating run-time techniques for leakage power reduction. In Proceedings of the 2002 Asia and South Pacific Design Automation Conference (ASP-DAC’02). 31. Google ScholarGoogle ScholarCross RefCross Ref
  17. Andrew Duller, Gajinder Panesar, and Daniel Towner. 2003. Parallel processing—the picoChip way. Communicating Processing Architectures 2003, 125--138.Google ScholarGoogle Scholar
  18. Ashraf Eassa. 2015. How Much Does a Qualcomm Inc. Snapdragon 400 Chip Cost? Retrieved October 18, 2017, from http://goo.gl/YAIqzJ.Google ScholarGoogle Scholar
  19. Alon Efrat, Quanfu Fan, and Suresh Venkatasubramanian. 2007. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. Journal of Mathematical Imaging and Vision 27, 3, 203--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Google’s Fused Location API. 2013. Google I/O 2013—Beyond the Blue Dot: New Features in Android Location (Video). Retrieved October 18, 2017, from https://goo.gl/fAckD8.Google ScholarGoogle Scholar
  21. Gartner. 2014. Gartner Says 4.9 Billion Connected “Things” Will Be in Use in 2015. Retrieved October 18, 2017, from http://goo.gl/TVinZF.Google ScholarGoogle Scholar
  22. Samsung Gear S. 2013. Home Page. Retrieved October 18, 2017, from http://goo.gl/aE6ApL.Google ScholarGoogle Scholar
  23. Samsung Gear SDK. 2013. Home Page. Retrieved October 18, 2017, from http://goo.gl/cT4qXJ.Google ScholarGoogle Scholar
  24. Google Glass. 2013. Home Page. Retrieved October 18, 2017, from https://goo.gl/2VDMyO.Google ScholarGoogle Scholar
  25. Google Glass SDK. 2013. Home Page. Retrieved October 18, 2017, from https://goo.gl/jWeUh5.Google ScholarGoogle Scholar
  26. Glasses AR SDK. 2015. Home Page. Retrieved October 18, 2017, from http://goo.gl/o9Y5YM.Google ScholarGoogle Scholar
  27. Michael Gschwind, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. 2006. Synergistic processing in cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Linley Gwennap. 2011. Adapteva: More flops, less watts. Microprocessor Report 6, 13, 11--02.Google ScholarGoogle Scholar
  29. HERE. 2014. HERE for Gear: Apps Inbound for Samsung Tizen. Retrieved October 18, 2017, from http://goo.gl/lVPqux.Google ScholarGoogle Scholar
  30. Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, et al. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the 2010 IEEE International Solid-State Circuits Conference (ISSCC’10). IEEE, Los Alamitos, CA, 108--109.Google ScholarGoogle ScholarCross RefCross Ref
  31. Libo Huang, Zhiying Wang, and Nong Xiao. 2012. Accelerating NoC-based MPI primitives via communication architecture customization. In Proceedings of the 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures, and Processors. IEEE, Los Alamitos, CA, 141--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Natalie Enright Jerger and Li-Shiuan Peh. 2009. On-chip networks. Synthesis Lectures on Computer Architecture 4, 1, 1--141. Google ScholarGoogle ScholarCross RefCross Ref
  33. Tushar Krishna, Chia-Hsin Owen Chen, Woo Cheol Kwon, and Li-Shiuan Peh. 2013. Breaking the on-chip latency barrier using SMART. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE, Los Alamitos, CA, 378--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, and Kirk W. Cameron. 2014. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In Proceedings of the 2014 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’14). IEEE, Los Alamitos, CA, 1448--1456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Larry McMurchie and Carl Ebeling. 1995. PathFinder: A negotiation-based performance-driven router for FPGAs. In Proceedings of the 1995 ACM 3rd International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 111--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Moto 360. 2015. Moto 360 (2nd Generation). Retrieved October 18, 2017, from http://goo.gl/N1jquY.Google ScholarGoogle Scholar
  38. MPICH. 1999. Home Page. Retrieved October 18, 2017, from https://www.mpich.org/.Google ScholarGoogle Scholar
  39. Meinard Müller. 2007. Dynamic time warping. In Information Retrieval for Music and Motion. Springer, 69--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Offline Navigation. 2016. Routing/Offline Routers. Retrieved October 18, 2017, from http://goo.gl/Bmeljs.Google ScholarGoogle Scholar
  41. Odroid-XU3. 2014. Home Page. Retrieved October 18, 2017, from http://goo.gl/vhPocF.Google ScholarGoogle Scholar
  42. Moriyoshi Ohara, Hiroshi Inoue, Yukihiko Sohda, Hideaki Komatsu, and Toshio Nakatani. 2006. MPI microtask for programming the cell broadband engine processor. IBM Systems Journal 45, 1, 85--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Optinvent. 2015. Home Page. Retrieved October 18, 2017, from http://optinvent.com/.Google ScholarGoogle Scholar
  44. James Psota and Anant Agarwal. 2008. rMPI: Message passing on multicore processors with on-chip interconnect. In Proceedings of the 2008 International Conference on High-Performance Embedded Architectures and Compilers. 22--37. Google ScholarGoogle ScholarCross RefCross Ref
  45. Peng Rong and Massoud Pedram. 2006. Power-aware scheduling and dynamic voltage setting for tasks running on a hard real-time system. In Proceedings of the 2006 Asia and South Pacific Conference on Design Automation. IEEE, Los Alamitos, CA, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 1, 43--49. Google ScholarGoogle ScholarCross RefCross Ref
  47. Kartik Sankaran, Minhui Zhu, Xiang Fa Guo, Akkihebbal L. Ananda, Mun Choon Chan, and Li-Shiuan Peh. 2014. Using mobile phone barometer for low-power transportation context detection. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems. ACM, New York, NY, 191--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sony SmartWatch 3. 2014. SmartWatch 3 SWR50. Retrieved October 18, 2017, from http://goo.gl/qrV8ux.Google ScholarGoogle Scholar
  49. Qualcomm Snapdragon 400. 2012. Snapdragon 400 Processor. Retrieved October 18, 2017, from https://goo.gl/aja771.Google ScholarGoogle Scholar
  50. Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT—a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the 2012 6th IEEE/ACM International Networks on Chip Symposium (NoCS’12). IEEE, Los Alamitos, CA, 201--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Cheng Tan, Aditi Kulkarni, Vanchinathan Venkataramani, Manupa Karunaratne, Tulika Mitra, and Li-Shiuan Peh. 2016. LOCUS: Low-power customizable many-core architecture for wearables. In Proceedings of the 2016 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. ACM, New York, NY, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Charles C. Tappert, Ching Y. Suen, and Toru Wakahara. 1990. The state of the art in online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 8. 787--808.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sergio V. Tota, Mario R. Casu, Massimo Ruo Roch, Luca Rostagno, and Maurizio Zamboni. 2010. MEDEA: A hybrid shared-memory/message-passing multiprocessor NoC-based architecture. In Proceedings of the 2010 Design, Automation, and Test in Europe Conference and Exhibition (DATE’10). IEEE, Los Alamitos, CA, 45--50.Google ScholarGoogle ScholarCross RefCross Ref
  55. LG Watch Urbane W150. 2015. LG Watch Urbane in Silver: W150. Retrieved October 18, 2017, from http://goo.gl/qg76vg.Google ScholarGoogle Scholar
  56. Intel Xeon Phi. 2012. Intel Xeon Phi Coprocessor 5110P. Retrieved October 18, 2017, from http://goo.gl/8jXTzR.Google ScholarGoogle Scholar
  57. Pan Yu and Tulika Mitra. 2004. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the 41st Annual Design Automation Conference. ACM, New York, NY, 723--728. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Pan Yu and Tulika Mitra. 2004. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, NY, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jason Zebchuk, Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Andreas Moshovos. 2009. A tagless coherence directory. In Proceedings of the 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). IEEE, Los Alamitos, CA, 423--434. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LOCUS: Low-Power Customizable Many-Core Architecture for Wearables

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!