skip to main content
research-article

Maximizing the Number of Good Dies for Streaming Applications in NoC-Based MPSoCs Under Process Variation

Published:24 September 2015Publication History
Skip Abstract Section

Abstract

Scaling CMOS technology into nanometer feature-size nodes has made it practically impossible to precisely control the manufacturing process. This results in variation in the speed and power consumption of a circuit. As a solution to process-induced variations, circuits are conventionally implemented with conservative design margins to guarantee the target frequency of each hardware component in manufactured multiprocessor chips. This approach, referred to as worst-case design, results in a considerable circuit upsizing, in turn reducing the number of dies on a wafer.

This work deals with the design of real-time systems for streaming applications (e.g., video decoders) constrained by a throughput requirement (e.g., frames per second) with reduced design margins, referred to as better-than-worst-case design. To this end, the first contribution of this work is a complete modeling framework that captures a streaming application mapped to an NoC-based multiprocessor system with voltage-frequency islands under process-induced die-to-die and within-die frequency variations. The framework is used to analyze the impact of variations in the frequency of hardware components on application throughput at the system level. The second contribution of this work is a methodology to use the proposed framework and estimate the impact of reducing circuit design margins on the number of good dies that satisfy the throughput requirement of a real-time streaming application. We show on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6% and 18.8% for designs with and without fixed SRAM and IO blocks, respectively.

References

  1. Mohamed A. Bamakhrama, Jiali Teddy Zhai, Hristo Nikolov, and Todor Stefanov. 2012. A methodology for automated design of hard-real-time embedded streaming systems. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE ’12). 6. http://dl.acm.org/citation.cfm?id=2492708.2492944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee. 1999. Synthesis of embedded software from synchronous dataflow specifications. Journal of VLSI Signal Processing Systems (IJVSPA) 21 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. A. Bowman, S. G. Duvall, and J. D. Meindl. 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. Journal of Solid-State Circuits (JSSC) 37, 2 (Feb. 2002). DOI:http://dx.doi.org/10.1109/4.982424Google ScholarGoogle ScholarCross RefCross Ref
  4. W. J. Dally and B. Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proc. Design Automation Conference (DAC’01). DOI:http://dx.doi.org/10.1109/DAC.2001.156225 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. K. De, and S. Borkar. 2011. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-Core TeraFLOPS processor. Journal of Solid-State Circuits (JSSC) 46, 1 (2011).Google ScholarGoogle Scholar
  6. P. Friedberg, Yu Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. 2005. Modeling within-die spatial correlation effects for process-design co-optimization. In Proc. Quality of Electronic Design (ISQED’05). DOI:http://dx.doi.org/10.1109/ISQED.2005.82 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Garg and D. Marculescu. 2008. System-level throughput analysis for process variation aware multiple voltage-frequency island designs. Transactions on Design Automation of Electronic Systems (TODAES) 13, 4, Article 59 (Oct. 2008), 25 pages. DOI:http://dx.doi.org/10.1145/1391962.1391967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. H. Ghamarian, M. C. W. Geilen, S. Stuijk, T. Basten, A. J. M. Moonen, M. J. G. Bekooij, B. D. Theelen, and M. R. Mousavi. 2006. Throughput analysis of synchronous data flow graphs. In Proc. Int’l Conference on Application of Concurrency to System Design (ACSD’06). DOI:http://dx.doi.org/10.1109/ACSD.2006.33 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Goossens and A. Hansson. 2010. The Æthereal network on chip after ten years: Goals, evolution, lessons, and future. In Proc. Design Automation Conference (DAC’10). 6. DOI:http://dx.doi.org/10.1145/1837274.1837353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kees Goossens, Arnaldo Azevedo, Karthik Chandrasekar, Manil Dev Gomony, Sven Goossens, Martijn Koedam, Yonghui Li, Davit Mirzoyan, Anca Molnos, Ashkan Beyranvand Nejad, Andrew Nelson, and Shubhendu Sinha. 2013. Virtual execution platforms for mixed-time-criticality systems: The CompSOC architecture and design flow. Special Interest Group on Embedded Systems (SIGBED) Review 10, 3 (Oct. 2013). http://doi.acm.org/10.1145/2544350.2544353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andreas Hansson, Maarten Wiggers, Arno Moonen, Kees Goossens, and Marco Bekooij. 2009. Enabling application-level performance guarantees in network-based systems on chip by applying dataflow analysis. IET Computers & Digital 3, 5 (2009). DOI:http://dx.doi.org/10.1049/iet-cdt.2008.0093Google ScholarGoogle Scholar
  12. C. Hernandez, A. Roca, F. Silla, J. Flich, and J. Duato. 2012. On the impact of within-die process variation in GALS-based NoC performance. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 31, 2. DOI:http://dx.doi.org/10.1109/TCAD.2011.2170071 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lin Huang and Qiang Xu. 2010. Performance yield-driven task allocation and scheduling for MPSoCs under process variation. In Proc. Design Automation Conference (DAC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kwangok Jeong, A. B. Kahng, and K. Samadi. 2009. Impact of guardband reduction on design outcomes: A quantitative approach. Transactions on Semiconductor Manufacturing (SM) 22, 4 (2009). DOI:http://dx.doi.org/10.1109/TSM.2009.2031789Google ScholarGoogle Scholar
  15. E. A. Lee and D. G. Messerschmitt. 1987. Synchronous data flow. Proceedings of the IEEE 75, 9 (1987). DOI:http://dx.doi.org/10.1109/PROC.1987.13876Google ScholarGoogle Scholar
  16. D. Marculescu and S. Garg. 2008. Process-driven variability analysis of single and multiple voltage frequency island latency-constrained systems. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 27, 5 (May 2008). DOI:http://dx.doi.org/10.1109/TCAD.2008.917969 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Meijer and J. P. de Gyvez. 2012. Body-bias-driven design strategy for area- and performance-efficient CMOS circuits. Transactions on Very Large Scale Integration (VLSI) Systems 20, 1 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Meincke, A. Hemani, S. Kumar, P. Ellervee, J. Oberg, T. Olsson, P. Nilsson, D. Lindqvist, and H. Tenhunen. 1999. Globally asynchronous locally synchronous architecture for large high-performance ASICs. In Proc. Int’l Symposium on Circuits and Systems (ISCAS’99), Vol. 2. DOI:http://dx.doi.org/10.1109/ISCAS.1999.780794Google ScholarGoogle Scholar
  19. Mikael Millberg, Erland Nilsson, Rikard Thid, Shashi Kumar, and Axel Jantsch. 2004. The Nostrum backbone - a communication protocol stack for networks on chip. In Proceedings of the International Conference on VLSI Design (VLSID’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Miranda, B. Dierickx, P. Zuber, P. Dobrovoln, and F. Kutscherauer. 2009. Variability aware modeling of SoCs: From device variations to manufactured system yield. In Proc. Quality of Electronic Design (ISQED’09). DOI:http://dx.doi.org/10.1109/ISQED.2009.4810353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Mirzoyan, B. Akesson, S. Stuijk, and K. Goossens. 2013. Better than Worst-Case Design for Streaming Application under Process Variation. Ph.D. Dissertation. EEMCS Department, Delft University of Technology.Google ScholarGoogle Scholar
  22. Davit Mirzoyan, Benny Akesson, and Kees Goossens. 2013. Throughput analysis and voltage-frequency island partitioning for streaming applications under process variation. In Proc. Embedded Systems for Real-Time Multimedia (ESTIMedia’13).Google ScholarGoogle ScholarCross RefCross Ref
  23. Davit Mirzoyan, Benny Akesson, and Kees Goossens. 2014. Process-variation-aware mapping of best-effort and real-time streaming applications to MPSoCs. Transactions in Embedded Computing Systems (TECS) 13, 2s, Article 61 (Jan. 2014), 24 pages. DOI:http://dx.doi.org/10.1145/2490819 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Muttersbach, T. Villiger, and Wolfgang Fichtner. 2000. Practical design of globally-asynchronous locally-synchronous systems. In Proc. Int’l Symposium on Asynchronous Circuits and Systems (ASYNC’00). DOI:http://dx.doi.org/10.1109/ASYNC.2000.836791 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hyunok Oh and Soonhoi Ha. 2004. Fractional rate dataflow model for efficient code synthesis. Journal of VLSI Signal Processing Systems 37, 1 (May 2004). DOI:http://dx.doi.org/10.1023/B:VLSI.0000017002.91721.0e Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Liang-Teck Pang and B. Nikolic. 2008. Measurement and analysis of variability in 45nm strained-Si CMOS technology. In Custom Integrated Circuits Conference (CICC’08). DOI:http://dx.doi.org/10.1109/CICC.2008.4672038Google ScholarGoogle Scholar
  27. Liang-Teck Pang, Kun Qian, Costas J. Spanos, and B. Nikolic. 2009. Measurement and analysis of variability in 45 nm strained-Si CMOS technology. Journal of Solid-State Circuits 44, 8 (2009).Google ScholarGoogle Scholar
  28. P. Poplavko, T. Basten, M. Bekooij, J. van Meerbergen, and B. Mesman. 2003. Task-level timing models for guaranteed performance in multiprocessor networks-on-chip. In Proc. Int’l Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’03). 10. DOI:http://dx.doi.org/10.1145/951710.951721 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Rylyakov, J. Tierno, G. English, M. Sperling, and D. Friedman. 2008. A wide tuning range (1 GHz-to-15 GHz) fractional-N all-digital PLL in 45nm SOI. In Custom Integrated Circuits Conference (CICC’08). DOI:http://dx.doi.org/10.1109/CICC.2008.4672113Google ScholarGoogle Scholar
  30. A. Shabbir, A. Kumar, S. Stuijk, B. Mesman, and H. Corporaal. 2010. CA-MPSoC: An automated design flow for predictable multi-processor architectures for multiple applications. Journal of Systems Architecture (JSA) 56, 7 (July 2010). DOI:http://dx.doi.org/10.1016/j.sysarc.2010.03.007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sundararajan Sriram and S. Shuvra Bhattacharyya. 2000. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dimitrios Stiliadis and Anujan Varma. 1998. Latency-rate servers: A general model for analysis of traffic scheduling algorithms. Transactions on Networking (TON) 6, 5 (Oct. 1998), 14. DOI:http://dx.doi.org/10.1109/90.731196 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sander Stuijk, Marc Geilen, and Twan Basten. 2006a. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proc. Design Automation Conference (DAC’06). 6. DOI:http://dx.doi.org/10.1145/1146909.1147138 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Stuijk, M. Geilen, and T. Basten. 2006b. SDF3: SDF For Free. In Proc. Int’l Conference on Application of Concurrency to System Design (ACSD’06). DOI:http://dx.doi.org/10.1109/ACSD.2006.23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Stuijk, T. Basten, M. C. W. Geilen, and H. Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proc. Design Automation Conference (DAC’07). 6. DOI:http://dx.doi.org/10.1145/1278480.1278674 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Stuijk, M. Geilen, and T. Basten. 2008. Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. Transactions on Computers (TC) 57, 10 (2008). DOI:http://dx.doi.org/10.1109/TC.2008.58 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. H. (Kees) van Berkel. 2009. Multi-core for mobile phones. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. Journal of Solid-State Circuits (JSSC) 43, 1 (2008). DOI:http://dx.doi.org/10.1109/JSSC.2007.910957Google ScholarGoogle Scholar
  39. Hassan M. G. Wassel, Ying Gao, Jason K. Oberg, Ted Huffmire, Ryan Kastner, Frederic T. Chong, and Timothy Sherwood. 2013. SurfNoC: A low latency and provably non-interfering approach to secure networks-on-chip. SIGARCH Computer Architecture News 41, 3 (June 2013), 12. DOI:http://dx.doi.org/10.1145/2508148.2485972 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Maarten H. Wiggers, Marco J. G. Bekooij, and Gerard J. M. Smit. 2007. Efficient computation of buffer capacities for cyclo-static dataflow graphs. In Proc. Design Automation Conference (DAC’07). 6. DOI:http://dx.doi.org/10.1145/1278480.1278647 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenstrüm. 2008. The worst-case execution-time problem overview of methods and survey of tools. Transactions on Embedded Compuing Systems (TECS) 7, 3, Article 36 (May 2008), 53 pages. DOI:http://dx.doi.org/10.1145/1347375.1347389 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Maximizing the Number of Good Dies for Streaming Applications in NoC-Based MPSoCs Under Process Variation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!