Abstract
Scaling CMOS technology into nanometer feature-size nodes has made it practically impossible to precisely control the manufacturing process. This results in variation in the speed and power consumption of a circuit. As a solution to process-induced variations, circuits are conventionally implemented with conservative design margins to guarantee the target frequency of each hardware component in manufactured multiprocessor chips. This approach, referred to as worst-case design, results in a considerable circuit upsizing, in turn reducing the number of dies on a wafer.
This work deals with the design of real-time systems for streaming applications (e.g., video decoders) constrained by a throughput requirement (e.g., frames per second) with reduced design margins, referred to as better-than-worst-case design. To this end, the first contribution of this work is a complete modeling framework that captures a streaming application mapped to an NoC-based multiprocessor system with voltage-frequency islands under process-induced die-to-die and within-die frequency variations. The framework is used to analyze the impact of variations in the frequency of hardware components on application throughput at the system level. The second contribution of this work is a methodology to use the proposed framework and estimate the impact of reducing circuit design margins on the number of good dies that satisfy the throughput requirement of a real-time streaming application. We show on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6% and 18.8% for designs with and without fixed SRAM and IO blocks, respectively.
- Mohamed A. Bamakhrama, Jiali Teddy Zhai, Hristo Nikolov, and Todor Stefanov. 2012. A methodology for automated design of hard-real-time embedded streaming systems. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE ’12). 6. http://dl.acm.org/citation.cfm?id=2492708.2492944. Google Scholar
Digital Library
- Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee. 1999. Synthesis of embedded software from synchronous dataflow specifications. Journal of VLSI Signal Processing Systems (IJVSPA) 21 (1999). Google Scholar
Digital Library
- K. A. Bowman, S. G. Duvall, and J. D. Meindl. 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. Journal of Solid-State Circuits (JSSC) 37, 2 (Feb. 2002). DOI:http://dx.doi.org/10.1109/4.982424Google Scholar
Cross Ref
- W. J. Dally and B. Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proc. Design Automation Conference (DAC’01). DOI:http://dx.doi.org/10.1109/DAC.2001.156225 Google Scholar
Digital Library
- S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. K. De, and S. Borkar. 2011. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-Core TeraFLOPS processor. Journal of Solid-State Circuits (JSSC) 46, 1 (2011).Google Scholar
- P. Friedberg, Yu Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. 2005. Modeling within-die spatial correlation effects for process-design co-optimization. In Proc. Quality of Electronic Design (ISQED’05). DOI:http://dx.doi.org/10.1109/ISQED.2005.82 Google Scholar
Digital Library
- S. Garg and D. Marculescu. 2008. System-level throughput analysis for process variation aware multiple voltage-frequency island designs. Transactions on Design Automation of Electronic Systems (TODAES) 13, 4, Article 59 (Oct. 2008), 25 pages. DOI:http://dx.doi.org/10.1145/1391962.1391967 Google Scholar
Digital Library
- A. H. Ghamarian, M. C. W. Geilen, S. Stuijk, T. Basten, A. J. M. Moonen, M. J. G. Bekooij, B. D. Theelen, and M. R. Mousavi. 2006. Throughput analysis of synchronous data flow graphs. In Proc. Int’l Conference on Application of Concurrency to System Design (ACSD’06). DOI:http://dx.doi.org/10.1109/ACSD.2006.33 Google Scholar
Digital Library
- K. Goossens and A. Hansson. 2010. The Æthereal network on chip after ten years: Goals, evolution, lessons, and future. In Proc. Design Automation Conference (DAC’10). 6. DOI:http://dx.doi.org/10.1145/1837274.1837353. Google Scholar
Digital Library
- Kees Goossens, Arnaldo Azevedo, Karthik Chandrasekar, Manil Dev Gomony, Sven Goossens, Martijn Koedam, Yonghui Li, Davit Mirzoyan, Anca Molnos, Ashkan Beyranvand Nejad, Andrew Nelson, and Shubhendu Sinha. 2013. Virtual execution platforms for mixed-time-criticality systems: The CompSOC architecture and design flow. Special Interest Group on Embedded Systems (SIGBED) Review 10, 3 (Oct. 2013). http://doi.acm.org/10.1145/2544350.2544353 Google Scholar
Digital Library
- Andreas Hansson, Maarten Wiggers, Arno Moonen, Kees Goossens, and Marco Bekooij. 2009. Enabling application-level performance guarantees in network-based systems on chip by applying dataflow analysis. IET Computers & Digital 3, 5 (2009). DOI:http://dx.doi.org/10.1049/iet-cdt.2008.0093Google Scholar
- C. Hernandez, A. Roca, F. Silla, J. Flich, and J. Duato. 2012. On the impact of within-die process variation in GALS-based NoC performance. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 31, 2. DOI:http://dx.doi.org/10.1109/TCAD.2011.2170071 Google Scholar
Digital Library
- Lin Huang and Qiang Xu. 2010. Performance yield-driven task allocation and scheduling for MPSoCs under process variation. In Proc. Design Automation Conference (DAC’10). Google Scholar
Digital Library
- Kwangok Jeong, A. B. Kahng, and K. Samadi. 2009. Impact of guardband reduction on design outcomes: A quantitative approach. Transactions on Semiconductor Manufacturing (SM) 22, 4 (2009). DOI:http://dx.doi.org/10.1109/TSM.2009.2031789Google Scholar
- E. A. Lee and D. G. Messerschmitt. 1987. Synchronous data flow. Proceedings of the IEEE 75, 9 (1987). DOI:http://dx.doi.org/10.1109/PROC.1987.13876Google Scholar
- D. Marculescu and S. Garg. 2008. Process-driven variability analysis of single and multiple voltage frequency island latency-constrained systems. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 27, 5 (May 2008). DOI:http://dx.doi.org/10.1109/TCAD.2008.917969 Google Scholar
Digital Library
- M. Meijer and J. P. de Gyvez. 2012. Body-bias-driven design strategy for area- and performance-efficient CMOS circuits. Transactions on Very Large Scale Integration (VLSI) Systems 20, 1 (2012). Google Scholar
Digital Library
- T. Meincke, A. Hemani, S. Kumar, P. Ellervee, J. Oberg, T. Olsson, P. Nilsson, D. Lindqvist, and H. Tenhunen. 1999. Globally asynchronous locally synchronous architecture for large high-performance ASICs. In Proc. Int’l Symposium on Circuits and Systems (ISCAS’99), Vol. 2. DOI:http://dx.doi.org/10.1109/ISCAS.1999.780794Google Scholar
- Mikael Millberg, Erland Nilsson, Rikard Thid, Shashi Kumar, and Axel Jantsch. 2004. The Nostrum backbone - a communication protocol stack for networks on chip. In Proceedings of the International Conference on VLSI Design (VLSID’04). Google Scholar
Digital Library
- M. Miranda, B. Dierickx, P. Zuber, P. Dobrovoln, and F. Kutscherauer. 2009. Variability aware modeling of SoCs: From device variations to manufactured system yield. In Proc. Quality of Electronic Design (ISQED’09). DOI:http://dx.doi.org/10.1109/ISQED.2009.4810353 Google Scholar
Digital Library
- D. Mirzoyan, B. Akesson, S. Stuijk, and K. Goossens. 2013. Better than Worst-Case Design for Streaming Application under Process Variation. Ph.D. Dissertation. EEMCS Department, Delft University of Technology.Google Scholar
- Davit Mirzoyan, Benny Akesson, and Kees Goossens. 2013. Throughput analysis and voltage-frequency island partitioning for streaming applications under process variation. In Proc. Embedded Systems for Real-Time Multimedia (ESTIMedia’13).Google Scholar
Cross Ref
- Davit Mirzoyan, Benny Akesson, and Kees Goossens. 2014. Process-variation-aware mapping of best-effort and real-time streaming applications to MPSoCs. Transactions in Embedded Computing Systems (TECS) 13, 2s, Article 61 (Jan. 2014), 24 pages. DOI:http://dx.doi.org/10.1145/2490819 Google Scholar
Digital Library
- J. Muttersbach, T. Villiger, and Wolfgang Fichtner. 2000. Practical design of globally-asynchronous locally-synchronous systems. In Proc. Int’l Symposium on Asynchronous Circuits and Systems (ASYNC’00). DOI:http://dx.doi.org/10.1109/ASYNC.2000.836791 Google Scholar
Digital Library
- Hyunok Oh and Soonhoi Ha. 2004. Fractional rate dataflow model for efficient code synthesis. Journal of VLSI Signal Processing Systems 37, 1 (May 2004). DOI:http://dx.doi.org/10.1023/B:VLSI.0000017002.91721.0e Google Scholar
Digital Library
- Liang-Teck Pang and B. Nikolic. 2008. Measurement and analysis of variability in 45nm strained-Si CMOS technology. In Custom Integrated Circuits Conference (CICC’08). DOI:http://dx.doi.org/10.1109/CICC.2008.4672038Google Scholar
- Liang-Teck Pang, Kun Qian, Costas J. Spanos, and B. Nikolic. 2009. Measurement and analysis of variability in 45 nm strained-Si CMOS technology. Journal of Solid-State Circuits 44, 8 (2009).Google Scholar
- P. Poplavko, T. Basten, M. Bekooij, J. van Meerbergen, and B. Mesman. 2003. Task-level timing models for guaranteed performance in multiprocessor networks-on-chip. In Proc. Int’l Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’03). 10. DOI:http://dx.doi.org/10.1145/951710.951721 Google Scholar
Digital Library
- A. Rylyakov, J. Tierno, G. English, M. Sperling, and D. Friedman. 2008. A wide tuning range (1 GHz-to-15 GHz) fractional-N all-digital PLL in 45nm SOI. In Custom Integrated Circuits Conference (CICC’08). DOI:http://dx.doi.org/10.1109/CICC.2008.4672113Google Scholar
- A. Shabbir, A. Kumar, S. Stuijk, B. Mesman, and H. Corporaal. 2010. CA-MPSoC: An automated design flow for predictable multi-processor architectures for multiple applications. Journal of Systems Architecture (JSA) 56, 7 (July 2010). DOI:http://dx.doi.org/10.1016/j.sysarc.2010.03.007 Google Scholar
Digital Library
- Sundararajan Sriram and S. Shuvra Bhattacharyya. 2000. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, Inc. Google Scholar
Digital Library
- Dimitrios Stiliadis and Anujan Varma. 1998. Latency-rate servers: A general model for analysis of traffic scheduling algorithms. Transactions on Networking (TON) 6, 5 (Oct. 1998), 14. DOI:http://dx.doi.org/10.1109/90.731196 Google Scholar
Digital Library
- Sander Stuijk, Marc Geilen, and Twan Basten. 2006a. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proc. Design Automation Conference (DAC’06). 6. DOI:http://dx.doi.org/10.1145/1146909.1147138 Google Scholar
Digital Library
- S. Stuijk, M. Geilen, and T. Basten. 2006b. SDF3: SDF For Free. In Proc. Int’l Conference on Application of Concurrency to System Design (ACSD’06). DOI:http://dx.doi.org/10.1109/ACSD.2006.23 Google Scholar
Digital Library
- S. Stuijk, T. Basten, M. C. W. Geilen, and H. Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proc. Design Automation Conference (DAC’07). 6. DOI:http://dx.doi.org/10.1145/1278480.1278674 Google Scholar
Digital Library
- S. Stuijk, M. Geilen, and T. Basten. 2008. Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. Transactions on Computers (TC) 57, 10 (2008). DOI:http://dx.doi.org/10.1109/TC.2008.58 Google Scholar
Digital Library
- C. H. (Kees) van Berkel. 2009. Multi-core for mobile phones. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE’09). Google Scholar
Digital Library
- S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. Journal of Solid-State Circuits (JSSC) 43, 1 (2008). DOI:http://dx.doi.org/10.1109/JSSC.2007.910957Google Scholar
- Hassan M. G. Wassel, Ying Gao, Jason K. Oberg, Ted Huffmire, Ryan Kastner, Frederic T. Chong, and Timothy Sherwood. 2013. SurfNoC: A low latency and provably non-interfering approach to secure networks-on-chip. SIGARCH Computer Architecture News 41, 3 (June 2013), 12. DOI:http://dx.doi.org/10.1145/2508148.2485972 Google Scholar
Digital Library
- Maarten H. Wiggers, Marco J. G. Bekooij, and Gerard J. M. Smit. 2007. Efficient computation of buffer capacities for cyclo-static dataflow graphs. In Proc. Design Automation Conference (DAC’07). 6. DOI:http://dx.doi.org/10.1145/1278480.1278647 Google Scholar
Digital Library
- Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenstrüm. 2008. The worst-case execution-time problem overview of methods and survey of tools. Transactions on Embedded Compuing Systems (TECS) 7, 3, Article 36 (May 2008), 53 pages. DOI:http://dx.doi.org/10.1145/1347375.1347389 Google Scholar
Digital Library
Index Terms
Maximizing the Number of Good Dies for Streaming Applications in NoC-Based MPSoCs Under Process Variation
Recommendations
High Throughput Asynchronous NoC Design under High Process Variation
Asynchronous switching is proposed as a robust design to mitigate the impact of process variation in Network on Chip (NoC). Circuit analysis is used to evaluate the influence of process variation on both synchronous and asynchronous designs. The impact ...
Optimal body bias selection for leakage improvement and process compensation over different technology generations
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and designWe present techniques to determine the optimal body bias (forward or reverse) to minimize leakage current and compensate process variations in scaled CMOS technologies. A circuit trades off sub-threshold leakage with band-to-band tunneling leakage at ...
Statistical aging analysis with process variation consideration
ICCAD '11: Proceedings of the International Conference on Computer-Aided DesignAs CMOS devices become smaller, process and aging variations become a major issue for circuit reliability and yield. In this paper, we analyze the effects of process variations on aging effects such as hot carrier injection (HCI) and negative bias ...






Comments