Abstract
While the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on specialized development tools such as security profilers, memory leak identifiers, data flight recorders, and dynamic type analysis. Many of these tools require full-system data which covers multiple interacting threads, processes, and processors. Reducing the performance penalty and complexity of these software tools is critical to those developing next generation applications, and many researchers have proposed adding specialized hardware to assist in profiling and introspection. Unfortunately, while this additional hardware would be incredibly beneficial to developers, the cost of this hardware must be paid on every single die that is manufactured.In this paper, we argue that a new way to attack this problem is with the addition of specialized analysis hardware built on separate active layers stacked vertically on the processor die using 3D IC technology. This provides a modular "snap-on" functionality that could be included with developer systems, and omitted from consumer systems to keep the cost impact to a minimum. In this paper we describe the advantage of using inter-die vias for introspection and we quantify the impact they can have in terms of the area, power, temperature, and routability of the resulting systems. We show that hardware stubs could be inserted into commodity processors at design time that would allow analysis layers to be bonded to development chips, and that these stubs would increase area and power by no more than 0.021mm2 and 0.9% respectively.
- International Technology Roadmap for Semiconductors, 2001.Google Scholar
- Workshop on Hardware Performance Monitor Design and Functionality in conjunction with HPCA-11, 2005.Google Scholar
- N. Goldsman A. Akturk and G.Metze. Self-Consistent Modeling of Heating and MOSFET Performance in 3-D Integrated Circuits. IEEE Transactions on Electron Devices, 52(11):2395--2403, 2005.Google Scholar
Cross Ref
- Cristinel Ababei, Yan Feng, Brent Goplen, Hushrav Mogal, Tianpei Zhang, Kia Bazargan, and Sachin Sapatnekar. Placement and Routing in 3D Integrated Circuits. IEEE Design and Test of Computers, 22(6):520--531, Nov/Dec 2005. Google Scholar
Digital Library
- Computer Industry Almanac. http://www.c-i-a.com. Google Scholar
Digital Library
- J. Anderson, W. Weihl, L. Berc, J. Dean, S. Ghemawat, M. Henziger, S. Leung, R. Sites, M. Vandevoorde, and C. Waldspurger. Continuous Profiling: Where Have All the Cycles Gone? ACM Transactions on Computer Systems (TOCS), 15(4):357--390, November 1997. Google Scholar
Digital Library
- K. Banerjee, S-C. Lin, A. Keshavarzi, S. Narendra, and V. De. A Self-Consistent Junction Temperature Estimation Methodology for Nanometer scale ICs with Implications for Performance and Thermal Management. In IEEE International Electron Devices Meeting (IEDM), pages 887--890, 2003.Google Scholar
Cross Ref
- Kaustav Banerjee, Shukri J. Souri, Pawan Kapur, and Krishna C. Saraswat. 3-d ics: A Novel Chip Design for Improving Deep Submicron Interconnect Performance and Systems-on-Chip Integration. Proceedings of the IEEE, 89(5):602--633, May 2001.Google Scholar
Cross Ref
- Benkart et al. 3D Chip Stack Technology using Through-chip Interconnects. IEEE Design and Test of Computers, 22(6):512--518, Nov/Dec 2005. Google Scholar
Digital Library
- Shekhar Borkar. Design challenges of Technology Scaling. IEEE Micro, 19(4):23--29, 1999. Google Scholar
Digital Library
- J. Adam Butts and Gurindar S. Sohi. A Static Power Model for Architects. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 191--201, New York, NY, USA, 2000. ACM Press. Google Scholar
Digital Library
- Lawrence T. Clark, E.J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K.E. Velarde, and M.A. Yarch. An embedded 32-b microprocessor core for low-power and highperformance applications. volume 36, pages 1599--1608, November 2001.Google Scholar
- T.M. Conte, B.A. Patel, and J.S. Cox. Using Branch Handling Hardware to Support Profile-driven Optimization. In Proceedings of the International symposium on Microarchitecture, pages 12--21, November 1994. Google Scholar
Digital Library
- T.M. Conte, M. Kishore N., and M. Ann Hirsch. Accurate and Practical Profile-driven Compilation using the Profile Buffer. In Proceedings of the 29th Annual International Symposium on Microarchitecture, December 1996. Google Scholar
Digital Library
- Marc L. Corliss, E Christopher Lewis, and Amir Roth. Dise: A Programmable Macro Engine for Customizing Applications. In Proceedings of the Thirtieth International Symposium on Computer Architecture (ISCA-30), June 2003. Google Scholar
Digital Library
- Marc L. Corliss, E Christopher Lewis, and Amir Roth. Low-overhead Debugging via Flexible Dynamic Instrumentation via Dise. In Proceedings of the Eleventh International Symposium on High-Performance Computer Architecture (HPCA-11), pages 303--314, February 2005. Google Scholar
Digital Library
- Digital Equipment Corporation. Alpha 21164 Microprocessor Hardware Reference Manual. 1995.Google Scholar
- Intel Corporation. Pentium(r) Pro Processor Developer's Manual. In McGraw-Hill, June 1997.Google Scholar
- Jedidiah R. Crandall and Frederic T. Chong. Minos: Control Data Attack Prevention Orthogonal to Memory Model. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 221--232, Washington, DC, USA, 2004. IEEE Computer Society. Google Scholar
Digital Library
- Davis et al. Demystifying 3D ICs: The pros and cons of going Vertical. IEEE Design and Test of Computers, 22(6):498--510, Nov/Dec 2005. Google Scholar
Digital Library
- Jeffrey Dean, James E. Hicks, Carl A. Waldspurger, William E. Weihl, and George Z. Chrysos. ProfileMe : Hardware support for instruction-level profiling on out-of-order processors. In International Symposium on Microarchitecture, pages 292--302, 1997. Google Scholar
Digital Library
- J. Douglas and H.H. Rachford. On the numerical solution of heat conduction problems in two or three space variables. Transactions on American Mathematical Society, pages 421--439, 1956.Google Scholar
Cross Ref
- Timothy Heil and James E. Smith. Relational Profiling: Enabling Thread-level Parallelism in Virtual Machines. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 281-290, New York, NY, USA, 2000. ACM Press. Google Scholar
Digital Library
- MIPS Technologies Inc. MIPS R10000 Microprocessor User's Manual. 1995.Google Scholar
- Canturk Isci and Margaret Martonosi. Runtime power monitoring in high-end processors: Methodology and empirical data. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 93, Washington, DC, USA, 2003. IEEE Computer Society. Google Scholar
Digital Library
- Philip Jacob, Okan Erdogan, Aamir Zia, Paul M. Belemjian, Russell P. Kraft, and John F. McDonald. "Predicting the performance of a 3D processor-memory chip stack". IEEE Design and Test of Computers, 22(6):540--547, Nov/Dec 2005. Google Scholar
Digital Library
- Trevor Jim, Greg Morrisett, Dan Grossman, Michael Hicks, James Cheney, and Yanling Wang. Cyclone: A safe dialect of C. In USENIX Annual Technical Conference, June 2002. Google Scholar
Digital Library
- Michael B. Kleiner, Stefan A. Kühn, and Werner Weber. Performance improvement of the memory hierarchy of RISC systems by applications of 3-D technology. In ISCAS, pages 2305--2308, 1995.Google Scholar
Cross Ref
- Rajesh Kumar. Interconnect and noise immunity design for the Pentium 4 processor. In DAC '03: Proceedings of the 40th conference on Design automation, pages 938--943, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- Kyeong Jae Lee and Kevin Skadron. Using performance counters for runtime temperature sensing in high-performance processors. In 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), April 2005. Google Scholar
Digital Library
- Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, and Kaustav Banerjee. A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy In Proceedings of the 43nd Design Automation Conference (DAC), June 2006. Google Scholar
Digital Library
- Christianto C. Liu, Ilya Ganusov, Martin Burtscher, and Sandip Tiwari. Bridging the processor-memory performance gap with 3D IC technology. IEEE Design Test, 22(6):556--564, 2005. Google Scholar
Digital Library
- M. Mamidipaka and Nikil Dutt. eCACTI: An Enhanced Power Model for On-chip Caches. Technical Report CECS TR-04-28, September 2004.Google Scholar
- Claude Massit and Nicolas Gerard. Three-dimensional multichip module United States Patents, US 5373189, December 1994.Google Scholar
- Miura et al. A 195gb/s 1.2w 3D-stacked inductive inter-chip wireless superconnect with transmit power control scheme. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pages 264--265, Feb 2005.Google Scholar
Cross Ref
- Satish Narayanasamy, Gilles Pokam, and Brad Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In 32nd Annual International Symposium on Computer Architecture (ISCA'05), pages 284--295, 2005. Google Scholar
Digital Library
- K. Narbos and J. White. Fastcap: A multipole accelerated 3D capacitance extraction program. IEEE Trans. on CAD, 10(11):1447--1459, 1991.Google Scholar
Digital Library
- George C. Necula, Scott McPeak, and Westley Weimer. Ccured: Type-safe retrofitting of legacy code. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 128--139, New York, NY, USA, 2002. ACM Press. Google Scholar
Digital Library
- M.N. Ozisik. Boundary value problems of heat conduction, 2002.Google Scholar
- D.W. Peaceman and H.H. Rachford. The numerical solution of parabolic and elliptic differential equations. Journal of the Society for Industrial and Applied Mathematics (SIAM), pages 28--41, 1995.Google Scholar
- R.V. Peri, S. Jinturkar, and L. Fajardo. A Novel Technique for Profiling Programs in Embedded Systems. In ACM Workshop on Feedback-Directed and Dynamic Optimization, 1999.Google Scholar
- Kiran Puttaswamy and Gabriel H. Loh. Implementing caches in a 3D technology for high performance processors. newblock In IEEE International Conference on Computer Design (ICCD) 2006, pages 525--532, October 2005. Google Scholar
Digital Library
- Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan. Temperature-aware microarchitecture. In ISCA, pages 2--13. IEEE Computer Society, 2003. Google Scholar
Digital Library
- G. Edward Suh, Jae W. Lee, David Zhang, and Srinivas Devadas. Secure Program Execution via Dynamic Information Flow Tracking. In ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, pages 85--96, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- Yuh-Fang Tsai, Yuan Xie, N. Vijaykrishnan, and Mary Jane Irwin. Three-dimensional cache design exploration using 3DCacti. In IEEE International Conference on Computer Design. IEEE, October 2005. Google Scholar
Digital Library
- Kapil Vaswani, Matthew J. Thazhuthaveetil, and Y.N. Srikant. A Programmable Hardware Path Profiler. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 217---228, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- Emmett Witchel, Josh Cates, and Krste Asanovic. Mondrian memory protection. In ASPLOS-X: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 304--316, New York, NY, USA, 2002. ACM Press. Google Scholar
Digital Library
- Emmett Witchel, Junghwan Rhee, and Krste Asanovic. Mondrix: memory isolation for linux using mondriaan memory protection. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles, pages 31--44, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- Min Xu, Rastislav Bodik, and Mark D. Hill. A "Flight Data Recorder" for enabling full-system multiprocessor deterministic replay. In ISCA '03: Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 122--135, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- Suan Hsi Yong and Susan Horwitz. Protecting C programs from attacks via invalid pointer dereferences. In ESEC/FSE-11: Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering, pages 307--316, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- Annie Zeng, James Lu, Kenneth Rose, and Ronald J. Gutmann. "Firstorder performance prediction of cache memory with wafer-level3d integration. IEEE Design and Test of Computers, 22(6):548--555, Nov/Dec 2005. Google Scholar
Digital Library
- Craig B. Zilles and Gurindar S. Sohi. A Programmable Co-processor for Profiling. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, 2001. Google Scholar
Digital Library
Index Terms
Introspective 3D chips
Recommendations
Introspective 3D chips
Proceedings of the 2006 ASPLOS ConferenceWhile the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on ...
Introspective 3D chips
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsWhile the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on ...
Introspective 3D chips
Proceedings of the 2006 ASPLOS ConferenceWhile the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems, software developers are increasingly dependent on ...






Comments