skip to main content
research-article

FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation

Published:01 September 2009Publication History
Skip Abstract Section

Abstract

Lithography simulation, an essential step in design for manufacturability (DFM), is still far from computationally efficient. Most leading companies use large clusters of server computers to achieve acceptable turn-around time. Thus coprocessor acceleration is very attractive for obtaining increased computational performance with a reduced power consumption. This article describes the implementation of a customized accelerator on FPGA using a polygon-based simulation model. An application-specific memory partitioning scheme is designed to meet the bandwidth requirements for a large number of processing elements. Deep loop pipelining and ping-pong buffer based function block pipelining are also implemented in our design. Initial results show a 15X speedup versus the software implementation running on a microprocessor, and more speedup is expected via further performance tuning. The implementation also leverages state-of-art C-to-RTL synthesis tools. At the same time, we also identify the need for manual architecture-level exploration for parallel implementations. Moreover, we implement the algorithm on NVIDIA GPUs using the CUDA programming environment, and provide some useful comparisons for different kinds of accelerators.

References

  1. Cao, Y., Lu, Y.-W., Chen, L., and Ye, J. 2004. Optimized hardware and software for fast full-chip simulation. In Proceedings of SPIE: Optical Microlithography XVIII. Vol. 5754, 407--414.Google ScholarGoogle Scholar
  2. Cobb, N. B. 1998. Fast optical and process proximity correction algorithms for integrated circuit manufacturing. Ph.D. thesis, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cobb, N. B. and Zakhor, A. 1995. Fast, low-complexity mask design. In Proceedings of SPIE: Optical/Laser Microlithography VIII. Vol. 2440, T. A. Brunner, Ed. 313--327.Google ScholarGoogle Scholar
  4. Cong, J. and Zou, Y. 2008. Lithographic aerial image simulation with FPGA-based hardware acceleration. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA’08). ACM, 67--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Doggett, M. and Meissner, M. 1999. A memory addressing and access design for real time volume rendering. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’99). 344--347.Google ScholarGoogle Scholar
  6. Frigo, M. and Johnson, S. G. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2, 216--231.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mack, C. A. 2005. Lithography simulation in semiconductor manufacturing. In Proceedings of SPIE: Advanced Microlithography Technologies. Vol. 5645, 63--83.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mencer, O. and Clapp, R. G. 2007. Accelerating 2D FFTs and convolutions for seismic processing. Brief notes, Maxeler Technologies.Google ScholarGoogle Scholar
  9. Mentor. 2004. Datasheet of Calibre nmOPC. Mentor Graphics Corporation.Google ScholarGoogle Scholar
  10. Pati, Y. C. and Kailath, T. 1994. Phase-shifting masks for microlithography: Automated design and mask requirements. J. Opt. Soc. Am. A 11, 9, 2438.Google ScholarGoogle ScholarCross RefCross Ref
  11. Podlozhnyuk, V. 2007. FFT-based 2D convolution. NVIDIA white paper.Google ScholarGoogle Scholar
  12. Tanskanen, J. K., Sihvo, T., and Niittylahti, J. 2004. Byte and modulo addressable parallel memory architecture for video coding. IEEE Trans. Circ. Syst. Video Technol. 14, 11, 1270--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Uzun, I., Amira, A., and Bouridane, A. 2005. FPGA implementations of fast fourier transforms for real-time signal and image processing. IEEE Proc. Vision, Image, Signal Process. 152, 3, 283--296.Google ScholarGoogle Scholar
  14. Wang, Y.-T., Tsai, C.-M., and Chang, F.-C. 2006. Lithographic simulations using graphical processing units. United States Patent Application 20060242618.Google ScholarGoogle Scholar
  15. Wong, A. K.-K. 2005. Optical Imaging in Projection Microlithography. SPIE Press, Bellingham, WA.Google ScholarGoogle Scholar
  16. Wong, A. K.-K. 2007. Private communication. Magma Design Automation Inc.Google ScholarGoogle Scholar
  17. Yeung, M. S. 2003. Fast and rigorous three-dimensional mask diffraction simulation using battle-lemarie wavelet-based multiresolution time-domain method. In Proceedings of SPIE: Optical Microlithography XVI. Vol. 5040, 69--77.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yu, P. and Pan, D. Z. 2007. A novel intensity based optical proximity correction algorithm with speedup in lithography simulation. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’07). 854--859. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation

      Recommendations

      Reviews

      Javier Castillo

      In high-performance reconfigurable computing (HPRC), a reconfigurable device is used to accelerate some parts of a computing-intensive application. HPRC is an emerging field, and its importance can be seen in the number of companies that have launched a system that has a field-programmable gate array (FPGA) connected to its computational nodes, such as SGI's reconfigurable application-specific computing (RASC), Nallatech's front-side bus (FSB) modules, and XtremeData's XD1000 system?they all have a similar architecture, and the system's microprocessor uses a high-speed channel to connect to the FPGA. The authors use an Opteron processor that connects to Altera's Stratix 2 FPGA via hypertransport links. The paper presents an optical lithography simulation algorithm that accelerates using reconfigurable hardware. "Optical lithography is the technology used for printing circuit patterns onto wafers. As the technology scales down and the feature size is even smaller than the wavelength of the light employed, significant light interference and diffraction may occur during the imaging process." Therefore, it is necessary to simulate the imaging process prior to manufacturing, in order to ensure its correctness. The method used to resolve the problem is based on decomposing the "system into many coherent systems with decreasing importance." As the authors explain, "the image corresponding to each coherent system can be obtained via numerical image convolution, and the final image is the weighted sum of the image of each coherent system." In the frequency domain, the convolution is done by applying fast Fourier transforms to the data. Since the layout of the very large-scale integration (VLSI) circuits is only composed of rectangles, the convolution values are precomputed and stored. Although this method is accurate enough to solve the problem, it is computationally demanding. The authors present a new hardware architecture to solve the problem, and then compare it with other existing architectures. Using C, they explore the problem and propose an optimized architecture. Next, a synthesis tool?AutoPilot?generates the final hardware implementation. The algorithm kernel is a loop that can be rearranged to exploit its intrinsic parallelism. The authors analyze the results from this rearranged loop to decide a hardware/software partition and a communication pattern for the system. The paper mainly discusses how to parallelize the hardware implementation and partition the memory, based on the data extracted from the high-level C implementation of the system. In Section 4.2, the authors describe how they rewrote the C code to implement specific architectural decisions. The section concludes that there is still a gap between the software C code and the C code suitable for hardware generation. The paper ends with results from different experiments, and a critique of the Compute Unified Device Architecture (CUDA) version of the algorithm, running on a graphics processing unit (GPU). Unfortunately, the authors fail to explain the scalability advantages of FPGAs over GPUs. The authors conclude that, while using a C tool is both useful and reduces the design time, it is difficult to extract the algorithm's parallelism and manage the system's memory mapping. In summary, readers may find ideas in this paper for future research on HPRC machines. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 3
        September 2009
        121 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/1575774
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2009
        • Accepted: 1 December 2008
        • Revised: 1 November 2008
        • Received: 1 June 2008
        Published in trets Volume 2, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!