skip to main content
research-article

Heuristic search for adaptive, defect-tolerant multiprocessor arrays

Published:21 March 2013Publication History
Skip Abstract Section

Abstract

In this article, new heuristic-search methods and algorithms are presented for enabling highly efficient and adaptive, defect-tolerant multiprocessor arrays. We consider systems where a homogeneous multiprocessor array lies on top of reconfigurable interconnects which allow the pipeline stages of the processors to be connected in all possible configurations. Considering the multiprocessor array partitioned in substitutable units at the granularity of pipeline stages, we employ a variety of heuristic-search methods and algorithms to isolate and replace defective units. The proposed heuristics are designed for off-line execution and aim at minimizing the performance overhead necessarily introduced to the array by the interconnects' latency. An empirical evaluation of the designed algorithms is then carried out, in order to assess the targeted problem and the efficacy of our approach. Our findings indicate this to be a NP-complete computational problem, however, our heuristic-search methods can achieve, for the problem sizes we exhaustively searched, 100% accuracy in finding the optimal solution among 1019 possible candidates within 2.5 seconds. Alternatively, they can provide near-optimal solutions at an accuracy which consistently exceeds 70% (compared to the optimal solution) in only 10-4 seconds.

References

  1. Aggarwal, N., Ranganathan, P., Jouppi, N. P., and Smith, J. E. 2007. Configurable isolation: building high availability systems with commodity multi-core processors. In Proceedings of ISCA'07, 470--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Austin, T., Bertacco, V., Mahlke, S., and Cao, Y. 2008. Reliable systems on unreliable fabrics. IEEE Des. Test 25, 4, 322--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Borkar, S. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE, Micro 25, 6, 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2003. Introduction to Algorithms 2nd Ed. McGraw-Hill Science/Engineering/Math. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bhaduri, S. K. Shukla, P. G. M. G. 2005. Reliability analysis of fault-tolerant reconfigurable architectures. In Proceedings of the IEEE International Workshop on Design & Test of Defect-Tolerant Nanoscale Archit (NANOARCH).Google ScholarGoogle Scholar
  6. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gold, B. T., Falsafi, B., and Hoe, J. C. 2009. Chip-level redundancy in distributed shared-memory multiprocessors. In Proceedings of the 15th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'09). 195--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning 1st Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gupta, S., Ansari, A., Feng, S., and Mahlke, S. 2010. StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric. Proceedings of the International Conference on Dependable Systems and Networks, 101--110. Google ScholarGoogle ScholarCross RefCross Ref
  10. Gupta, S., Feng, S., Ansari, A., Blome, J., and Mahlke, S. 2008. The stagenet fabric for constructing resilient multicore systems. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gupta, S., Feng, S., Ansari, A., and Mahlke, S. 2011. Stagenet: A reconfigurable fabric for constructing dependable cmps. IEEE Trans. Comput. 60, 1, 5--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gupta, S., Feng, S., Blome, J., and Mahlke, S. 2007. Stagenet: A reconfigurable cmp fabric for resilient systems. In Proceedings of the Reconfigurable and Adaptive Architecture Workshop (RAAW).Google ScholarGoogle Scholar
  13. Hauck, S. and DeHon, A. 2007. Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hennessy, J. L. and Patterson, D. A. 2007. Computer Architecture—A Quantitative Approach 4th Ed. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hoos, H. H. and Stuetzle, T. 2004. Stochastic Local Search: Foundations and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Emmert, C. Stroud, B. S., and Abramovici, M. 2000. Dynamic fault tolerance in fpgas via partial reconfiguration. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. M. Emmert, C. E. S. and Abramovici, M. 2007. Online fault tolerance for fpga logic blocks. IEEE Trans. VLSI Syst. 15, 216--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. 1983. Optimization by simulated annealing. Science 220, 4598, 671--680. Google ScholarGoogle ScholarCross RefCross Ref
  19. Knuth, D. E. 2011. The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Powell, M. D., Biswas, A., Gupta, S., and Mukherjee, S. S. 2009. Architectural core salvaging in a multi-core processor for hard-error tolerance. In Proceedings of the International Symposium on Computer Architecture (ISCA'09). 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ramirez, A., Cabarcas, F., Juurlink, B., Mesa, M. A., Sanchez, F., Azevedo, A., Meenderinck, C., Ciobanu, C., Isaza, S., and Gaydadjiev, G. 2010. The sarc architecture. IEEE Micro 30, 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ramos, J., Samson, J., Lupia, D., Troxel, I., Subramaniyan, R., A. Jacobs, J. Greco, G. C., Curreri, J., Fischer, M., Grobelny, E., Georgr, A., and Some, R. 2006. High-performance, dependable multiprocessor. In Proceedings of the IEEE Aerospace Conference. Google ScholarGoogle ScholarCross RefCross Ref
  23. Riordan, T., Grewal, G., Hsu, S., Kinsel, J., Libby, J., March, R., Mills, M., Ries, P., and Scofield, R. 1988. The mips m2000 system. In Proceedings of the ICCD. 366--369. Google ScholarGoogle ScholarCross RefCross Ref
  24. Romanescu, B. F. and Sorin, D. J. 2008. Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 43--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rosen, K. H. 2002. Discrete Mathematics and Its Applications 5th Ed. McGraw-Hill Higher Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shuler, Robert, J. 2010. Fpgas with reconfigurable fault-tolerant redundancy. NASA Tech briefs MSC-24464-1, NASA Center, Johnson Space Center.Google ScholarGoogle Scholar
  27. Smaragdos, G. 2012. An adaptive defect-tolerant multiprocessor array architecture. M.S. thesis, Delft University of Technology.Google ScholarGoogle Scholar
  28. Smolens, J. C., Gold, B. T., Falsafi, B., and Hoe, J. C. 2006. Reunion: Complexity-effective multicore redundancy. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO'39). 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Stuart, R. and Peter, N. 2003. Artificial Intelligence: A Modern Approach 2nd Ed. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sylvester, D., Blaauw, D., and Karl, E. 2006. Elastic: An adaptive self-healing architecture for unpredictable silicon. Proceedings of the Desi. Test Comput. 23, 6, 484--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tzilis, S., Sourdis, I., and Gaydadjiev, G. 2010. Fine-grain fault diagnosis for fpga logic blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT 2010). Beijing, China. Google ScholarGoogle ScholarCross RefCross Ref
  32. Vasilikos, V. 2011. Heuristic search for defect tolerant adaptive multiprocessor arrays. M.S. thesis.Google ScholarGoogle Scholar
  33. Černý, V. 1985. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. J. Optimi. Theory Appli. 45, 1, 41--51. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Heuristic search for adaptive, defect-tolerant multiprocessor arrays

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!