Abstract
This article introduces the resilient adaptive algebraic architecture that aims at adapting parallelism exploitation of a matrix multiplication algorithm in a time-deterministic fashion to reduce power consumption while meeting real-time deadlines present in most DSP-like applications. The proposed architecture provides low-overhead error correction capabilities relying on the hardware implementation of the algorithm-based fault-tolerance method that is executed concurrently with matrix multiplication, providing efficient occupation of memory and power resources. The Resilient Adaptive Algebraic Architecture (RA3) is evaluated using three real-time industrial case studies from the telecom and multimedia application domains to present the design space exploration and the adaptation possibilities the architecture offers to hardware designers. RA3 is compared in its performance and energy efficiency with standard high-performance architectures, namely a GPU and an out-of-order general-purpose processor. Finally, we present the results of fault injection campaigns in order to measure the architecture resilience to soft errors.
- A. Ahmadian, M. Hosseingholi, and A. Ejlali. 2010. A control-theoretic energy management for fault-tolerant hard real-time systems. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 173--178.Google Scholar
- R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Des. Test 22, 3, 258--266. Google Scholar
Digital Library
- M. Berggren, M. Borgh, C. Schuldt, F. Lindstrom, and I. Claesson. 2011. Low-complexity network echo cancellation approach for systems equipped with external memory. Trans. Audio, Speech Lang. Proc. 19, 8, 2506--2515. Google Scholar
Digital Library
- C. Ding, C. Karlsson, H. Liu, T. Davies, and Z. Chen. 2011. Matrix multiplication on GPUs with on-line fault tolerance. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'11). 311--317. Google Scholar
Digital Library
- H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM Press, New York, 365--376. Google Scholar
Digital Library
- F. Firouzi, M. E. Salehi, A. Azarpeyvand, S. M. Fakhraie, and F. Wang. 2010. Reliability considerations in dynamic voltage and frequency scaling schemes. In Proceedings of the 5th International Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS'10). 1--4.Google Scholar
- R. F. H. Fischer and C. Windpassinger. 2003. Real versus complex-valued equalisation in v-blast systems. Electron. Lett. 39, 5, 470--471.Google Scholar
Cross Ref
- G. H. Golub and C. F. Van Loan. 1996. Matrix Computations 3rd Ed. Johns Hopkins Studies in Mathematical Sciences, The Johns Hopkins University Press. Google Scholar
Digital Library
- B. Goode. 2002. Voice over Internet protocol (VoIP). Proc. IEEE 90, 9, 1495--1517.Google Scholar
Cross Ref
- T. Hoang, V. Saseendran, D. Siaudinis, and P. Larsson-Edefors. 2011. Power gating multiplier of embedded processor datapath. In Proceedings of the 7th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME'11). 41--44.Google Scholar
- K. Huang and J. A. Abraham. 1984. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 33, 6, 518--528. Google Scholar
Digital Library
- D. Ikebuchi, N. Seki, Y. Kojima, M. Kamata, L. Zhao, H. Amano, T. Shirai, S. Koyama, T. Hashida, Y. Umahashi, H. Masuda, K. Usami, S. Takeda, H. Nakamura, M. Namiki, and M. Kondo. 2009. Geyser-1: A mips r3000 CPU core with fine grain runtime power gating. In Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC'09). 281--284.Google Scholar
- IEEE. 2009. IEEE standard for local and metropolitan area networks part 16: Air interface for broadband wireless access systems. IEEE Std 802.16-2009 (Revision of IEEE Std 802.16-2004) (29 2009), 1--2080.Google Scholar
- ITRS. 2012. ITRS 2012 update. http://www.itrs.net/Links/2012ITRS/Home2012.htm.Google Scholar
- ITU-T H.264. 2012. Advanced video coding for generic audiovisual services. https://www.itu.int/rec/T-REC-H.264.Google Scholar
- E. H. Krishna, M. Raghuram, K. V. Madhav, and K. A. Reddy. 2010. Acoustic echo cancellation using a computationally efficient transform domain LMS adaptive filter. In Proceedings of the 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA'10). 409--412.Google Scholar
- J. Lee and N. S. Kim. 2009. Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). 47--50. Google Scholar
Digital Library
- X. Lin, A. W. H. Khong, M. Doroslovacki, and P. A. Naylor. 2008. Frequency-domain adaptive algorithm for network echo cancellation in VoIP. EURASIP J. Audio Speech Music Process. 2008, 3. Google Scholar
Digital Library
- P. Luethi, A. Burg, S. Haene, D. Perels, N. Felber, and W. Fichtner. 2007. VLSI implementation of a high-speed iterative sorted MMSE QR decomposition. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'07). 1421--1424.Google Scholar
- A. Nelson, A. Molnos, and K. Goossens. 2011. Composable power management with energy and power budgets per application. In Proceedings of the International Conference on Embedded Computer Systems (SAMOS'11). 396--403.Google Scholar
- F. Ohrtman. 2002. Softswitch: Architecture for VoIP 1st Ed. McGraw-Hill Professional.Google Scholar
- C. Radhakrishnan and W. K. Jenkins. 2010. Fault tolerance in transform-domain adaptive filters operating with real-valued signals. IEEE Trans. Circ. Sys. 57, 1, 166--178. Google Scholar
Digital Library
- Z. Ren, B. H. Krogh, and R. Marculescu. 2005. Hierarchical adaptive dynamic power management. IEEE Trans. Comput. 54, 4, 409--420. Google Scholar
Digital Library
- P. Reviriego, J. A. Maestro, and O. Ruano. 2008. Efficient protection techniques against SEUs for adaptive filters: An echo canceller case study. IEEE Trans. Nuclear Sci. 55, 3, 1700--1707.Google Scholar
Cross Ref
- P. Salmela, A. Burian, H. Sorokin, and J. Takala. 2008. Complex-valued qr decomposition implementation for MIMO receivers. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'08). 1433--1436.Google Scholar
- Samsung. 2013. Samsung green lpddr2. http://www.samsung.com/global/business/semiconductor/minisite/Greenmemory/Products/LPDDR2/LPDDR2_Features.htm.Google Scholar
- M. Sondhi and D. Berkley. 1980. Silencing echoes on the telephone network. Proc. IEEE 68, 8, 948--963.Google Scholar
Cross Ref
- K. Sühring. 2013. H.264/AVC software coordination. http://iphome.hhi.de/suehring/tml/.Google Scholar
- K. Usami, Y. Goto, K. Matsunaga, S. Koyama, D. Ikebuchi, et al. 2011. On-chip detection methodology for break-even time of power gated function units. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'11). 161--168. Google Scholar
Digital Library
- Z. Yu, Z. Shi, and X. Zeng. 2011. Fault tolerant computing for stream DSP applications using GALs multicore processors. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'11). 2305--2308.Google Scholar
- G. Zoia, A. A. Sturzenegger, and O. Hochreutiner. 2007. Audio quality and acoustic echo issues for VoIP on portable devices. In Proceedings of the IEEE International Conference on Portable Information Devices (PORTABLE'07). 1--5.Google Scholar
Index Terms
Adaptive Parallelism Exploitation under Physical and Real-Time Constraints for Resilient Systems
Recommendations
Hybrid Timing Analysis of Modern Processor Pipelines via Hardware/Software Interactions
RTAS '08: Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications SymposiumEmbedded systems are often subject to constraints that require determinism to ensure that task deadlines are met. Such systems are referred to as real-time systems. Schedulability analysis provides a firm basis to ensure that tasks meet their deadlines ...
Automatic detection and exploitation of branch constraints for timing analysis
Predicting the worst-case execution time (WCET) and best-case execution time (BCET) of a real-time program is a challenging task. Though much progress has been made in obtaining tighter timing predictions by using techniques that model the architectural ...
Safely exploiting multithreaded processors to tolerate memory latency in real-time systems
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systemsA coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited ...






Comments