Abstract
The introduction of cache-coherent processor-logic interconnects in CPU-FPGA platforms promises low-latency communication between CPU and FPGA fabrics. This reduced latency improves the performance of heterogeneous systems implemented on such devices and gives rise to new software architectures that can better use the available hardware.
Via an extended study accelerating the software task scheduler of a microkernel operating system, this article reports on the potential for accelerating applications that exhibit fine-grained interactions. In doing so, we evaluate the performance of direct and cache-coherent communication methods for applications that involve frequent, low-bandwidth transactions between CPU and programmable logic.
In the specific case we studied, we found that replacing a highly optimised software implementation of the task scheduler with an FPGA-based scheduler reduces the cost of communication between two software threads by 5.5%. We also found that, while hardware acceleration reduces cache footprint, we still observe execution time variability because of other non-deterministic features of the CPU.
- ARM limited. 2005. ARMv7-A Architecture Reference Manual DDI 0406C.b.Google Scholar
- ARM limited. 2011. AMBA® AXI™ and ACE™ Protocol Specification IHI 0022D (ID102711).Google Scholar
- ARM limited. 2012. ARM Cortex-A9 MPCore Technical Reference Manual DDI0407H.Google Scholar
- B. Blackham, Yao Shi, S. Chattopadhyay, A. Roychoudhury, and Gernot Heiser. 2011. Timing analysis of a protected operating system kernel. In Proceedings of the IEEE 32nd Real-Time Systems Symposium (RTSS’11). 339--348. Google Scholar
Digital Library
- J. Dahlstrom and S. Taylor. 2013. Migrating an OS scheduler into tightly coupled FPGA logic to increase attacker workload. In Proceedings of the IEEE Military Communications Conference (MILCOM’13). 986--991.Google Scholar
- E. Dodiu and V. G. Gaitan. 2012. Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers—Concept and theory of operation. In Proceedings of the IEEE International Conference on Electro/Information Technology (EIT’12). 1--5.Google Scholar
- Muhuan Huang, K. Lim, and J. Cong. 2014. A scalable, high-performance customized priority queue. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL’14). 1--4.Google Scholar
- A. Ioannou and M. G. H. Katevenis. 2007. Pipelined heap (priority queue) management for advanced scheduling in high-speed networks. IEEE/ACM Trans. Network. 15, 2 (April 2007), 450--461. Google Scholar
Digital Library
- Gerwin Klein, June Andronick, Kevin Elphinstone, Toby Murray, Thomas Sewell, Rafal Kolanski, and Gernot Heiser. 2014. Comprehensive formal verification of an OS microkernel. ACM Trans. Comput. Syst. 32, 1 (Feb. 2014), 2:1--2:70. Google Scholar
Digital Library
- Pramote Kuacharoen, Mohamed A. Shalan, and Vincent J. Mooney III. 2003. A configurable hardware scheduler for real-time systems. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms. CSREA Press, 96--101.Google Scholar
- Bo-Cheng Charles Lai, P. Schaumont, and I. Verbauwhede. 2005. A light-weight cooperative multi-threading with hardware supported thread-management on an embedded multi-processor system. In Proceedings of the 39th Asilomar Conference onSignals, Systems and Computers. 1647--1651.Google Scholar
- Anna Lyons and Gernot Heiser. 2016. It’s time: OS mechanisms for enforcing asymmetric temporal integrity. CoRR abs/1606.00111 (2016). Retrieved from http://arxiv.org/abs/1606.00111.Google Scholar
- V. J. Mooney and D. M. Blough. 2002. A hardware-software real-time operating system framework for SoCs. IEEE Design Test Comput. 19, 6 (Nov. 2002), 44--51. Google Scholar
Digital Library
- André C. Nácul, Francesco Regazzoni, and Marcello Lajolo. 2007. Hardware scheduling support in SMP architectures. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’07). EDA Consortium, San Jose, CA, 642--647. Retrieved from http://dl.acm.org/citation.cfm?id=1266366.1266502. Google Scholar
Digital Library
- Soon Ee Ong, Siaw Chen Lee, N. B. Z. Ali, and F. A. B. Hussin. 2013. SEOS: Hardware implementation of real-time operating system for adaptability. In Proceedings of the 1st International Symposium on Computing and Networking (CANDAR’13). 612--616. Google Scholar
Digital Library
- T. Sewell, F. Kam, and G. Heiser. 2016. Complete, high-assurance determination of loop bounds and infeasible paths for WCET analysis. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16). 1--11.Google Scholar
Index Terms
Efficient Fine-grained Processor-logic Interactions on the Cache-coherent Zynq Platform
Recommendations
A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only)
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysA mixed-grained reconfigurable computing platform targeting multiple-standard video decoding is proposed in this paper. The platform integrates eight coarse-grained Reconfigurable Processing Units (RPUs), each of which consists of 16×16 multi-functional ...
Rapid Implementation of Embedded Systems using Xilinx Zynq Platform
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media ConferenceIn any digital system design, it is crucial to achieve the lowest time-to-market possible. Indeed, that need has pushed large FPGA manufacturers to produce SoCs which will implement reprogrammable logic along with CPU and DSP cores. Especially, during ...
A Reconfigurable Processor Architecture Combining Multi-core and Reconfigurable Processing Unit
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information TechnologyIt’s a promising way to improve performance significantly by adding reconfigurable processing unit to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is ...






Comments