Abstract
Wireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing basestation transceivers utilize customized DSP cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software-programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and Network-on-chip (NoC).
We propose SPECTRUM, a predictable, software-defined many-core architecture that exploits the massive parallelism of the LTE/5G baseband processing workload. The focus is on designing scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled NoC that orchestrates the communication to avoid any contention, and per-core software-controlled scratchpad memory with deterministic access latency. Compared to many-core architecture like Skylake-SP (average power 215 W) that drops 14% packets at high-traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24 W. SPECTRUM consumes 2.11× lower power than C66x DSP cores+accelerator platform in baseband processing. We also enable SPECTRUM to handle dynamic workloads with multiple service categories present in 5G mobile network (Enhanced Mobile Broadband (eMBB), Ultra-reliable and Low-latency Communications (URLLC), and Massive Machine Type Communications (mMTC)), using a run-time scheduling and mapping algorithm. Experimental evaluations show that our algorithm performs task/NoC mapping at run-time on fewer cores compared to the static mapping (that reserves cores exclusively for each service category) while still meeting the differentiated latency and reliability requirements.
- 2009. Alcatel-Lucent 9926 digital 2U eNodeB baseband unit. Alcatel-lucent product brief. Retrieved from https://bit.ly/3gKlOv0.Google Scholar
- 2010. Amber ARM-Compatible Core. Retrieved from https://opencores.org/project,amber.Google Scholar
- 2011. LTE baseband targeted design platform. Xilinx product brief. Retrieved from http://www.origin.xilinx.com/publications/prod_mktg/LTE-Baseband-SellSheet.pdf.Google Scholar
- 2011. Temperature Control Solution of Communication Base Station. Retrieved from https://bit.ly/2Bpa9jH.Google Scholar
- 2012. LTE baseband targeted design platform. Xilinx product brief. Retrieved from https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/po/wireless-channel-card.pdf.Google Scholar
- 2012. Octean Fusion-M CN73XX. Retrieved from https://bit.ly/2TypyW7.Google Scholar
- 2013. 66AK2Hxx Multicore DSP+ARM Keystone II SoC. Retrieved from https://bit.ly/2zgPDjO.Google Scholar
- 2013. QorIQ Qonverge B4860 Baseband Processor. Retrieved from https://bit.ly/2uT6lnp.Google Scholar
- 2013. SoC and ASIC Design At Ericsson. Retrieved from https://bit.ly/2TOMLmP.Google Scholar
- 2014. Open Air Interface. Retrieved from http://www.openairinterface.org/.Google Scholar
- 2016. Transcede t3K Concurrent Dual-Mode SoC Family Communiation Infrastructure. Retrieved from https://intel.ly/2OvK4aY.Google Scholar
- 2017. LTE 3GPP releases Overview. Retrieved from https://bit.ly/2DNNnoh.Google Scholar
- 2018. Personal communication with base station manufacturer.Google Scholar
- 2019. Ericsson Mobility Report. Retrieved from https://bit.ly/2LONsuD.Google Scholar
- 2019. LTE UE Category 8 Class Definitions. Retrieved from https://bit.ly/30Kf5cw.Google Scholar
- 3GPP. 2017. Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation. Technical Specification (TS) 36.211. 3rd Generation Partnership Project (3GPP). Version 14.2.0.Google Scholar
- 3GPP. 2017. Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Layer Procedures. Technical Specification (TS) 36.213. 3rd Generation Partnership Project (3GPP). Version 14.2.0.Google Scholar
- 3GPP. 2018. Universal Mobile Telecommunications System (UMTS); Base Station (BS) Radio Transmission and Reception (FDD). Technical Specification (TS) 25.104. Retrieved from http://www.3gpp.org/release-15 Version 15.4.0 Release 15.Google Scholar
- Sebastian Altmeyer et al. 2014. Evaluation of cache partitioning for hard real-time systems. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS’14).Google Scholar
- Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6--26.Google Scholar
Digital Library
- Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, NY, 73--78.Google Scholar
Digital Library
- Sandro Belfanti, Christoph Roth, Michael Gautschi, Christian Benkeser, and Qiuting Huang. 2013. A 1Gbps LTE-advanced turbo-decoder ASIC in 65nm CMOS. In Proceedings of the Symposium on VLSI Circuits (VLSIC’13). IEEE, C284--C285.Google Scholar
- Paul Bender, Peter Black, Matthew Grob, Roberto Padovani, Nagabhushana Sindhushayana, and Andrew Viterbi. 2010. CDMA/HDR: A bandwidth-efficient high-speed wireless data service for nomadic users. In The Foundations of the Digital Wireless World: Selected Works of A. J. Viterbi. World Scientific, 161--168.Google Scholar
- Sourjya Bhaumik, Shoban Preeth Chandrabose, Manjunath Kashyap Jataprolu, Gautam Kumar, Anand Muralidhar, Paul Polakos, Vikram Srinivasan, and Thomas Woo. 2012. CloudIQ: A framework for processing base stations in a data center. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking. ACM, 125--136.Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.Google Scholar
Digital Library
- Ouajdi Brini and Mounir Boukadoum. 2017. Virtualization of the LTE physical layer symbol processing with GPUs. In Proceedings of the 15th IEEE International New Circuits and Systems Conference (NEWCAS’17). IEEE, 329--332.Google Scholar
Cross Ref
- Dai Bui, Alessandro Pinto, and Edward A. Lee. 2009. On-time network on-chip: Analysis and architecture. EECS Department, University of California, Berkeley, Technical report UCB/EECS-2009-59.Google Scholar
- Dai N. Bui, Hiren D. Patel, and Edward A. Lee. 2010. Deploying hard real-time control software on-chip multiprocessors. In Proceedings of the IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’10). IEEE, 283--292.Google Scholar
- Nicola Bui and Joerg Widmer. 2016. Owl: A reliable online watcher for lte control channel measurements. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges. ACM, 25--30.Google Scholar
Digital Library
- Divya Chitimalla, Koteswararao Kondepu, Luca Valcarenghi, and Biswanath Mukherjee. 2015. Reconfigurable and efficient fronthaul of 5G systems. In Proceedings of the IEEE International Conference on Advanced Networks and Telecommuncations Systems, ANTS 2015, Kolkata, India, December 15-18, 2015. 1--5. DOI:https://doi.org/10.1109/ANTS.2015.7413609Google Scholar
Cross Ref
- Christoph Cullmann et al. 2010. Predictability considerations in the design of multi-core embedded systems. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’10).Google Scholar
- W. J. Dally. 1992. Virtual-channel flow control. IEEE Trans. Parallel Distrib. Syst. 3, 2 (Mar. 1992), 194--205. DOI:https://doi.org/10.1109/71.127260Google Scholar
Digital Library
- Benoît Dupont de Dinechin, Pierre Guironnet de Massas, Guillaume Lager, Clément Léger, Benjamin Orgogozo, Jérôme Reybert, and Thierry Strudel. 2013. A distributed run-time environment for the kalray MPPA-256 integrated manycore processor. In Proceedings of the International Conference on Computational Science (ICCS’13), Vol. 13. 1654--1663.Google Scholar
Cross Ref
- Stephen A. Edwards and Edward A. Lee. 2007. The case for the precision timed (PRET) machine. In Proceedings of the 44th ACM/IEEE Design Automation Conference. IEEE, 264--265.Google Scholar
- R. Damodaran et al. 2012. A 1.25 GHz 0.8 W C66x DSP core in 40 nm CMOS. In Proceedings of the IEEE International Conference on VLSI Design (VLSID’12).Google Scholar
- Heiko Falk et al. 2007. Compile-time decided instruction cache locking using worst-case execution paths. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07).Google Scholar
- Heiko Falk et al. 2009. Optimal static WCET-aware scratchpad allocation of program code. In Proceedings of the Design Automation Conference (DAC’09).Google Scholar
- Arnon Friedmann and Sandeep Kumar. 2009. LTE emerges as early leader in 4G technologies. In White Paper. Texas Instruments.Google Scholar
- Nan Guan et al. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the International Conference on Embedded Software (EMSOFT’09).Google Scholar
- Andreas Hansson, Mahesh Subburaman, and Kees Goossens. 2009. Aelite: A flit-synchronous network on chip with composable and predictable services. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’09). European Design and Automation Association, Belgium, 250--255. http://dl.acm.org/citation.cfm?id=1874620.1874679Google Scholar
Cross Ref
- S. Hesham, J. Rettkowski, D. Goehringer, and M. A. Abd El Ghany. 2017. Survey on real-time networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 28, 5 (May 2017), 1500--1517. DOI:https://doi.org/10.1109/TPDS.2016.2623619Google Scholar
Digital Library
- Huawei. [n.d.]. Base Station Operation Increases the Efficiency of Network Construction. Retrieved from http://carrier.huawei.com/en/solutions/maximizing-network-value/base-station-operation-increases-the-efficiency.Google Scholar
- Yiming Huo, Xiaodai Dong, and Wei Xu. 2017. 5G cellular user equipment: From theory to practical hardware design. IEEE Access 5 (2017), 13992--14010.Google Scholar
Cross Ref
- ITU. 2018. Setting the Scene for 5G: Opportunities 8 Challenges. Retrieved from https://bit.ly/2MO2Swv.Google Scholar
- Xianfeng Li, Yun Liang, Tulika Mitra, and Abhik Roychoudhury. 2007. Chronos: A timing analyzer for embedded software. Sci. Comput. Program. 69, 1--3 (2007), 56--67.Google Scholar
Cross Ref
- Jing Lu, Ke Bai, and Aviral Shrivastava. 2015. Efficient code assignment techniques for local memory on software managed multicores. ACM Trans. Embed. Comput. Syst. 14, 4, Article 71 (Dec. 2015), 24 pages.Google Scholar
Digital Library
- Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, et al. 2010. The 48-core scc processor: The programmer’s view. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11.Google Scholar
Digital Library
- S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design Automation Test in Europe Conference, Vol. 1. 1--6. DOI:https://doi.org/10.1109/DATE.2006.244007Google Scholar
Cross Ref
- ns 3. 2010. ns-3 network simulator. Retrieved from https://www.nsnam.org/.Google Scholar
- Imtiaz Parvez, Ali Rahmati, Ismail Guvenc, Arif I. Sarwat, and Huaiyu Dai. 2018. A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys 8 Tutorials 20, 4 (2018), 3098--3130.Google Scholar
- Arogyaswami Paulraj, Rohit Nabar, and Dhananjay Gore. 2003. Introduction to Space-time Wireless Communications. Cambridge University Press.Google Scholar
- Klaus I. Pedersen, Gilberto Berardinelli, Frank Frederiksen, Preben Mogensen, and Agnieszka Szufarska. 2016. A flexible 5G frame structure design for frequency-division duplex cases. IEEE Commun. Mag. 54, 3 (2016), 53--59.Google Scholar
Digital Library
- Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean François Nezan, and Slaheddine Aridhi. 2014. Preesm: A dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In Proceedings of the European Embedded Design in Education and Research Conference (EDERC’14). 36.Google Scholar
Cross Ref
- Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, et al. 2015. T-CREST: Time-predictable multi-core architecture for embedded systems. J. Syst. Architect. 61, 9 (2015), 449--471.Google Scholar
Digital Library
- Martin Schoeberl, Florian Brandner, Jens Sparsø, and Evangelia Kasapaki. 2012. A statically scheduled time-division-multiplexed network-on-chip for real-time systems. In Proceedings of the IEEE/ACM Sixth International Symposium on Networks-on-Chip (NOCS’12). IEEE Computer Society, Washington, D.C., 152--160. DOI:https://doi.org/10.1109/NOCS.2012.25Google Scholar
Digital Library
- Philipp Schulz, Maximilian Matthe, Henrik Klessig, Meryem Simsek, Gerhard Fettweis, Junaid Ansari, Shehzad Ali Ashraf, Bjoern Almeroth, Jens Voigt, Ines Riedel, et al. 2017. Latency critical IoT applications in 5G: Perspective on the design of radio interface and network architecture. IEEE Commun. Mag. 55, 2 (2017), 70--78.Google Scholar
Digital Library
- Silexica. 2016. Multi-core Software Design For an LTE Base Station, White Paper. Retrieved from https://bit.ly/2TyE7sx.Google Scholar
- Magnus Sjalander, Sally A. McKee, Peter Brauer, David Engdal, and Andras Vajda. 2012. An LTE uplink receiver PHY benchmark and subframe-based power management. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’12). IEEE Computer Society, Washington, D.C, 25--34. DOI:https://doi.org/10.1109/ISPASS.2012.6189203Google Scholar
Digital Library
- Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In Proceedings of the Hot Chips 27 Symposium (HCS’15). IEEE, 1--24.Google Scholar
Cross Ref
- Manikantan Srinivasan, C. Siva Ram Murthy, and Anusuya Balasubramanian. 2015. Modular performance analysis of multicore SoC-based small cell LTE base station. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’15). IEEE, 37--42.Google Scholar
- Christoph Studer, Christian Benkeser, Sandro Belfanti, and Quiting Huang. 2011. Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J. Solid-State Circ. 46, 1 (2011), 8--17.Google Scholar
Cross Ref
- Vivy Suhendra et al. 2005. WCET centric data allocation to scratchpad memory. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’05).Google Scholar
- Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Paul Johnson, Jae-Wook Lee, Walter Lee, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2 (2002), 25--35.Google Scholar
Digital Library
- Theo Ungerer, Francisco Cazorla, Pascal Sainrat, Guillem Bernat, Zlatko Petrov, Christine Rochange, Eduardo Quinones, Mike Gerdes, Marco Paolieri, Julian Wolf, et al. 2010. Merasa: Multicore execution of hard real-time applications supporting analyzability. IEEE Micro 30, 5 (2010), 66--75.Google Scholar
Digital Library
- Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM J. Comput. 11, 2 (1982), 350--361.Google Scholar
Digital Library
- Vanchinathan Venkataramani, Mun Choon Chan, and Tulika Mitra. 2019. Scratchpad-memory management for multi-threaded applications on many-core architectures. ACM Trans. Embed. Comput. Syst. 18, 1 (2019), 10.Google Scholar
Digital Library
- Vanchinathan Venkataramani, Anuj Pathania, and Tulika Mitra. 2020. Unified thread-and data-mapping for multi-threaded multi-phase applications on SPM many-cores. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE'20). 1496--1501.Google Scholar
Cross Ref
- Xavier Vera, Björn Lisper, and Jingling Xue. 2007. Data cache locking for tight timing calculations. ACM Trans. Embed. Comput. Syst. 7, 1 (2007), 1--38.Google Scholar
Digital Library
- Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, et al. 2008. The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3 (2008), 1--53.Google Scholar
Digital Library
- Qi Zheng, Yajing Chen, Ronald G. Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott A. Mahlke, and Trevor N. Mudge. 2013. WiBench: An open source kernel suite for benchmarking wireless systems. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’13), Portland, OR, USA, September 22-24, 2013. 123--132. DOI:https://doi.org/10.1109/IISWC.2013.6704678Google Scholar
- Qi Zheng, Yajing Chen, Hyunseok Lee, Ronald Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott Mahlke, and Trevor Mudge. 2015. Using graphics processing units in an LTE base station. J. Signal Process. Systems 78, 1 (Jan. 2015), 35--47.Google Scholar
Digital Library
Index Terms
SPECTRUM: A Software-defined Predictable Many-core Architecture for LTE/5G Baseband Processing
Recommendations
SPECTRUM: a software defined predictable many-core architecture for LTE baseband processing
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsWireless communication standards such as Long Term Evolution (LTE) are rapidly changing to support the high data rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation ...
Using Graphics Processing Units in an LTE Base Station
Base stations have been built from ASICs, DSP processors, or FPGAs. This paper studies the feasibility of building wireless base stations from commercial graphics processing units (GPUs). GPUs are attractive because they are widely used massively ...
LTE/Wi-Fi Coexistence in 5 GHz ISM Spectrum: Issues, Solutions and Perspectives
The introduction of a high number of small cells in cellular networks and the complementary adoption of WLAN technologies in unlicensed spectrum are interesting options to attend the increasing demand for Internet traffic in wireless broadband access ...






Comments