Abstract
This paper presents PARD, a programmable architecture for resourcing-on-demand that provides a new programming interface to convey an application's high-level information like quality-of-service requirements to the hardware. PARD enables new functionalities like fully hardware-supported virtualization and differentiated services in computers. PARD is inspired by the observation that a computer is inherently a network in which hardware components communicate via packets (e.g., over the NoC or PCIe). We apply principles of software-defined networking to this intra-computer network and address three major challenges. First, to deal with the semantic gap between high-level applications and underlying hardware packets, PARD attaches a high-level semantic tag (e.g., a virtual machine or thread ID) to each memory-access, I/O, or interrupt packet. Second, to make hardware components more manageable, PARD implements programmable control planes that can be integrated into various shared resources (e.g., cache, DRAM, and I/O devices) and can differentially process packets according to tag-based rules. Third, to facilitate programming, PARD abstracts all control planes as a device file tree to provide a uniform programming interface via which users create and apply tag-based rules.
Full-system simulation results show that by co-locating latencycritical memcached applications with other workloads PARD can improve a four-core computer's CPU utilization by up to a factor of four without significantly increasing tail latency. FPGA emulation based on a preliminary RTL implementation demonstrates that the cache control plane introduces no extra latency and that the memory control plane can reduce queueing delay for high-priority memory-access requests by up to a factor of 5.6.
- Gartner says efficient data center design can lead to 300 percent capacity growth in 60 percent less space. http://www.gartner.com/newsroom/id/1472714.Google Scholar
- Software-Defined Networking. https://www.opennetworking.org/sdn-resources/sdn-definition/.Google Scholar
- BusyBox. http://www.busybox.net/.Google Scholar
- Cgroups. http://en.wikipedia.org/wiki/Cgroups.Google Scholar
- Intel 64 and IA-32 Architectures Software Developer Manuals, volume 3: System Programming Guide.Google Scholar
- Intelligent Platform Management Interface (IPMI). http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface.Google Scholar
- Linux Container(LXC). http://lxc.sourceforge.net/.Google Scholar
- Memcached. http://memcached.org/.Google Scholar
- Intel 82599 10 gigabit ethernet controller: Datasheet. http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html.Google Scholar
- Openflow switch specification. https://www.opennetworking.org/sdn-resources/openflow/.Google Scholar
- Xilinx Virtex-7 FPGA VC709 Connectivity Kit. http://www.xilinx.com/products/boards-and-kits/EK-V7-VC709-CES-G.htm.Google Scholar
- Vivado Design Suite. http://www.xilinx.com/products/design-tools/vivado/.Google Scholar
- Computing Community Consortium (CCC). 21st century computer architecture. A community white paper, 2012. URL http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf.Google Scholar
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, New York, NY, USA, 2010. Google Scholar
Digital Library
- L. A. Barroso, J. Clidaras, and U. Holzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 8(3):1--154, 2013. Google Scholar
Digital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. Google Scholar
Digital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011. Google Scholar
Digital Library
- M. N. Bojnordi and E. Ipek. PARDIS: a programmable memory controller for the DDRx interfacing standards. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 13--24, Washington, DC, USA, 2012. Google Scholar
Digital Library
- J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2): 74--80, Feb. 2013. Google Scholar
Digital Library
- C. Delimitrou and C. Kozyrakis. ibench: Quantifying interference for datacenter applications. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 23--33, 2013.Google Scholar
Cross Ref
- C. Delimitrou and C. Kozyrakis. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the eighteenth inter- national conference on Architectural support for programming languages and operating systems, page 77--88, 2013. Google Scholar
Digital Library
- C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 127--144, New York, NY, USA, 2014. Google Scholar
Digital Library
- X. L. Dong, B. Saha, and D. Srivastava. Less is more: selecting sources wisely for integration. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB'13, Trento, Italy, 2013.Google Scholar
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '12, pages 37--48, New York, NY, USA, 2012. Google Scholar
Digital Library
- Google. Google Cluster workload traces. http://code.google.com/p/googleclusterdata.Google Scholar
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1--17, Sept. 2006. Google Scholar
Digital Library
- A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based QoS techniques for cache/memory in CMP platforms. In Proceedings of the 23rd international conference on Supercomputing, page 479--488, 2009. Google Scholar
Digital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 22--22, Berkeley, CA, USA, 2011. Google Scholar
Digital Library
- C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. SIGCOMM Comput. Commun. Rev., 42 (4), Aug. 2012. Google Scholar
Digital Library
- Intel. An Introduction to the Intel QuickPath Interconnect. Jan. 2009.Google Scholar
- R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th annual international conference on Supercomputing, page 257--266, 2004. Google Scholar
Digital Library
- R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for Cache/Memory in CMP platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '07, page 25--36, New York, NY, USA, 2007. Google Scholar
Digital Library
- V. Jeyakumar, M. Alizadeh, D. Mazires, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical network performance isolation at the edge. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, Berkeley, CA, USA, 2013. Google Scholar
Digital Library
- J. M. Kaplan, W. Forrest, and N. Kindler. Revolutionizing data center energy efficiency. Technical report, McKinsey & Company, 2008.Google Scholar
- R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: Predictable low latency for data center applications. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC '12, pages 9:1--9:14, New York, NY, USA, 2012. Google Scholar
Digital Library
- H. Kasture and D. Sanchez. Ubik: Efficient cache sharing with strict qos for latency-critical workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 729--742, New York, NY, USA, 2014. Google Scholar
Digital Library
- E. Keller, J. Szefer, J. Rexford, and R. B. Lee. NoHype: virtualized cloud infrastructure without the virtualization. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, page 350--361, New York, NY, USA, 2010. Google Scholar
Digital Library
- C. Kozyrakis. Resource efficient computing for warehouse-scale data-centers. In Design, Automation Test in Europe Conference Exhibition (DATE), pages 1351--1356, Mar. 2013. Google Scholar
Digital Library
- J. Leverich and C. Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the 2014 EuroSys Conference, Amsterdam, Nethelands, 2014. Google Scholar
Digital Library
- B. Li, L. Zhao, R. Iyer, L. S. Peh, M. Leddige, M. Espig, S. E. Lee, and D. Newell. CoQoS: coordinating QoS-aware shared resources in NoC-based SoCs. Journal of Parallel and Distributed Computing, 71 (5):700--713, 2011. Google Scholar
Digital Library
- B. Li, L. S. Peh, L. Zhao, and R. Iyer. Dynamic QoS management for chip multiprocessors. ACM Trans. Archit. Code Optim., 9(3): 17:1--17:29, Oct. 2012. Google Scholar
Digital Library
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In IEEE 14th International Symposium on High Performance Computer Architecture, 2008. HPCA 2008, pages 367--378, Feb. 2008.Google Scholar
- L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, page 367--376, 2012. Google Scholar
Digital Library
- L. Liu, Y. Li, Z. Cui, Y. Bao, M. Chen, and C. Wu. Going vertical in memory management: Handling multiplicity by multi-policy. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 169--180, June 2014. Google Scholar
Digital Library
- J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: Online contention detection and response. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '10, pages 257--265, New York, NY, USA, 2010. Google Scholar
Digital Library
- J. Mars, L. Tang, and R. Hundt. Heterogeneity in "homogeneous" warehouse-scale computers: A performance opportunity. Computer Architecture Letters, 10(2):29--32, 2011. Google Scholar
Digital Library
- J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross core interference through contention synthesis. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pages 167--176, New York, NY, USA, 2011. Google Scholar
Digital Library
- M. Mesnier, F. Chen, T. Luo, and J. B. Akers. Differentiated storage services. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, page 57--70, New York, NY, USA, 2011. Google Scholar
Digital Library
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, page 374--385, 2011. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 146--160, 2007. Google Scholar
Digital Library
- R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 385--398, Lombard, IL, 2013. Google Scholar
Digital Library
- D. Novakovic, N. Vasic, S. Novakovic, D. Kostic, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In the 2013 USENIX Annual Technical Conference (USENIX ATC 13), pages 219--230, San Jose, CA, 2013. Google Scholar
Digital Library
- Oracle. Oracle VM Server for SPARC (Logical Domains). http://www.oracle.com/technetwork/systems/logical-domains/index.html.Google Scholar
- Oracle. OpenSPARC T1 microprocessor. http://www.oracle.com/technetwork/systems/opensparc/index.html.Google Scholar
- Patrick Mochel. The sysfs filesystem. In Linux Symposium, 2005.Google Scholar
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low- overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, page 423--432, 2006. Google Scholar
Digital Library
- S. Radhakrishnan, Y. Geng, V. Jeyakumar, A. Kabbani, G. Porter, and A. Vahdat. Senic: Scalable nic for end-host rate limiting. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 475--488, Berkeley, CA, USA, 2014. Google Scholar
Digital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT '06, pages 2--12, New York, NY, USA, 2006. Google Scholar
Digital Library
- C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC '12, pages 7:1--7:13, New York, NY, USA, 2012. Google Scholar
Digital Library
- RFC2474. Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers. http://tools.ietf.org/html/rfc2474.Google Scholar
- RFC2475. An Architecture for Differentiated Services. http://tools.ietf.org/html/rfc2475.Google Scholar
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA '00, pages 128--138, New York, NY, USA, 2000. Google Scholar
Digital Library
- D. Sanchez and C. Kozyrakis. The ZCache: decoupling ways and associativity. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, page 187--198, Washington, DC, USA, 2010. Google Scholar
Digital Library
- D. Sanchez and C. Kozyrakis. Vantage: scalable and efficient fine-grain cache partitioning. In ACM SIGARCH Computer Architecture News, volume 39, page 57--68, 2011. Google Scholar
Digital Library
- M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 351--364, New York, NY, USA, 2013. Google Scholar
Digital Library
- A. Sharifi, S. Srikantaiah, A. K. Mishra, M. Kandemir, and C. R. Das. METE: meeting end-to-end QoS in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, page 13--24, 2011. Google Scholar
Digital Library
- A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the data center network. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011. Google Scholar
Digital Library
- D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2007.Google Scholar
- L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 283--294, New York, NY, USA, 2011. Google Scholar
Digital Library
- R. B. Tremaine, P. Franaszek, J. Robinson, C. Schulz, T. Smith, M. Wazlowski, and P. M. Bland. IBM memory expansion technology (MXT). IBM Journal of Research and Development, 45(2):271--285, Mar. 2001. Google Scholar
Digital Library
- B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter TCP (d2tcp). In Proceedings of the ACM SIGCOMM 2012 Conference, SIGCOMM '12, New York, NY, USA, 2012. Google Scholar
Digital Library
- G. Wang and T. S. E. Ng. The impact of virtualization on network performance of amazon EC2 data center. In Proceedings of the 29th Conference on Information Communications, INFOCOM'10, Piscat- away, NJ, USA, 2010. Google Scholar
Digital Library
- C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, New York, NY, USA, 2011. Google Scholar
Digital Library
- Y. Xu, M. Bailey, B. Noble, and F. Jahanian. Small is better: Avoiding latency traps in virtualized data centers. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 7:1--7:16, New York, NY, USA, 2013. Google Scholar
Digital Library
- Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding long tails in the cloud. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 329--342, Berkeley, CA, USA, 2013. Google Scholar
Digital Library
- H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture, page 607--618, 2013. Google Scholar
Digital Library
- M. Yu, A. Greenberg, D. Maltz, J. Rexford, L. Yuan, S. Kandula, and C. Kim. Profiling network performance for multi-tier data center applications. In Proceedings of the 8th USENIX Conference on Net- worked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011. Google Scholar
Digital Library
- D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz. DeTail: Reducing the flow completion time tail in datacenter networks. SIGCOMM Comput. Commun. Rev. , 42(4), Aug. 2012. Google Scholar
Digital Library
- Y. Zhang, M. Laurenzano, J. Mars, and L. Tang. Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2014. Google Scholar
Digital Library
Index Terms
Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD)
Recommendations
Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD)
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsThis paper presents PARD, a programmable architecture for resourcing-on-demand that provides a new programming interface to convey an application's high-level information like quality-of-service requirements to the hardware. PARD enables new ...
Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD)
ASPLOS'15This paper presents PARD, a programmable architecture for resourcing-on-demand that provides a new programming interface to convey an application's high-level information like quality-of-service requirements to the hardware. PARD enables new ...
Differentiated Services and Integrated Services Use of MPLS
ISCC '00: Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)All the new emerging QoS service architectures are motivated by the desire to improve the overall performance of IP networks. Integrated Services (Intserv), Differentiated Services (Diffserv), MultiProtocol Label Switching (MPLS) and constraint-based ...







Comments