skip to main content
research-article

AI Tax: The Hidden Cost of AI Data Center Applications

Published:26 March 2021Publication History
Skip Abstract Section

Abstract

Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often developed and evaluated in a vacuum without considering the full application environment. This article emphasizes the need for a holistic, end-to-end analysis of artificial intelligence (AI) workloads and reveals the “AI tax.” We deploy and characterize Face Recognition in an edge data center. The application is an AI-centric edge video analytics application built using popular open source infrastructure and machine learning (ML) tools. Despite using state-of-the-art AI and ML algorithms, the application relies heavily on pre- and post-processing code. As AI-centric applications benefit from the acceleration promised by accelerators, we find they impose stresses on the hardware and software infrastructure: storage and network bandwidth become major bottlenecks with increasing AI acceleration. By specializing for AI applications, we show that a purpose-built edge data center can be designed for the stresses of accelerated AI at 15% lower TCO than one derived from homogeneous servers and infrastructure.

References

  1. [n.d.]. AI-Benchmark. Retrieved July 25, 2019 from http://ai-benchmark.com/index.html.Google ScholarGoogle Scholar
  2. [n.d.]. Amazon.com: : Amazon Go. Retrieved July 25, 2019 from https://www.amazon.com/b?node=16008589011.Google ScholarGoogle Scholar
  3. [n.d.]. Apache Apex. Retrieved July 29, 2019 from http://apex.apache.org/.Google ScholarGoogle Scholar
  4. [n.d.]. Apache Flink: Stateful Computations Over Data Streams. Retrieved July 29, 2019 from https://flink.apache.org/.Google ScholarGoogle Scholar
  5. [n.d.]. Apache Kafka. Retrieved June 10, 2019 from https://kafka.apache.org/.Google ScholarGoogle Scholar
  6. [n.d.]. Apache Kafka. Retrieved June 12, 2019 from https://kafka.apache.org/documentation/streams/.Google ScholarGoogle Scholar
  7. [n.d.]. Apache Storm. http://storm.apache.org/.Google ScholarGoogle Scholar
  8. [n.d.]. Caffe | Deep Learning Framework. Retrieved July 21, 2019 from https://caffe.berkeleyvision.org/.Google ScholarGoogle Scholar
  9. [n.d.]. Coral. Retrieved July 24, 2019 from https://coral.withgoogle.com/.Google ScholarGoogle Scholar
  10. [n.d.]. Data Center Cooling Costs | Dataspan. Retrieved August 6, 2019 from https://www.dataspan.com/blog/data-center-cooling-costs/.Google ScholarGoogle Scholar
  11. [n.d.]. Deep Learning and Artificial Intelligence Solutions | NVIDIA. Retrieved July 16, 2019 from https://www.nvidia.com/en-us/deep-learning-ai/solutions/.Google ScholarGoogle Scholar
  12. [n.d.]. Docker - Build, Ship, and Run Any App, Anywhere. Retrieved June 12, 2019 from https://www.docker.com/.Google ScholarGoogle Scholar
  13. [n.d.]. Druid | Interactive Analytics at Scale. Retrieved July 29, 2019 from https://druid.apache.org/.Google ScholarGoogle Scholar
  14. [n.d.]. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. Retrieved July 11, 2020 from https://github.com/baidu-research/DeepBench.Google ScholarGoogle Scholar
  15. [n.d.]. GitHub - basicmi/AI-Chip: A list of ICs and IPs for AI, Machine Learning and Deep Learning. Retrieved August 5, 2019 from https://github.com/basicmi/AI-Chip.Google ScholarGoogle Scholar
  16. [n.d.]. GitHub - davidsandberg/facenet: Face Recognition using Tensorflow. Retrieved January 9, 2019 from https://github.com/davidsandberg/facenet.Google ScholarGoogle Scholar
  17. [n.d.]. Google Cloud Platform Blog: Google Supercharges Machine Learning Tasks with TPU Custom Chip. Retrieved July 24, 2019 from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.Google ScholarGoogle Scholar
  18. [n.d.]. Habana Homepage - Habana. Retrieved June 17, 2019 from https://habana.ai/.Google ScholarGoogle Scholar
  19. [n.d.]. home. Retrieved July 11, 2020 from https://aimatrix.ai/en-us/index.html.Google ScholarGoogle Scholar
  20. [n.d.]. HPE Power Advisor. Retrieved July 30, 2019 from https://paonline56.itcs.hpe.com/?Page=Index#.Google ScholarGoogle Scholar
  21. [n.d.]. Inference - Habana. Retrieved June 17, 2019 from https://habana.ai/inference/.Google ScholarGoogle Scholar
  22. [n.d.]. Intel Unveils the Intel Neural Compute Stick 2 at Intel AI Devcon Beijing for Building Smarter AI Edge Devices. Retrieved June 30, 2019 from https://newsroom.intel.com/news/intel-unveils-intel-neural-compute-stick-2/.Google ScholarGoogle Scholar
  23. [n.d.]. Intel® Optane™ Technology. Retrieved July 31, 2019 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.Google ScholarGoogle Scholar
  24. [n.d.]. Intel® SSD DC P4510 Series (1.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/122573/intel-ssd-dc-p4510-series-1-0tb-2-5in-pcie-3-1-x4-3d2-tlc.html.Google ScholarGoogle Scholar
  25. [n.d.]. Intel® Xeon® Platinum 8176 Processor (38.5M Cache, 2.10GHz) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/120508/intel-xeon-platinum-8176-processor-38-5m-cache-2-10-ghz.html.Google ScholarGoogle Scholar
  26. [n.d.]. Logstash: Collect, Parse, Transform Logs | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/products/logstash.Google ScholarGoogle Scholar
  27. [n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/.Google ScholarGoogle Scholar
  28. [n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/inference-overview/.Google ScholarGoogle Scholar
  29. [n.d.]. NVIDIA Deep Learning Accelerator. Retrieved July 19, 2019 from http://nvdla.org/.Google ScholarGoogle Scholar
  30. [n.d.]. On-Premise Data Centers: Coming Back or Heading Out? Retrieved July 30, 2019 from https://emconit.com/blog/on-premise-data-centers-coming-back-or-heading-out.Google ScholarGoogle Scholar
  31. [n.d.]. Open Source Search & Analytics · Elasticsearch | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/.Google ScholarGoogle Scholar
  32. [n.d.]. Production-Grade Container Orchestration - Kubernetes. Retrieved June 12, 2019 from https://kubernetes.io/.Google ScholarGoogle Scholar
  33. [n.d.]. Project Brainwave - Microsoft Research. Retrieved July 12, 2019 from https://www.microsoft.com/en-us/research/project/project-brainwave/.Google ScholarGoogle Scholar
  34. [n.d.]. The Rise of Edge Data Centres - Data Economy. Retrieved July 30, 2019 from https://data-economy.com/the-rise-of-edge-data-centres/.Google ScholarGoogle Scholar
  35. [n.d.]. Samza. Retrieved July 29, 2019 from http://samza.apache.org/.Google ScholarGoogle Scholar
  36. [n.d.]. Slash Data-Ccenter Costs and Downtime by Using Coolan’s TCO Model - TechRepublic. Retrieved July 30, 2019 from https://www.techrepublic.com/article/slash-data-center-costs-and-downtime-by-using-coolans-tco-model/.Google ScholarGoogle Scholar
  37. [n.d.]. Specifications - SN2000 Series - Mellanox Docs. Retrieved July 30, 2019 from https://docs.mellanox.com/display/sn2000pub/Specifications.Google ScholarGoogle Scholar
  38. [n.d.]. Stanford DAWN Deep Learning Benchmark (DAWNBench). Retrieved July 9, 2020 from https://dawn.cs.stanford.edu/benchmark/index.html.Google ScholarGoogle Scholar
  39. [n.d.]. TensorFlow. Retrieved June 12, 2019 from https://www.tensorflow.org/.Google ScholarGoogle Scholar
  40. [n.d.]. Video Analytics Market to Reach USD 25.4 Billion by 2026. Retrieved August 6, 2019 from https://www.marketwatch.com/press-release/video-analytics-market-to-reach-usd-254-billion-by-2026cisco-systems-inc-axis-communications-genetec-inc-2019-09-09.Google ScholarGoogle Scholar
  41. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.Google ScholarGoogle Scholar
  42. Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. Vol. 21. ACM.Google ScholarGoogle Scholar
  43. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. 2014. The mystery machine: End-to-end performance analysis of large-scale Internet services. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 217--231.Google ScholarGoogle Scholar
  45. Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarGoogle ScholarCross RefCross Ref
  46. Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.Google ScholarGoogle Scholar
  47. DataTorrent. [n.d.]. End-to-end “Exactly-Once” With Apache Apex. Retrieved July 30, 2019 from https://cdn.rawgit.com/dtpublic/website/b0c73294/blogs/End-to-end%20_Exactly-Once_%20_with%20Apache%20Apex%20-%20DataTorrent.htm.Google ScholarGoogle Scholar
  48. Hewlett Packard Enterprise. 2018. HPE On-Prem vs. Amazon Web Services (AWS). Technical Report. Hewlett Packard Enterprise Company.Google ScholarGoogle Scholar
  49. Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, Yunyou Huang, Chen Zheng, Jiahui Dai, Zheng Cao, et al. 2019. AIBench: An industry standard internet service AI benchmark suite. arXiv preprint arXiv:1908.08998.Google ScholarGoogle Scholar
  50. Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2019. The architectural implications of Facebook’s DNN-based personalized recommendation. arXiv preprint arXiv:1906.03109.Google ScholarGoogle Scholar
  51. Kaylie Gyarmathy. [n.d.]. How to Reduce Latency Using Edge Computing. Retrieved July 30, 2019 from https://www.vxchnge.com/blog/how-data-center-reduces-latency.Google ScholarGoogle Scholar
  52. Michelle Hannula. [n.d.]. How Hybrid Cloud Simplifies Data Sovereignty Challenges | CIO. Retrieved July 30, 2019 from https://www.cio.com/article/3396631/how-hybrid-cloud-simplifies-data-sovereignty-challenges.html.Google ScholarGoogle Scholar
  53. Md E. Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 625--638.Google ScholarGoogle Scholar
  54. Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620--629.Google ScholarGoogle ScholarCross RefCross Ref
  55. Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0--0.Google ScholarGoogle Scholar
  56. Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. arXiv preprint arXiv:1910.06663.Google ScholarGoogle Scholar
  57. Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2019. MLPerf inference benchmark. Retrieved July 11, 2020 from https://edge.seas.harvard.edu/files/edge/files/mlperf_inference.pdf.Google ScholarGoogle Scholar
  58. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 158--169.Google ScholarGoogle Scholar
  59. Martin Kleppmann. [n.d.]. Apache Kafka, Samza, and the Unix Philosophy of Distributed Data. Retrieved July 30, 2019 from https://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/.Google ScholarGoogle Scholar
  60. Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892--901.Google ScholarGoogle ScholarCross RefCross Ref
  61. Jialin Li, Naveen Kr Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--14.Google ScholarGoogle Scholar
  62. Almudena Lindoso and Luis Entrena. 2009. Hardware architectures for image processing acceleration. In Image Processing. IntechOpen.Google ScholarGoogle Scholar
  63. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 14--26.Google ScholarGoogle ScholarCross RefCross Ref
  64. Jason Mars, Lingjia Tang, and Robert Hundt. 2011. Heterogeneity in “Homogeneous” warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters 10, 2 (2011), 29--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Robert Metzger. [n.d.]. Kafka + Flink: A Practical, How-To Guide. Retrieved July 30, 2019 from https://www.ververica.com/blog/kafka-flink-a-practical-how-to.Google ScholarGoogle Scholar
  66. Rajiv Onat. [n.d.]. Apache Storm and Kafka Together: A Real-time Data Refinery. Retrieved July 30, 2019 from https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/.Google ScholarGoogle Scholar
  67. Keshav Pingali. 2019. A Case for Case Studies. https://www.sigarch.org/a-case-for-case-studies/.Google ScholarGoogle Scholar
  68. Carlo Regazzoni, Andrea Cavallaro, Ying Wu, Janusz Konrad, and Arun Hampapur. 2010. Video analytics for surveillance: Theory and practice [from the guest editors]. IEEE Signal Processing Magazine 27, 5 (2010), 16--17.Google ScholarGoogle ScholarCross RefCross Ref
  69. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.Google ScholarGoogle Scholar
  70. Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, et al. 2020. Missing the forest for the trees: End-to-end AI application performance in edge data centers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 515--528.Google ScholarGoogle ScholarCross RefCross Ref
  71. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.Google ScholarGoogle ScholarCross RefCross Ref
  72. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Peter Torelli and Mohit Bangale. [n.d.]. Measuring Inference Performance of Machine-Learning Frameworks on Edge-class Devices with the MLMark™ Benchmark. Report. EEMBC.Google ScholarGoogle Scholar
  74. Bob Wheeler. 2018. Data Centers Accelerate AI Processing. Technical Report. The Linley Group.Google ScholarGoogle Scholar
  75. Alex Woodie. [n.d.]. Understanding Your Options for Stream Processing Frameworks. Retrieved July 30, 2019 from https://www.datanami.com/2019/05/30/understanding-your-options-for-stream-processing-frameworks/.Google ScholarGoogle Scholar
  76. Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical Report. Xilinx, Inc.Google ScholarGoogle Scholar
  77. Fangjin Yang. [n.d.]. Building a Streaming Analytics Stack with Apache Kafka and Druid. Retrieved July 30, 2019 from https://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid/.Google ScholarGoogle Scholar
  78. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref
  79. Wei Zhang, Wei Wei, Lingjie Xu, Lingling Jin, and Cheng Li. 2019. AI matrix: A deep learning benchmark for Alibaba data centers. arXiv preprint arXiv:1909.10562.Google ScholarGoogle Scholar

Index Terms

  1. AI Tax: The Hidden Cost of AI Data Center Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 37, Issue 1-4
          November 2019
          177 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/3446674
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 March 2021
          • Accepted: 1 November 2020
          • Received: 1 July 2020
          Published in tocs Volume 37, Issue 1-4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)223
          • Downloads (Last 6 weeks)13

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!