Abstract
Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often developed and evaluated in a vacuum without considering the full application environment. This article emphasizes the need for a holistic, end-to-end analysis of artificial intelligence (AI) workloads and reveals the “AI tax.” We deploy and characterize Face Recognition in an edge data center. The application is an AI-centric edge video analytics application built using popular open source infrastructure and machine learning (ML) tools. Despite using state-of-the-art AI and ML algorithms, the application relies heavily on pre- and post-processing code. As AI-centric applications benefit from the acceleration promised by accelerators, we find they impose stresses on the hardware and software infrastructure: storage and network bandwidth become major bottlenecks with increasing AI acceleration. By specializing for AI applications, we show that a purpose-built edge data center can be designed for the stresses of accelerated AI at 15% lower TCO than one derived from homogeneous servers and infrastructure.
- [n.d.]. AI-Benchmark. Retrieved July 25, 2019 from http://ai-benchmark.com/index.html.Google Scholar
- [n.d.]. Amazon.com: : Amazon Go. Retrieved July 25, 2019 from https://www.amazon.com/b?node=16008589011.Google Scholar
- [n.d.]. Apache Apex. Retrieved July 29, 2019 from http://apex.apache.org/.Google Scholar
- [n.d.]. Apache Flink: Stateful Computations Over Data Streams. Retrieved July 29, 2019 from https://flink.apache.org/.Google Scholar
- [n.d.]. Apache Kafka. Retrieved June 10, 2019 from https://kafka.apache.org/.Google Scholar
- [n.d.]. Apache Kafka. Retrieved June 12, 2019 from https://kafka.apache.org/documentation/streams/.Google Scholar
- [n.d.]. Apache Storm. http://storm.apache.org/.Google Scholar
- [n.d.]. Caffe | Deep Learning Framework. Retrieved July 21, 2019 from https://caffe.berkeleyvision.org/.Google Scholar
- [n.d.]. Coral. Retrieved July 24, 2019 from https://coral.withgoogle.com/.Google Scholar
- [n.d.]. Data Center Cooling Costs | Dataspan. Retrieved August 6, 2019 from https://www.dataspan.com/blog/data-center-cooling-costs/.Google Scholar
- [n.d.]. Deep Learning and Artificial Intelligence Solutions | NVIDIA. Retrieved July 16, 2019 from https://www.nvidia.com/en-us/deep-learning-ai/solutions/.Google Scholar
- [n.d.]. Docker - Build, Ship, and Run Any App, Anywhere. Retrieved June 12, 2019 from https://www.docker.com/.Google Scholar
- [n.d.]. Druid | Interactive Analytics at Scale. Retrieved July 29, 2019 from https://druid.apache.org/.Google Scholar
- [n.d.]. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. Retrieved July 11, 2020 from https://github.com/baidu-research/DeepBench.Google Scholar
- [n.d.]. GitHub - basicmi/AI-Chip: A list of ICs and IPs for AI, Machine Learning and Deep Learning. Retrieved August 5, 2019 from https://github.com/basicmi/AI-Chip.Google Scholar
- [n.d.]. GitHub - davidsandberg/facenet: Face Recognition using Tensorflow. Retrieved January 9, 2019 from https://github.com/davidsandberg/facenet.Google Scholar
- [n.d.]. Google Cloud Platform Blog: Google Supercharges Machine Learning Tasks with TPU Custom Chip. Retrieved July 24, 2019 from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.Google Scholar
- [n.d.]. Habana Homepage - Habana. Retrieved June 17, 2019 from https://habana.ai/.Google Scholar
- [n.d.]. home. Retrieved July 11, 2020 from https://aimatrix.ai/en-us/index.html.Google Scholar
- [n.d.]. HPE Power Advisor. Retrieved July 30, 2019 from https://paonline56.itcs.hpe.com/?Page=Index#.Google Scholar
- [n.d.]. Inference - Habana. Retrieved June 17, 2019 from https://habana.ai/inference/.Google Scholar
- [n.d.]. Intel Unveils the Intel Neural Compute Stick 2 at Intel AI Devcon Beijing for Building Smarter AI Edge Devices. Retrieved June 30, 2019 from https://newsroom.intel.com/news/intel-unveils-intel-neural-compute-stick-2/.Google Scholar
- [n.d.]. Intel® Optane™ Technology. Retrieved July 31, 2019 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.Google Scholar
- [n.d.]. Intel® SSD DC P4510 Series (1.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/122573/intel-ssd-dc-p4510-series-1-0tb-2-5in-pcie-3-1-x4-3d2-tlc.html.Google Scholar
- [n.d.]. Intel® Xeon® Platinum 8176 Processor (38.5M Cache, 2.10GHz) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/120508/intel-xeon-platinum-8176-processor-38-5m-cache-2-10-ghz.html.Google Scholar
- [n.d.]. Logstash: Collect, Parse, Transform Logs | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/products/logstash.Google Scholar
- [n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/.Google Scholar
- [n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/inference-overview/.Google Scholar
- [n.d.]. NVIDIA Deep Learning Accelerator. Retrieved July 19, 2019 from http://nvdla.org/.Google Scholar
- [n.d.]. On-Premise Data Centers: Coming Back or Heading Out? Retrieved July 30, 2019 from https://emconit.com/blog/on-premise-data-centers-coming-back-or-heading-out.Google Scholar
- [n.d.]. Open Source Search & Analytics · Elasticsearch | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/.Google Scholar
- [n.d.]. Production-Grade Container Orchestration - Kubernetes. Retrieved June 12, 2019 from https://kubernetes.io/.Google Scholar
- [n.d.]. Project Brainwave - Microsoft Research. Retrieved July 12, 2019 from https://www.microsoft.com/en-us/research/project/project-brainwave/.Google Scholar
- [n.d.]. The Rise of Edge Data Centres - Data Economy. Retrieved July 30, 2019 from https://data-economy.com/the-rise-of-edge-data-centres/.Google Scholar
- [n.d.]. Samza. Retrieved July 29, 2019 from http://samza.apache.org/.Google Scholar
- [n.d.]. Slash Data-Ccenter Costs and Downtime by Using Coolan’s TCO Model - TechRepublic. Retrieved July 30, 2019 from https://www.techrepublic.com/article/slash-data-center-costs-and-downtime-by-using-coolans-tco-model/.Google Scholar
- [n.d.]. Specifications - SN2000 Series - Mellanox Docs. Retrieved July 30, 2019 from https://docs.mellanox.com/display/sn2000pub/Specifications.Google Scholar
- [n.d.]. Stanford DAWN Deep Learning Benchmark (DAWNBench). Retrieved July 9, 2020 from https://dawn.cs.stanford.edu/benchmark/index.html.Google Scholar
- [n.d.]. TensorFlow. Retrieved June 12, 2019 from https://www.tensorflow.org/.Google Scholar
- [n.d.]. Video Analytics Market to Reach USD 25.4 Billion by 2026. Retrieved August 6, 2019 from https://www.marketwatch.com/press-release/video-analytics-market-to-reach-usd-254-billion-by-2026cisco-systems-inc-axis-communications-genetec-inc-2019-09-09.Google Scholar
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.Google Scholar
- Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. Vol. 21. ACM.Google Scholar
- Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.Google Scholar
Digital Library
- Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. 2014. The mystery machine: End-to-end performance analysis of large-scale Internet services. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 217--231.Google Scholar
- Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8--20.Google Scholar
Cross Ref
- Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.Google Scholar
- DataTorrent. [n.d.]. End-to-end “Exactly-Once” With Apache Apex. Retrieved July 30, 2019 from https://cdn.rawgit.com/dtpublic/website/b0c73294/blogs/End-to-end%20_Exactly-Once_%20_with%20Apache%20Apex%20-%20DataTorrent.htm.Google Scholar
- Hewlett Packard Enterprise. 2018. HPE On-Prem vs. Amazon Web Services (AWS). Technical Report. Hewlett Packard Enterprise Company.Google Scholar
- Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, Yunyou Huang, Chen Zheng, Jiahui Dai, Zheng Cao, et al. 2019. AIBench: An industry standard internet service AI benchmark suite. arXiv preprint arXiv:1908.08998.Google Scholar
- Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2019. The architectural implications of Facebook’s DNN-based personalized recommendation. arXiv preprint arXiv:1906.03109.Google Scholar
- Kaylie Gyarmathy. [n.d.]. How to Reduce Latency Using Edge Computing. Retrieved July 30, 2019 from https://www.vxchnge.com/blog/how-data-center-reduces-latency.Google Scholar
- Michelle Hannula. [n.d.]. How Hybrid Cloud Simplifies Data Sovereignty Challenges | CIO. Retrieved July 30, 2019 from https://www.cio.com/article/3396631/how-hybrid-cloud-simplifies-data-sovereignty-challenges.html.Google Scholar
- Md E. Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 625--638.Google Scholar
- Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620--629.Google Scholar
Cross Ref
- Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0--0.Google Scholar
- Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. arXiv preprint arXiv:1910.06663.Google Scholar
- Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2019. MLPerf inference benchmark. Retrieved July 11, 2020 from https://edge.seas.harvard.edu/files/edge/files/mlperf_inference.pdf.Google Scholar
- Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 158--169.Google Scholar
- Martin Kleppmann. [n.d.]. Apache Kafka, Samza, and the Unix Philosophy of Distributed Data. Retrieved July 30, 2019 from https://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/.Google Scholar
- Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892--901.Google Scholar
Cross Ref
- Jialin Li, Naveen Kr Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--14.Google Scholar
- Almudena Lindoso and Luis Entrena. 2009. Hardware architectures for image processing acceleration. In Image Processing. IntechOpen.Google Scholar
- Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 14--26.Google Scholar
Cross Ref
- Jason Mars, Lingjia Tang, and Robert Hundt. 2011. Heterogeneity in “Homogeneous” warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters 10, 2 (2011), 29--32.Google Scholar
Digital Library
- Robert Metzger. [n.d.]. Kafka + Flink: A Practical, How-To Guide. Retrieved July 30, 2019 from https://www.ververica.com/blog/kafka-flink-a-practical-how-to.Google Scholar
- Rajiv Onat. [n.d.]. Apache Storm and Kafka Together: A Real-time Data Refinery. Retrieved July 30, 2019 from https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/.Google Scholar
- Keshav Pingali. 2019. A Case for Case Studies. https://www.sigarch.org/a-case-for-case-studies/.Google Scholar
- Carlo Regazzoni, Andrea Cavallaro, Ying Wu, Janusz Konrad, and Arun Hampapur. 2010. Video analytics for surveillance: Theory and practice [from the guest editors]. IEEE Signal Processing Magazine 27, 5 (2010), 16--17.Google Scholar
Cross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.Google Scholar
- Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, et al. 2020. Missing the forest for the trees: End-to-end AI application performance in edge data centers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 515--528.Google Scholar
Cross Ref
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.Google Scholar
Cross Ref
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.Google Scholar
Digital Library
- Peter Torelli and Mohit Bangale. [n.d.]. Measuring Inference Performance of Machine-Learning Frameworks on Edge-class Devices with the MLMark™ Benchmark. Report. EEMBC.Google Scholar
- Bob Wheeler. 2018. Data Centers Accelerate AI Processing. Technical Report. The Linley Group.Google Scholar
- Alex Woodie. [n.d.]. Understanding Your Options for Stream Processing Frameworks. Retrieved July 30, 2019 from https://www.datanami.com/2019/05/30/understanding-your-options-for-stream-processing-frameworks/.Google Scholar
- Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical Report. Xilinx, Inc.Google Scholar
- Fangjin Yang. [n.d.]. Building a Streaming Analytics Stack with Apache Kafka and Druid. Retrieved July 30, 2019 from https://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid/.Google Scholar
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google Scholar
Cross Ref
- Wei Zhang, Wei Wei, Lingjie Xu, Lingling Jin, and Cheng Li. 2019. AI matrix: A deep learning benchmark for Alibaba data centers. arXiv preprint arXiv:1909.10562.Google Scholar
Index Terms
AI Tax: The Hidden Cost of AI Data Center Applications
Recommendations
Requirements for Tax XAI Under Constitutional Principles and Human Rights
Explainable and Transparent AI and Multi-Agent SystemsAbstractTax authorities worldwide make extensive use of artificial intelligence (AI) technologies to automate various aspects of their tasks, such as answering taxpayer questions, assessing fraud risk, risk profiling, and auditing (selecting tax ...
Contemporary cybernetics and its facets of cognitive informatics and computational intelligence
Special issue on cybernetics and cognitive informaticsThis paper explores the architecture, theoretical foundations, and paradigms of contemporary cybernetics from perspectives of cognitive informatics (CI) and computational intelligence. The modern domain and the hierarchical behavioral model of ...






Comments