ABSTRACT
Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3× in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al\mbox. 2016. Tensorflow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation.Google Scholar
- Mohamed S Abdelfattah, Abhinav Mehrotra, \Lukasz Dudziak, and Nicholas D Lane. 2021. Zero-Cost Proxies for Lightweight NAS. arXiv preprint arXiv:2101.08134 (2021).Google Scholar
- Lei Jimmy Ba and Rich Caruana. 2014. Do Deep Nets Really Need to be Deep?. In Advances in Neural Information Processing Systems.Google Scholar
- Paul Barham and Michael Isard. 2019. Machine Learning Systems are Stuck in a Rut. In Proceedings of the Workshop on Hot Topics in Operating Systems. ACM, 177?183.Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (Tucson, AZ, USA) (PLDI ?08). Association for Computing Machinery, New York, NY, USA, 101?113. isbn:9781595938602 https://doi.org/10.1145/1375581.1375595 Google Scholar
Digital Library
- L\'eon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177?186.Google Scholar
- Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Advances in Neural Information Processing Systems.Google Scholar
- Tianshi Chen, Yunji Chen, Marc Duranton, Qi Guo, Atif Hashmi, Mikko Lipasti, Andrew Nere, Shi Qiu, Michele Sebag, and Olivier Temam. 2012. BenchNN: On the broad potential application scope of hardware neural network accelerators. In International Symposium on Workload Characterization.Google Scholar
Digital Library
- Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
Digital Library
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al\mbox. 2018. $\$TVM$\$: An automated end-to-end optimizing compiler for deep learning. In USENIX Symposium on Operating Systems Design and Implementation.Google Scholar
- Fran\ccois Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, and Rocky Rhodes. 2019. Visual Wake Words Dataset. arXiv preprint arXiv:1906.05721 (2019).Google Scholar
- Albert Henri Cohen. 1999. Analyse et transformation de programmes: du mod\`ele poly\'edrique aux langages formels. Ph.D. Dissertation. Versailles-St Quentin en Yvelines.Google Scholar
- Elliot J Crowley, Jack Turner, Amos Storkey, and Michael O'Boyle. 2018. Pruning neural networks: is it time to nip it in the bud? arXiv preprint arXiv:1810.04622 (2018).Google Scholar
- Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, and Tristan J. Webb. 2018. Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning. arXiv preprint arXiv:1801.08058 (2018).Google Scholar
- Misha Denil, Babak Shakibi, Laurent Dinh, Ranzato Marc'Aurelio, and Nando de Freitas. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems.Google Scholar
- Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search. In International Conference on Learning Representations. https://openreview.net/forum?id=HJxyZkBKDrGoogle Scholar
- Venmugil Elango, Norm Rubin, Mahesh Ravishankar, Hariharan Sandanagobalane, and Vinod Grover. 2018. Diesel: DSL for Linear Algebra and Neural Net Computations on GPUs. In In International Workshop on Machine Learning and Programming Languages.Google Scholar
- Thomas Elsken, Jan Hendrik Metzen, Frank Hutter, et al\mbox. 2019. Neural Architecture Search.Google Scholar
- Michael Figurnov, Aijan Ibraimova, Dmitry Vetrov, and Pushmeet Kohli. 2015. Perforatedcnns: Acceleration through elimination of redundant convolutions. arXiv preprint arXiv:1504.08362 (2015).Google Scholar
- Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.Google Scholar
- Trevor Gale, Erich Elsen, and Sara Hooker. 2019. The State of Sparsity in Deep Neural Networks. arXiv preprint arXiv:1902.09574 (2019).Google Scholar
- Perry Gibson, Jos\'e Cano, Jack Turner, Elliot J Crowley, Michael O?Boyle, and Amos Storkey. 2020. Optimizing Grouped Convolutions on Edge Devices. In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 189?196.Google Scholar
- Sylvain Girbal, Nicolas Vasilache, C\'edric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies. International Journal of Parallel Programming 34, 3 (01 June 2006), 261?317. issn:1573-7640 https://doi.org/10.1007/s10766-006-0012-3 Google Scholar
Digital Library
- Inigo Goiri, Ricardo Bianchini, Santosh Nagarakatte, and Thu D Nguyen. 2015. Approxhadoop: Bringing approximations to mapreduce frameworks. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. 383?397.Google Scholar
Digital Library
- Aditya Sharad Golatkar, Alessandro Achille, and Stefano Soatto. 2019. Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence. In Advances in Neural Information Processing Systems. 10678?10688.Google Scholar
- Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. 2011. Polly-Polyhedral optimization in LLVM. In International Workshop on Polyhedral Compilation Techniques.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
Digital Library
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Gao Huang, Schichen Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. CondenseNet: An Efficient DenseNet using Learned Group Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700?4708.Google Scholar
Cross Ref
- Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al\mbox. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems.Google Scholar
- Yani Ioannou, Duncan Robertson, Roberto Cipolla, and Antonio Criminisi. 2017. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. In Symposium on Operating Systems Principles.Google Scholar
Digital Library
- Wayne Kelly and William Pugh. 1998. A framework for unifying reordering transformations. Technical Report.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
- Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Advances in Neural Information Processing Systems.Google Scholar
- Namhoon Lee, Thalaiyasingam Ajanthan, and Philip H. S. Torr. 2019. SNIP: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations.Google Scholar
- Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In International Conference on Learning Representations.Google Scholar
- Liam Li and Ameet Talwalkar. 2019. Random search and reproducibility for neural architecture search. arXiv preprint arXiv:1902.07638 (2019).Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In International Conference on Learning Representations.Google Scholar
- Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems.Google Scholar
- Joseph Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2020. Neural Architecture Search without Training. arXiv preprint arXiv:2006.04647 (2020).Google Scholar
- Sasa Misailovic, Daniel M Roy, and Martin C Rinard. 2011. Probabilistically accurate program transformations. In International Static Analysis Symposium. Springer, 316?333.Google Scholar
Cross Ref
- Naums Mogers, Valentin Radu, Lu Li, Jack Turner, Michael O'Boyle, and Christophe Dubach. 2020. Automatic generation of specialized direct convolutions for mobile GPUs. In Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit. 41?50.Google Scholar
Digital Library
- Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. 2019. Importance estimation for neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11264?11272.Google Scholar
Cross Ref
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations.Google Scholar
- Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. Polymage: Automatic optimization for image processing pipelines. In ACM SIGPLAN Notices, Vol. 50. ACM, 429?443.Google Scholar
Digital Library
- Jongse Park, Hadi Esmaeilzadeh, Xin Zhang, Mayur Naik, and William Harris. 2015. Flexjava: Language support for safe and modular approximate programming. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 745?757.Google Scholar
Digital Library
- Junran Peng, Lingxi Xie, Zhaoxiang Zhang, Tieniu Tan, and Jingdong Wang. 2018. Accelerating Deep Neural Networks with Spatial Bottleneck Modules. arXiv preprint arXiv:1809.02601 (2018).Google Scholar
- Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning.Google Scholar
- Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming Heterogeneous Systems from an Image Processing DSL. ACM Trans. Archit. Code Optim. 14, 3, Article 26 (Aug. 2017), 25 pages. issn:1544-3566 https://doi.org/10.1145/3107953 Google Scholar
Digital Library
- Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jos\'e Cano, Elliot J Crowley, Björn Franke, Amos Storkey, and Michael O'Boyle. 2019. Performance aware convolutional neural network channel pruning for embedded GPUs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 24?34.Google Scholar
Cross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fr\'edo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519?530.Google Scholar
Digital Library
- Martin Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th annual international conference on Supercomputing. 324?334.Google Scholar
Digital Library
- Adrian Sampson, Andr\'e Baixo, Benjamin Ransford, Thierry Moreau, Joshua Yip, Luis Ceze, and Mark Oskin. 2015. Accept: A programmer-guided compiler framework for practical approximate computing. University of Washington Technical Report UW-CSE-15-01 1, 2 (2015).Google Scholar
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85?117.Google Scholar
- Christian Sciuto, Kaicheng Yu, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. 2019. Evaluating the Search Phase of Neural Architecture Search. arXiv preprint arXiv:1902.08142 (2019).Google Scholar
- Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 124?134.Google Scholar
Digital Library
- David R So, Chen Liang, and Quoc V Le. 2019. The evolved transformer. arXiv preprint arXiv:1901.11117 (2019).Google Scholar
- Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GPU code generation. In International Symposium on Code Generation and Optimization.Google Scholar
- Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning.Google Scholar
- Lucas Theis, Iryna Korshunova, Alykhan Tejani, and Ferenc Husz\'ar. 2018. Faster gaze prediction with dense networks and Fisher pruning.Google Scholar
- Philippe Tillet, HT Kung, and David Cox. 2019. Triton: an intermediate language and compiler for tiled neural network computations. In International Workshop on Machine Learning and Programming Languages.Google Scholar
Digital Library
- Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey, and Gavin Gray. 2020. BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget. In International Conference on Learning Representations. https://openreview.net/forum?id=SklkDkSFPBGoogle Scholar
- Jack Turner, Elliot J Crowley, Valentin Radu, Jos\'e Cano, Amos Storkey, and Michael O'Boyle. 2018. Distilling with performance enhanced students. arXiv preprint arXiv:1810.10460 (2018).Google Scholar
- Nicolas Vasilache, C\'edric Bastoul, and Albert Cohen. 2006. Polyhedral code generation in the real world. In International Conference on Compiler Construction. \balancecolumnsGoogle Scholar
Digital Library
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arXiv preprint arXiv:1802.04730 (2018).Google Scholar
- Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In International Congress on Mathematical Software.Google Scholar
Cross Ref
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, Jose Ignacio Gomez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 54.Google Scholar
Digital Library
- Sven Verdoolaege, Serge Guelton, Tobias Grosser, and Albert Cohen. 2014. Schedule trees. In International Workshop on Polyhedral Compilation Techniques.Google Scholar
- Martin Wistuba, Ambrish Rawat, and Tejaswini Pedapati. 2019. A survey on neural architecture search. arXiv preprint arXiv:1905.01392 (2019).Google Scholar
- Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Saining Xie, Ross Girshick, Piotr Doll\'ar, Zhuowen Tu, and Kaiming He. 2017. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
Digital Library
- Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, and Mark Horowitz. 2016. A Systematic Approach to Blocking Convolutional Neural Networks. arXiv preprint arXiv:1606.04209 (2016).Google Scholar
- Chris Ying, Aaron Klein, Esteban Real, Eric Christiansen, Kevin Murphy, and Frank Hutter. 2019. Nas-bench-101: Towards reproducible neural architecture search. arXiv preprint arXiv:1902.09635 (2019).Google Scholar
- Arber Zela, Julien Siems, and Frank Hutter. 2020. NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search. arXiv preprint arXiv:2001.10422 (2020).Google Scholar
- Tim Zerrell and Jeremy Bruestle. 2019. Stripe: Tensor Compilation via the Nested Polyhedral Model. arXiv preprint arXiv:1903.06498 (2019).Google Scholar
- Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, et al\mbox. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. arXiv preprint arXiv:2006.06762 (2020).Google Scholar
- Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A Kelner, and Martin Rinard. 2012. Randomized accuracy-aware program transformations for efficient approximate computations. ACM SIGPLAN Notices 47, 1 (2012), 441?454.Google Scholar
- Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In International Conference on Learning Representations.Google Scholar
Index Terms
Neural architecture search as program transformation exploration
Recommendations
Architecture Search for Deep Neural Network
Smart Computing and CommunicationAbstractDeep learning has become a popularly used tool in large amount of applications. Given its ability to explore the input and output relationship, deep learning can perform well in terms of prediction. However, one important drawback in this ...
Steganalysis of convolutional neural network based on neural architecture search
AbstractRecent studies show that the performance of deep convolutional neural network (CNN) applied to steganalysis is better than that of traditional methods. However, the existing network structure is still caused by artificial design, which may not be ...
New Results for Prediction of Chaotic Systems Using Deep Recurrent Neural Networks
AbstractPrediction of nonlinear and dynamic systems is a challenging task, however with the aid of machine learning techniques, particularly neural networks, is now possible to accomplish this objective. Most common neural networks used are the multilayer ...






Comments