skip to main content
research-article

Learning Reliable Neural Networks with Distributed Architecture Representations

Authors Info & Claims
Published:25 March 2023Publication History
Skip Abstract Section

Abstract

Neural architecture search (NAS) has shown the strong performance of learning neural models automatically in recent years. But most NAS systems are unreliable due to the architecture gap brought by discrete representations of atomic architectures. In this article, we improve the performance and robustness of NAS via narrowing the gap between architecture representations. More specifically, we apply a general contraction mapping to model neural networks with distributed representations (Neural Architecture Search with Distributed Architecture Representations (ArchDAR)). Moreover, for a better search result, we present a joint learning approach to integrating distributed representations with advanced architecture search methods. We implement our ArchDAR in a differentiable architecture search model and test learned architectures on the language modeling task. On the Penn Treebank data, it outperforms a strong baseline significantly by 1.8 perplexity scores. Also, the search process with distributed representations is more stable, which yields a faster structural convergence when it works with the differentiable architecture search model.

REFERENCES

  1. [1] Danilo P. Mandic and Jonathon A. Chambers. 2001. Recurrent neural networks for prediction. Stability Issues in RNN Architectures, John Wiley & Sons Ltd., 115--133. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Angeline Peter J., Saunders Gregory M., and Pollack Jordan B.. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. Neural Netw. 5, 1 (1994), 5465. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bapna Ankur, Chen Mia Xu, Firat Orhan, Cao Yuan, and Wu Yonghui. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Riloff Ellen, Chiang David, Hockenmaier Julia, and Tsujii Jun’ichi (Eds.). Association for Computational Linguistics, 30283033. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bender Gabriel, Kindermans Pieter-Jan, Zoph Barret, Vasudevan Vijay, and Le Quoc V.. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Dy Jennifer G. and Krause Andreas (Eds.), Vol. 80. PMLR, 549558.Google ScholarGoogle Scholar
  5. [5] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, Agarwal Sandhini, Herbert-Voss Ariel, Krueger Gretchen, Henighan Tom, Child Rewon, Ramesh Aditya, Ziegler Daniel M., Wu Jeffrey, Winter Clemens, Hesse Christopher, Chen Mark, Sigler Eric, Litwin Mateusz, Gray Scott, Chess Benjamin, Clark Jack, Berner Christopher, McCandlish Sam, Radford Alec, Sutskever Ilya, and Amodei Dario. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20), Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.).Google ScholarGoogle Scholar
  6. [6] Cai Han, Chen Tianyao, Zhang Weinan, Yu Yong, and Wang Jun. 2018. Efficient architecture search by network transformation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), McIlraith Sheila A. and Weinberger Kilian Q. (Eds.). AAAI Press, 27872794.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Cai Han, Zhu Ligeng, and Han Song. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google ScholarGoogle Scholar
  8. [8] Chen Xin, Xie Lingxi, Wu Jun, and Tian Qi. 2021. Progressive DARTS: Bridging the optimization gap for NAS in the wild. Int. J. Comput. Vis. 129, 3 (2021), 638655. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Long and Short Papers, Burstein Jill, Doran Christy, and Solorio Thamar (Eds.). Association for Computational Linguistics, 41714186. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Domhan Tobias, Springenberg Jost Tobias, and Hutter Frank. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), Yang Qiang and Wooldridge Michael J. (Eds.). AAAI Press, 34603468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Dong Xuanyi and Yang Yi. 2019. Searching for a robust neural architecture in four GPU hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation / IEEE, 17611770. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Elsken Thomas, Metzen Jan Hendrik, and Hutter Frank. 2021. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 1 (2021), 1997--2017.Google ScholarGoogle Scholar
  13. [13] Gao Jiahui, Xu Hang, Shi Han, Ren Xiaozhe, Yu Philip L. H., Liang Xiaodan, Jiang Xin, and Li Zhenguo. 2022. AutoBERT-Zero: Evolving BERT backbone from scratch. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22). AAAI Press.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hastie Trevor, Tibshirani Robert, and Friedman Jerome H.. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, (2nd ed.). Springer. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hutter Frank, Kotthoff Lars, and Vanschoren Joaquin. 2019. Automated Machine Learning: Methods, Systems, Challenges (1st ed.). Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jiang Yufan, Hu Chi, Xiao Tong, Zhang Chunliang, and Zhu Jingbo. 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Inui Kentaro, Jiang Jing, Ng Vincent, and Wan Xiaojun (Eds.). Association for Computational Linguistics, 35833588. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kelly D. G.. 1990. Stability in contractive nonlinear neural networks. IEEE Trans. Biomed. Eng. 37, 3 (1990), 231242. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), Bengio Yoshua and LeCun Yann (Eds.).Google ScholarGoogle Scholar
  19. [19] Li Lisha, Jamieson Kevin G., DeSalvo Giulia, Rostamizadeh Afshin, and Talwalkar Ameet. 2017. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.Google ScholarGoogle Scholar
  20. [20] Li Liam and Talwalkar Ameet. 2019. Random search and reproducibility for neural architecture search. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI’19). 129.Google ScholarGoogle Scholar
  21. [21] Li Yinqiao, Hu Chi, Zhang Yuhao, Xu Nuo, Jiang Yufan, Xiao Tong, Zhu Jingbo, Liu Tongran, and Li Changliang. 2020. Learning architectures from an extended search space for language modeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Jurafsky Dan, Chai Joyce, Schluter Natalie, and Tetreault Joel R. (Eds.). Association for Computational Linguistics, 66296639. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Hanxiao, Simonyan Karen, and Yang Yiming. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  23. [23] Luo Renqian, Tian Fei, Qin Tao, Chen Enhong, and Liu Tie-Yan. 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS’18), Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.). 78277838.Google ScholarGoogle Scholar
  24. [24] Merity Stephen, Keskar Nitish Shirish, and Socher Richard. 2018. Regularizing and optimizing LSTM language models. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.Google ScholarGoogle Scholar
  25. [25] Mikolov Tomás, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), Bengio Yoshua and LeCun Yann (Eds.).Google ScholarGoogle Scholar
  26. [26] Mikolov Tomás, Karafiát Martin, Burget Lukás, Cernocký Jan, and Khudanpur Sanjeev. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), Kobayashi Takao, Hirose Keikichi, and Nakamura Satoshi (Eds.). ISCA, 10451048.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mikolov Tomás, Sutskever Ilya, Chen Kai, Corrado Gregory S., and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (NeurIPS’13), Burges Christopher J. C., Bottou Léon, Ghahramani Zoubin, and Weinberger Kilian Q. (Eds.). 31113119.Google ScholarGoogle Scholar
  28. [28] Pham Hieu, Guan Melody Y., Zoph Barret, Le Quoc V., and Dean Jeff. 2018. Efficient neural architecture search via parameter sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Dy Jennifer G. and Krause Andreas (Eds.), Vol. 80. PMLR, 40924101.Google ScholarGoogle Scholar
  29. [29] Polyak B. T. and Juditsky A. B.. 1992. Acceleration of stochastic approximation by averaging. SIAM J. Contr. Optimiz. 30, 4 (1992), 838855. DOI: arXiv:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Real Esteban, Aggarwal Alok, Huang Yanping, and Le Quoc V.. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 47804789. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Sanchez-Lengeling Benjamin, Reif Emily, Pearce Adam, and Wiltschko Alexander B.. 2021. A gentle introduction to graph neural networks. Distill 2021 (2021). DOI: https://distill.pub/2021/gnn-intro.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Sanfeliu Alberto and Fu King-Sun. 1983. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybernet. 13, 3 (1983), 353362. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Spruyt Vincent. 2014. The curse of dimensionality in classification. Comput. Vis. Dummies 21, 3 (2014), 3540.Google ScholarGoogle Scholar
  34. [34] Swersky Kevin, Snoek Jasper, and Adams Ryan Prescott. 2014. Freeze-Thaw Bayesian optimization. arXiv:1406.3896. Retrieved from https://arxiv.org/abs/1406.3896.Google ScholarGoogle Scholar
  35. [35] Takeda Ryu, Kanda Naoyuki, and Nukaga Nobuo. 2014. Boundary contraction training for acoustic models based on discrete deep neural networks. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14), Li Haizhou, Meng Helen M., Ma Bin, Chng Engsiong, and Xie Lei (Eds.). ISCA, 10631067.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 60006010.Google ScholarGoogle Scholar
  37. [37] Wang Qiang, Li Bei, Xiao Tong, Zhu Jingbo, Li Changliang, Wong Derek F., and Chao Lidia S.. 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, Korhonen Anna, Traum David R., and Màrquez Lluís (Eds.). Association for Computational Linguistics, 18101822. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wei Tao, Wang Changhu, Rui Yong, and Chen Chang Wen. 2016. Network morphism. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16), JMLR Workshop and Conference Proceedings, Balcan Maria-Florina and Weinberger Kilian Q. (Eds.), Vol. 48. JMLR.org, 564572.Google ScholarGoogle Scholar
  39. [39] Wei Xiangpeng, Yu Heng, Hu Yue, Zhang Yue, Weng Rongxiang, and Luo Weihua. 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Jurafsky Dan, Chai Joyce, Schluter Natalie, and Tetreault Joel R. (Eds.). Association for Computational Linguistics, 414426. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Werbos Paul J. 1990. Backpropagation through time: What it does and how to do it. Proc. IEEE 78, 10 (1990), 15501560.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Xu Yuhui, Xie Lingxi, Zhang Xiaopeng, Chen Xin, Qi Guo-Jun, Tian Qi, and Xiong Hongkai. 2020. {PC}-{DARTS}: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  42. [42] Yu Kaicheng, Sciuto Christian, Jaggi Martin, Musat Claudiu, and Salzmann Mathieu. 2020. Evaluating the search phase of neural architecture search. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.Google ScholarGoogle Scholar
  43. [43] Zhou Jie, Cui Ganqu, Hu Shengding, Zhang Zhengyan, Yang Cheng, Liu Zhiyuan, Wang Lifeng, Li Changcheng, and Sun Maosong. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 5781. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zilly Julian G., Srivastava Rupesh Kumar, Koutník Jan, and Schmidhuber Jürgen. 2016. Recurrent highway networks. arXiv:1607.03474. Retrieved from https://arxiv.org/abs/1607.03474.Google ScholarGoogle Scholar
  45. [45] Zoph Barret and Le Quoc V.. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.Google ScholarGoogle Scholar
  46. [46] Zoph Barret, Vasudevan Vijay, Shlens Jonathon, and Le Quoc V.. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 86978710. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning Reliable Neural Networks with Distributed Architecture Representations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
        April 2023
        682 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3588902
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 March 2023
        • Online AM: 4 January 2023
        • Accepted: 25 December 2022
        • Revised: 25 October 2022
        • Received: 7 June 2022
        Published in tallip Volume 22, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!