Abstract
Neural architecture search (NAS) has shown the strong performance of learning neural models automatically in recent years. But most NAS systems are unreliable due to the architecture gap brought by discrete representations of atomic architectures. In this article, we improve the performance and robustness of NAS via narrowing the gap between architecture representations. More specifically, we apply a general contraction mapping to model neural networks with distributed representations (Neural Architecture Search with Distributed Architecture Representations (ArchDAR)). Moreover, for a better search result, we present a joint learning approach to integrating distributed representations with advanced architecture search methods. We implement our ArchDAR in a differentiable architecture search model and test learned architectures on the language modeling task. On the Penn Treebank data, it outperforms a strong baseline significantly by 1.8 perplexity scores. Also, the search process with distributed representations is more stable, which yields a faster structural convergence when it works with the differentiable architecture search model.
- [1] Danilo P. Mandic and Jonathon A. Chambers. 2001. Recurrent neural networks for prediction. Stability Issues in RNN Architectures, John Wiley & Sons Ltd., 115--133.
DOI: Google ScholarCross Ref
- [2] . 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. Neural Netw. 5, 1 (1994), 54–65.
DOI: Google ScholarDigital Library
- [3] . 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 3028–3033.
DOI: Google ScholarCross Ref
- [4] . 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, and (Eds.), Vol. 80. PMLR, 549–558.Google Scholar
- [5] . 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20), , , , , and (Eds.).Google Scholar
- [6] . 2018. Efficient architecture search by network transformation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), and (Eds.). AAAI Press, 2787–2794.Google Scholar
Cross Ref
- [7] . 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google Scholar
- [8] . 2021. Progressive DARTS: Bridging the optimization gap for NAS in the wild. Int. J. Comput. Vis. 129, 3 (2021), 638–655.
DOI: Google ScholarDigital Library
- [9] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Long and Short Papers, , , and (Eds.). Association for Computational Linguistics, 4171–4186.
DOI: Google ScholarCross Ref
- [10] . 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), and (Eds.). AAAI Press, 3460–3468.Google Scholar
Digital Library
- [11] . 2019. Searching for a robust neural architecture in four GPU hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation / IEEE, 1761–1770.
DOI: Google ScholarCross Ref
- [12] . 2021. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 1 (2021), 1997--2017.Google Scholar
- [13] . 2022. AutoBERT-Zero: Evolving BERT backbone from scratch. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22). AAAI Press.Google Scholar
Cross Ref
- [14] . 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, (2nd ed.). Springer.
DOI: Google ScholarCross Ref
- [15] . 2019. Automated Machine Learning: Methods, Systems, Challenges (1st ed.). Springer Publishing Company, Incorporated. Google Scholar
Cross Ref
- [16] . 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), , , , and (Eds.). Association for Computational Linguistics, 3583–3588.
DOI: Google ScholarCross Ref
- [17] . 1990. Stability in contractive nonlinear neural networks. IEEE Trans. Biomed. Eng. 37, 3 (1990), 231–242.
DOI: Google ScholarCross Ref
- [18] . 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), and (Eds.).Google Scholar
- [19] . 2017. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.Google Scholar
- [20] . 2019. Random search and reproducibility for neural architecture search. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI’19). 129.Google Scholar
- [21] . 2020. Learning architectures from an extended search space for language modeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), , , , and (Eds.). Association for Computational Linguistics, 6629–6639.
DOI: Google ScholarCross Ref
- [22] . 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google Scholar
- [23] . 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS’18), , , , , , and (Eds.). 7827–7838.Google Scholar
- [24] . 2018. Regularizing and optimizing LSTM language models. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.Google Scholar
- [25] . 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), and (Eds.).Google Scholar
- [26] . 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), , , and (Eds.). ISCA, 1045–1048.Google Scholar
Cross Ref
- [27] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (NeurIPS’13), , , , and (Eds.). 3111–3119.Google Scholar
- [28] . 2018. Efficient neural architecture search via parameter sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, and (Eds.), Vol. 80. PMLR, 4092–4101.Google Scholar
- [29] . 1992. Acceleration of stochastic approximation by averaging. SIAM J. Contr. Optimiz. 30, 4 (1992), 838–855.
DOI: arXiv:Google ScholarDigital Library
- [30] . 2019. Regularized evolution for image classifier architecture search. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 4780–4789.
DOI: Google ScholarDigital Library
- [31] . 2021. A gentle introduction to graph neural networks. Distill 2021 (2021).
DOI: https://distill.pub/2021/gnn-intro.Google ScholarCross Ref
- [32] . 1983. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybernet. 13, 3 (1983), 353–362.
DOI: Google ScholarCross Ref
- [33] . 2014. The curse of dimensionality in classification. Comput. Vis. Dummies 21, 3 (2014), 35–40.Google Scholar
- [34] . 2014. Freeze-Thaw Bayesian optimization.
arXiv:1406.3896. Retrieved from https://arxiv.org/abs/1406.3896. Google Scholar - [35] . 2014. Boundary contraction training for acoustic models based on discrete deep neural networks. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14), , , , , and (Eds.). ISCA, 1063–1067.Google Scholar
Cross Ref
- [36] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000–6010.Google Scholar
- [37] . 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, , , and (Eds.). Association for Computational Linguistics, 1810–1822.
DOI: Google ScholarCross Ref
- [38] . 2016. Network morphism. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16),
JMLR Workshop and Conference Proceedings , and (Eds.), Vol. 48. JMLR.org, 564–572.Google Scholar - [39] . 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), , , , and (Eds.). Association for Computational Linguistics, 414–426.
DOI: Google ScholarCross Ref
- [40] . 1990. Backpropagation through time: What it does and how to do it. Proc. IEEE 78, 10 (1990), 1550–1560.Google Scholar
Cross Ref
- [41] . 2020. {PC}-{DARTS}: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [42] . 2020. Evaluating the search phase of neural architecture search. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.Google Scholar
- [43] . 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
DOI: Google ScholarCross Ref
- [44] . 2016. Recurrent highway networks.
arXiv:1607.03474. Retrieved from https://arxiv.org/abs/1607.03474. Google Scholar - [45] . 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.Google Scholar
- [46] . 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8697–8710.
DOI: Google ScholarCross Ref
Index Terms
Learning Reliable Neural Networks with Distributed Architecture Representations
Recommendations
Neural Architecture Search Applied to Hybrid Morphological Neural Networks
Intelligent SystemsAbstractThis work addresses a way to train morphological neural network differentially using backpropagation. The proposed algorithm can also learn whether to use erosion or dilation, based on the data being processed. Finally, we apply architecture ...
Differentiable neural architecture learning for efficient neural networks
Highlights- We build a new standalone control module based on the scaled sigmoid function to enrich the neural network module family to enable the neural architecture ...
AbstractEfficient neural networks has received ever-increasing attention with the evolution of convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems to ...
GraphPAS: Parallel Architecture Search for Graph Neural Networks
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalGraph neural architecture search has received a lot of attention as Graph Neural Networks (GNNs) has been successfully applied on the non-Euclidean data recently. However, exploring all possible GNNs architectures in the huge search space is too time-...






Comments