skip to main content
10.5555/3524938.3525996guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free Access

Learning to learn kernels with variational random features

Authors Info & Claims
Published:13 July 2020Publication History

ABSTRACT

We introduce kernels with random Fourier features in the meta-learning framework for few-shot learning. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as a variational inference problem by deriving an evidence lower bound under the meta-learning framework. To incorporate shared knowledge from related tasks, we propose a context inference of the posterior, which is established by an LSTM architecture. The LSTM-based inference network effectively integrates the context information of previous tasks with task-specific information, generating informative and adaptive features. The learned MetaVRF is able to produce kernels of high representational power with a relatively low spectral sampling rate and also enables fast adaptation to new tasks. Experimental results on a variety of few-shot regression and classification tasks demonstrate that MetaVRF can deliver much better, or at least competitive, performance compared to existing meta-learning alternatives.

Skip Supplemental Material Section

Supplemental Material

References

  1. Allen, K. R., Shelhamer, E., Shin, H., and Tenenbaum, J. B. Infinite mixture prototypes for few-shot learning. In Proceedings of the 36th International Conference on Machine Learning, pp. 232-241, 2019.Google ScholarGoogle Scholar
  2. Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B., and de Freitas, N. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aravind Rajeswaran, Chelsea Finn, S. K. S. L. Meta-learning with implicit gradients. arXiv preprint arXiv:1909.04630, 2019.Google ScholarGoogle Scholar
  4. Avron, H., Sindhwani, V., Yang, J., and Mahoney, M. W. Quasi-monte carlo feature maps for shift-invariant kernels. The Journal of Machine Learning Research, 17(1):4096- 4133, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bach, F. R., Lanckriet, G. R., and Jordan, M. I. Multiple kernel learning, conic duality, and the smo algorithm. In Proceedings of the twenty-first international conference on Machine learning, pp. 6, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bauer, M., Rojas-Carulla, M., Świątkowski, J. B., Schölkopf, B., and Turner, R. E. Discriminative kshot learning using probabilistic models. arXiv preprint arXiv:1706.00326, 2017.Google ScholarGoogle Scholar
  7. Bertinetto, L., Henriques, J. F., Torr, P. H., and Vedaldi, A. Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019.Google ScholarGoogle Scholar
  8. Bishop, C. M. Pattern recognition and machine learning. springer, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bullins, B., Zhang, C., and Zhang, Y. Not-so-random features. In International Conference on Learning Representations, 2018.Google ScholarGoogle Scholar
  10. Carratino, L., Rudi, A., and Rosasco, L. Learning with sgd and random features. In Advances in Neural Information Processing Systems, pp. 10192-10203, 2018.Google ScholarGoogle Scholar
  11. Chang, W.-C., Li, C.-L., Yang, Y., and Poczos, B. Data-driven random fourier features using stein effect. arXiv preprint arXiv:1705.08525, 2017.Google ScholarGoogle Scholar
  12. Chen, Y., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., and De Freitas, N. Learning to learn without gradient descent by gradient descent. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 748-756. JMLR. org, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Devos, A., Chatel, S., and Grossglauser, M. Reproducing meta-learning with differentiable closed-form solvers. In ICLR Workshop, 2019.Google ScholarGoogle Scholar
  14. Duvenaud, D., Lloyd, J. R., Grosse, R., Tenenbaum, J. B., and Ghahramani, Z. Structure discovery in nonparametric regression through compositional kernel search. arXiv preprint arXiv:1302.4922, 2013.Google ScholarGoogle Scholar
  15. Finn, C. and Levine, S. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In International Conference on Learning Representations, 2018.Google ScholarGoogle Scholar
  16. Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp. 1126-1135. JMLR. org, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Finn, C., Xu, K., and Levine, S. Probabilistic modelagnostic meta-learning. In Advances in Neural Information Processing Systems, pp. 9516-9527, 2018.Google ScholarGoogle Scholar
  18. Garcia, V. and Bruna, J. Few-shot learning with graph neural networks. In International Conference on Learning Representations, 2018.Google ScholarGoogle Scholar
  19. Gärtner, T., Flach, P. A., Kowalczyk, A., and Smola, A. J. Multi-instance kernels. In International Conference on Machine Learning, 2002.Google ScholarGoogle Scholar
  20. Gers, F. A. and Schmidhuber, J. Recurrent nets that time and count. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, volume 3, pp. 189-194. IEEE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  21. Gidaris, S. and Komodakis, N. Dynamic few-shot visual learning without forgetting. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367-4375, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  22. Gönen, M. and Alpaydin, E. Multiple kernel learning algorithms. Journal of machine learning research, 12(Jul): 2211-2268, 2011.Google ScholarGoogle Scholar
  23. Gordon, J., Bronskill, J., Bauer, M., Nowozin, S., and Turner, R. E. Meta-learning probabilistic inference for prediction. In International Conference on Learning Representations, 2019.Google ScholarGoogle Scholar
  24. Graves, A. and Schmidhuber, J. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural networks, 18(5-6):602-610, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hensman, J., Durrande, N., and Solin, A. Variational fourier features for gaussian processes. The Journal of Machine Learning Research, 18(1):5537-5588, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hofmann, T., Schölkopf, B., and Smola, A. J. Kernel methods in machine learning. The annals of statistics, pp. 1171-1220, 2008.Google ScholarGoogle Scholar
  28. Kim, H., Mnih, A., Schwarz, J., Garnelo, M., Eslami, A., Rosenbaum, D., Vinyals, O., and Teh, Y. W. Attentive neural processes. In International Conference on Learning Representations, 2019.Google ScholarGoogle Scholar
  29. Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.Google ScholarGoogle Scholar
  30. Koch, G. Siamese neural networks for one-shot image recognition. In ICML Workshop, 2015.Google ScholarGoogle Scholar
  31. Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.Google ScholarGoogle Scholar
  32. Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332-1338, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  33. Li, C.-L., Chang, W.-C., Mroueh, Y., Yang, Y., and Poczos, B. Implicit kernel learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2007-2016, 2019.Google ScholarGoogle Scholar
  34. Li, Z., Zhou, F., Chen, F., and Li, H. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.Google ScholarGoogle Scholar
  35. Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. A simple neural attentive meta-learner. In International Conference on Learning Representations, 2018.Google ScholarGoogle Scholar
  36. Munkhdalai, T., Yuan, X., Mehri, S., and Trischler, A. Rapid adaptation with conditionally shifted neurons. arXiv preprint arXiv:1712.09926, 2017.Google ScholarGoogle Scholar
  37. Oreshkin, B., López, P. R., and Lacoste, A. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems, pp. 721-731, 2018.Google ScholarGoogle Scholar
  38. Qiao, S., Liu, C., Shen, W., and Yuille, A. L. Few-shot image recognition by predicting parameters from activations. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 7229-7238, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  39. Rahimi, A. and Recht, B. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, pp. 1177-1184, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ravi, S. and Larochelle, H. Optimization as a model for fewshot learning. In International Conference on Learning Representations, 2017.Google ScholarGoogle Scholar
  41. Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.Google ScholarGoogle Scholar
  42. Rudin, W. Fourier analysis on groups, volume 121967. Wiley Online Library, 1962.Google ScholarGoogle Scholar
  43. Rusu, A. A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., and Hadsell, R. Meta-learning with latent embedding optimization. In International Conference on Learning Representations, 2019.Google ScholarGoogle Scholar
  44. Satorras, V. G. and Estrach, J. B. Few-shot learning with graph neural networks. In International Conference on Learning Representations, 2018.Google ScholarGoogle Scholar
  45. Schmidhuber, J. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131-139, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Schuster, M. and Paliwal, K. K. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673-2681, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Shervashidze, N., Schweitzer, P., Leeuwen, E. J. v., Mehlhorn, K., and Borgwardt, K. M. Weisfeilerlehman graph kernels. Journal of Machine Learning Research, 12(Sep):2539-2561, 2011.Google ScholarGoogle Scholar
  48. Sinha, A. and Duchi, J. C. Learning kernels with random features. In Advances in Neural Information Processing Systems, pp. 1298-1306, 2016.Google ScholarGoogle Scholar
  49. Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pp. 4077-4087, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sohn, K., Lee, H., and Yan, X. Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems, pp. 3483-3491, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., and Hospedales, T. M. Learning to compare: Relation network for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199-1208, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  52. Thrun, S. and Pratt, L. Learning to learn. Springer Science & Business Media, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Titsias, M. K., Schwarz, J., Matthews, A. G. d. G., Pascanu, R., and Teh, Y. W. Functional regularisation for continual learning using gaussian processes. arXiv preprint arXiv:1901.11356, 2019.Google ScholarGoogle Scholar
  54. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630-3638, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wilson, A. and Adams, R. Gaussian process kernels for pattern discovery and extrapolation. In International Conference on Machine Learning, pp. 1067-1075, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yang, Z., Wilson, A., Smola, A., and Song, L. A la carte-learning fast kernels. In Artificial Intelligence and Statistics, pp. 1098-1106, 2015.Google ScholarGoogle Scholar
  57. Yu, F. X. X., Suresh, A. T., Choromanski, K. M., Holtmann-Rice, D. N., and Kumar, S. Orthogonal random features. In Advances in Neural Information Processing Systems, pp. 1975-1983, 2016.Google ScholarGoogle Scholar
  58. Zagoruyko, S. and Komodakis, N. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.Google ScholarGoogle Scholar
  59. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. Deep sets. In Advances in Neural Information Processing Systems, pp. 3391-3401, 2017.Google ScholarGoogle Scholar
  60. Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., and Whiteson, S. Fast context adaptation via meta-learning. In International Conference on Machine Learning, pp. 7693- 7702, 2019.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image Guide Proceedings
    ICML'20: Proceedings of the 37th International Conference on Machine Learning
    July 2020
    11702 pages

    Copyright © 2020

    Publisher

    JMLR.org

    Publication History

    • Published: 13 July 2020

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)3

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader