10.1145/1553374.1553516acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicpsprocConference Proceedings
research-article

Feature hashing for large scale multitask learning

ABSTRACT

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

References

  1. Achlioptas, D. (2003). Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66, 671--687. Google ScholarGoogle Scholar
  2. Bennett, J., & Lanning, S. (2007). The Netflix Prize. Proceedings of Conference on Knowledge Discovery and Data Mining Cup and Workshop 2007.Google ScholarGoogle Scholar
  3. Bernstein, S. (1946). The theory of probabilities. Moscow: Gastehizdat Publishing House.Google ScholarGoogle Scholar
  4. Daume, H. (2007). Frustratingly easy domain adaptation. Annual Meeting of the Association for Computational Linguistics (p. 256).Google ScholarGoogle Scholar
  5. Ganchev, K., & Dredze, M. (2008). Small statistical models by random feature mixing. Workshop on Mobile Language Processing, Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  6. Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. Proceedings of the 25th VLDB Conference (pp. 518--529). Edinburgh, Scotland: Morgan Kaufmann. Google ScholarGoogle Scholar
  7. Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project (Technical Report). http://hunch.net/?p=309.Google ScholarGoogle Scholar
  8. Ledoux, M. (2001). The concentration of measure phenomenon. Providence, RI: AMS.Google ScholarGoogle Scholar
  9. Li, P., Church, K., & Hastie, T. (2007). Conditional random sampling: A sketch-based sampling technique for sparse data. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 873--880. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  10. Rahimi, A., & Recht, B. (2008). Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  11. Rahimi, A., & Recht, B. (2009). Randomized kitchen sinks. In L. Bottou, Y. Bengio, D. Schuurmans and D. Koller (Eds.), Advances in neural information processing systems 21. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  12. Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. Proc. Intl. Workshop on Artificial Intelligence and Statistics 12.Google ScholarGoogle Scholar

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!