skip to main content
10.1145/3097983.3098075acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Developing a Comprehensive Framework for Multimodal Feature Extraction

Published: 13 August 2017 Publication History

Abstract

Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions--ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. In a world where nearly every new service has its own API, documentation, and/or client library, data scientists who need to combine diverse features obtained from multiple sources are often forced to write and maintain ever more elaborate feature extraction pipelines. To address this challenge, we introduce a new open-source framework for comprehensive multimodal feature extraction. Pliers is an open-source Python package that supports standardized annotation of diverse data types (videos, images, audio, and text), and is expressly implemented with both ease-of-use and extensibility in mind. Users can apply a wide range of pre-existing feature extraction tools to their data in just a few lines of Python code, and can also easily add their own custom extractors by writing modular classes. A graph-based API enables rapid development of feature extraction pipelines that output results in a single, standardized format. We describe the package's architecture, detail its advantages over previous feature extraction toolboxes, and use a sample application to a large functional MRI dataset to illustrate how pliers can significantly reduce the time and effort required to construct simple feature extraction workflows while increasing code clarity and maintainability.

Supplementary Material

MP4 File (mcnamara_comprehensive_framework.mp4)

References

[1]
D. Bogdanov, N. Wack, E. Gomez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. Zapata, X. Serra, and et al. Essentia: an audio analysis library for music information retrieval. International Society for Music Information Retrieval Conference, page 493--498, 2013.
[2]
M. Brysbaert and B. New. Moving beyond ku-Äηera and francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behavior Research Methods, 41(4):977--990, 2009.
[3]
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, and A. Mueller. Api design for machine learning software: experiences from the scikit-learn project. European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases, Sep 2013.
[4]
M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, and et al. The minimal preprocessing pipelines for the human connectome project. NeuroImage, 80:105--124, 2013.
[5]
K. Gorgolewski, C. D. Burns, C. Madison, D. Clark, Y. O. Halchenko, M. L. Waskom, and S. S. Ghosh. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python. Front Neuroinform, 5, 08 2011.
[6]
R. Hyam. Automated image sampling and classification can be used to explore perceived naturalness of urban spaces. Plos One, 12(1), Apr 2017.
[7]
N. Kanwisher and G. Yovel. The fusiform face area: a cortical region specialized for the perception of faces. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109--2128, 2006.
[8]
A. Khosla, W. A. Bainbridge, A. Torralba, and A. Oliva. Modifying the memorability of face photographs. 2013 IEEE International Conference on Computer Vision, 2013.
[9]
D. Kiela. Mmfeat: A toolkit for extracting multi-modal features. Proceedings of ACL-2016 System Demonstrations, 2016.
[10]
V. Kuperman, H. Stadthagen-Gonzalez, and M. Brysbaert. Age-of-acquisition ratings for 30,000 english words. Behavior Research Methods, 44(4):978--990, Dec 2012.
[11]
O. Lartillot, P. Toiviainen, and T. Eerola. A matlab toolbox for music information retrieval. Data Analysis, Machine Learning and Applications Studies in Classification, Data Analysis, and Knowledge Organization, page 261--268, 2008.
[12]
M. Mazloom, R. Rietveld, S. Rudinac, M. Worring, and W. V. Dolen. Multimodal popularity prediction of brand-related social media posts. Proceedings of the 2016 ACM on Multimedia Conference - MM '16, 2016.
[13]
J. C. Rangel, M. Cazorla, I. GarcÃa-Varea, J. MartÃnez-Gómez, Ã. Fromont, and M. Sebban. Computing image descriptors from annotations acquired from external tools. Advances in Intelligent Systems and Computing Robot 2015: Second Iberian Robotics Conference, page 673--683, 2015.
[14]
S. K. Scott and I. S. Johnsrude. The neuroanatomical and functional organization of speech perception. Trends in Neurosciences, 26(2):100--107, 2003.
[15]
S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E. Behrens, H. Johansen-Berg, P. R. Bannister, M. D. Luca, I. Drobnjak, D. E. Flitney, and et al. Advances in functional and structural mr image analysis and implementation as fsl. NeuroImage, 23, 2004.
[16]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[17]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818--2826, 2016.
[18]
D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, and K. Ugurbil. The wu-minn human connectome project: An overview. NeuroImage, 80:62--79, 2013.
[19]
D. B. Walther, E. Caddigan, L. Fei-Fei, and D. M. Beck. Natural scene categories revealed in distributed patterns of activity in the human brain. Journal of Neuroscience, 29(34):10573--10581, 2009.
[20]
A. B. Warriner, V. Kuperman, and M. Brysbaert. Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior Research Methods, 45(4):1191--1207, 2013.
[21]
A. Zhang. Speech recognition, 2017.

Cited By

View all
  • (2024)Communicating Europe: a computational analysis of the evolution of the European Commission’s communication on TwitterJournal of Computational Social Science10.1007/s42001-024-00271-wOnline publication date: 17-Apr-2024
  • (2024)Semantic Properties of Word Prompts Shape Design Outcomes: Understanding the Influence of Semantic Richness and SimilarityDesign Computing and Cognition’2410.1007/978-3-031-71922-6_16(241-258)Online publication date: 28-Sep-2024
  • (2023)Dynamic interactions between anterior insula and anterior cingulate cortex link perceptual features and heart rate variability during movie viewingNetwork Neuroscience10.1162/netn_a_002957:2(557-577)Online publication date: 30-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature extraction
  2. multimodal retrieval
  3. python
  4. standardization
  5. wrappers

Qualifiers

  • Research-article

Funding Sources

  • National Institute of Mental Health

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)462
  • Downloads (Last 6 weeks)65
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Communicating Europe: a computational analysis of the evolution of the European Commission’s communication on TwitterJournal of Computational Social Science10.1007/s42001-024-00271-wOnline publication date: 17-Apr-2024
  • (2024)Semantic Properties of Word Prompts Shape Design Outcomes: Understanding the Influence of Semantic Richness and SimilarityDesign Computing and Cognition’2410.1007/978-3-031-71922-6_16(241-258)Online publication date: 28-Sep-2024
  • (2023)Dynamic interactions between anterior insula and anterior cingulate cortex link perceptual features and heart rate variability during movie viewingNetwork Neuroscience10.1162/netn_a_002957:2(557-577)Online publication date: 30-Jun-2023
  • (2023)LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00351(2051-2056)Online publication date: Jul-2023
  • (2023)A Lightweight Multimodal Learning Model to Recognize User Sentiment in Mobile Devices2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043524(1-6)Online publication date: 6-Jan-2023
  • (2023)Neural unscrambling of temporal information during a nonlinear narrativeCerebral Cortex10.1093/cercor/bhad01533:11(7001-7014)Online publication date: 8-Feb-2023
  • (2023)Moment-by-moment tracking of audience brain responses to an engaging public speech: Replicating the reverse-message engineering approachCommunication Monographs10.1080/03637751.2023.224039891:1(31-55)Online publication date: 10-Aug-2023
  • (2023)Large-scale encoding of emotion concepts becomes increasingly similar between individuals from childhood to adolescenceNature Neuroscience10.1038/s41593-023-01358-926:7(1256-1266)Online publication date: 8-Jun-2023
  • (2022)Neuroscout, a unified platform for generalizable and reproducible fMRI researcheLife10.7554/eLife.7927711Online publication date: 30-Aug-2022
  • (2022)Seeing Social: A Neural Signature for Conscious Perception of Social InteractionsThe Journal of Neuroscience10.1523/JNEUROSCI.0859-22.202242:49(9211-9226)Online publication date: 24-Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media