skip to main content
10.1145/3316482.3326344acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections

Automating the generation of hardware component knowledge bases

Published:23 June 2019Publication History

ABSTRACT

Hardware component databases are critical resources in designing embedded systems. Since generating these databases requires hundreds of thousands of hours of manual data entry, they are proprietary, limited in the data they provide, and have many random data entry errors.

We present a machine-learning based approach for automating the generation of component databases directly from datasheets. Extracting data directly from datasheets is challenging because: (1) the data is relational in nature and relies on non-local context, (2) the documents are filled with technical jargon, and (3) the datasheets are PDFs, a format that decouples visual locality from locality in the document. The proposed approach uses a rich data model and weak supervision to address these challenges.

We evaluate the approach on datasheets of three classes of hardware components and achieve an average quality of 75 F1 points which is comparable to existing human-curated knowledge bases. We perform two applications studies that demonstrate the extraction of multiple data modalities such as numerical properties and images. We show how different sources of supervision such as heuristics and human labels have distinct advantages which can be utilized together within a single methodology to automatically generate hardware component knowledge bases.

References

  1. 2015. Choosing the right transistor for a switching circuit. https://electronics.stackexchange.com/questions/29029/ choosing-the-right-transistor-for-a-switching-circuitGoogle ScholarGoogle Scholar
  2. Fraser Anderson, Tovi Grossman, and George Fitzmaurice. 2017. Trigger-Action-Circuits: Leveraging Generative Design to Enable Novices to Design and Build Circuitry. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology . ACM, 331–342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 344–354.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web.. In IJCAI, Vol. 7. 2670–2676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hui Chao and Jian Fan. 2004. Layout and content extraction for pdf documents. In International Workshop on Document Analysis Systems. Springer, 213–224.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dave Doherty. 2019. About Digikey. https://www.digikey.com/en/ resources/about-digikeyGoogle ScholarGoogle Scholar
  7. Daniel Drew, Julie L Newcomb, William McGrath, Filip Maksimovic, David Mellis, and Björn Hartmann. 2016. The toastboard: Ubiquitous instrumentation and automated checking of breadboarded circuits. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology . ACM, 677–686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, et al. 2011. Open information extraction: The second generation. In Twenty-Second International Joint Conference on Artificial Intelligence . Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benoît Frénay and Michel Verleysen. 2014. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5 (2014), 845–869.Google ScholarGoogle Scholar
  10. William Huang, Ye-Sheng Kuo, Pat Pannuto, and Prabal Dutta. 2014. Opo: a wearable sensor for capturing high-fidelity face-to-face interactions. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems . ACM, 61–75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Antonio Iannopollo, Stavros Tripakis, and Alberto SangiovanniVincentelli. 2019. Constrained synthesis from component libraries. Science of Computer Programming 171 (2019), 21–41.Google ScholarGoogle ScholarCross RefCross Ref
  12. Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran. 2015. Comprehensive and reliable crowd assessment algorithms. In 2015 IEEE 31st International Conference on Data Engineering . IEEE, 195–206.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ying Liu, Kun Bai, Prasenjit Mitra, and C Lee Giles. 2007. Tableseer: automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries . ACM, 91–100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 . Association for Computational Linguistics, 1003–1011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI) . 561–577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ermelinda Oro and Massimo Ruffolo. 2009. Trex: An approach for recognizing and extracting tables from pdf documents. In 2009 10th International Conference on Document Analysis and Recognition . IEEE, 906–910. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shanan E Peters, Ce Zhang, Miron Livny, and Christopher Ré. 2014. A machine reading system for assembling synthetic paleontological databases. PLoS one 9, 12 (2014), e113523.Google ScholarGoogle ScholarCross RefCross Ref
  18. Raf Ramakers, Fraser Anderson, Tovi Grossman, and George Fitzmaurice. 2016. Retrofab: A design tool for retrofitting physical interfaces using actuators, sensors and 3d printing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . ACM, 409–419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Raf Ramakers, Kashyap Todi, and Kris Luyten. 2015. PaperPulse: an integrated approach for embedding electronics in paper designs. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems . ACM, 2457–2466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rohit Ramesh, Richard Lin, Antonio Iannopollo, Alberto SangiovanniVincentelli, Björn Hartmann, and Prabal Dutta. 2017. Turning coders into makers: the promise of embedded design generation. In Proceedings of the 1st Annual ACM Symposium on Computational Fabrication . ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. Data Programming: Creating Large Training Sets, Quickly. In Advances in Neural Information Processing Systems. 3567–3575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge Base Construction from Richly Formatted Data. In Proceedings of the 2018 International Conference on Management of Data . ACM, 1301–1316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, and Shanan Peters. 2013. GeoDeepDive: statistical inference using familiar data-processing languages. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data . ACM, 993–996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yuchen Zhang, Xi Chen, Dengyong Zhou, and Michael I Jordan. 2014. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Advances in neural information processing systems. 1260– 1268. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automating the generation of hardware component knowledge bases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
        June 2019
        218 pages
        ISBN:9781450367240
        DOI:10.1145/3316482

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate116of438submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Access Granted

      This article is provided by ACM and the author Luke Hsiao through the ACM Author-Izer service.