skip to main content
research-article

SEVA: Sensor-enhanced video annotation

Published:14 August 2009Publication History
Skip Abstract Section

Abstract

In this article, we study how a sensor-rich world can be exploited by digital recording devices such as cameras and camcorders to improve a user's ability to search through a large repository of image and video files. We design and implement a digital recording system that records identities and locations of objects (as advertised by their sensors) along with visual images (as recorded by a camera). The process, which we refer to as Sensor-Enhanced Video Annotation (SEVA), combines a series of correlation, interpolation, and extrapolation techniques. It produces a tagged stream that later can be used to efficiently search for videos or frames containing particular objects or people. We present detailed experiments with a prototype of our system using both stationary and mobile objects as well as GPS and ultrasound. Our experiments show that: (i) SEVA has zero error rates for static objects, except very close to the boundary of the viewable area; (ii) for moving objects or a moving camera, SEVA only misses objects leaving or entering the viewable area by 1--2 frames; (iii) SEVA can scale to 10 fast-moving objects using current sensor technology; and (iv) SEVA runs online using relatively inexpensive hardware.

References

  1. Adams, B., Phung, D., and Venkatesh, S. 2006. Extraction of social context and application to personal multimedia exploration. In Proceedings of the 14th Annual ACM International Conference on Multimedia (MULTIMEDIA '06). ACM Press, New York, 987--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahern, S., Eckles, D., Good, N., King, S., Naaman, M., and Nair, R. 2007. Over-exposed? Privacy patterns and considerations in online and mobile photo sharing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 357--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. 2004. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Appan, P. and Sundaram, H. 2004. Networked multimedia event exploration. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MULTIMEDIA '04). ACM Press, New York, 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bahl, P. and Padmanabhan, V. N. 2000. Radar: An in-building rf-based user location and tracking system. In Proceedings of the 19th Annual Joint Conference of the IEEE Computer and Communications Societies (InfoCom'00), vol. 2, 775--784.Google ScholarGoogle Scholar
  6. Bajaj, R., Ranaweera, S. L., and Agrawal, D. P. 2002. Gps: Location-tracking technology. Comput. 35, 4, 92--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barry, B. 2005. Mindful documentary. Ph.D. thesis, Massachusetts Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Davis, M., King, S., Good, N., and Sarvas, R. 2004. From context to content: Leveraging context to infer media metadata. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 188--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Devore, J. L. 1999. Probability and Statistics for Engineering and the Sciences, 5th Ed. Brooks/Cole.Google ScholarGoogle Scholar
  10. Dourish, P. 2004. What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 1, 19--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ellis, D. P. W. and Lee, K. 2004. Minimal-impact audio-based personal archives. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 39--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fan, J., Gao, Y., and Luo, H. 2004. Multi-level annotation of natural scenes using dominant image components and semantic concepts. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 540--547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feng, H., Shi, R., and Chua, T. 2004. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 960--967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Finkenzeller, K. 2003. RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification, 2nd Ed. John Willey & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gemmell, J., Bell, G., Lueder, R., Drucker, S., and Wong, C. 2002. Mylifebits: Fulfilling the memex vision. In Proceedings of the 10th Annual ACM International Conference on Multimedia (MM'02), 235--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gemmell, J., Williams, L., Wood, K., Lueder, R., and Bell, G. 2004. Passive capture and ensuing issues for a personal lifetime store. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 48--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. geocoder. Find the latitude and longitude of any us address. http://www.geocoder.us.Google ScholarGoogle Scholar
  18. gpsdrive: Gpsdrive 2.09. http://www.gpsdrive.cc/.Google ScholarGoogle Scholar
  19. Grimm, R. 2002. System support for pervasive applications. Ph.D. thesis, University of Washington, Department of Computer Science and Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hähnel, D., Burgard, W., Fox, D., Fishkin, K., and Philipose, M. 2004. Mapping and localization with rfid technology. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'05), 1015--1020.Google ScholarGoogle Scholar
  21. Harter, A., Hopper, A., Steggles, P., Ward, A., and Webster, P. 1999. The anatomy of a context-aware application. In Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom'99), 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hightower, J. and Borriello, G. 2001. Location systems for ubiquitous computing. Comput. 34, 8, 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hightower, J., Want, R., and Borriello, G. 2000. Spoton: An indoor 3D location sensing technology based on rf signal strength. Tech. rep. 00-02-02, University of Washington.Google ScholarGoogle Scholar
  24. Hill, J. and Culler, D. 2002. Mica: A wireless platform for deeply embedded networks. IEEE Micro 22, 6, 1224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hong, J. I. and Landay, J. A. 2004. An architecture for privacy-sensitive ubiquitous computing. In Proceedings of the 2nd International Conference on Mobile Systems, Applications, and Services, 177--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jin, R., Chai, J. Y., and Si, L. 2004. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 892--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Johanson, B., Fox, A., and Winograd, T. 2002. The interactive workspaces project: Experiences with ubiquitous computing rooms. IEEE Pervasive Comput. 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kindberg, T. and et. al. 2002. People, places, things: Web presence for the real world. Mobile Netw. 7, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li, B. and Goh, K. 2003. Confidence-based dynamic ensemble for image annotation and semantics discovery. In Proceedings of the 11th Annual ACM International Conference on Multimedia (MM'03), 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Liu, X., Corner, M., and Shenoy, P. 2005. Seva: Sensor-enhanced video annotation. In Proceedings of the 13th ACM Annual Conference on Multimedia (MM'05), 618--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Liu, X., Corner, M., and Shenoy, P. 2006. Ferret: Rfid localization for pervasive multimedia. In Proceedings of the 8th International Conference on Ubiquitous Computing (UbiComp'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lymberopoulos, D. and Savvides, A. 2005. XYZ: A motion-enabled, power aware sensor node platform for distributed sensor network applications. In Proceedings of Information Processing in Sensor Networks (ISPN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mainwaring, A., Polastre, J., Szewczyk, R., Culler, D., and Anderson, J. 2002. Wireless sensor networks for habitat monitoring. In Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications (WSNA'02), 88--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Manjunath, B. S., Salembier, P., and Sikora, T. 2002. Introduction to MPEG 7: Multimedia Content Description Language, 4th Ed. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mealling, M. 2003. Auto-id object name service (ons) 1.0. Working Draft 12.Google ScholarGoogle Scholar
  36. Naaman, M., Harada, S., Wang, Q., Garcia-Molina, H., and Paepcke, A. 2004. Context data in geo-referenced digital photo collections. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 196--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Naaman, M., Paepcke, A., and Garcia-Molina, H. 2003. From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (CoopIS'03), 196--217.Google ScholarGoogle Scholar
  38. Nack, F. and Putz, W. 2004. Saying what it means: Semi-automated (News) media annotation. Multimedia Tools and Applications 22, 3, 263--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ni, L. M., Liu, Y., Lau, Y. C., and Patil, A. P. 2003. Landmarc: Indoor location sensing using active rfid. In Proceedings of the 1st IEEE International Conference on Pervasive Computing and Communications (PerCom'03). 407--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Polastre, J., Szewczyk, R., and Culler, D. 2005. Telos: Enabling ultra-low power wireless research. In Proceedings of the 4th International Conference on Information Processing in Sensor Networks: Special Track on Platform Tools and Design Methods for Network Embedded Sensors (IPSN/SPOTS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Priyantha, N. B., Chakraborty, A., and Balakrishnan, H. 2000. The cricket location-support system. In Proceedings of the 6th Annual ACM International Conference on Mobile Computing and Networking (MobiCom'00), 32--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Roman, M., Hess, C., and Campbell, R. 2002. Gaia: An oo middleware infrastructure for ubiquitous computing environments. In ECOOP Workshop on Object-Orientation and Operating Systems.Google ScholarGoogle Scholar
  43. Simon, D. 2006. Optimal State Estimation, 1st Ed. Wiley-Interscience.Google ScholarGoogle Scholar
  44. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 12, 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Smith, A., Balakrishnan, H., Goraczko, M., and Priyantha, N. 2004. Tracking moving devices with the cricket location system. In Proceedings of the 2nd ACM International Conference on Mobile Systems, Applications, and Services (MobiSys'04), 190--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Su, N. M., Park, H., Bostrom, E., Burke, J., Srivastava, M. B., and Estrin, D. 2004. Augmemting film and video footage with sensor data. In Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications (PerComm'04), 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Toyama, K., Logan, R., and Roseway, A. 2003. Geographic location tags on digital images. In Proceedings of the 11th Annual ACM International Conference on Multimedia (MM'03), 156--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Want, R., Hopper, A., Falcao, V., and Gibbons, J. 1992. The active badge location system. ACM Trans. Inf. Syst. 10, 1, 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhang, L., Hu, Y., Li, M., Ma, W., and Zhang, H. 2004. Effective propagation for face annotation in family albums. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 716--723. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SEVA: Sensor-enhanced video annotation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 5, Issue 3
      August 2009
      204 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/1556134
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 August 2009
      • Accepted: 1 May 2008
      • Revised: 1 December 2007
      • Received: 1 September 2006
      Published in tomm Volume 5, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader