ABSTRACT
When using a tablet computer, sketching is a natural approach for users to annotate video scenes. However, when these annotations are done in real-time and overlaid in the video, their context can be lost due to changes in the scene being annotated. We propose an approach towards maintaining the annotations' context, by using object tracking to create anchors onto which further annotations can be attached. To this end, the annotator is capable of using different tracking methods, including a Kinect sensor and/or the TLD object tracking algorithm.
The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our alternative approaches to handle moving object selection on live video, which we have called "Hold and Overlay" and "Hold and Speed Up". In addition, the results of a set of preliminary tests are reported.
References
- Bargeron, D. and Moscovich, T. 2003. Reflowing digital ink annotations. In Proceedings of the SIGCHI conference on Human factors in computing systems. CHI '03. ACM, New York, NY, USA, 385--393. DOI=http://doi.acm.org/10.1145/642611.642678. Google Scholar
- Bradski, G. R. 1998. Computer Vision Face Tracking For Use in a Perceptual User Interface. Intel Technology Journal. 2, 2 (Abr-Jun 1998), 12--21.Google Scholar
- Bulterman, D. 2004. Animating peer-level annotations within web-based multimedia. In EuroGraphics Multimedia Workshop. Euro-graphics Association, 49--57. Google Scholar
- Cabral, D., Valente, J., Silva, J., Aragão, U., Fernandes, C. and Correia, N. 2011. A Creation-Tool for Contemporary Dance using Multimodal Video Annotation. In Proceedings of the 19th ACM international conference on Multimedia. MM '11. ACM, New York, NY, USA, 905--908. DOI=http://doi.acm.org/10.1145/2072298.2071899. Google Scholar
- Cabral, D., Valente, J. G., Aragão, U., Fernandes, C. and Correia, N. 2012. Evaluation of a Multimodal Video Annotator for Contemporary Dance. In Proceedings of the 11th International Working Conference on Advanced Visual Interfaces. AVI2012. ACM, New York, NY, USA, 572--579. DOI=http://doi.acm.org/10.1145/2254556.2254663. Google Scholar
- Cabral, D. and Correia, N. 2009. Pen-Based Video Annotations: A Proposal and a Prototype for Tablet PCs. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II. INTERACT '09. Springer-Verlag, Berlin, Heidelberg, Germany, 17--20. DOI=http://dx.doi.org/10.1007/978-3-642-03658-3_5. Google Scholar
- Calonder, M., Lepetit, V., Strecha, C. and Fua, P. 2010. BRIEF: binary robust independent elementary features. In Proceedings of the 11th European conference on Computer vision: Part IV. ECCV'10. Springer-Verlag, Berlin, Heidelberg, Germany, 778--792. DOI=http://dx.doi.org/10.1007/978-3-642-15561-1_56. Google Scholar
- Diakopoulos, N. and Essa, I. 2006. Videotater: an approach for pen-based digital video segmentation and tagging. In Proceedings of the 19th annual ACM symposium on User interface software and technology. UIST '06. ACM, New York, NY, USA, 221--224. DOI=http://doi.acm.org/10.1145/1166253.1166287. Google Scholar
- Goldman, D. B., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M. 2008. Video object annotation, navigation, and composition. In Proceedings of the 21st annual ACM symposium on User interface software and technology. UIST '08. ACM, New York, NY, USA, 3--12. DOI=http://doi.acm.org/10.1145/1449715.1449719 Google Scholar
- Golovchinksy, G., Carter, S. and Dunnigan, A. 2011. ARA: the active reading application. In Proceedings of the 19th ACM international conference on Multimedia. MM '11. ACM, New York, NY, USA, 799--800. DOI=http://doi.acm.org/10.1145/2072298.2072464. Google Scholar
- Hajri, A. Al, Fels, S., Miller, G. and Ilich, M.. 2011. Moving target selection in 2D graphical user interfaces. In Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part II (Lisbon, Portugal, September 7--9, 2011). INTERACT '11. Springer-Verlag, Berlin, Heidelberg, Germany, 141--161. DOI=http://dx.doi.org/10.1007/978-3-642-23771-3_12 Google Scholar
- Kalal, Z., Mikolajczyk, K. and Matas, J. 2012. Tracking-Learning-Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 34, 7 (July 2012), 1409--1422. DOI=http://dx.doi.org/10.1109/TPAMI.2011.239 Google Scholar
- MacKenzie, I., Sellen, A., and Buxton, W. 1991. A comparison of input devices in element pointing and dragging tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology. CHI '91. ACM, New York, NY, USA, 161--166. DOI=http://doi.acm.org/10.1145/108844.108868. Google Scholar
- Marshall, C. C. 1998. Toward an ecology of hypertext annotation. In Proceedings of the ninth ACM conference on Hypertext and hypermedia. HYPERTEXT '98. ACM, New York, NY, USA, 40--49. DOI=http://doi.acm.org/10.1145/276627.276632. Google Scholar
- Miller, G., Fels, S., Hajri, A. Al, Ilich, M., Foley-Fisher, Z., Fernandez, M. and Jang, D. 2011. MediaDiver: viewing and annotating multi-view video. In Proceedings of the 2011 annual conference extended abstracts on Human factors in omputing systems. CHI EA '11. ACM, New York, NY, USA, 1141--1146. DOI=http://doi.acm.org/10.1145/1979742.1979711. Google Scholar
- Nebehay, G. 2012. Robust Object Tracking Based on Tracking-Learning-Detection. Master's Thesis. Faculty of Informatics, TU Vienna.Google Scholar
- Neuschmied, H., Trichet, R. and Merialdo, B. 2007. Fast annotation of video objects for interactive TV. In Proceedings of the 15th international conference on Multimedia. MULTIMEDIA '07. ACM, New York, NY, USA, 158--159. DOI=http://doi.acm.org/10.1145/1291233.1291268. Google Scholar
- Ramos, G. and Balakrishnan, R. 2003. Fluid interaction techniques for the control and annotation of digital video. In Proceedings of the 16th annual ACM symposium on User interface software and technology. UIST '03. ACM, New York, NY, USA, 105--114. DOI=http://doi.acm.org/10.1145/964696.964708. Google Scholar
- Rosten, E. and Drummond, T. 2006. Machine learning for high-speed corner detection. In Proceedings of the 9th European conference on Computer Vision, volume I. ECCV '06. Springer-Verlag, Berlin, Germany, 430--443. DOI=http://dx.doi.org/10.1007/11744023_34. Google Scholar
- Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. CVPR '11. IEEE Computer Society, Washington, DC, USA, 1297--1304. DOI=http://dx.doi.org/10.1109/CVPR.2011.5995316. Google Scholar
- Yilmaz, A., Javed, O. and Shah, M. 2006. Object tracking: A survey. ACM Comput. Surv. (December 2006), 38, 4, 13:1--13:45. DOI=http://doi.acm.org/10.1145/1177352.1177355. Google Scholar
- Zhang, K., Zhang, L., Yang, M. 2012. Real-time compressive tracking. In Proceedings of the 12th European Conference on Computer Vision, part III. ECCV 2012. Springer-Verlag, Berlin, Heidelberg, Germany, 864--877. DOI=http://dx.doi.org/10.1007/978-3-642-33712-3_62. Google Scholar
Supplemental Material
Available for Download
Index Terms
Real-time annotation of video objects on tablet computers



Comments