Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Annotated collections of images and videos are a necessary basis for the successful development of multimedia retrieval systems. The underlying models of such systems rely heavily on quality and availability of large training collections. The annotation of large collections, however, is a time-consuming and error prone task as it has to be performed by human annotators. In this paper we present the IBM Efficient Video Annotation (EVA) system, a server-based tool for semantic concept annotation of large video and image collections. It is optimised for collaborative annotation and includes features such as workload sharing and support in conducting inter-annotator analysis. We discuss initial results of an ongoing user-evaluation of this system. The results are based on data collected during the 2005 TRECVID Annotation Forum, where more than 100 annotators have been using the system.
Advertisements



top of pageAUTHORS



Author image not provided  Timo Volkmer

No contact information provided yet.

Bibliometrics: publication history
Publication years2004-2007
Publication count5
Citation Count55
Available for download3
Downloads (6 Weeks)1
Downloads (12 Months)30
Downloads (cumulative)1,921
Average downloads per article640.33
Average citations per article11.00
View colleagues of Timo Volkmer


Author image not provided  John R. Smith

No contact information provided yet.

Bibliometrics: publication history
Publication years1994-2015
Publication count98
Citation Count1,889
Available for download37
Downloads (6 Weeks)68
Downloads (12 Months)995
Downloads (cumulative)25,315
Average downloads per article684.19
Average citations per article19.28
View colleagues of John R. Smith


Author image not provided  Apostol (Paul) Natsev

No contact information provided yet.

Bibliometrics: publication history
Publication years1999-2013
Publication count33
Citation Count517
Available for download20
Downloads (6 Weeks)24
Downloads (12 Months)271
Downloads (cumulative)12,468
Average downloads per article623.40
Average citations per article15.67
View colleagues of Apostol (Paul) Natsev

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
E. Cooke, P. Ferguson, G. Gaughan, C. Gurrin, G. Jones, H. L. Borgue, H. Lee, S. Marlow, K. McDonald, M. McHugh, N. Murphy, N. O'Connor, N. O'Hare, S. Rothwell, A. Smeaton, and P. Wilkins. TRECVID 2004 experiments in Dublin City University. In TRECVID 2004 Workshop Notebook Papers, Gaithersburg, MD, USA, 15--16 November 2004. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/dcu.pdf.
 
3
 
4
A. L. Edwards. An Introduction to Linear Regression and Correlation, chapter 4. The Correlation Coefficient, pages 33--46. W. H. Freeman, San Francisco, CA, USA, 1976.
 
5
The Informedia Digital Library Project. http://www.informedia.cs.cmu.edu.
 
6
W. Kraaij, A. F. Smeaton, P. Over, and J. Arlandis. TRECVID-2004 -- An introduction. In E. M. Voorhees and L. P. Buckland, editors, TRECVID 2004 Workshop Notebook Papers, Gaithersburg, MD, USA, 15--16 November 2004. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/tv4overview.pdf.
 
7
C.-Y. Lin, B. L. Tseng, and J. R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In E. M. Voorhees and L. P. Buckland, editors, TRECVID 2003 Workshop Notebook Papers, Gaithersburg, MD, USA, 18--21 November 2003. http://www.alphaworks.ibm.com/tech/videoannex.
 
8
The ricoh MovieTool. http://ricoh.co.jp/src/multimedia/MovieTool.
 
9
M. Naphade, L. Kennedy, J. Kender, S. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. Technical Report RC23612, IBM T.J. Watson Research Center, Hawthorne, NY, USA, May 2005. http://domino.watson.ibm.com/library/CyberDig.nsf/papers/A33ABDB65967B5%3B852570070056B36F/$File/rc23612.pdf.
 
10
C. Petersohn. Fraunhofer HHI at TRECVID~2004: Shot boundary detection system. In TRECVID 2004 Workshop Notebook Papers, Gaithersburg, MD, USA, 15--16 November 2004. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/fraunhofer.pdf.
 
11
T. Pfund and S. Marchand-Maillet. A dynamic multimedia annotation tool. In G. Beretta and R. Schettini, editors, In Proceedings of SPIE Photonics West, Electronic Imaging 2002, Internet Imaging III, volume 4672, pages 216--224, San Jose, CA, USA, January 2002. http://viper.unige.ch/research/annotation.
 
12
A. F. Smeaton, W. Kraaij, and P. Over. TRECVID-2003 -- An introduction. In E. M. Voorhees and L. P. Buckland, editors, TRECVID 2003 Workshop Notebook Papers, Gaithersburg, MD, USA, 18--21 November 2003. http://www-nlpir.nist.gov/projects/tvpubs/tvpapers03/tv3overview.pdf.
 
13
A. F. Smeaton and P. Over. The TREC-2002 video track report. In E. M. Voorhees and L. P. Buckland, editors, NIST Special Publication 500-251: Proceedings of the Eleventh Text REtrieval Conference (TREC 2002), pages 69--85, Gaithersburg, MD, USA, 19--22 November 2002. http://trec.nist.gov/pubs/trec11/papers/VIDEO.OVER.pdf.
 
14
TREC Video Retrieval Evaluation (TRECVID). http://www-nlpir.nist.gov/projects/trecvid.
 
15
The video desciption tool (VIDETO). http://www.zgdv.de/zgdv/departments/zr1/Produkte/videto.
16

top of pageCITED BY

38 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title MULTIMEDIA '05 Proceedings of the 13th annual ACM international conference on Multimedia table of contents
General Chairs Hongjiang Zhang Microsoft Research Asia, China
Tat-Seng Chua National University of Singapore, Singapore
Program Chairs Ralf Steinmetz Technische Universitat Darmstadt, Germany
Mohan Kankanhalli National University of Singapore, Singapore
Lynn Wilcox FXPAL
Pages 892-901
Publication Date2005-11-06 (yyyy-mm-dd)
Sponsors SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGMULTIMEDIA ACM Special Interest Group on Multimedia
ACM Association for Computing Machinery
PublisherACM New York, NY, USA ©2005
ISBN: 1-59593-044-2 Order Number: 433051 doi>10.1145/1101149.1101341
Conference MMInternational Multimedia Conference MM logo
Paper Acceptance Rate 49 of 312 submissions, 16%
Overall Acceptance Rate 1,375 of 5,525 submissions, 25%
Year Submitted Accepted Rate
MULTIMEDIA '97 142 40 28%
MULTIMEDIA '02 330 46 14%
MULTIMEDIA '03 255 43 17%
MULTIMEDIA '04 331 55 17%
MULTIMEDIA '05 312 49 16%
MULTIMEDIA '06 292 48 16%
MULTIMEDIA '07 298 57 19%
MM '08 516 136 26%
MM '09 305 50 16%
MM '10 974 396 41%
MM '11 666 230 35%
MM '12 331 67 20%
MM '13 235 47 20%
MM '14 286 55 19%
MM '15 252 56 22%
Overall 5,525 1,375 25%

APPEARS IN
Digital Content
Applications
Networking
Interaction
Software

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 13th annual ACM international conference on Multimedia
Table of Contents
Future of home media
Kazumasa Enami
Pages: 1-1
doi>10.1145/1101149.1101150
Full text: PDFPDF

In the future, the television set at home will be a key device for providing integrated information services through broadcasting, communication and storage media. In this environment, users will be able to receive any type of information, e. g., HDTV ...
expand
SESSION: Content 1: news video processing
Tracking news stories across different sources
Yun Zhai, Mubarak Shah
Pages: 2-10
doi>10.1145/1101149.1101152
Full text: PDFPDF

Information linkage is becoming more and more important in this digital age. In this paper, we propose a concept tracking method, which links news stories on the same topic across multiple sources. The semantic linkage between the news stories is reflected ...
expand
Topic transition detection using hierarchical hidden Markov and semi-Markov models
Dinh Q. Phung, T. V. Duong, S. Venkatesh, Hung H. Bui
Pages: 11-20
doi>10.1145/1101149.1101153
Full text: PDFPDF

In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a ...
expand
Joint visual-text modeling for automatic retrieval of multimedia documents
G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. P. Khudanpur, D. Klakow, M. R. Krause, R. Manmatha, H. J. Nock, D. Petkova, B. Pytlik, P. Virga
Pages: 21-30
doi>10.1145/1101149.1101154
Full text: PDFPDF

In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different ...
expand
Multiple instance learning for labeling faces in broadcasting news video
Jun Yang, Rong Yan, Alexander G. Hauptmann
Pages: 31-40
doi>10.1145/1101149.1101155
Full text: PDFPDF

Labeling faces in news video with their names is an interesting research problem which was previously solved using supervised methods that demand significant user efforts on labeling training data. In this paper, we investigate a more challenging setting ...
expand
SESSION: Applications 1: media fusion for communication and presentation
Complementing your TV-viewing by web content automatically-transformed into TV-program-type content
Akiyo Nadamoto, Katsumi Tanaka
Pages: 41-50
doi>10.1145/1101149.1101157
Full text: PDFPDF

Despite much talk about the fusion of broadcasting and the Internet, no technology has been established for fusing web and TV program content. In this paper, we propose ways to transform web content into TV-program-type content as a first step towards ...
expand
Augmented segmentation and visualization for presentation videos
Alexander Haubold, John R. Kender
Pages: 51-60
doi>10.1145/1101149.1101158
Full text: PDFPDF

We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). ...
expand
Exploring media correlation and synchronization for navigated hypermedia documents
Kuo-Yu Liu, Herng-Yow Chen
Pages: 61-70
doi>10.1145/1101149.1101159
Full text: PDFPDF

This paper is devoted to explore media correlation and media synchronization in a composite multimedia document, the so-called navigated hypermedia document in our language learning system, to facilitate the multimedia authoring, presentation, and access. ...
expand
Designing a large-scale video chat application
Jeremiah Scholl, Peter Parnes, John D. McCarthy, Angela Sasse
Pages: 71-80
doi>10.1145/1101149.1101160
Full text: PDFPDF

Studies of video conferencing systems generally focus on scenarios where users communicate using an audio channel. However, text chat serves users in a wide variety of contexts, and is commonly included in multimedia conferencing systems as a complement ...
expand
SESSION: Brave new topics 1: multimedia challenges for planetary scale applications
IrisNet: an internet-scale architecture for multimedia sensors
Jason Campbell, Phillip B. Gibbons, Suman Nath, Padmanabhan Pillai, Srinivasan Seshan, Rahul Sukthankar
Pages: 81-88
doi>10.1145/1101149.1101162
Full text: PDFPDF

Most current sensor network research explores the use of extremely simple sensors on small devices called motes and focuses on over-coming the resource constraints of these devices. In contrast, our research explores the challenges of multimedia sensors ...
expand
The multimedia challenges raised by pervasive games
Mauricio Capra, Milena Radenkovic, Steve Benford, Leif Oppermann, Adam Drozd, Martin Flintham
Pages: 89-95
doi>10.1145/1101149.1101163
Full text: PDFPDF

Pervasive gaming is a new form of multimedia entertainment that extends the traditional computer gaming experience out into the real world. Through a combination of personal devices, positioning systems and other multimedia sensors, combined with wireless ...
expand
PLASMA: a PLAnetary scale monitoring architecture
Demet Aksoy
Pages: 96-102
doi>10.1145/1101149.1101164
Full text: PDFPDF

While sensor networks continue to attract significant interest in various research communities, high impact applications still have a long list of challenges to be addressed. An individual sensor system can provide important observations within a local ...
expand
Gates of global perception: forensic graphics for evidence presentation
A. M. Burton, D. Schofield, L. M. Goodwin
Pages: 103-111
doi>10.1145/1101149.1101165
Full text: PDFPDF

The admissibility of the inevitably increasing amount of digital evidence to the world's courtrooms may be one of the keys to the preservation of global justice. Digital evidence can take many forms, this paper will concentrate on both graphical evidence ...
expand
SESSION: Content 2: image clustering
Web image clustering by consistent utilization of visual features and surrounding texts
Bin Gao, Tie-Yan Liu, Tao Qin, Xin Zheng, Qian-Sheng Cheng, Wei-Ying Ma
Pages: 112-121
doi>10.1145/1101149.1101167
Full text: PDFPDF

Image clustering, an important technology for image processing, has been actively researched for a long period of time. Especially in recent years, with the explosive growth of the Web, image clustering has even been a critical technology to help users ...
expand
Iteratively clustering web images based on link and attribute reinforcements
Xin-Jing Wang, Wei-Ying Ma, Lei Zhang, Xing Li
Pages: 122-131
doi>10.1145/1101149.1101168
Full text: PDFPDF

Image clustering is an important research topic which contributes to a wide range of applications. Traditional image clustering approaches are based on image content features only, while content features alone can hardly describe the semantics of the ...
expand
Image clustering with tensor representation
Xiaofei He, Deng Cai, Haifeng Liu, Jiawei Han
Pages: 132-140
doi>10.1145/1101149.1101169
Full text: PDFPDF

We consider the problem of image representation and clustering. Traditionally, an n1 x n2 image is represented by a vector in the Euclidean space ℝ n1 x n2. Some learning algorithms are ...
expand
Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices
A. Pigeau, M. Gelgon
Pages: 141-150
doi>10.1145/1101149.1101170
Full text: PDFPDF

Usage of mobile devices (phones, digital cameras) raises the need for organizing large personal image collections. In accordance with studies on user needs, we propose a statistical criterion and an associated optimization technique, relying on geo-temporal ...
expand
SESSION: Systems 1: multi-camera systems
Critical video quality for distributed automated video surveillance
Pavel Korshunov, Wei Tsang Ooi
Pages: 151-160
doi>10.1145/1101149.1101172
Full text: PDFPDF

Large-scale distributed video surveillance systems pose new scalability challenges. Due to the large number of video sources in such systems, the amount of bandwidth required to transmit video streams for monitoring often strains the capability of the ...
expand
A real-time interactive multi-view video system
Jian-Guang Lou, Hua Cai, Jiang Li
Pages: 161-170
doi>10.1145/1101149.1101173
Full text: PDFPDF

With the rapid development of electronic and computing technology, multi-view video is attracting extensive interest recently due to its greatly enhanced viewing experience. In this paper, we present the system architecture for real-time capturing, processing, ...
expand
MedSMan: a streaming data management system over live multimedia
Bin Liu, Amarnath Gupta, Ramesh Jain
Pages: 171-180
doi>10.1145/1101149.1101174
Full text: PDFPDF

Querying live media streams is a challenging problem that is becoming an essential requirement in a growing number of applications. Research in multimedia information systems has addressed and made good progress in dealing with archived data. Meanwhile, ...
expand
SESSION: Interactive arts 1: interfaces for audio and music creation
"fl Huge UId streams": fountains that are keyboards with nozzle spray as keys that give rich tactile feedback and are more expressive and more fun than plastic keys
Steve Mann
Pages: 181-190
doi>10.1145/1101149.1101176
Full text: PDFPDF

"flUId" is a system for fluid-based tactile user interfaces with an array of fluid streams that work like the keys on a keyboard, but that can also provide a much richer and more expressive form of input by virtue of the infinitely ...
expand
Facilitating collective musical creativity
Atau Tanaka, Nao Tokui, Ali Momeni
Pages: 191-198
doi>10.1145/1101149.1101177
Full text: PDFPDF

We present two projects that facilitate collective music creativity over networks. One system is a participative social music system on mobile devices. The other is a collaborative music mixing environment that adheres to the Creative Commons license ...
expand
MobiLenin combining a multi-track music video, personal mobile phones and a public display into multi-user interactive entertainment
Jürgen Scheible, Timo Ojala
Pages: 199-208
doi>10.1145/1101149.1101178
Full text: PDFPDF

This paper introduces a novel and creative approach for coupling multimedia art with a non-conventional distributed human-computer interface for multi-user interactive entertainment. The proposed MobiLenin system allows a group of people to interact ...
expand
DEMONSTRATION SESSION: Technical demonstration 1: media understanding and browsing
PhotoRouter: destination-centric mobile media messaging
Shane Ahern, Simon King, Hong Qu, Marc Davis
Pages: 209-210
doi>10.1145/1101149.1101180
Full text: PDFPDF

The number of people using cameraphones is growing by tens of millions every month. Yet the majority of cameraphone users have difficulty transferring photos off their phone and sharing them with others. PhotoRouter is a software application for ...
expand
Content-based music audio recommendation
Pedro Cano, Markus Koppenberger, Nicolas Wack
Pages: 211-212
doi>10.1145/1101149.1101181
Full text: PDFPDF

We present the MusicSurfer, a metadata free system for the interaction with massive collections of music. MusicSurfer automatically extracts descriptions related to instrumentation, rhythm and harmony from music audio signals. Together with efficient ...
expand
MediaMetro: browsing multimedia document collections with a 3D city metaphor
Patrick Chiu, Andreas Girgensohn, Surapong Lertsithichai, Wolf Polak, Frank Shipman
Pages: 213-214
doi>10.1145/1101149.1101182
Full text: PDFPDF

The MediaMetro application provides an interactive 3D visualization of multimedia document collections using a city metaphor. The directories are mapped to city layouts using algorithms similar to treemaps. Each multimedia document is represented by ...
expand
mCLOVER: mobile content-based leaf image retrieval system
Suckchul Kim, Yoonsik Tak, Yunyoung Nam, Eenjun Hwang
Pages: 215-216
doi>10.1145/1101149.1101183
Full text: PDFPDF

This demonstration presents a content-based leaf image retrieval system that supports wired/wireless access. For example, if we want to know about a plant that we encounter in a mountain or field, we might look it up in an illustrated book. But, it will ...
expand
Video2Cartoon: generating 3D cartoon from broadcast soccer video
Dawei Liang, Yang Liu, Qingming Huang, Guangyu Zhu, Shuqiang Jiang, Zhebin Zhang, Wen Gao
Pages: 217-218
doi>10.1145/1101149.1101184
Full text: PDFPDF

In this demonstration, a prototype system for generating 3D cartoon from broadcast soccer video is proposed. This system takes advantage of computer vision (CV) and computer graphics (CG) techniques to provide users new experience that can not be obtained ...
expand
Online face detection and user authentication
Caroline Mallauran, Jean-Luc Dugelay, Florent Perronnin, Christophe Garcia
Pages: 219-220
doi>10.1145/1101149.1101185
Full text: PDFPDF

The ability to verify automatically and with great accuracy the identity of a person has become crucial in everyday life. Biometrics is an emerging topic in the field of signal processing. Our research on biometrics aims at developing a complete framework ...
expand
Intention-based home video browsing
Tao Mei, Xian-Sheng Hua
Pages: 221-222
doi>10.1145/1101149.1101186
Full text: PDFPDF

This demonstration presents an efficient home video browsing system from a novel viewpoint -- capture intention. We extend our previous work to build up a comprehensive scheme to mine the capture intention, and based on this scheme, propose a ...
expand
Photo LOI: browsing multi-user photo collections
Rahul Nair, Nick Reid, Marc Davis
Pages: 223-224
doi>10.1145/1101149.1101187
Full text: PDFPDF

The number of digital photographs is growing beyond the abilities of individuals to easily manage and understand their own photo collections. Photo LOI (Level of Interest) is a technique that filters, aggregates, and visualizes photographs taken by multiple ...
expand
MediaMill: exploring news video archives based on learned semantics
Cees G. M. Snoek, Marcel Worring, Jan van Gemert, Jan-Mark Geusebroek, Dennis Koelma, Giang P. Nguyen, Ork de Rooij, Frank Seinstra
Pages: 225-226
doi>10.1145/1101149.1101188
Full text: PDFPDF

In this technical demonstration we showcase the MediaMill system. A search engine that facilitates access to news video archives at a semantic level. The core of the system is an unprecedented lexicon of 100 automatically detected semantic concepts. ...
expand
A repeated video clip identification system
Xianfeng Yang, Ping Xue, Qi Tian
Pages: 227-228
doi>10.1145/1101149.1101189
Full text: PDFPDF

Identifying short repeated video clips, such as news program logo, station logo, TV commercials, etc., from broadcasting video databases or streams is important for video content indexing, personalization as well as monitoring. In this demo system we ...
expand
SESSION: Best student papers
SensEye: a multi-tier camera sensor network
Purushottam Kulkarni, Deepak Ganesan, Prashant Shenoy, Qifeng Lu
Pages: 229-238
doi>10.1145/1101149.1101191
Full text: PDFPDF

This paper argues that a camera sensor network containing heterogeneous elements provides numerous benefits over traditional homogeneous sensor networks. We present the design and implementation of senseye---a multi-tier network of heterogeneous ...
expand
Physics-motivated features for distinguishing photographic images and computer graphics
Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Lexing Xie, Mao-Pei Tsui
Pages: 239-248
doi>10.1145/1101149.1101192
Full text: PDFPDF

The increasing photorealism for computer graphics has made computer graphics a convincing form of image forgery. Therefore, classifying photographic images and photorealistic computer graphics has become an important problem for image forgery detection. ...
expand
Semantic manifold learning for image retrieval
Yen-Yu Lin, Tyng-Luh Liu, Hwann-Tzong Chen
Pages: 249-258
doi>10.1145/1101149.1101193
Full text: PDFPDF

Learning the user's semantics for CBIR involves two different sources of information: the similarity relations entailed by the content-based features, and the relevance relations specified in the feedback. Given that, we propose an augmented relation ...
expand
DEMONSTRATION SESSION: Video demonstrations and visions
How speech/text alignment benefits web-based learning
Sheng-Wei Li, Hao-Tung Lin, Herng-Yow Chen
Pages: 259-260
doi>10.1145/1101149.1101195
Full text: PDFPDF

This demonstration presents an integrated web-based synchronized scenario for many-to-one cross-media correlations between speech (an EFL, English as Foreign Language, lecture with free-style lecturing behaviors) and the corresponding textual content. ...
expand
My digital photos: where and when?
Neil O'Hare, Cathal Gurrin, Hyowon Lee, Noel Murphy, Alan F. Smeaton, Gareth J.F. Jones
Pages: 261-262
doi>10.1145/1101149.1101196
Full text: PDFPDF

In recent years digital cameras have seen an enormous rise in popularity, leading to a huge increase in the quantity of digital photos being taken. This brings with it the challenge of organising these large collections. We preset work which organises ...
expand
Post-bit: embodied video contents on tiny stickies
Takashi Matsumoto, Tony Dunnigan, Maribeth Back
Pages: 263-264
doi>10.1145/1101149.1101197
Full text: PDFPDF

Post-Bit is a small e-paper device modeled after paper Post-Its®1. We explored and designed interfaces to handle multi-media contents with paper-like manipulations using this e-paper device. The functions of each Post-Bit combined the ...
expand
Natural video browsing
Cai-Zhi Zhu, Tao Mei, Xian-Sheng Hua
Pages: 265-266
doi>10.1145/1101149.1101198
Full text: PDFPDF

In this demonstration, we show a novel system, Video Booklet, which enables nature personal video browsing and searching. Firstly representative thumbnails of video segments are selected and reshaped by a set of pre-trained personalized shape templates, ...
expand
MMM2: mobile media metadata for media sharing
Marc Davis, John Canny, Nancy Van House, Nathan Good, Simon King, Rahul Nair, Carrie Burgener, Bruce Rinehart, Rachel Strickland, Guy Campbell, Scott Fisher, Nick Reid
Pages: 267-268
doi>10.1145/1101149.1101199
Full text: PDFPDF

As cameraphones become the dominant platform for consumer multimedia capture worldwide, multimedia researchers are faced both with the challenge of how to help users manage the billions of photographs they are collectively producing and the opportunity ...
expand
Media gallery TV: view and shop your photos on interactive digital television
Sabine Thieme, Ansgar Scherp, Melanie Albrecht, Susanne Boll
Pages: 269-270
doi>10.1145/1101149.1101200
Full text: PDFPDF

In this paper, we present the Media Gallery, a MHP-based interactive multimedia application on digital TV. This application allows customers to view and order their digital photos and to order physical prints and fun products from these digital ...
expand
POSTER SESSION: Poster 1: systems track
Content-adaptive transmission of reconstructed soccer goal events over low bandwidth networks
Qing Tang, Irena Koprinska, Jesse S. Jin
Pages: 271-274
doi>10.1145/1101149.1101202
Full text: PDFPDF

This paper presents a content-adaptive system for streaming reconstructed soccer goal events over networks with bandwidth limited to 1.5Mbps or below. The reconstruction module analyzes a soccer video to produce corresponding panoramic field model with ...
expand
A new selection method for H.264 based fine granular scalable video coding
Won-Hyuck Yoo, Jihun Cha, Won-Sik Jeong, Kyuheon Kim, Gwang Hoon Park
Pages: 275-278
doi>10.1145/1101149.1101203
Full text: PDFPDF

In this paper, we introduce a new selection method for H.264 based Fine Granular Scalable video coding. It selectively uses the temporal-prediction data inside the enhancement-layer only when those data can significantly reduce the temporal-redundancies, ...
expand
JADE: jabber-based authoring in distributed environments
Andrew Roczniak, Abdulmotaleb El Saddik
Pages: 279-282
doi>10.1145/1101149.1101204
Full text: PDFPDF

We present our initial results in developing a framework for collaborative multimedia authoring tools. This research is motivated by the lack of tools that take into account consumers' quality of experience. By mapping factors that have an impact ...
expand
Streaming with causality: a practical approach
Cezar Pleşca, Romulus Grigoraş, Philippe Quéinnec, Gérard Padiou
Pages: 283-286
doi>10.1145/1101149.1101205
Full text: PDFPDF

Highly interactive collaborative streaming applications express the need for causality. Solutions exist but we argue that more work needs to be done especially from a perceptual point of view. The key question is: given the current state of the Internet ...
expand
A peer-to-peer network for live media streaming using a push-pull approach
Meng Zhang, Jian-Guang Luo, Li Zhao, Shi-Qiang Yang
Pages: 287-290
doi>10.1145/1101149.1101206
Full text: PDFPDF

In this paper, we present an unstructured peer-to-peer network called GridMedia for live media streaming employing a push-pull approach. Each node in GridMedia randomly selects its neighbors in the overlay and uses push-pull method to fetch data from ...
expand
Power-aware bandwidth and stereo-image scalable audio decoding
Wendong Huang, Ye Wang, Samarjit Chakraborty
Pages: 291-294
doi>10.1145/1101149.1101207
Full text: PDFPDF

We propose a new workload-scalable audio decoding scheme that would enable users to control the tradeoff between playback quality and power consumption in battery-powered portable audio players. Our objective is to give users a control at the decoder ...
expand
TrustStream: a novel secure and scalable media streaming architecture
Hao Yin, Chuang Lin, Feng Qiu, Xuening Liu, Dapeng Wu
Pages: 295-298
doi>10.1145/1101149.1101208
Full text: PDFPDF

Streaming media over networks has gained renewed interest recently due to the emerging IP-TV and mobile TV. The success of commercial media streaming systems critically depends on two important capabilities, namely, 1) scalability in distributing media ...
expand
Using offline bitstream analysis for power-aware video decoding in portable devices
Yicheng Huang, Samarjit Chakraborty, Ye Wang
Pages: 299-302
doi>10.1145/1101149.1101209
Full text: PDFPDF

Dynamic voltage/frequency scheduling algorithms for multimedia applications have recently been a subject of intensive research. Many of these algorithms use control-theoretic feedback techniques to predict the future execution demand of an application ...
expand
Supporting multi-party voice-over-IP services with peer-to-peer stream processing
Xiaohui Gu, Zhen Wen, Philip S. Yu, Zon-Yin Shae
Pages: 303-306
doi>10.1145/1101149.1101210
Full text: PDFPDF

Multi-party voice-over-IP (MVoIP) services provide economical and natural group communication mechanisms for many emerging applications such as on-line gaming, distance collaboration, and tele-immersion. In this paper, we present a novel peer-to-peer ...
expand
rStream: resilient peer-to-peer streaming with rateless codes
Chuan Wu, Baochun Li
Pages: 307-310
doi>10.1145/1101149.1101211
Full text: PDFPDF

The inherent instability and unreliability of peer-to-peer networks introduce several fundamental engineering challenges to multimedia streaming over peer-to-peer networks. First, multimedia streaming sessions need to be resilient to the volatile network ...
expand
Impact of incentive mechanisms on quality of experience
Andrew Roczniak, Abdulmotaleb El Saddik
Pages: 311-314
doi>10.1145/1101149.1101212
Full text: PDFPDF

Since entities participating in P2P networks are usually autonomous and therefore free to decide on their level of participation, mechanisms to resolve conflicts between individual and collective rationality are needed. How can implementations of such ...
expand
POSTER SESSION: Poster 2: applications track
ClickRemoval: interactive pinpoint image object removal
Frank Nielsen, Richard Nock
Pages: 315-318
doi>10.1145/1101149.1101214
Full text: PDFPDF

In this paper, we explore the problem of deleting objects in still pictures. We present an interactive system based on an intuitive user-friendly interface for removing undesirable objects in digital pictures. To erase an object in an image, a user indicates ...
expand
Motion picture inpainting on aged films
Timothy K. Shih, Rong-Chi Chang, Yu-Ping Chen
Pages: 319-322
doi>10.1145/1101149.1101215
Full text: PDFPDF

Video inpainting uses spatial-temporal information to repair defects such as spikes and lines on aged films. We propose a series of new algorithms based on adjustable thresholds to repair different varieties of aged films. The main contribution is an ...
expand
Implementation of a mobile MPEG-21 peer
Shane Lauf, Ian Burnett
Pages: 323-326
doi>10.1145/1101149.1101216
Full text: PDFPDF

The MPEG-21 Multimedia Framework aims to realize interoperable access to content across heterogeneous networks and devices. Within the Framework, the concept of Digital Items is introduced as a structured digital representation for multimedia. To demonstrate ...
expand
Exploiting self-adaptive posture-based focus estimation for lecture video editing
Feng Wang, Chong-Wah Ngo, Ting-Chuen Pong
Pages: 327-330
doi>10.1145/1101149.1101217
Full text: PDFPDF

Head pose plays a special role in estimating a presenter's focuses and actions for lecture video editing. This paper presents an efficient and robust head pose estimation algorithm to cope with the new challenges arising in the content management of ...
expand
IMAGINATION: a robust image-based CAPTCHA generation system
Ritendra Datta, Jia Li, James Z. Wang
Pages: 331-334
doi>10.1145/1101149.1101218
Full text: PDFPDF

We propose IMAGINATION (IMAge Generation for INternet AuthenticaTION), a system for the generation of attack-resistant, user-friendly, image-based CAPTCHAs. In our system, we produce controlled distortions on randomly chosen images and present them to ...
expand
[hid] toolkit: a unified framework for instrument design
Hans-Christoph Steiner
Pages: 335-338
doi>10.1145/1101149.1101219
Full text: PDFPDF

The [hid] toolkit is a set of software objects for designing gestural instruments. All too frequently, computer performers are tied to the keyboard/mouse/monitor model, narrowly constraining the range of possible gestures. A multitude of off-the-shelf ...
expand
Hierarchical voting classification scheme for improving visual sign language recognition
Liang-Guo Zhang, Xilin Chen, Chunli Wang, Wen Gao
Pages: 339-342
doi>10.1145/1101149.1101220
Full text: PDFPDF

As one of the important research areas of multimodal interaction, sign language recognition (SLR) has attracted increasing interest. In SLR, especially on medium or large vocabulary, it is usually difficult or impractical to collect enough training ...
expand
Real time advertisement insertion in baseball video based on advertisement effect
Yiqun Li, Kong Wah Wan, Xin Yan, Changsheng Xu
Pages: 343-346
doi>10.1145/1101149.1101221
Full text: PDFPDF

In this paper, we propose a novel method to detect baseball video scene for commercial advertisement insertion. The method uses the criteria based on better advertisement effect to generate a set of rules. From these rules, proper timing (starting time ...
expand
Providing on-demand sports video to mobile devices
Qingshan Liu, Zhigang Hua, Cunxun Zang, Xiaofeng Tong, Hanqing Lu
Pages: 347-350
doi>10.1145/1101149.1101222
Full text: PDFPDF

This paper introduces a system for providing on-demand sports video to mobile devices, which has two main contributions. First, we construct an infrastructure for extracting and delivering the highlights instead of the whole sport videos to mobile clients, ...
expand
An adaptive edge detection based colorization algorithm and its applications
Yi-Chin Huang, Yi-Shin Tung, Jun-Cheng Chen, Sung-Wen Wang, Ja-Ling Wu
Pages: 351-354
doi>10.1145/1101149.1101223
Full text: PDFPDF

Colorization is a computer-assisted process for adding colors to grayscale images or movies. It can be viewed as a process for assigning a three-dimensional color vector (YUV or RGB) to each pixel of a grayscale image. In previous works, with some color ...
expand
Recognition of hands-free speech and hand pointing action for conversational TV
Yasuo Ariki, Tetsuya Takiguchi, Atsushi Sako
Pages: 355-358
doi>10.1145/1101149.1101224
Full text: PDFPDF

In this paper, we propose a structure and components of a conversational television set(TV) to which we can ask anything on the broadcasted contents and receive the interesting information from the TV. The conversational TV is composed of two types of ...
expand
A corpus-based singing voice synthesis system for mandarin Chinese
Cheng-Yuan Lin, Tzu-Ying Lin, J.-S. Roger Jang
Pages: 359-362
doi>10.1145/1101149.1101225
Full text: PDFPDF

In this paper, the design and implementation of a corpus-based singing voice synthesis (SVS) system for Mandarin Chinese was introduced. The design rules of three corpora for singing voice synthesis were proposed. After that, two distance functions were ...
expand
Dynamic shot suggestion filtering for home video based on user performance
Brett Adams, Svetha Venkatesh
Pages: 363-366
doi>10.1145/1101149.1101226
Full text: PDFPDF

This paper presents novel additions to our existing amateur media creation framework. The framework provides at-capture guidance to enable the home movie maker to realize their aesthetic and narrative goals and automation of post-production editing. ...
expand
Creating MAGIC: system for generating learning object metadata for instructional content
Ying Li, Chitra Dorai, Robert Farrell
Pages: 367-370
doi>10.1145/1101149.1101227
Full text: PDFPDF

This paper presents our latest work on building a system called MAGIC (Metadata Automated Generation for Instructional Content) that will automatically identify segments and generate critical metadata conforming with the SCORM (Sharable Content Object ...
expand
Cooking navi: assistant for daily cooking in kitchen
Reiko Hamada, Jun Okabe, Ichiro Ide, Shin'ichi Satoh, Shuichi Sakai, Hidehiko Tanaka
Pages: 371-374
doi>10.1145/1101149.1101228
Full text: PDFPDF

We are developing a cooking navigation system, which helps even a novice user to cook several recipes in parallel without failure, while improving an advanced user's skill further. To realize this, the system optimizes the cooking procedure considering ...
expand
Personal media sharing and authoring on the web
Xian-Sheng Hua, Shipeng Li
Pages: 375-378
doi>10.1145/1101149.1101229
Full text: PDFPDF

In this paper, we propose a novel system working on the Web for personal media sharing and authoring. Three primary technologies enable this end-to-end system, including scalable video coding, intelligent multimedia content analysis, and template-based ...
expand
Automatic generating detail-on-demand hypervideo using MPEG-7 and SMIL
Tina T. Zhou, Tom Gedeon, Jess S. Jin
Pages: 379-382
doi>10.1145/1101149.1101230
Full text: PDFPDF

Detail-on-demand hypervideo will provide a powerful mechanism to allow viewers to see additional information of video segments through hyperlinks. A large number of tools are devoted to the identification of selectable video objects and the synchronization ...
expand
Office blogger
Berna Erol, Jonathan J. Hull
Pages: 383-386
doi>10.1145/1101149.1101231
Full text: PDFPDF

The Office Blogger (OBlog) is an experimental prototype of a multimedia appliance that allows an office worker to easily record events, conversations, meetings, pictures and documents, and helps create blog entries from that data. OBlog employs a novel ...
expand
Haptic: the new biometrics-embedded media to recognizing and quantifying human patterns
Mauricio Orozco Trujillo, Ismail Shakra, Abdulmotaleb El Saddik
Pages: 387-390
doi>10.1145/1101149.1101232
Full text: PDFPDF

Authentication for the purposes of security has taken giant strides since the introduction of Biometrics to help identify people by their behavioral and physiological features. From organizations and corporations to educational institutes, electronic ...
expand
Attention region selection with information from professional digital camera
Song Liu, Liang-Tien Chia, Deepu Rajan
Pages: 391-394
doi>10.1145/1101149.1101233
Full text: PDFPDF

The attentive region extraction is a challenging issue for semantic interpretation of image and video content. The successful attentive region extraction greatly facilitates image classification, adaptation, compression and retrieval. Different from ...
expand
POSTER SESSION: Poster 3: content track
Automatic video annotation using ontologies extended with visual information
Marco Bertini, Alberto Del Bimbo, Carlo Torniai
Pages: 395-398
doi>10.1145/1101149.1101235
Full text: PDFPDF

Classifying video elements according to some pre-defined ontology of the video content domain is a typical way to perform video annotation. Ontologies are defined by establishing relationships between linguistic terms that specify domain concepts at ...
expand
Early versus late fusion in semantic video analysis
Cees G. M. Snoek, Marcel Worring, Arnold W. M. Smeulders
Pages: 399-402
doi>10.1145/1101149.1101236
Full text: PDFPDF

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
expand
Detecting group activities using rigidity of formation
Saad M. Khan, Mubarak Shah
Pages: 403-406
doi>10.1145/1101149.1101237
Full text: PDFPDF

Most work in human activity recognition is limited to relatively simple behaviors like sitting down, standing up or other dramatic posture changes. Very little has been achieved in detecting more complicated behaviors especially those characterized by ...
expand
A probabilistic template-based approach to discovering repetitive patterns in broadcast videos
Peng Wang, Zhi-Qiang Liu, Shi-Qiang Yang
Pages: 407-410
doi>10.1145/1101149.1101238
Full text: PDFPDF

There are usually repetitive sub-segments in broadcast videos, which may be associated with high-level concepts or events, e.g., news footage, repeated scores in basketball. Unsupervised mining techniques provide generic solutions to discovering such ...
expand
Learning with non-metric proximity matrices
Gang Wu, Edward Y. Chang, Zhihua Zhang
Pages: 411-414
doi>10.1145/1101149.1101239
Full text: PDFPDF

Many emerging applications formulate non-metric proximity matrices (non-positive semidefinite), and hence cannot fit into the framework of kernel machines. A popular approach to this problem is to transform the spectrum of the similarity matrix so as ...
expand
Semantic feedback for interactive image retrieval
Changbo Yang, Ming Dong, Farshad Fotouhi
Pages: 415-418
doi>10.1145/1101149.1101240
Full text: PDFPDF

In this paper we present a semantic image retrieval system with integrated feedback mechanism. In our system, we propose a novel feedback solution for semantic retrieval: semantic feedback, which allows our system to interact with users directly ...
expand
Image region entropy: a measure of "visualness" of web images associated with one concept
Keiji Yanai, Kobus Barnard
Pages: 419-422
doi>10.1145/1101149.1101241
Full text: PDFPDF

We propose a new method to measure "visualness" of concepts, that is, what extent concepts have visual characteristics. To know which concept has visually discriminative power is important for image annotation, especially automatic image annotation by ...
expand
To learn representativeness of video frames
Hong-Wen Kang, Xian-Sheng Hua
Pages: 423-426
doi>10.1145/1101149.1101242
Full text: PDFPDF

With the rapid explosion of video data, compact representation of videos is becoming more and more desirable for efficient browsing and communication, which leads to a number of research works on video summarization in recent years. Among these works, ...
expand
Affect-based indexing and retrieval of films
Ching Hau Chan, Gareth J. F. Jones
Pages: 427-430
doi>10.1145/1101149.1101243
Full text: PDFPDF

Digital multimedia systems are creating many new opportunities for rapid access to content archives. In order to explore these collections using search applications, the content must be annotated with significant features. An important and often overlooked ...
expand
Toward emergent representations for video
Ryan Shaw, Marc Davis
Pages: 431-434
doi>10.1145/1101149.1101244
Full text: PDFPDF

Advanced systems for finding, using, sharing, and remixing video require high-level representations of video content. A number of researchers have taken top-down, analytic approaches to the specification of representation structures for video. The resulting ...
expand
Region based image annotation through multiple-instance learning
Changbo Yang, Ming Dong, Farshad Fotouhi
Pages: 435-438
doi>10.1145/1101149.1101245
Full text: PDFPDF

In an annotated image database, keywords are usually associated with images instead of individual regions, which poses a major challenge for any region based image annotation algorithm. In this paper, we propose to learn the correspondence between image ...
expand
Spatio-temporal quality assessment for home videos
Tao Mei, Cai-Zhi Zhu, He-Qin Zhou, Xian-Sheng Hua
Pages: 439-442
doi>10.1145/1101149.1101246
Full text: PDFPDF

Compared with the video programs taken by professionals, home videos are always with low-quality content resulted from lack of professional capture skills. In this paper, we present a novel spatio-temporal quality assessment scheme in terms of low-level ...
expand
Light weight MP3 watermarking method for mobile terminals
Koichi Takagi, Shigeyuki Sakazawa
Pages: 443-446
doi>10.1145/1101149.1101247
Full text: PDFPDF

This paper proposes an MP3 watermarking method that is applicable to a mobile terminal with limited computational resources. Considering that the embedded information is copyright information and metadata, which should be extracted before playing back, ...
expand
GPU-assisted decoding of video samples represented in the YCoCg-R color space
Wesley De Neve, Dieter Van Rijsselbergen, Charles Hollemeersch, Jan De Cock, Stijn Notebaert, Rik Van de Walle
Pages: 447-450
doi>10.1145/1101149.1101248
Full text: PDFPDF

Although pixel shaders were designed for the creation of programmable rendering effects, they can also be used as generic processing units for vector data. In this paper, attention is paid to an implementation of the YCoCg-R to RGB color space transform, ...
expand
Learning an image-word embedding for image auto-annotation on the nonlinear latent space
Wei Liu, Xiaoou Tang
Pages: 451-454
doi>10.1145/1101149.1101249
Full text: PDFPDF

Latent Semantic Analysis (LSA) has shown encouraging performance for the problem of unsupervised image automatic annotation. LSA conducts annotation by keywords propagation on a linear Latent Space, which accounts for the underlying semantic structure ...
expand
Exciting event detection in broadcast soccer video with mid-level description and incremental learning
Qixiang Ye, Qingming Huang, Wen Gao, Shuqiang Jiang
Pages: 455-458
doi>10.1145/1101149.1101250
Full text: PDFPDF

In this paper, we propose a method for exciting event detection in broadcast soccer video with mid-level description and SVM-based incremental learning. In the method, video frames are firstly classified and grouped into views in terms of low-level playfield ...
expand
A method for retrieving music data with different bit rates using MPEG-4 TwinVQ audio compression
Michihiro Kobayakawa, Mamoru Hoshi, Kensuke Onishi
Pages: 459-462
doi>10.1145/1101149.1101251
Full text: PDFPDF

The present paper describes a method for indexing a piece of music using the TwinVQ (Transform-domain Weighted Interleave Vector Quantization) audio compression (MPEG-4 audio standard). First, we present a framework for indexing a piece of music based ...
expand
Sound source location cue coding system for compact representation of multi-channel audio
Inseon Jang, Jeongil Seo, Seungkwon Beack, Kyeongok Kang, Han-gil Moon
Pages: 463-466
doi>10.1145/1101149.1101252
Full text: PDFPDF

Binaural cue coding (BCC) has been introduced for compact representation of multi-channel audio. It exploits binaural cue parameters for capturing the spatial image of multi-channel audio. Recently, it has been standardized within MPEG as the name of ...
expand
Semantic knowledge extraction and annotation for web images
Zhigang Hua, Xiang-Jun Wang, Qingshan Liu, Hanqing Lu
Pages: 467-470
doi>10.1145/1101149.1101253
Full text: PDFPDF

Nowadays, images have become widely available on the World Wide Web (WWW). It's essential to develop effective ways for managing and retrieving such abundant images. Advantageously, compared to the traditional images where very little information is ...
expand
On indexing of 3D scenes using MPEG-7
Ioan Marius Bilasco, Jérôme Gensel, Marlène Villanova-Oliver, Hervé Martin
Pages: 471-474
doi>10.1145/1101149.1101254
Full text: PDFPDF

The evolving desktop computer capacities and the emergence of the X3D standard offer a new boost to 3D domain. Giving sense to 3D content becomes a major issue specially for reusing such a content extracted from existing 3D scenes. In this paper, we ...
expand
Natural language processing of lyrics
Jose P. G. Mahedero, Álvaro MartÍnez, Pedro Cano, Markus Koppenberger, Fabien Gouyon
Pages: 475-478
doi>10.1145/1101149.1101255
Full text: PDFPDF

We report experiments on the use of standard natural language processing (NLP) tools for the analysis of music lyrics. A significant amount of music audio has lyrics. Lyrics encode an important part of the semantics of a song, therefore their analysis ...
expand
Building a visual ontology for video retrieval
L. Hollink, M. Worring, A. Th. Schreiber
Pages: 479-482
doi>10.1145/1101149.1101256
Full text: PDFPDF

To ensure access to growing video collections, annotation is becoming more and more important using background knowledge in the form of ontologies or thesauri is a way to facilitate annotation in a broad domain. Current ontologies are not suitable for ...
expand
Towards context-aware face recognition
Marc Davis, Michael Smith, John Canny, Nathan Good, Simon King, Rajkumar Janakiraman
Pages: 483-486
doi>10.1145/1101149.1101257
Full text: PDFPDF

In this paper, we focus on the use of context-aware, collaborative filtering, machine-learning techniques that leverage automatically sensed and inferred contextual metadata together with computer vision analysis of image content to make accurate predictions ...
expand
A novel framework for SVM-based image retrieval on large databases
Lei Wang, Xuchun Li, Ping Xue, Kap Luk Chan
Pages: 487-490
doi>10.1145/1101149.1101258
Full text: PDFPDF

In this paper, a novel framework is proposed to deliver a fast, robust, and generally applicable SVM-based image retrieval for large databases. A quick test scheme is developed, and on-line kernel learning is employed to realize it after analyzing ...
expand
Automatic image orientation determination with natural image statistics
Siwei Lyu
Pages: 491-494
doi>10.1145/1101149.1101259
Full text: PDFPDF

In this paper, we propose a new method for automatically determining image orientations. This method is based on a set of natural image statistics collected from a multi-scale multi-orientation image decomposition (e.g., wavelets). From these statistics, ...
expand
Determining structure in continuously recorded videos
Yun Zhai, Mubarak Shah
Pages: 495-498
doi>10.1145/1101149.1101260
Full text: PDFPDF

In this paper, we present a scene detection framework on continuously recorded videos. Conventional temporal scene segmentation methods work for the videos composed of discrete shots, where shot boundaries are clearly defined. The proposed method detects ...
expand
Two-scale image retrieval with significant meta-information feedback
Jia Li
Pages: 499-502
doi>10.1145/1101149.1101261
Full text: PDFPDF

A two-scale image retrieval system is developed to provide efficient search in large-scale databases as well as flexibility for users to incorporate ubjective preferences during retrieval. A new clustering method is developed for images each characterized ...
expand
A multiview video transcoder
Baochun Bai, Janelle Harms
Pages: 503-506
doi>10.1145/1101149.1101262
Full text: PDFPDF

Video transcoding can convert a compressed video from one format to another format. In this paper, we propose a novel multiview video transcoder, which is used for bit-rate scaling of multiple compressed synchronized video streams. Different from the ...
expand
Emotion-based music recommendation by association discovery from film music
Fang-Fei Kuo, Meng-Fen Chiang, Man-Kwan Shan, Suh-Yin Lee
Pages: 507-510
doi>10.1145/1101149.1101263
Full text: PDFPDF

With the growth of digital music, the development of music recommendation is helpful for users. The existing recommendation approaches are based on the users' preference on music. However, sometimes, recommending music according to the emotion is needed. ...
expand
An improved QTCQ wavelet image coding method using DCT and coefficient reorganization
Li Chen, Jia Wang
Pages: 511-514
doi>10.1145/1101149.1101264
Full text: PDFPDF

An improved quadtree classification and TCQ (QTCQ) wavelet image compression method is proposed in this paper. The method applies small block DCT to coefficients in the high frequency subbands and reorders them before quantizing and coding. Experiments ...
expand
Automatic identification of digital video based on shot-level sequence matching
Jian Zhou, Xiao-Ping Zhang
Pages: 515-518
doi>10.1145/1101149.1101265
Full text: PDFPDF

To locate a video clip in large collections is very important for retrieval applications, especially for digital rights management. In this paper, we present a novel technique for automatic identification of digital video. This new algorithm is based ...
expand
Highlight ranking for sports video browsing
Xiaofeng Tong, Qingshan Liu, Yifan Zhang, Hanqing Lu
Pages: 519-522
doi>10.1145/1101149.1101266
Full text: PDFPDF

Sports video has been extensively studied for its wide viewer-ship and tremendous commercial potentials. Many studies focused on highlight extraction for summarizing a lengthy video. In this paper, we present an advanced highlight analysis system for ...
expand
SSF fingerprint for image authentication: an incidental distortion resistant scheme
Sheng Tang, Jin-Tao Li, Yong-Dong Zhang
Pages: 523-526
doi>10.1145/1101149.1101267
Full text: PDFPDF

We propose a novel method for image authentication which can distinguish incidental manipulations from malicious ones. The authentication fingerprint is based on the Hotelling's T-square statistic (HTS) via Principal Component Analysis (PCA) of block ...
expand
Validating cardiac echo diagnosis through video similarity
Tanveer Syeda-Mahmood, Dulce Ponceleon, Jing Yang
Pages: 527-530
doi>10.1145/1101149.1101268
Full text: PDFPDF

Video data is increasingly being used in medical diagnosis. Due to the quality of the video and the complexities of underlying motion captured, it is difficult for an in-experienced physician/radiologist to describe motion abnormalities in a crisp way, ...
expand
Tracking users' capture intention: a novel complementary view for home video content analysis
Tao Mei, Xian-Sheng Hua, He-Qin Zhou
Pages: 531-534
doi>10.1145/1101149.1101269
Full text: PDFPDF

In this paper, we present a novel view to home video content analysis, which aims at tracking the capture intention of camcorder users. Based on the study of intention mechanism in psychology, a set of domain-specific capture intention concepts ...
expand
Evaluation of subjective video quality of mobile devices
Satu Jumisko-Pyykkö, Jukka Häkkinen
Pages: 535-538
doi>10.1145/1101149.1101270
Full text: PDFPDF

Subjectively perceived video quality is a critical factor when adopting new mobile video applications. When video is used in mobile networks the most important requirements are related to low bitrates, framerates and the screen size of mobile device. ...
expand
A unified shot boundary detection framework based on graph partition model
Jinhui Yuan, Jianmin Li, Fuzong Lin, Bo Zhang
Pages: 539-542
doi>10.1145/1101149.1101271
Full text: PDFPDF

In this paper, we propose a unified shot boundary detection framework by extending the previous work of graph partition model with temporal constraints. To detect both the abrupt transitions (CUTs) and gradual transitions (GTs, excluding fade out/in) ...
expand
Part-based shape retrieval
Mirela Tanase, Remco C. Veltkamp
Pages: 543-546
doi>10.1145/1101149.1101272
Full text: PDFPDF

This paper introduces a measure for computing the dissimilarity between multiple polylines and a polygon based on the turning function, and describes a part-based retrieval system using that dissimilarity measure. This dissimilarity can be efficiently ...
expand
Co-active intelligence for image retrieval
Mark Truran, James Goulding, Helen Ashman
Pages: 547-550
doi>10.1145/1101149.1101273
Full text: PDFPDF

Lexical ambiguity in query-based image retrieval is an immemorial problem which has seemingly resisted all countermeasures. In this paper we introduce a methodology that expresses the users of a system and their navigational behaviour as the paramount ...
expand
POSTER SESSION: Art poster session
Face to face: a media-art using a face detection system and its exhibition
Yasuto Nakanishi
Pages: 551-554
doi>10.1145/1101149.1101275
Full text: PDFPDF

"Face to face" is a media-art that only takes pictures of a profile or a blurring face, etc. those might be thought as failure pictures generally. Its theme is sameness and difference between camera and mirror, and it aims to offer an experience that ...
expand
The dancing genome project: generation of a human-computer choreography using a genetic algorithm
François-Joseph Lapointe, Martine Époque
Pages: 555-558
doi>10.1145/1101149.1101276
Full text: PDFPDF

In this paper, we present an interactive genetic algorithm for the generation of human-computer choreography, using motion capture technology. First, we introduce the four steps of the algorithm to (1) define a movement vocabulary, (2) initialize movement ...
expand
The "control of fear": an interactive art experiencing and presenting system with multimodal sensors and media
Chin Chih Yang, Lipin Liu, Jacy Chen
Pages: 559-562
doi>10.1145/1101149.1101277
Full text: PDFPDF

The "Control of Fear" project is an interactive art exhibition project to provide the general public an opportunity to experience what it might be occurred to them if their lives were suddenly altered by an unforeseen and unpredictable catastrophic event.The ...
expand
MusicStory: a personalized music video creator
David A. Shamma, Bryan Pardo, Kristian J. Hammond
Pages: 563-566
doi>10.1145/1101149.1101278
Full text: PDFPDF

In this paper, we describe MusicStory, a system that automatically creates videos to accompany music with lyrics. MusicStory uses common search engines, photo-sharing websites, and simple analysis of the dynamics and tempo of the music to create personalized ...
expand
Impossible geographies of belonging
Petra Gemeinboeck
Pages: 567-570
doi>10.1145/1101149.1101279
Full text: PDFPDF

The paper discusses the boundary between virtual and physical spaces as it is constituted and perforated in a series of installation works by the author. It focuses on the recently completed interactive installation, Impossible Geographies 01: Memory, ...
expand
The SINE WAVE ORCHESTRA stay
Kazuhiro Jo, Ken Furudate, Daisuke Ishida, Mizuki Noguchi
Pages: 571-573
doi>10.1145/1101149.1101280
Full text: PDFPDF

This is a report of creative and technical considerations in building a participatory sound performance The SINE WAVE ORCHESTRA stay. In this performance, the participants one by one leave their own sine wave in the performance space. These sine waves ...
expand
Seven mile boots: implications of an everyday interface
Martin Pichlmair
Pages: 574-577
doi>10.1145/1101149.1101281
Full text: PDFPDF

With seven-league boots through the Internet - when you take a stroll through the physical world in this wireless LAN footwear, you might meet people who happen to be spending some time in a chat-room. Their virtual conversations are made audible as ...
expand
A new approach to interactive performance systems
Hüseyin Kuşcu, B. Tevfik Akgün
Pages: 578-581
doi>10.1145/1101149.1101282
Full text: PDFPDF

In this paper, we describe the basic principles that has to be in an interactive performance system, and we present a new solution to realize the principles to be used in dance performance with a distributable and hybrid approach of time and states. ...
expand
Mulholland drive: a movie with no image
D. Scott Hessels
Pages: 582-585
doi>10.1145/1101149.1101283
Full text: PDFPDF

Three media artists, Martin Bonadeo, Michael Chu, and D. Scott Hessels, drove Los Angeles' famous Mulholland Drive with five types of sensors--measuring the car's tilt, direction, altitude, speed, and engine sound. The captured data of the mountain road ...
expand
An adaptation framework for new media artworks
Anis Ouali, Brigitte Kerhervé, Paul Landon
Pages: 586-589
doi>10.1145/1101149.1101284
Full text: PDFPDF

In this paper, we are interested in adaptation mechanisms for the design, creation and experimentation of adaptive and interactive new media artworks. Through a concrete case study, we propose an adaptation framework that combines semantic and physical ...
expand
Man in |e|space.mov / motion analysis in 3D space
Wolf Ka
Pages: 590-593
doi>10.1145/1101149.1101285
Full text: PDFPDF

The article documents the theoretical and aesthetical basis of the interactive dance performance "man in |e|space.mov". The text discusses the abstraction of the human body in this performance by an interactive costume of light whose motion is analyzed ...
expand
Organum: individual presence through collaborative play
Greg Niemeyer, Dan Perkel, Ryan Shaw, Jane McGonigal
Pages: 594-597
doi>10.1145/1101149.1101286
Full text: PDFPDF

Organum Playtest is an interactive installation in which three players collaboratively navigate through a model of the human voice box, using their voices as a joystick. By asking players to solve collaborative maze puzzles through cross-functional control, ...
expand
SESSION: Plenary papers
Learning the semantics of multimedia queries and concepts from a small number of examples
Apostol (Paul) Natsev, Milind R. Naphade, Jelena TešiĆ
Pages: 598-607
doi>10.1145/1101149.1101288
Full text: PDFPDF

In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for semantic concepts, also with a few examples. In our view these two tasks are ...
expand
An object-based video coding framework for video sequences obtained from static cameras
Asaad Hakeem, Khurram Shafique, Mubarak Shah
Pages: 608-617
doi>10.1145/1101149.1101289
Full text: PDFPDF

This paper presents a novel object-based video coding framework for videos obtained from a static camera. As opposed to most existing methods, the proposed method does not require explicit 2D or 3D models of objects and hence is general enough to cater ...
expand
SEVA: sensor-enhanced video annotation
Xiaotao Liu, Mark Corner, Prashant Shenoy
Pages: 618-627
doi>10.1145/1101149.1101290
Full text: PDFPDF

In this paper, we study how a sensor-rich world can be exploited by digital recording devices such as cameras and camcorders to improve a user's ability to search through a large repository of image and video files. We design and implement a digital ...
expand
SESSION: Content 3: audio and security
Unsupervised content discovery in composite audio
Rui Cai, Lie Lu, Alan Hanjalic
Pages: 628-637
doi>10.1145/1101149.1101292
Full text: PDFPDF

Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approaches to content extraction, which are hard to generalize and require suitable ...
expand
Multimodal content-based structure analysis of karaoke music
Yongwei Zhu, Kai Chen, Qibin Sun
Pages: 638-647
doi>10.1145/1101149.1101293
Full text: PDFPDF

This paper presents a novel approach for content-based analysis of karaoke music, which utilizes multimodal contents including synchronized lyrics text from the video channel and original singing audio as well as accompaniment audio in the two audio ...
expand
A unified framework for resolving ambiguity in copy detection
Sujoy Roy, Ee-Chien Chang, K. Natarajan
Pages: 648-655
doi>10.1145/1101149.1101294
Full text: PDFPDF

Copy detection is an important component of digital rights management and can be implemented using a retrieval-based approach. Under this approach, a query image, suspected to be a copy, is compared against all the images in the owner database. The comparison ...
expand
Accurate repeat finding and object skipping using fingerprints
Cormac Herley
Pages: 656-665
doi>10.1145/1101149.1101295
Full text: PDFPDF

This paper introduces a novel and very accurate segmentation algorithm. It is very efficient and consumes less than 10% of CPU on a simple desktop PC to segment a stream in real-time. It operates on an audio stream, or on the audio portion of a audio-visual ...
expand
PANEL SESSION: Panel
What is the state of our community?
Yong Rui, Ramesh Jain, Nicolas D. Georganas, HongJiang Zhang, Klara Nahrstedt, John Smith, Mohan Kankanhalli
Pages: 666-668
doi>10.1145/1101149.1101297
Full text: PDFPDF
SESSION: Brave new topics 2: affective multimodal human-computer interaction
Affective multimodal human-computer interaction
Maja Pantic, Nicu Sebe, Jeffrey F. Cohn, Thomas Huang
Pages: 669-676
doi>10.1145/1101149.1101299
Full text: PDFPDF

Social and emotional intelligence are aspects of human intelligence that have been argued to be better predictors than IQ for measuring aspects of success in life, especially in social interactions, learning, and adapting to what is important. When it ...
expand
Multimodal affect recognition in learning environments
Ashish Kapoor, Rosalind W. Picard
Pages: 677-682
doi>10.1145/1101149.1101300
Full text: PDFPDF

We propose a multi-sensor affect recognition system and evaluate it on the challenging task of classifying interest (or disinterest) in children trying to solve an educational puzzle on the computer. The multimodal sensory information from facial expressions ...
expand
Multimodal expressive embodied conversational agents
Catherine Pelachaud
Pages: 683-689
doi>10.1145/1101149.1101301
Full text: PDFPDF

In this paper we present our work toward the creation of a multimodal expressive Embodied Conversational Agent (ECA). Our agent, called Greta, exhibits nonverbal behaviors synchronized with speech. We are using the taxonomy of communicative functions ...
expand
Socially aware media
Alex (Sandy) Pentland
Pages: 690-695
doi>10.1145/1101149.1101302
Full text: PDFPDF

Face-to-face communication conveys social context as well as words, and it is this social signaling that allows new information to be smoothly integrated into a shared, group-wide understanding. By building machines that understand social signaling and ...
expand
SESSION: Content 4: image analysis and retrieval
Coevolutionary feature synthesized EM algorithm for image retrieval
Rui Li, Bir Bhanu, Anlei Dong
Pages: 696-705
doi>10.1145/1101149.1101304
Full text: PDFPDF

As a commonly used unsupervised learning algorithm in Content-Based Image Retrieval (CBIR), Expectation-Maximization (EM) algorithm has several limitations, especially in high dimensional feature spaces where the data are limited and the ...
expand
Image annotations by combining multiple evidence & wordNet
Yohan Jin, Latifur Khan, Lei Wang, Mamoun Awad
Pages: 706-715
doi>10.1145/1101149.1101305
Full text: PDFPDF

The development of technology generates huge amounts of non-textual information, such as images. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with ...
expand
Robust subspace analysis for detecting visual attention regions in images
Yiqun Hu, Deepu Rajan, Liang-Tien Chia
Pages: 716-724
doi>10.1145/1101149.1101306
Full text: PDFPDF

Detecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The ...
expand
Formulating context-dependent similarity functions
Gang Wu, Edward Y. Chang, Navneet Panda
Pages: 725-734
doi>10.1145/1101149.1101307
Full text: PDFPDF

Tasks of information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data-, and user-dependent) way. In this ...
expand
SESSION: Applications 2: automated multimedia authoring
Automatic generation of personalized music sports video
Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, Qi Tian
Pages: 735-744
doi>10.1145/1101149.1101309
Full text: PDFPDF

In this paper, we propose a novel automatic approach for personalized music sports video generation. Two research challenges, semantic sports video content selection and automatic video composition, are addressed. For the first challenge, we propose ...
expand
Automated rich presentation of a semantic topic
Lie Lu, Zhiwei Li
Pages: 745-753
doi>10.1145/1101149.1101310
Full text: PDFPDF

To have a rich presentation of a topic, it is not only expected that many relevant multimodal information, including images, text, audio and video, could be extracted; it is also important to organize and summarize the related information, and provide ...
expand
SESSION: Interactive arts 2: performance, play, and appreciation
Situated event bootstrapping and capture guidance for automated home movie authoring
Brett Adams, Svetha Venkatesh
Pages: 754-763
doi>10.1145/1101149.1101312
Full text: PDFPDF

This paper describes a novel interactive media authoring framework, MediaTE, that enables amateurs to create videos of higher narrative or aesthetic quality with a completely mobile lifecycle. A novel event bootstrapping dialog is used to derive shot ...
expand
An ambient intelligence platform for physical play
Ron Wakkary, Marek Hatala, Robb Lovell, Milena Droumeva
Pages: 764-773
doi>10.1145/1101149.1101313
Full text: PDFPDF

This paper describes an ambient intelligent prototype known as socio-ec(h)o. socio-ec(h)o explores the design and implementation of a system for sensing and display, user modeling, and interaction models based on a game structure. The game structure ...
expand
Generating dance verbs and assisting computer choreography
Chi-Min Hsieh, Annie Luciani
Pages: 774-782
doi>10.1145/1101149.1101314
Full text: PDFPDF

As quoted in the philosophy of contemporary dance: <<Understanding the directions for a Free Dance performer stems mainly from the qualities and energy of the movement rather from spatial criteria>>, a lot of emphasis is put currently on ...
expand
POEtic-cubes: acquisition of new qualia through apperception using a bio-inspired electronic tissue
Raquel Paricio, J. Manuel Moreno
Pages: 783-789
doi>10.1145/1101149.1101315
Full text: PDFPDF

In this paper we shall present the research process towards an artistic installation, called POEtic-Cubes, that is constituted by nine autonomous robots monitored by the POEtic electronic tissue (bio-inspired hardware with adaptive features). The main ...
expand
DEMONSTRATION SESSION: Technical demonstration 2: media authoring and processing
MMM2: mobile media metadata for photo sharing
Shane Ahern, Simon King, Marc Davis
Pages: 790-791
doi>10.1145/1101149.1101317
Full text: PDFPDF

Though cameraphones are rapidly becoming the dominant platform for consumer digital photography, users still face difficulties in transferring, managing, and sharing photos captured with cameraphones. The Mobile Media Metadata 2 (MMM2) system removes ...
expand
LazyCut: content-aware template-based video authoring
Xian-Sheng Hua, Zengzhi Wang, Shipeng Li
Pages: 792-793
doi>10.1145/1101149.1101318
Full text: PDFPDF

Though there are many commercial video authoring tools available today, video authoring remains as a tedious and extremely time consuming task that often requires trained professional skills. To tackle this issue, this demonstration presents a novel ...
expand
Simulated virtual market place by using voiscape communication medium
Yasusi Kanada
Pages: 794-795
doi>10.1145/1101149.1101319
Full text: PDFPDF

We are developing a new voice communication medium called voiscape. Voiscape enables natural and seamless bi-directional voice communication by using sound to create a virtual sound room. In a sound room, people can feel others' direction ...
expand
Perceptual media compression for multiple viewers with feedback delay
Oleg Komogortsev, Javed Khan
Pages: 796-797
doi>10.1145/1101149.1101320
Full text: PDFPDF

Human eyes have limited perception capabilities; for example, only 2 degrees of our 140 degree vision field provide the highest quality of perception. Due to this fact the idea of perceptual focus emerged to allow a visual content to be changed in a ...
expand
MobiCon: integrated capture, annotation, and sharing of video clips with mobile phones
Janne Lahti, Utz Westermann, Marko Palola, Johannes Peltola, Elena Vildjiounaite
Pages: 798-799
doi>10.1145/1101149.1101321
Full text: PDFPDF

This paper presents MobiCon, a video production tool for mobile camera phones. MobiCon integrates video clip capture with context-aware, personalized clip annotation -- supporting automatic annotation suggestions based on context data and efficient manual ...
expand
Media processing workflow design and execution with ARIA
Lina Peng, Gisik Kwon, K. Selçuk Candan, Kyung Ryu, Karam Chatha, Hari Sundaram, Yinpeng Chen
Pages: 800-801
doi>10.1145/1101149.1101322
Full text: PDFPDF

Recently, we introduced a novel ARchitecture for Interactive Arts (ARIA) middleware that processes, filters, and fuses sensory inputs and actuates responses in real-time while providing various Quality of Service (QoS) guarantees. The objective of ARIA ...
expand
Context-driven smart authoring of multimedia content with xSMART
Ansgar Scherp, Susanne Boll
Pages: 802-803
doi>10.1145/1101149.1101323
Full text: PDFPDF

In recent years, many highly sophisticated multimedia authoring tools have been developed. Up to today, these system's integration of the targeted user context, however, is limited. With our Context-aware Smart Multimedia Authoring Tool (xSMART) ...
expand
Video inpainting and restoration techniques
Rong-Chi Chang, Louis H. Lin, Chia-Ton Tian, Timothy K. Shih
Pages: 804-805
doi>10.1145/1101149.1101324
Full text: PDFPDF

Aged films may contain defects such as spikes or dirt, as well as long vertical defect lines. These defects were produced in file development or due to improper maintenance of films. We present a series of algorithms, which can detect and restore defects. ...
expand
Reading SCORM compliant multimedia courses using heterogeneous pervasive devices
Te-Hua Wang, Hsuan-Pu Chang, Yun-Long Sie, Kun-Han Chan, Mon-Tin Tzou, Timothy K. Shih
Pages: 806-807
doi>10.1145/1101149.1101325
Full text: PDFPDF

The Sharable Content Object Reference Model (SCORM) provides some important representation for distance learning content and the learning behavior. In general, SCORM-Compliant learning content can be viewed via the Web browsers. In this paper, we built ...
expand
An automated end-to-end lecture capturing and broadcasting system
Cha Zhang, Jim Crawford, Yong Rui, Li-wei He
Pages: 808-809
doi>10.1145/1101149.1101326
Full text: PDFPDF

We present a complete end-to-end system that is fully automated and supports capturing, broadcasting, viewing, archiving and search. Specifically, we describe a system architecture that minimizes the pre- and post-production time, and a fully automated ...
expand
SESSION: Content 5: video abstraction
Scenario based dynamic video abstractions using graph matching
JeongKyu Lee, JungHwan Oh, Sae Hwang
Pages: 810-819
doi>10.1145/1101149.1101328
Full text: PDFPDF

In this paper, we present scenario based dynamic video abstractions using graph matching. Our approach has two main components: multi-level scenario generations and dynamic video abstractions. Multi-level scenarios are generated by a graph-based video ...
expand
Evaluation of video summarization for a large number of cameras in ubiquitous home
Gamhewage C. de Silva, Toshihiko Yamasaki, Kiyoharu Aizawa
Pages: 820-828
doi>10.1145/1101149.1101329
Full text: PDFPDF

A system for video summarization in a ubiquitous environment is presented. Data from pressure-based floor sensors are clustered to segment footsteps of different persons. Video handover has been implemented to retrieve a continuous video showing a person ...
expand
SESSION: Systems 2: mobility and video
Can small be beautiful?: assessing image resolution requirements for mobile TV
Hendrik Knoche, John D. McCarthy, M. Angela Sasse
Pages: 829-838
doi>10.1145/1101149.1101331
Full text: PDFPDF

Mobile TV services are now being offered in several countries, but for cost reasons, most of these services offer material directly recoded for mobile consumption (i.e. without additional editing). The experiment reported in this paper, aims to assess ...
expand
Chameleon: application level power management with performance isolation
Xiaotao Liu, Prashant Shenoy, Mark Corner
Pages: 839-848
doi>10.1145/1101149.1101332
Full text: PDFPDF

In this paper, we present Chameleon---an application-level power management approach for reducing energy consumption in mobile processors. Our approach exports the entire responsibility of power management decisions to the application level. We propose ...
expand
SESSION: Open source software competition
OpenVIDIA: parallel GPU computer vision
James Fung, Steve Mann
Pages: 849-852
doi>10.1145/1101149.1101334
Full text: PDFPDF

Graphics and vision are approximate inverses of each other: ordinarily Graphics Processing Units (GPUs) are used to convert "numbers into pictures" (i.e. computer graphics). In this paper, we propose using GPUs in approximately the reverse way: to assist ...
expand
SESSION: Content 6: multimodal processing
Generation of views of TV content using TV viewers' perspectives expressed in live chats on the web
Hisashi Miyamori, Satoshi Nakamura, Katsumi Tanaka
Pages: 853-861
doi>10.1145/1101149.1101336
Full text: PDFPDF

We propose a method of generating views of TV programs based on viewer's perspectives expressed in live chats on the Web. Important scenes in a program and responses by particular viewers can be extracted efficiently by statistically computing and/or ...
expand
Graph based multi-modality learning
Hanghang Tong, Jingrui He, Mingjing Li, Changshui Zhang, Wei-Ying Ma
Pages: 862-871
doi>10.1145/1101149.1101337
Full text: PDFPDF

To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent ...
expand
Multimodal metadata fusion using causal strength
Yi Wu, Edward Y. Chang, Belle L. Tseng
Pages: 872-881
doi>10.1145/1101149.1101338
Full text: PDFPDF

We propose a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), ...
expand
Automatic discovery of query-class-dependent models for multimodal search
Lyndon S. Kennedy, Apostol (Paul) Natsev, Shih-Fu Chang
Pages: 882-891
doi>10.1145/1101149.1101339
Full text: PDFPDF

We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. The framework automatically discovers useful query classes by clustering queries in a training set according to the performance ...
expand
SESSION: Applications 3: tools for multimedia analysis and retrieval
A web-based system for collaborative annotation of large image and video collections: an evaluation and user study
Timo Volkmer, John R. Smith, Apostol (Paul) Natsev
Pages: 892-901
doi>10.1145/1101149.1101341
Full text: PDFPDF

Annotated collections of images and videos are a necessary basis for the successful development of multimedia retrieval systems. The underlying models of such systems rely heavily on quality and availability of large training collections. The annotation ...
expand
Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers
Ming-yu Chen, Michael Christel, Alexander Hauptmann, Howard Wactlar
Pages: 902-911
doi>10.1145/1101149.1101342
Full text: PDFPDF

The authors developed an extensible system for video exploitation that puts the user in control to better accommodate novel situations and source material. Visually dense displays of thumbnail imagery in storyboard views are used for shot-based video ...
expand
Automatic measurement of quality metrics for colonoscopy videos
Sae Hwang, JungHwan Oh, JeongKyu Lee, Yu Cao, Wallapak Tavanapong, Danyu Liu, Johnny Wong, Piet C. de Groen
Pages: 912-921
doi>10.1145/1101149.1101343
Full text: PDFPDF

Colonoscopy is the accepted screening method for detection of colorectal cancer or its precursor lesions, colorectal polyps. Indeed, colonoscopy has contributed to a decline in the number of colorectal cancer related deaths. However, not all cancers ...
expand
SESSION: Interactive arts 3: interaction in social and virtual environments
Censor chair: exploring censorship and social presence through psychophysiological sensing
Eric Aley, Trina Cooper, Ross Graeber, Andruid Kerne, Kyle Overby, Zachary O. Toups
Pages: 922-929
doi>10.1145/1101149.1101345
Full text: PDFPDF

In this paper, we describe Censor Chair, an art installation that creates a shared experience addressing forms of censorship including self-censorship, censorship of a group upon an individual, visual and auditory censorship in digital media, ...
expand
Tensegric mobile controlled by pseudo forces
Kazuya G. Kobayashi, Taro Ichizawa, Koichi Nakano, Katsutoshi Ootsubo
Pages: 930-936
doi>10.1145/1101149.1101346
Full text: PDFPDF

A tensegric mobile in virtual 3D space is introduced. An input model is a triangular mesh B-rep designed by an artist, which is allowed to have an arbitrary topology. The tensegric structure is automatically generated from a mesh model as a deforming ...
expand
Echology: an interactive spatial sound and video artwork
Meghan Deutscher, Reynald Hoskinson, Sachiyo Takashashi, Sidney Fels
Pages: 937-945
doi>10.1145/1101149.1101347
Full text: PDFPDF

We present a novel way of manipulating a spatial soundscape, one that encourages collaboration and exploration. Through a table-top display surrounded by speakers and lights, participants are invited to engage in peaceful play with Beluga whales shown ...
expand
SESSION: Systems 3: searching and streaming
PRISM: indexing multi-dimensional data in P2P networks using reference vectors
O. D. Sahin, A. Gulbeden, F. Emekci, D. Agrawal, A. El Abbadi
Pages: 946-955
doi>10.1145/1101149.1101349
Full text: PDFPDF

Peer-to-peer (P2P) systems research has gained considerable attention recently with the increasing popularity of file sharing applications. Since these applications are used for sharing huge amounts of data, it is very important to efficiently locate ...
expand
Supporting multimedia streaming between mobile peers with link availability prediction
Min Qin, Roger Zimmermann, Leslie S. Liu
Pages: 956-965
doi>10.1145/1101149.1101350
Full text: PDFPDF

Numerous types of mobile devices are now popular with end users, who increasingly use them to carry multimedia content on the go. As wireless connectivity is integrated with more handhelds, streaming multimedia content among mobile peers is becoming ...
expand
Scalable media streaming to interactive users
Marcus Rocha, Marcelo Maia, Ítalo Cunha, Jussara Almeida, Sérgio Campos
Pages: 966-975
doi>10.1145/1101149.1101351
Full text: PDFPDF

Recently, a number of scalable stream sharing protocols have been proposed with the promise of great reductions in the server and network bandwidth required for delivering popular media content. Although the scalability of these protocols has been evaluated ...
expand
SESSION: Applications 4: interactive multimedia systems
Digital violin tutor: an integrated system for beginning violin learners
Jun Yin, Ye Wang, David Hsu
Pages: 976-985
doi>10.1145/1101149.1101353
Full text: PDFPDF

Prompt feedback is essential for beginning violin learners; however, most amateur learners can only meet with teachers and receive feedback once or twice a week. To help such learners, we have attempted an initial design of Digital Violin Tutor (DVT), ...
expand
Pervasive views: area exploration and guidance using extended image media
Jiang Yu Zheng, Xiaolong Wang
Pages: 986-995
doi>10.1145/1101149.1101354
Full text: PDFPDF

This work achieves full registration of scenes in a large area and creates visual indexes for visualization in a digital city. We explore effective mapping, indexing, and display of scenes so that an area becomes "visible". Users can virtual navigate ...
expand
A flexible system for creating music while interacting with the computer
Zeljko Obrenovic
Pages: 996-1004
doi>10.1145/1101149.1101355
Full text: PDFPDF

Music is a very important part of our lives. People enjoy listening to the music, and many of us find a special pleasure in creating the music. Computers further extended many aspects of our musical experience. Listening to, recording, and creating music ...
expand
SESSION: Brave new topics 3: advanced methods for medical image retrieval & applications
Data grid for large-scale medical image archive and analysis
H. K. Huang, Aifeng Zhang, Brent Liu, Zheng Zhou, Jorge Documet, Nelson King, L. W. C. Chan
Pages: 1005-1013
doi>10.1145/1101149.1101357
Full text: PDFPDF

Storage and retrieval technology for large-scale medical image systems has matured significantly during the past ten years but many implementations still lack cost-effective backup and recovery solutions. As an example, a PACS (Picture Archiving and ...
expand
Evaluation axes for medical image retrieval systems: the imageCLEF experience
Henning Müller, Paul Clough, William Hersh, Thomas Deselaers, Thomas Lehmann, Antoine Geissbuhler
Pages: 1014-1022
doi>10.1145/1101149.1101358
Full text: PDFPDF

Content--based image retrieval in the medical domain is an extremely hot topic in medical imaging as it promises to help better managing the large amount of medical images being produced. Applications are mainly expected in the field of medical teaching ...
expand
MultiPRE: a novel framework with multiple parallel retrieval engines for content-based image retrieval
Wei Xiong, Bo Qiu, Qi Tian, Changsheng Xu, S. H. Ong, Kelvin Foong, Jean-Pierre Chevallet
Pages: 1023-1032
doi>10.1145/1101149.1101359
Full text: PDFPDF

We propose a novel framework for content-based image retrieval with multiple parallel retrieval engines (MultiPRE) to achieve higher retrieval performance. Visual features, including both low-level features, such as color, texture and region features, ...
expand
SESSION: Doctoral symposium 1
Ontology-driven content search for personalized education
Apple W. P. Fok
Pages: 1033-1034
doi>10.1145/1101149.1101361
Full text: PDFPDF

Striving towards our education vision, Personalized Education, a Personalized Education System (PES) framework has been proposed [3] to exploit the vast amount of multimedia learning content on the Web. PEOnto, a fundamental component of PE, composes ...
expand
Content-based video indexing for sports applications using integrated multi-modal approach
Dian Tjondronegoro, Yi-Ping Phoebe Chen, Binh Pham
Pages: 1035-1036
doi>10.1145/1101149.1101362
Full text: PDFPDF

To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. This doctoral consists ...
expand
Designing time-based interactions with multimedia
Eric Lee
Pages: 1037-1038
doi>10.1145/1101149.1101363
Full text: PDFPDF

The current model of time in multimedia frameworks poses particular problems when designing multimedia systems with time-based interaction. We propose to expand and extend an existing distinction between semantic time and real time from music and film ...
expand
Estimating illumination parameters in real space with application to image relighting
Feng Xie, Linmi Tao
Pages: 1039-1040
doi>10.1145/1101149.1101364
Full text: PDFPDF
Game state and event distribution using proxy technology and application layer multicast
Knut-Helge Vik
Pages: 1041-1042
doi>10.1145/1101149.1101365
Full text: PDFPDF
SESSION: Doctoral symposium 2
Multimodal analysis of recorded video for e-learning
Thomas Martin, Alain Boucher, Jean-Marc Ogier
Pages: 1043-1044
doi>10.1145/1101149.1101367
Full text: PDFPDF

In this paper, we present a model for multimodal content analysis. We distinguish between media and modality, which helps us to define and characterize three inter-modal relations. Then we apply this model for recorded course analysis for e-learning. ...
expand
Enhancing quality of service by exploiting delay tolerance in multimedia applications
Saraswathi Krithivasan, Sridhar Iyer
Pages: 1045-1046
doi>10.1145/1101149.1101368
Full text: PDFPDF
Threading stories and generating topic structures in news videos across different sources
Xiao Wu
Pages: 1047-1048
doi>10.1145/1101149.1101369
Full text: PDFPDF

News videos delivered from different sources constitute a huge volume of daily information. These videos, overall, form a huge collection of news stories that are intertwined with various novel and old topic themes. To date, it remains a challenging ...
expand
uPen: laser-based, personalized, multi-user interaction on large displays
Xiaojun Bi, Yuanchun Shi, Xiaojie Chen, PeiFeng Xiang
Pages: 1049-1050
doi>10.1145/1101149.1101370
Full text: PDFPDF

We present the uPen, a laser pointer combined with a contact-pushed switch, three press buttons and a wireless communication module. This novel interaction device allows users to interact on large displays at a distance or directly on the surface with ...
expand
SESSION: ACM multimedia art exhibition
ACM multimedia interactive art program: an introduction to the presence/absence exhibition
Alejandro Jaimes, Andrew Senior, Wolfgang Muench
Pages: 1051-1052
doi>10.1145/1101149.1101372
Full text: PDFPDF

The second ACM Multimedia Art program followed the successful formula used in ACM MM 2005, consisting of a session of long papers, a selection of posters and an art exhibition of multimedia works displayed at a gallery for a period encompassing the conference ...
expand
The bomar gene: fictiobiography, digiart, hypertext
Jason Nelson
Pages: 1053-1054
doi>10.1145/1101149.1101373
Full text: PDFPDF

The Bomar Gene [6] is a new media, digital fiction hybrid that explores the speculative concept that within us, the codes governing our bodies, is a single unique gene. This speculative gene gives each person an individualized ability, a singular talent. ...
expand
Non_sensor
Raquel Rennó, Rafael Marchetti, Gonzague Defos du Rau
Pages: 1055-1056
doi>10.1145/1101149.1101374
Full text: PDFPDF

The paper presents Non_sensor, a digital art project that makes use of a Polhemus motion tracking system to create a electromagnetic field which is disturbed by metallic objects that are manipulated by the visitors in the installation. This disturbance ...
expand
Playas: homeland mirage
Jack Stenner, Andruid Kerne, Yauger Williams
Pages: 1057-1058
doi>10.1145/1101149.1101375
Full text: PDFPDF

This paper describes an interactive installation that addresses issues of presence and absence by creating a virtualized representation of the abandoned town, Playas, New Mexico. This town is slated for conversion into an anti-terrorism training facility ...
expand
'Ere be dragons: an interactive artwork
Stephen Boyd Davis, Magnus Moar, John Cox, Chris Riddoch, Karl Cooke, Rachel Jacobs, Matt Watkins, Richard Hull, Tom Melamed
Pages: 1059-1060
doi>10.1145/1101149.1101376
Full text: PDFPDF

The paper introduces a pervasive digital artwork which harnesses live heart-rate and GPS data to create a novel experience on a Pocket PC. The aims of the project, the technologies employed and the results of a preliminary trial are briefly described.
expand
Tastes like...
Miha Ciglar
Pages: 1061-1062
doi>10.1145/1101149.1101377
Full text: PDFPDF

"Tastes Like..." (a composition for two monitors, mixing board and human body) is an interactive audiovisual work implemented without computers and common sound/picture - synthesis/processing techniques but exclusively with low-tech analogue equipment. ...
expand
Immersing ME: the disappearing digitized presence
Yu-Chuan Tseng, Chia-Hsiang Lee
Pages: 1063-1064
doi>10.1145/1101149.1101378
Full text: PDFPDF

Artists try to represent existence and indicate the meaning of presence by a variety of practices. However, when a figure is created by bits and composes hypereal information, the existence information has become fragments of fake existence. Presence ...
expand
Art exhibition: impossible geographies 01
Petra Gemeinboeck, Mary Agnes Krell
Pages: 1065-1066
doi>10.1145/1101149.1101379
Full text: PDFPDF

Impossible Geographies 01: Memory is an interactive installation in which memory becomes the metaphor for the fluid boundaries between the physical and the virtual. It dynamically traces visitor's actions and mixes them in unexpected ways with ...
expand
Vanishing point
Mauricio Arango
Pages: 1067-1068
doi>10.1145/1101149.1101380
Full text: PDFPDF

Vanishing Point is a presentation of the world as it responds to international newspaper coverage - not a measure of what the world is, but of what is most newsworthy. Consequently, countries that receive less media coverage gradually disappear ...
expand
Interactions: an interactive multimedia installation
David Birchfield
Pages: 1069-1070
doi>10.1145/1101149.1101381
Full text: PDFPDF

Interactions is an interactive multimedia installation designed and realized by the author. The installation utilizes two neural network artist agents that act as virtual artists to manipulate a body of images, texts, and sounds collected from ...
expand
Body degree zero
Alan Dunning, Paul Woodrow, Morley Hollenberg
Pages: 1071-1072
doi>10.1145/1101149.1101382
Full text: PDFPDF

The Einstein's Brain Project is a collaborative group of artists and scientists who have been working together for the past 9 years. A central aim of the group is the visualization of the biological state of the body through the fabrication of environments, ...
expand
SmallConnection: designing of tangible communication media over networks
Hideaki Ogawa, Noriaki Ando, Satoshi Onodera
Pages: 1073-1074
doi>10.1145/1101149.1101383
Full text: PDFPDF

The concept of "SmallConnection (abbr. SC)" is creating easy to operate tangible media for communication over networks. Focusing on the scenario where two intimate people live in distant places, we developed communication media that can be handled like ...
expand
The king has...
Krister Olsson, Takashi Kawashima
Pages: 1075-1076
doi>10.1145/1101149.1101384
Full text: PDFPDF

The installation The King Has... grew out of a desire to explore the ways in which people react to knowing the secrets of others, and if anonymity were guaranteed, the kinds of secrets people would choose to make public.Secrets were gathered from ...
expand
diorama table
Keiko Takahashi, Shinji Sasada
Pages: 1077-1078
doi>10.1145/1101149.1101385
Full text: PDFPDF

"diorama table " is an interactive table installation. People place physical objects on the table and projected elements such as trains, cars, houses, and trees appear and are interacted with physical objects.
expand
"KODAMA": mischievous echoes
Hisako Kroiden Yamakawa
Pages: 1079-1080
doi>10.1145/1101149.1101386
Full text: PDFPDF

I created "KODAMA" to demonstrate my sensation of solidified human voices in conversation."KODAMA" is an interactive installation. The "KODAMA" are tree fairies that live in the forest who listen to human voices and mimic their sounds. They are visually ...
expand
Seven mile boots
Martin Pichlmair
Pages: 1081-1081
doi>10.1145/1101149.1101387
Full text: PDFPDF

With seven-league boots through the Internet - when you take a stroll through the physical world in this wireless LAN footwear, you just might meet people who happen to be spending some time in a chat-room. Their virtual conversations are made audible ...
expand
Tangible weather channel
Yu-Cheng Hsu
Pages: 1082-1083
doi>10.1145/1101149.1101388
Full text: PDFPDF

Tangible Weather Channel is an interactive sculptural apparatus that enables the participant to type in the remote location of a loved one and interprets its real-time weather information as a way of creating an emotional connection. Rather than ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder