Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

In this demo, we show a novel end-to-end video automatic labeling system, which accepts MPEG-1 sequence inputs and generates MPEG-7 XML metadata files. Detections are based on the prior established anchor models. This system has two parts: model training process and labeling process. They are comprised of seven modules: Shot Segmentation, Region Segmentation, Annotation, Feature Extraction, Model Learning, Classification, and XML Rendering.
Advertisements



top of pageAUTHORS



Author image not provided  Ching-Yung Lin

No contact information provided yet.

Bibliometrics: publication history
Publication years2001-2016
Publication count47
Citation Count480
Available for download27
Downloads (6 Weeks)94
Downloads (12 Months)888
Downloads (cumulative)15,999
Average downloads per article592.56
Average citations per article10.21
View colleagues of Ching-Yung Lin


Author image not provided  Belle L. Tseng

No contact information provided yet.

Bibliometrics: publication history
Publication years1978-2012
Publication count58
Citation Count1,210
Available for download39
Downloads (6 Weeks)215
Downloads (12 Months)3,145
Downloads (cumulative)54,156
Average downloads per article1,388.62
Average citations per article20.86
View colleagues of Belle L. Tseng


Author image not provided  Milind Naphade

No contact information provided yet.

Bibliometrics: publication history
Publication years2000-2014
Publication count33
Citation Count602
Available for download16
Downloads (6 Weeks)19
Downloads (12 Months)480
Downloads (cumulative)8,247
Average downloads per article515.44
Average citations per article18.24
View colleagues of Milind Naphade


Author image not provided  Apostol Natsev

No contact information provided yet.

Bibliometrics: publication history
Publication years1999-2013
Publication count33
Citation Count517
Available for download20
Downloads (6 Weeks)24
Downloads (12 Months)271
Downloads (cumulative)12,468
Average downloads per article623.40
Average citations per article15.67
View colleagues of Apostol Natsev


Author image not provided  John R. Smith

No contact information provided yet.

Bibliometrics: publication history
Publication years1994-2015
Publication count98
Citation Count1,889
Available for download37
Downloads (6 Weeks)68
Downloads (12 Months)995
Downloads (cumulative)25,315
Average downloads per article684.19
Average citations per article19.28
View colleagues of John R. Smith

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Amir et. al., "CueVideo Toolkit Version 2.1," http://www.almaden.ibm.com/cs/cuevideo/
 
2
C.-Y. Lin, B. Tseng and J. Smith, "VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning," Proc. of ICME, Baltimore, Jul. 2003.
 
3
C.-Y. Lin, B. Tseng, M. Naphade, A. Natsev and J. Smith, "VideoAL: A Novel End-to-End MPEG-7 Automatic Labeling System," Proc. of ICIP, Barcelona, Sept. 2003.
 
4
A. F. Smeaton and P. Over, "The TREC-2002 Video Track Report," Proc. of Text Retrieval Conference, Gaithersburg, Maryland, Nov. 2002.

top of pageCITED BY

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia table of contents
General Chairs Lawrence Rowe University of California, Berkeley
Harrick Vin University of Texas, Austin
Program Chairs Thomas Plagemann University of Oslo
Prashant Shenoy University of Massachusetts, Amherst
John R. Smith IBM T.J. Watson Research Center
Pages 98-99
Publication Date2003-11-02 (yyyy-mm-dd)
Sponsors SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGCOMM ACM Special Interest Group on Data Communication
SIGMULTIMEDIA ACM Special Interest Group on Multimedia
ACM Association for Computing Machinery
PublisherACM New York, NY, USA ©2003
ISBN: 1-58113-722-2 Order Number: 433031 doi>10.1145/957013.957033
Conference MMInternational Multimedia Conference MM logo
Paper Acceptance Rate 43 of 255 submissions, 17%
Overall Acceptance Rate 1,375 of 5,525 submissions, 25%
Year Submitted Accepted Rate
MULTIMEDIA '97 142 40 28%
MULTIMEDIA '02 330 46 14%
MULTIMEDIA '03 255 43 17%
MULTIMEDIA '04 331 55 17%
MULTIMEDIA '05 312 49 16%
MULTIMEDIA '06 292 48 16%
MULTIMEDIA '07 298 57 19%
MM '08 516 136 26%
MM '09 305 50 16%
MM '10 974 396 41%
MM '11 666 230 35%
MM '12 331 67 20%
MM '13 235 47 20%
MM '14 286 55 19%
MM '15 252 56 22%
Overall 5,525 1,375 25%

APPEARS IN
Applications
Software
Interaction
Networking
Digital Content

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the eleventh ACM international conference on Multimedia
Table of Contents
Building "bows for violinists": designing real digital tools for working artists
Michael B. Johnson
Pages: 1-1
doi>10.1145/957013.957014
Full text: PDFPDF

When asked what we do at Pixar Studio Tools R&D, we often use a story Marvin Minsky related about picking up a violin to make music versus a CD player. One is designed for ease of use, while the other demands and rewards a lifetime of study and practice. ...
expand
SESSION: Content analysis
Foreground object detection from videos containing complex background
Liyuan Li, Weimin Huang, Irene Y. H. Gu, Qi Tian
Pages: 2-10
doi>10.1145/957013.957017
Full text: PDFPDF

This paper proposes a novel method for detection and segmentation of foreground objects from a video which contains both stationary and moving background objects and undergoes both gradual and sudden "once-off" changes. A Bayes decision rule for classification ...
expand
Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video
Xinguo Yu, Changsheng Xu, Hon Wai Leong, Qi Tian, Qing Tang, Kong Wah Wan
Pages: 11-20
doi>10.1145/957013.957018
Full text: PDFPDF

This paper first presents an improved trajectory-based algorithm for automatically detecting and tracking the ball in broadcast soccer video. Unlike the object-based algorithms, our algorithm does not evaluate whether a sole object is a ball. Instead, ...
expand
Supporting timeliness and accuracy in distributed real-time content-based video analysis
Viktor S. Wold Eide, Frank Eliassen, Ole-Christoffer Granmo, Olav Lysne
Pages: 21-32
doi>10.1145/957013.957019
Full text: PDFPDF

Real-time content-based access to live video data requires content analysis applications that are able to process the video data at least as fast as the video data is made available to the application and with an acceptable error rate. Statements as ...
expand
A mid-level representation framework for semantic sports video analysis
Ling-Yu Duan, Min Xu, Tat-Seng Chua, Qi Tian, Chang-Sheng Xu
Pages: 33-44
doi>10.1145/957013.957020
Full text: PDFPDF

Sports video has been widely studied due to its tremendous commercial potentials. Despite encouraging results from various specific sports games, it is almost impossible to extend a system for a new sports game because they usually employ different sets ...
expand
SESSION: Multimedia streaming and services
PROMISE: peer-to-peer media streaming using CollectCast
Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava
Pages: 45-54
doi>10.1145/957013.957022
Full text: PDFPDF

We present the design, implementation, and evaluation of PROMISE, a novel peer-to-peer media streaming system encompassing the key functions of peer lookup, peer-based aggregated streaming, and dynamic adaptations to network and peer conditions. Particularly, ...
expand
Optimal streaming of layered video: joint scheduling and error concealment
Philippe de Cuetos, Keith W. Ross
Pages: 55-64
doi>10.1145/957013.957023
Full text: PDFPDF

We consider streaming layered video (live and stored) over a lossy packet network in order to maximize the video quality that is rendered at the receiver. We propose a framework, called joint scheduling and error concealment (Joint S+EC), in which ...
expand
Adaptive disk scheduling in a multimedia DBMS
Ketil Lund, Vera Goebel
Pages: 65-74
doi>10.1145/957013.957024
Full text: PDFPDF

In this paper, we present APEX, a disk scheduling framework with QoS support, designed for environments with highly varying disk bandwidth usage. In particular, we focus on a Learning-on-Demand scenario supported by a multimedia database management system, ...
expand
Comprehensive statistical admission control for streaming media servers
Roger Zimmermann, Kun Fu
Pages: 75-85
doi>10.1145/957013.957025
Full text: PDFPDF

Streaming media servers and digital continuous media recorders require the scheduling of I/O requests to disk drives in real time. There are two accepted paradigms to achieve this: deterministic or statistical. The deterministic approach must assume ...
expand
DEMONSTRATION SESSION: Demonstration session 1
ARMS: adaptive rich media secure streaming
Lisa Amini, Raymond Rose, Chitra Venkatramani, Olivier Verscheure, Peter Westerink, Pascal Frossard
Pages: 86-87
doi>10.1145/957013.957027
Full text: PDFPDF

In this demonstration we present the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The ARMS system dynamically adapts streams to available bandwidth, client capabilities, packet ...
expand
Active capture: automatic direction for automatic movies
Marc Davis, Jeffrey Heer, Ana Ramirez
Pages: 88-89
doi>10.1145/957013.957028
Full text: PDFPDF

The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating ...
expand
Panoptes: scalable low-power video sensor networking technologies
Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng
Pages: 90-91
doi>10.1145/957013.957029
Full text: PDFPDF

This demonstration will show the video sensor networking technologies developed at the OGI School of Science and Engineering. The general purpose video sensors allow programmers to create application-specific filtering, power management, and event triggering ...
expand
Hyper-hitchcock: authoring interactive videos and generating interactive summaries
Andreas Girgensohn, Frank Shipman, Lynn Wilcox
Pages: 92-93
doi>10.1145/957013.957030
Full text: PDFPDF

To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide ...
expand
The video paper multimedia playback system
Jamey Graham, Berna Erol, Jonathan J. Hull, Dar-Shyang Lee
Pages: 94-95
doi>10.1145/957013.957031
Full text: PDFPDF

Video Paper is a prototype system for multimedia browsing, analysis, and replay. Key frames extracted from a video recording are printed on paper together with bar codes that allow for random access and replay. A transcript for the audio track can also ...
expand
Mobile video stream monitoring system
Kam-Yiu Lam, Calvin K. H. Chiu
Pages: 96-97
doi>10.1145/957013.957032
Full text: PDFPDF

IMVS (Intelligent Mobile Video Stream Monitoring System) is a mobile video surveillance system. The objective of IMVS is to design a high performance video stream monitoring system in a mobile computing environment. In particular, ...
expand
MPEG-7 video automatic labeling system
Ching-Yung Lin, Belle L. Tseng, Milind Naphade, Apostol Natsev, John R. Smith
Pages: 98-99
doi>10.1145/957013.957033
Full text: PDFPDF

In this demo, we show a novel end-to-end video automatic labeling system, which accepts MPEG-1 sequence inputs and generates MPEG-7 XML metadata files. Detections are based on the prior established anchor models. This system has two parts: model training ...
expand
DOVE: drawing over video environment
Jiazhi Ou, Xilin Chen, Susan R. Fussell, Jie Yang
Pages: 100-101
doi>10.1145/957013.957034
Full text: PDFPDF

We demonstrate a multimedia system that integrates pen-based gesture and live video to support collaboration on physical tasks. The system combines network IP cameras, desktop PCs, and tablet PCs (or PDAs) to allow a remote helper to draw on a video ...
expand
An automatic image inpaint tool
Timothy K. Shih, Liang-Chen Lu, Rong-Chi Chang
Pages: 102-103
doi>10.1145/957013.957035
Full text: PDFPDF

Automatic digital inpainting is a challenge but interesting research area. This demonstration presents a tool, which uses a color interpolation mechanism to restore damaged images. The mechanism checks the variation of pixel blocks and restores pixels ...
expand
The co-opticon: shared access to a robotic streaming video camera
Dezhen Song, Ken Goldberg
Pages: 104-105
doi>10.1145/957013.957036
Full text: PDFPDF

The "co-opticon" is a robotic pan, tilt, and zoom streaming video camera controlled by simultaneous frame requests from remote users. Robotic webcameras are commercially available but currently restrict control to only one user at a time. The co-opticon ...
expand
MobiPicture: browsing pictures on mobile devices
Ming-Yu Wang, Xing Xie, Wei-Ying Ma, Hong-Jiang Zhang
Pages: 106-107
doi>10.1145/957013.957037
Full text: PDFPDF

Pictures have become increasingly common and popular in mobile communication. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of pictures on the small screen. MobiPicture is a prototype ...
expand
Route panoramas for city navigation
Jiang Yu Zheng, Min Shi, Makoto Kato
Pages: 108-109
doi>10.1145/957013.957038
Full text: PDFPDF

This paper presents a new medium called route panorama (RP) for visualizing a large-scale environment such as a town or a city. An RP is captured by a slit camera mounted on a vehicle. It is a continuous, compact and complete visual representation of ...
expand
SESSION: Music
Personalization of user profiles for content-based music retrieval based on relevance feedback
Keiichiro Hoashi, Kazunori Matsumoto, Naomi Inoue
Pages: 110-119
doi>10.1145/957013.957040
Full text: PDFPDF

Numerous efforts on content-based music information retrieval have been presented in recent years. However, the object of such existing research is to retrieve a specific song from a large music database. In this research, we propose a music retrieval ...
expand
Polyphonic music modeling with random fields
Victor Lavrenko, Jeremy Pickens
Pages: 120-129
doi>10.1145/957013.957041
Full text: PDFPDF

Recent interest in the area of music information retrieval and related technologies is exploding. However, very few of the existing techniques take advantage of recent developments in statistical modeling. In this paper we discuss an application of Random ...
expand
Approximate matching algorithms for music information retrieval using vocal input
Richard L. Kline, Ephraim P. Glinert
Pages: 130-139
doi>10.1145/957013.957042
Full text: PDFPDF

Effective use of multimedia collections requires efficient and intuitive methods of searching and browsing. This work considers databases which store music and explores how these may best be searched by providing input queries in some musical form. For ...
expand
Automated extraction of music snippets
Lie Lu, Hong-Jiang Zhang
Pages: 140-147
doi>10.1145/957013.957043
Full text: PDFPDF

Similar to image and video thumbnail, music snippet is defined as the most representative or highlight excerpt of a music clip, and can be used efficiently for fast browsing large number of music files. Music snippet is usually a part of the repeated ...
expand
SESSION: Managing images
Automatic browsing of large pictures on mobile devices
Hao Liu, Xing Xie, Wei-Ying Ma, Hong-Jiang Zhang
Pages: 148-155
doi>10.1145/957013.957045
Full text: PDFPDF

Pictures have become increasingly common and popular in mobile communications. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of large pictures on the small screen. In this paper, ...
expand
Geographic location tags on digital images
Kentaro Toyama, Ron Logan, Asta Roseway
Pages: 156-166
doi>10.1145/957013.957046
Full text: PDFPDF

We describe an end-to-end system that capitalizes on geographic location tags for digital photographs. The World Wide Media eXchange (WWMX) database indexes large collections of image media by several pieces of metadata including timestamp, owner, and ...
expand
Generic image classification using visual knowledge on the web
Keiji Yanai
Pages: 167-176
doi>10.1145/957013.957047
Full text: PDFPDF

In this paper, we describe a generic image classification system with an automatic knowledge acquisition mechanism from the World-Wide Web. Due to the recent spread of digital imaging devices, the demand for image recognition of various kinds of real ...
expand
SESSION: Student best paper contest
Proscenium: a framework for spatio-temporal video editing
Eric P. Bennett, Leonard McMillan
Pages: 177-184
doi>10.1145/957013.957049
Full text: PDFPDF

We present an approach to video editing where movie sequences are treated as spatio-temporal volumes that can be sheered and warped under user control. This simple capability enables new video editing operations that support complex postproduction modifications, ...
expand
Real-time compression for dynamic 3D environments
Sang-Uok Kum, Ketan Mayer-Patel, Henry Fuchs
Pages: 185-194
doi>10.1145/957013.957050
Full text: PDFPDF

The goal of tele-immersion has long been to enable people at remote locations to share a sense of presence. A tele-immersion system acquires the 3D representation of a collaborator's environment remotely and sends it over the network where it is rendered ...
expand
Confidence-based dynamic ensemble for image annotation and semantics discovery
Beitao Li, Kingshy Goh
Pages: 195-206
doi>10.1145/957013.957051
Full text: PDFPDF

Providing accurate and scalable solutions to map low-level perceptual features to high-level semantics is critical for multimedia information organization and retrieval. In this paper, we propose a confidence-based dynamic ensemble (CDE) to overcome ...
expand
SESSION: Reception and posters
Weaving stories in digital media: when Spielberg makes home movies
Brett Adams, Svetha Venkatesh
Pages: 207-210
doi>10.1145/957013.957053
Full text: PDFPDF

In this paper we describe research aimed at enabling amateur video makers to improve both the technical quality and communicative capacity of their work. Motivated by the recognition that untold hours of home video are simply abandoned after capture, ...
expand
An audio stream classification and optimal segmentation for multimedia applications
Konstantin Biatov, Joachim Koehler
Pages: 211-214
doi>10.1145/957013.957054
Full text: PDFPDF

In this paper we investigate on-line zero-crossing based audio stream segmentation and classification into speech and other segments. We consider such segments as applause, noise of the auditorium, and silence. We demonstrate that the features extracted ...
expand
Interleaving media data for MPEG-4 presentations
Jeff Boston, Michelle Kim, William Luken, Edward So, Steve Wood
Pages: 215-218
doi>10.1145/957013.957055
Full text: PDFPDF

A composite multimedia presentation may be represented by a sequence of virtual media data packets. An algorithm is presented for ordering these virtual media data packets so as to minimize the initial delay required to transfer the composite stream ...
expand
Using structure patterns of temporal and spectral feature in audio similarity measure
Rui Cai, Lie Lu, Hong-Jiang Zhang
Pages: 219-222
doi>10.1145/957013.957056
Full text: PDFPDF

Although statistical characteristics of audio features are widely used for similarity measure in most of current audio analysis systems and have been proved to be effective, they only utilized the averaged feature variations over time, and thus lead ...
expand
Music thumbnailing via structural analysis
Wei Chai, Barry Vercoe
Pages: 223-226
doi>10.1145/957013.957057
Full text: PDFPDF

Music thumbnailing (or music summarization) aims at finding the most representative part of a song, which can be used for web browsing, web searching and music recommendation. Three strategies are proposed in this paper for automatically generating the ...
expand
A geographic redirection service for on-line games
Chris Chambers, Wu-chi Feng, Wu-chang Feng, Debanjan Saha
Pages: 227-230
doi>10.1145/957013.957058
Full text: PDFPDF

For many on-line games, user experience is impacted significantly by network latency. As on-line games and on-line game servers proliferate, the ability to discover and connect to nearby servers is essential for maintaining user satisfaction. In this ...
expand
A new scanning method for H.264 based fine granular scalable video coding
Won-Sik Cheong, Kyuheon Kim, Gwang Hoon Park
Pages: 231-234
doi>10.1145/957013.957059
Full text: PDFPDF

In this paper, we introduce a new scanning method for H.264 based Fine Granular Scalable video coding, which can significantly improve the subjective picture quality of a decoded scalable video. Since the network condition is fluctuated, it is often ...
expand
Capacity planning tool for streaming media services
Ludmila Cherkasova, Wenting Tang
Pages: 235-238
doi>10.1145/957013.957060
Full text: PDFPDF

The goal of the proposed capacity planning tool is to provide the best cost/performance configuration for support of a known media service workload. There are two essential components in our capacity planning tool: i) the capacity measurements ...
expand
A semantic model for flash retrieval using co-occurrence analysis
Dawei Ding, Qing Li, Bo Feng, Liu Wenyin
Pages: 239-242
doi>10.1145/957013.957061
Full text: PDFPDF

Flash is experiencing a breathtaking growth and has become one of the prevailing media formats on the Web. Our goal is to exploit the enormous Flash resources by developing a model of content-based Flash retrieval. Towards this end, we introduce a novel ...
expand
Nonparametric color characterization using mean shift
Ling-Yu Duan, Min Xu, Qi Tian, Chang-sheng Xu
Pages: 243-246
doi>10.1145/957013.957062
Full text: PDFPDF

Color is very useful in locating and recognizing objects that occur in artificial environments. The color histogram has shown its efficiency and advantages as a general tool for various applications, such as content-based image retrieval and video browsing, ...
expand
Looking into video frames on small displays
Xin Fan, Xing Xie, He-Qin Zhou, Wei-Ying Ma
Pages: 247-250
doi>10.1145/957013.957063
Full text: PDFPDF

With the growing popularity of personal digital assistants and smart phones, people have become enthusiastic to watch videos through these mobile devices. However, a crucial challenge is to provide a better user experience for browsing videos on the ...
expand
Observation based vs. model based admission control for interactive multimedia sessions
Silvia Hollfelder, Peter Fankhauser, Erich J. Neuhold
Pages: 251-254
doi>10.1145/957013.957064
Full text: PDFPDF

Interactive multimedia sessions cause high variations in workload due to presenting different media streams at different points in time. As users interact, the workload variations can not be predicted precisely. But admission control mechanisms need ...
expand
Discriminative model fusion for semantic concept detection and annotation in video
G. Iyengar, H. J. Nock
Pages: 255-258
doi>10.1145/957013.957065
Full text: PDFPDF

In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept ...
expand
Affective content detection using HMMs
Hang-Bong Kang
Pages: 259-262
doi>10.1145/957013.957066
Full text: PDFPDF

This paper discusses a new technique for detecting affective events using Hidden Markov Models(HMM). To map low level features of video data to high level emotional events, we perform empirical study on the relationship between emotional events and low-level ...
expand
Universal synchronization scheme for distributed audio-video capture on heterogeneous computing platforms
Rainer Lienhart, Igor Kozintsev, Stefan Wehr
Pages: 263-266
doi>10.1145/957013.957067
Full text: PDFPDF

We propose a universal synchronization scheme for distributed audio-video capture on heterogeneous computing devices such as laptops, tablets, PDAs, cellular phones, audio recorders, and camcorders. These devices typically possess sensors such as microphones ...
expand
An automatic singing voice rectifier design
Cheng-Yuan Lin, J.-S. Roger Jang, Mao-Yuan Hsu
Pages: 267-270
doi>10.1145/957013.957068
Full text: PDFPDF

This paper proposes a new approach to automatic singing voice rectification. There are two components in the rectifier; one is the recognizer based on dynamic time warping and the other is the synthesizer based PSOLA (Pitch Synchronous Overlap and Add) ...
expand
Location-aware data broadcasting: an application for digital mobile broadcasting in Japan
Kinji Matsumura, Kazuya Usui, Kenjiro Kai, Koichi Ishikawa
Pages: 271-274
doi>10.1145/957013.957069
Full text: PDFPDF

Terrestrial digital broadcasting that uses the ISDB-T (Integrated Services Digital Broadcasting-Terrestrial) system is scheduled for launch in Japan in December 2003. This system also enables mobile broadcasting service, which will be offered a few years ...
expand
On image auto-annotation with latent space models
Florent Monay, Daniel Gatica-Perez
Pages: 275-278
doi>10.1145/957013.957070
Full text: PDFPDF

Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance ...
expand
Colour picking: the pecking prder of form and function
Frank Nack, Amit Manniesing, Lynda Hardman
Pages: 279-282
doi>10.1145/957013.957071
Full text: PDFPDF

Multimedia presentation generation has to be able to balance the functional aspects of a presentation that address the information needs of the user and its aesthetic form. We demonstrate our approach using automatic colour design for which we integrate ...
expand
A robust dissolve detector by support vector machine
Chong-Wah Ngo
Pages: 283-286
doi>10.1145/957013.957072
Full text: PDFPDF

In this paper, we propose a novel approach for the robust detection and classification of dissolve sequences in videos. Our approach is based on the multi-resolution representation of temporal slices extracted from 3D image volume. At the low-resolution ...
expand
Hierarchical topical segmentation in instructional films based on cinematic expressive functions
Dinh Q. Phung, Svetha Venkatesh, Chitra Dorai
Pages: 287-290
doi>10.1145/957013.957073
Full text: PDFPDF

In this paper, we propose a novel solution for segmenting an instructional video into hierarchical topical sections. Incorporating the knowledge of education-oriented film theory with our previous study of expressive functions namely the content density ...
expand
Programming portable optimized multimedia applications
Juan Carlos Rojas, Miriam Leeser
Pages: 291-294
doi>10.1145/957013.957074
Full text: PDFPDF

Multimedia computer architectures can speed-up applications significantly when programmed manually. Optimized programs have been non-portable up to now, because of differences in instruction sets, register lengths, alignment requirements and programming ...
expand
ARTiFACIAL: automated reverse turing test using FACIAL features
Yong Rui, Zicheg Liu
Pages: 295-298
doi>10.1145/957013.957075
Full text: PDFPDF

Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world ...
expand
Extracting information about emotions in films
Andrew Salway, Mike Graham
Pages: 299-302
doi>10.1145/957013.957076
Full text: PDFPDF

We present a method being developed to extract information about characters' emotions in films. It is suggested that this information can help describe higher levels of multimedia semantics relating to narrative structures. Our method extracts information ...
expand
Video cut editing rule based on participants' gaze in multiparty conversation
Yoshinao Takemae, Kazuhiro Otsuka, Naoki Mukawa
Pages: 303-306
doi>10.1145/957013.957077
Full text: PDFPDF

This paper proposes a video cut editing rule based on participants' gaze for extracting and conveying the flow of conversation in multiparty conversation. Systems that record meetings and those that support teleconferences are attracting considerable ...
expand
Securing media for adaptive streaming
Chitra Venkatramani, Peter Westerink, Olivier Verscheure, Pascal Frossard
Pages: 307-310
doi>10.1145/957013.957078
Full text: PDFPDF

This paper describes the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The secure streaming algorithms ensure end-to-end security while the content is adapted and streamed via intermediate, ...
expand
Real-time goal-mouth detection in MPEG soccer video
Kongwah Wan, Xin Yan, Xinguo Yu, Changsheng Xu
Pages: 311-314
doi>10.1145/957013.957079
Full text: PDFPDF

We report our work in real-time detection of goal-mouth appearances in MPEG soccer video. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions ...
expand
Synchronization of lecture videos and electronic slides by video text analysis
Feng Wang, Chong-Wah Ngo, Ting-Chuen Pong
Pages: 315-318
doi>10.1145/957013.957080
Full text: PDFPDF

An essential goal of structuring lecture videos captured in live presentation is to provide a synchronized view of video clips and electronic slides. This paper presents an automatic approach to match video clips and slides based on the analysis of text ...
expand
Experience based sampling technique for multimedia analysis
Jun Wang, Mohan S. Kankanhalli
Pages: 319-322
doi>10.1145/957013.957081
Full text: PDFPDF

We present a novel experience based sampling or experiential sampling technique which has the ability to focus on the analysis's task by making use of the contextual information from the environment. In this technique, sensor samples are ...
expand
R-Histogram: quantitative representation of spatial relations for similarity-based image retrieval
Yuhang Wang, Fillia Makedon
Pages: 323-326
doi>10.1145/957013.957082
Full text: PDFPDF

Representation of relative spatial relations between objects is required in many multimedia database applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance is often required. This cannot ...
expand
Studying streaming video quality: from an application point of view
Zhiheng Wang, Sujata Banerjee, Sugih Jamin
Pages: 327-330
doi>10.1145/957013.957083
Full text: PDFPDF

An important aspect of improving streaming application performance is the streaming quality evaluation process. In this paper we introduce a set of alternative objective streaming video quality metrics, which are suitable for large scale deployment. ...
expand
Construction of interactive video information system by applying results of object recognition
Xiaomeng Wu, Wenli Zhang, Shunsuke Kamijo, Masao Sakauchi
Pages: 331-334
doi>10.1145/957013.957084
Full text: PDFPDF

Although numerous attempts have been made to determine algorithms and approaches for building up a video information system, not many practical applications have been proposed. In this paper, a novel interactive video information system called the Drama ...
expand
Application of a content-based percussive sound synthesizer to packet loss recovery in music streaming
Lonce Wyse, Ye Wang, Xinglei Zhu
Pages: 335-338
doi>10.1145/957013.957085
Full text: PDFPDF

This paper presents a novel method to recover lost packets in music streaming using a synthesizer to generate percussive sounds. As an improvement of the state-of-the-art system that uses a content-based audio codebook, the new method can greatly reduce ...
expand
The combination limit in multimedia retrieval
Rong Yan, Alexander G. Hauptmann
Pages: 339-342
doi>10.1145/957013.957086
Full text: PDFPDF

Combining search results from multimedia sources is crucial for dealing with heterogeneous multimedia data, particularly in multimedia retrieval where a final ranked list of items of interest is returned sorted by confidence or relevance. However, relatively ...
expand
Negative pseudo-relevance feedback in content-based video retrieval
Rong Yan, Alexander G. Hauptmann, Rong Jin
Pages: 343-346
doi>10.1145/957013.957087
Full text: PDFPDF

Video information retrieval requires a system to find information relevant to a query which may be represented simultaneously in different ways through a text description, audio, still images and/or video sequences. We present a novel approach that uses ...
expand
Avatar motion control by user body postures
Satoshi Yonemoto, Hiroshi Nakano, Rin-ichiro Taniguchi
Pages: 347-350
doi>10.1145/957013.957088
Full text: PDFPDF

This paper describes an avatar motion control by body postures. Our goal is to do seamless mapping of human motion in the real world into virtual environments. We hope that the idea of direct human motion sensing will be used on future interfaces. With ...
expand
Model-based talking face synthesis for anthropomorphic spoken dialog agent system
Tatsuo Yotsukura, Shigeo Morishima, Satoshi Nakamura
Pages: 351-354
doi>10.1145/957013.957089
Full text: PDFPDF

Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. ...
expand
Automated annotation of human faces in family albums
Lei Zhang, Longbin Chen, Mingjing Li, Hongjiang Zhang
Pages: 355-358
doi>10.1145/957013.957090
Full text: PDFPDF

Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based ...
expand
Music scale modeling for melody matching
Yongwei Zhu, Mohan Kankanhalli
Pages: 359-362
doi>10.1145/957013.957091
Full text: PDFPDF

Several time series matching techniques have been proposed for content-based music retrieval. These techniques have shown to be robust and effective for music retrieval by acoustic inputs, such as query-by-humming. However, due to the key transposition ...
expand
Inventing new media: what we can learn from new media art and media history
Lev Manovich
Pages: 363-363
doi>10.1145/957013.957015
Full text: PDFPDF

Throughout the human history, the design of different cultural techniques and media forms for representing human knowledge, collective and personal experience, and what we now call "data" have not been confined to single individuals or disciplines. To ...
expand
SESSION: Image annotation and video summarization
Temporal event clustering for digital photo collections
Matthew Cooper, Jonathan Foote, Andreas Girgensohn, Lynn Wilcox
Pages: 364-373
doi>10.1145/957013.957093
Full text: PDFPDF

We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm ...
expand
Contrast-based image attention analysis by using fuzzy growing
Yu-Fei Ma, Hong-Jiang Zhang
Pages: 374-381
doi>10.1145/957013.957094
Full text: PDFPDF

Visual attention analysis provides an alternative methodology to semantic image understanding in many applications such as adaptive content delivery and region-based image retrieval. In this paper, we propose a feasible and fast approach to attention ...
expand
Video summarization based on user log enhanced link analysis
Bin Yu, Wei-Ying Ma, Klara Nahrstedt, Hong-Jiang Zhang
Pages: 382-391
doi>10.1145/957013.957095
Full text: PDFPDF

Efficient video data management calls for intelligent video summarization tools that automatically generate concise video summaries for fast skimming and browsing. Traditional video summarization techniques are based on low-level feature analysis, which ...
expand
Generation of interactive multi-level video summaries
Frank Shipman, Andreas Girgensohn, Lynn Wilcox
Pages: 392-401
doi>10.1145/957013.957096
Full text: PDFPDF

In this paper, we describe how a detail-on-demand representation for interactive video is used in video summarization. Our approach automatically generates a hypervideo composed of multiple video summary levels and navigational links between these summaries ...
expand
SESSION: Multimedia coding and security
Improved p-domain rate control and perceived quality optimizations for MPEG-4 real-time video applications
Michael Militzer, Maciej Suchomski, Klaus Meyer-Wegener
Pages: 402-411
doi>10.1145/957013.957098
Full text: PDFPDF

The paper describes bit rate control for a one-pass MPEG-4 video encoding algorithm in order to make it suitable for real-time applications. The proposed control method is of low computational complexity and more accurate than previous approaches. In ...
expand
Content-based UEP: a new scheme for packet loss recovery in music streaming
Ye Wang, Ali Ahmaniemi, David Isherwood, Wendong Huang
Pages: 412-421
doi>10.1145/957013.957099
Full text: PDFPDF

Bandwidth efficiency and error robustness are two essential and conflicting requirements for streaming media content over error-prone channels, such as wireless channels. This paper describes a new scheme called content-based unequal error protection ...
expand
Layered coding vs. multiple descriptions for video streaming over multiple paths
J. Chakareski, S. Han, B. Girod
Pages: 422-431
doi>10.1145/957013.957100
Full text: PDFPDF

In this paper, we examine the performance of specific implementations of multiple description coding and of layered coding for video streaming over error-prone packet switched networks. We compare their performance using different transmission schemes ...
expand
A flexible and scalable authentication scheme for JPEG2000 image codestreams
Cheng Peng, Robert H. Deng, Yongdong Wu, Weizhong Shao
Pages: 433-441
doi>10.1145/957013.957101
Full text: PDFPDF

JPEG2000 is an emerging standard for still image compression and is becoming the solution of choice for many digital imaging fields and applications. An important aspect of JPEG2000 is its "compress once, decompress many ways" property [1], i. e., it ...
expand
DEMONSTRATION SESSION: Demonstration session 2
Music videos miner
Lalitha Agnihotri, Nevenka Dimitrova, John Kender, John Zimmerman
Pages: 442-443
doi>10.1145/957013.957103
Full text: PDFPDF
Identifying audio clips with RARE
Chris J. C. Burges, John C. Platt, Jonathan Goldstein
Pages: 444-445
doi>10.1145/957013.957104
Full text: PDFPDF

In this paper, we describe RARE (Robust Audio Recognition Engine): a system for identifying audio streams and files. RARE can be used in a variety of applications: from enhancing the consumer listening experience to cleaning large audio databases. RARE ...
expand
An affinity-based image retrieval system for multimedia authoring and presentation
Shu-Ching Chen, Mei-Ling Shyu, Na Zhao, Chengcui Zhang
Pages: 446-447
doi>10.1145/957013.957105
Full text: PDFPDF

In this demonstration, we present an image retrieval system to support multimedia authoring and presentation. An affinity-based mechanism, Markov Model Mediator (MMM), is used as the search engine for the system, which utilizes both the low-level image ...
expand
MuSA.RT: music on the spiral array. real-time
Elaine Chew, Alexandre R.J. Francois
Pages: 448-449
doi>10.1145/957013.957106
Full text: PDFPDF

We present MuSA.RT, Opus 1, a multimodal interactive system for music analysis and visualization using the Spiral Array model. Real-time MIDI input from a live performance is processed, analyzed and mapped to the 3D model, revealing tonal structures ...
expand
Indexing, searching, and skimming of multimedia documents containing recorded lectures and live presentations
Wolfgang Hürst
Pages: 450-451
doi>10.1145/957013.957107
Full text: PDFPDF

This demonstration illustrates different ways to support users dealing with recorded live presentations in order to improve the usability of the corresponding documents. It highlights different problems in this context and presents solutions and alternative ...
expand
Microcontroller implementation of melody recognition: a prototype
Jyh-Shing Roger Jang, Yung-Sen Jang
Pages: 452-453
doi>10.1145/957013.957108
Full text: PDFPDF

This demo presents a 16-bit microcontroller implementation of a content-based music retrieval system that can take a user's acoustic input (5-second clip of singing or humming) and then retrieve the intended song from 20 candidate songs. Performance ...
expand
Human + agent: creating recombinant information
Andruid Kerne, Vikram Sundaram, Jin Wang, Madhur Khandelwal, J. Michael Mistrot
Pages: 454-455
doi>10.1145/957013.957109
Full text: PDFPDF

combinFormation is a tool that enables browsing and collecting information elements in a generative space. By generative, we mean that the tool is an agent that automatically retrieves information elements and visually composes them. A combinFormation ...
expand
Enhancing web accessibility
Alison Lee, Vicki Hanson
Pages: 456-457
doi>10.1145/957013.957110
Full text: PDFPDF

This demonstration will illustrate the key technical and user interface aspects of the Web Adaptation Technology. Various transformations underlying the system will be shown that illustrate how this approach enables a wide range of users with reduced ...
expand
eXtensible content protection
Florian Pestoni, Clemens Drews
Pages: 458-459
doi>10.1145/957013.957111
Full text: PDFPDF

This paper describes a proof of concept implementation of xCP, a content protection scheme for home networks.
expand
Creating touch-screens anywhere with interactive projected displays
Claudio Pinhanez, Rick Kjeldsen, Lijun Tang, Anthony Levas, Mark Podlaseck, Noi Sukaviriya, Gopal Pingali
Pages: 460-461
doi>10.1145/957013.957112
Full text: PDFPDF

We demonstrate a system that combines steerable projection and computer vision technologies to create "touch-screen" style interactive displays on any flat surface in a space. A high-end version of the system -- the Everywhere Display (ED) -- combines ...
expand
Excuse me, but are you human?
Yong Rui, Zicheg Liu
Pages: 462-463
doi>10.1145/957013.957113
Full text: PDFPDF

Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world ...
expand
Interactive multimedia messaging service platform
Shen Jun, Yan Rong, Sun Pei, Song Song
Pages: 464-465
doi>10.1145/957013.957114
Full text: PDFPDF

An interactive multimedia messaging service platform and two demos are described in this paper.
expand
Interactive storytelling system using behavior-based non-verbal information: ZENetic computer
Naoko Tosa, Seigo Matsuoka, Koji Miyazaki
Pages: 466-467
doi>10.1145/957013.957115
Full text: PDFPDF

We have developed an interactive storytelling system that aims to help us "recreate" our conscious selves by calling on Buddhist principles, Asian philosophy, and traditional Japanese culture through the inspirational media of ink painting, kimono and ...
expand
Robust goal-mouth detection for virtual content insertion
Kongwah WAN, Xin YAN, Xinguo YU, Changsheng XU
Pages: 468-469
doi>10.1145/957013.957116
Full text: PDFPDF

In this paper, we describe a working system that detects and segments goal-mouth appearances of soccer video in real-time. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection ...
expand
SESSION: 3D multimedia environments
Computation and performance issues In coliseum: an immersive videoconferencing system
H. Harlyn Baker, Nina Bhatti, Donald Tanguay, Irwin Sobel, Dan Gelb, Michael E. Goss, John MacCormick, Kei Yuasa, W. Bruce Culbertson, Thomas Malzbender
Pages: 470-479
doi>10.1145/957013.957118
Full text: PDFPDF

Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops. Five cameras are attached to each PC display and directed at the participant. From ...
expand
Design of a multi-sender 3D videoconferencing application over an end system multicast protocol
Mojtaba Hosseini, Nicolas D. Georganas
Pages: 480-489
doi>10.1145/957013.957119
Full text: PDFPDF

Videoconferencing in the context of 3D virtual environments promises better spatial consistency and mutual awareness for its participants. However, in the absence of IP Multicast and limited upload bandwidth of today's DSL connections, the feasibility ...
expand
SESSION: Multimedia authoring
AVE: automated home video editing
Xian-Sheng HUA, Lie LU, Hong-Jiang ZHANG
Pages: 490-497
doi>10.1145/957013.957121
Full text: PDFPDF

In this paper, we present a system that automates home video editing. This system automatically extracts a set of highlight segments from a set of raw home videos and aligns them with user supplied incidental music based on the content of the video and ...
expand
Linking multimedia presentations with their symbolic source documents: algorithm and applications
Berna Erol, Jonathan J. Hull, Dar-Shyang Lee
Pages: 498-507
doi>10.1145/957013.957122
Full text: PDFPDF

An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPoint™ or Acrobat™) from which they were generated. The images are captured either by tapping the video output from ...
expand
SESSION: Surveillance
Video retrieval using spatio-temporal descriptors
Daniel DeMenthon, David Doermann
Pages: 508-517
doi>10.1145/957013.957124
Full text: PDFPDF

This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time ...
expand
Invariance in motion analysis of videos
Cen Rao, Mubarak Shah, Tanveer Syeda-Mahmood
Pages: 518-527
doi>10.1145/957013.957125
Full text: PDFPDF

In this paper, we propose an approach that retrieves motion of objects from the videos based on the dynamic time warping of view invariant characteristics. The motion is represented as a sequence of dynamic instants and intervals, which are automatically ...
expand
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance
Gang Wu, Yi Wu, Long Jiao, Yuan-Fang Wang, Edward Y. Chang
Pages: 528-538
doi>10.1145/957013.957126
Full text: PDFPDF

We present a framework for multi-camera video surveillance. The framework consists of three phases: detection, representation, and recognition. The detection phase handles multi-source spatio-temporal data fusion for efficiently ...
expand
SESSION: Interacting with media
Interacting with audio streams for entertainment and communication
Mat C. Hans, Mark T. Smith
Pages: 539-545
doi>10.1145/957013.957128
Full text: PDFPDF

We present a new model of interactive audio for entertainment and communication. A new device called the DJammer and its associated technologies are described. The DJammer introduces the idea of provisioning mobile users to interact cooperatively with ...
expand
Shared interactive video for teleconferencing
Chunyuan Liao, Qiong Liu, Don Kimber, Patrick Chiu, Jonathan Foote, Lynn Wilcox
Pages: 546-554
doi>10.1145/957013.957129
Full text: PDFPDF

We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text ...
expand
Visualizing the pulse of a classroom
Milton Chen
Pages: 555-561
doi>10.1145/957013.957130
Full text: PDFPDF

Effective classroom teaching often requires an instructor to be acutely aware of every student. The instructor must rapidly look from student to student to catch fleeting gestures or facial expressions. To facilitate the tracking of communicative actions ...
expand
SESSION: Multimedia for tiny devices
Panoptes: scalable low-power video sensor networking technologies
Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng, Louis Bavoil
Pages: 562-571
doi>10.1145/957013.957132
Full text: PDFPDF

Video-based sensor networks can provide important visual information in a number of applications including: environmental monitoring, health care, emergency response, and video security. This paper describes the Panoptes video-based sensor networking ...
expand
Position calibration of audio sensors and actuators in a distributed computing platform
Vikas C. Raykar, Igor Kozintsev, Rainer Lienhart
Pages: 572-581
doi>10.1145/957013.957133
Full text: PDFPDF

In this paper, we present a novel approach to automatically determine the positions of sensors and actuators in an ad-hoc distributed network of general purpose computing platforms. The formulation and solution accounts for the limited precision in temporal ...
expand
Integrated power management for video streaming to mobile handheld devices
Shivajit Mohapatra, Radu Cornea, Nikil Dutt, Alex Nicolau, Nalini Venkatasubramanian
Pages: 582-591
doi>10.1145/957013.957134
Full text: PDFPDF

Optimizing user experience for streaming video applications on handheld devices is a significant research challenge. In this paper, we propose an integrated power management approach that unifies low level architectural optimizations (CPU, memory, register), ...
expand
DEMONSTRATION SESSION: Video demonstration session
Photo2Video
Xian-Sheng HUA, Lie LU, Hong-Jiang ZHANG
Pages: 592-593
doi>10.1145/957013.957136
Full text: PDFPDF

To exploit rich content embedded in a single photograph, a system named Photo2Video was developed to automatically convert a photographic series into a video by simulating camera motions, set to incidental music of the user's choice. For a chosen ...
expand
Essistants
Christos Tryfonas, James Schumacher
Pages: 594-595
doi>10.1145/957013.957137
Full text: PDFPDF

One of the challenges the service providers currently face is the ability to introduce a variety of services at a minimal cost and impact to their customers. These services often become personalized as more and more content becomes available. The natural ...
expand
The automatic video editor
Sam Yip, Eugenia Leu, Hunter Howe
Pages: 596-597
doi>10.1145/957013.957138
Full text: PDFPDF

More and more home videos are being produced with the increasing popularity of digital video camcorders. Yet the resulting home videos tend to be very long and boring to watch. The precious memories within those videos are ultimately lost. The problem ...
expand
Managing digital memories with the FXPAL photo application
John Adcock, Matthew Cooper, John Doherty, Jonathan Foote, Andreas Girgensohn, Lynn Wilcox
Pages: 598-599
doi>10.1145/957013.957139
Full text: PDFPDF

The FXPAL Photo Application is designed to faciliate the organization of digital images from digital cameras and other sources through automated organization and intuitive user interfaces.
expand
Detail-on-demand hypervideo
John Doherty, Andreas Girgensohn, Jonathan Helfman, Frank Shipman, Lynn Wilcox
Pages: 600-601
doi>10.1145/957013.957140
Full text: PDFPDF

We demonstrate the use of detail-on-demand hypervideo in interactive training and video summarization. Detail-on-demand video allows viewers to watch short video segments and to follow hyperlinks to see additional detail. The player for detail-on-demand ...
expand
Active capture: automatic direction for automatic movies
Marc Davis
Pages: 602-603
doi>10.1145/957013.957141
Full text: PDFPDF

Current consumer media production is laborious, tedious, and produces unsatisfying results. To address this problem, Active Capture leverages media production knowledge, computer vision and audition algorithms, and user interaction techniques to automate ...
expand
SESSION: Content-based retrieval
Multimedia content processing through cross-modal association
Dongge Li, Nevenka Dimitrova, Mingkun Li, Ishwar K. Sethi
Pages: 604-611
doi>10.1145/957013.957143
Full text: PDFPDF

Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide ...
expand
A practical SVM-based algorithm for ordinal regression in image retrieval
Hong Wu, Hanqing Lu, Songde Ma
Pages: 612-621
doi>10.1145/957013.957144
Full text: PDFPDF

Most current learning algorithms for image retrieval are based on dichotomy relevance judgement (relevant and non-relevant), though this measurement of relevance is too coarse. To better identify the user needs and preference, a good retrieval system ...
expand
Knowing a tree from the forest: art image retrieval using a society of profiles
Kai Yu, Wei-Ying Ma, Volker Tresp, Zhao Xu, Xiaofei He, HongJiang Zhang, Hans-Peter Kriegel
Pages: 622-631
doi>10.1145/957013.957145
Full text: PDFPDF

This paper aims to address the problem of art image retrieval (AIR), which aims to help users find their favorite painting images. AIR is of great interests to us because of its application potentials and interesting research challenges---the retrieval ...
expand
VideoQA: question answering on news video
Hui Yang, Lekha Chaisorn, Yunlong Zhao, Shi-Yong Neo, Tat-Seng Chua
Pages: 632-641
doi>10.1145/957013.957146
Full text: PDFPDF

When querying a news video archive, the users are interested in retrieving precise answers in the form of a summary that best answers the query. However, current video retrieval systems, including the search engines on the web, are designed to retrieve ...
expand
SESSION: Doctoral symposium - session I
A framework for cost-effective peer-to-peer content distribution
Mohamed M. Hefeeda
Pages: 642-643
doi>10.1145/957013.957148
Full text: PDFPDF
Transport-level protocol coordination in distributed multimedia applications
David E. Ott, Ketan Mayer-Patel
Pages: 644-645
doi>10.1145/957013.957149
Full text: PDFPDF
A scalable overlay video mixing service model
Bin Yu, Klara Nahrstedt
Pages: 646-647
doi>10.1145/957013.957150
Full text: PDFPDF
SESSION: Doctoral symposium - session II
The mindful camera: common sense for documentary videography
Barbara Barry
Pages: 648-649
doi>10.1145/957013.957152
Full text: PDFPDF

Cameras with story understanding can help videographers reflect on their process of content capture during documentary construction. This paper describes a set of tools that use common sense knowledge to support documentary videography.
expand
Making sense of video content
A. Viranga Ratnaike, Bala Srinivasan, Surya Nepal
Pages: 650-651
doi>10.1145/957013.957153
Full text: PDFPDF

Our aim in this research is to make sense of scenes in video. We expect this will enable us to identify different scenes sharing the same semantic, even if they do not share any multimedia cues. Our approach is based on emergence, and involves classification ...
expand
Algorithms and systems for shared access to a robotic streaming video camera
Dezhen Song
Pages: 652-653
doi>10.1145/957013.957154
Full text: PDFPDF

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder