|
|
Building "bows for violinists": designing real digital tools for working artists |
| |
Michael B. Johnson
|
|
Pages: 1-1 |
|
doi>10.1145/957013.957014 |
|
Full text: PDF
|
|
When asked what we do at Pixar Studio Tools R&D, we often use a story Marvin Minsky related about picking up a violin to make music versus a CD player. One is designed for ease of use, while the other demands and rewards a lifetime of study and practice. ...
When asked what we do at Pixar Studio Tools R&D, we often use a story Marvin Minsky related about picking up a violin to make music versus a CD player. One is designed for ease of use, while the other demands and rewards a lifetime of study and practice. Both have their place when you want to hear music, but only one of them is helpful in making music. In Pixar R&D, we build bows for our violinists here at Pixar.Designing and building digital tools for artists working on large collaborative projects like features films has many challenges. Some of these are unique to a collaborative enterprise, and some are not. I will discuss some of the challenges in building such tools, using examples drawn from both successful and unsuccessful projects. expand
|
|
|
SESSION: Content analysis |
|
|
|
|
Foreground object detection from videos containing complex background |
| |
Liyuan Li,
Weimin Huang,
Irene Y. H. Gu,
Qi Tian
|
|
Pages: 2-10 |
|
doi>10.1145/957013.957017 |
|
Full text: PDF
|
|
This paper proposes a novel method for detection and segmentation of foreground objects from a video which contains both stationary and moving background objects and undergoes both gradual and sudden "once-off" changes. A Bayes decision rule for classification ...
This paper proposes a novel method for detection and segmentation of foreground objects from a video which contains both stationary and moving background objects and undergoes both gradual and sudden "once-off" changes. A Bayes decision rule for classification of background and foreground from selected feature vectors is formulated. Under this rule, different types of background objects will be classified from foreground objects by choosing a proper feature vector. The stationary background object is described by the color feature, and the moving background object is represented by the color co-occurrence feature. Foreground objects are extracted by fusing the classification results from both stationary and moving pixels. Learning strategies for the gradual and sudden "once-off" background changes are proposed to adapt to various changes in background through the video. The convergence of the learning process is proved and a formula to select a proper learning rate is also derived. Experiments have shown promising results in extracting foreground objects from many complex backgrounds including wavering tree branches, flickering screens and water surfaces, moving escalators, opening and closing doors, switching lights and shadows of moving objects. expand
|
|
|
Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video |
| |
Xinguo Yu,
Changsheng Xu,
Hon Wai Leong,
Qi Tian,
Qing Tang,
Kong Wah Wan
|
|
Pages: 11-20 |
|
doi>10.1145/957013.957018 |
|
Full text: PDF
|
|
This paper first presents an improved trajectory-based algorithm for automatically detecting and tracking the ball in broadcast soccer video. Unlike the object-based algorithms, our algorithm does not evaluate whether a sole object is a ball. Instead, ...
This paper first presents an improved trajectory-based algorithm for automatically detecting and tracking the ball in broadcast soccer video. Unlike the object-based algorithms, our algorithm does not evaluate whether a sole object is a ball. Instead, it evaluates whether a candidate trajectory, which is generated from the candidate feature image by a candidate verification procedure based on Kalman filter,, which is generated from the candidate feature image by a candidate verification procedure based on Kalman filter, is a ball trajectory. Secondly, a new approach for automatically analyzing broadcast soccer video is proposed, which is based on the ball trajectory. The algorithms in this approach not only improve play-break analysis and high-level semantic event detection, but also detect the basic actions and analyze team ball possession, which may not be analyzed based only on the low-level feature. Moreover, experimental results show that our ball detection and tracking algorithm can achieve above 96% accuracy for the video segments with the soccer field. Compared with the existing methods, a higher accuracy is achieved on goal detection and play-break segmentation. To the best of our knowledge, we present the first solution in detecting the basic actions such as touching and passing, and analyzing the team ball possession in broadcast soccer video. expand
|
|
|
Supporting timeliness and accuracy in distributed real-time content-based video analysis |
| |
Viktor S. Wold Eide,
Frank Eliassen,
Ole-Christoffer Granmo,
Olav Lysne
|
|
Pages: 21-32 |
|
doi>10.1145/957013.957019 |
|
Full text: PDF
|
|
Real-time content-based access to live video data requires content analysis applications that are able to process the video data at least as fast as the video data is made available to the application and with an acceptable error rate. Statements as ...
Real-time content-based access to live video data requires content analysis applications that are able to process the video data at least as fast as the video data is made available to the application and with an acceptable error rate. Statements as this express quality of service (QoS) requirements to the application. In order to provide some level of control of the QoS provided, the video content analysis application must be scalable and resource aware so that requirements of timeliness and accuracy can be met by allocating additional processing resources.In this paper we present a general architecture of video content analysis applications including a model for specifying requirements of timeliness and accuracy. The salient features of the architecture include its combination of probabilistic knowledge-based media content analysis with QoS and distributed resource management to handle QoS requirements, and its independent scalability at multiple logical levels of distribution. We also present experimental results with an algorithm for QoS-aware selection of configurations of feature extractor and classification algorithms that can be used to balance requirements of timeliness and accuracy against available processing resources. Experiments with an implementation of a real-time motion vector based object-tracking application, demonstrate the scalability of the architecture. expand
|
|
|
A mid-level representation framework for semantic sports video analysis |
| |
Ling-Yu Duan,
Min Xu,
Tat-Seng Chua,
Qi Tian,
Chang-Sheng Xu
|
|
Pages: 33-44 |
|
doi>10.1145/957013.957020 |
|
Full text: PDF
|
|
Sports video has been widely studied due to its tremendous commercial potentials. Despite encouraging results from various specific sports games, it is almost impossible to extend a system for a new sports game because they usually employ different sets ...
Sports video has been widely studied due to its tremendous commercial potentials. Despite encouraging results from various specific sports games, it is almost impossible to extend a system for a new sports game because they usually employ different sets of low-level features appropriate for the specific games and closely coupled with the use of game specific rules to detect events or highlights. There is a lack of internal representation and structure to be generic and applicable for many different sports. In this paper, we present a generic mid-level representation framework for semantic sports video analysis. The mid-level representation layer is introduced between the low-level audio-visual processing and high-level semantic analysis. It allows us to separate sports specific knowledge and rules from the low-level and mid-level feature extraction. This makes sports video analysis more efficient, effective, and less ad-hoc for various types of sports. To achieve robustness of the low-level feature analysis, a non-parametric clustering, mean shift procedure, has been successfully applied to both color and motion analysis. The proposed framework has been tested for five field-ball type sports covering duration of about 8 hours. Experiments have shown its robust performance in semantic analysis and event detection. We believe that the proposed mid-level representation framework can be used for event detection, highlight extraction, summarization and personalization of many types of sports video. expand
|
|
|
SESSION: Multimedia streaming and services |
|
|
|
|
PROMISE: peer-to-peer media streaming using CollectCast |
| |
Mohamed Hefeeda,
Ahsan Habib,
Boyan Botev,
Dongyan Xu,
Bharat Bhargava
|
|
Pages: 45-54 |
|
doi>10.1145/957013.957022 |
|
Full text: PDF
|
|
We present the design, implementation, and evaluation of PROMISE, a novel peer-to-peer media streaming system encompassing the key functions of peer lookup, peer-based aggregated streaming, and dynamic adaptations to network and peer conditions. Particularly, ...
We present the design, implementation, and evaluation of PROMISE, a novel peer-to-peer media streaming system encompassing the key functions of peer lookup, peer-based aggregated streaming, and dynamic adaptations to network and peer conditions. Particularly, PROMISE is based on a new application level P2P service called CollectCast. CollectCast performs three main functions: (1) inferring and leveraging the underlying network topology and performance information for the selection of senders; (2) monitoring the status of peers and connections and reacting to peer/connection failure or degradation with low overhead; (3) dynamically switching active senders and standby senders, so that the collective network performance out of the active senders remains satisfactory. Based on both real-world measurement and simulation, we evaluate the performance of PROMISE, and discuss lessons learned from our experience with respect to the practicality and further optimization of PROMISE. expand
|
|
|
Optimal streaming of layered video: joint scheduling and error concealment |
| |
Philippe de Cuetos,
Keith W. Ross
|
|
Pages: 55-64 |
|
doi>10.1145/957013.957023 |
|
Full text: PDF
|
|
We consider streaming layered video (live and stored) over a lossy packet network in order to maximize the video quality that is rendered at the receiver. We propose a framework, called joint scheduling and error concealment (Joint S+EC), in which ...
We consider streaming layered video (live and stored) over a lossy packet network in order to maximize the video quality that is rendered at the receiver. We propose a framework, called joint scheduling and error concealment (Joint S+EC), in which packet scheduling decisions at the sender explicitly account for the error concealment mechanism at the receiver. We show how the theory of infinite--horizon, average--reward Markov decision processes (MDPs) with average--cost constraints can be applied to the joint scheduling and error concealment problem for low--delay transmission channels. The formulation allows for a wide variety of performance metrics, including metrics that take quality variation into account. We demonstrate the framework and MDP solution procedure using MPEG--4 FGS video traces. The main conclusions are (1) Joint S+EC optimal policies perform better than Disjoint S+EC optimal policies, and (2) the performance of the optimal Disjoint S+EC policy is not significantly better than very simple static policies. expand
|
|
|
Adaptive disk scheduling in a multimedia DBMS |
| |
Ketil Lund,
Vera Goebel
|
|
Pages: 65-74 |
|
doi>10.1145/957013.957024 |
|
Full text: PDF
|
|
In this paper, we present APEX, a disk scheduling framework with QoS support, designed for environments with highly varying disk bandwidth usage. In particular, we focus on a Learning-on-Demand scenario supported by a multimedia database management system, ...
In this paper, we present APEX, a disk scheduling framework with QoS support, designed for environments with highly varying disk bandwidth usage. In particular, we focus on a Learning-on-Demand scenario supported by a multimedia database management system, where students can search for, and play back multimedia-based learning material. APEX is based on a two-level scheduling architecture, where the upper level realizes different service classes using a set of queues, while the lower level distributes available disk bandwidth among these queues.In this paper, we focus on the low-level scheduling in APEX, which is based on an extended token bucket algorithm. The disk requests scheduled for service are assembled into batches, which render possible good efficiency for the disk. Combined with a very efficient work-conservation scheme, this enables APEX to apply bandwidth where it is needed, without efficiency loss. We demonstrate, through simulations, that APEX provides both higher throughput and lower response times than other mixed-media disk schedulers, while still avoiding deadline violations for real-time requests. We also show its robustness with respect to misaligned bandwidth allocation. expand
|
|
|
Comprehensive statistical admission control for streaming media servers |
| |
Roger Zimmermann,
Kun Fu
|
|
Pages: 75-85 |
|
doi>10.1145/957013.957025 |
|
Full text: PDF
|
|
Streaming media servers and digital continuous media recorders require the scheduling of I/O requests to disk drives in real time. There are two accepted paradigms to achieve this: deterministic or statistical. The deterministic approach must assume ...
Streaming media servers and digital continuous media recorders require the scheduling of I/O requests to disk drives in real time. There are two accepted paradigms to achieve this: deterministic or statistical. The deterministic approach must assume larger bounds on such disk parameters as the seek time, the rotational latency and the transfer rate, to guarantee the timely service of I/O requests. The statistical approach generally allows higher utilization of resources, in exchange for a residual probability of missed I/O request deadlines. We propose a novel statistical admission control algorithm called TRAC based on a comprehensive three random variable (3RV) model to support both reading and writing of multiple variable bit rate media streams on current generation disk drives. Its major distinctions from previous work include (1) a very realistic disk model which considers multi-zoning of disks, seek and rotational latency profiles, and unequal reading and writing data rate limits, (2) a dynamic bandwidth sharing mechanism between reading and writing, and (3) support for random placement of data blocks. We evaluate the TRAC algorithm through an extensive numerical analysis and real device measurements. The results show that it achieves a much more realistic resource utilization (up to 38\% higher) as compared with the best, previously proposed algorithm based on a single random variable (1RV) model. Most impressive, in all the experiments the difference between the results generated by TRAC and the actual disk device measurements match closely. expand
|
|
|
DEMONSTRATION SESSION: Demonstration session 1 |
|
|
|
|
ARMS: adaptive rich media secure streaming |
| |
Lisa Amini,
Raymond Rose,
Chitra Venkatramani,
Olivier Verscheure,
Peter Westerink,
Pascal Frossard
|
|
Pages: 86-87 |
|
doi>10.1145/957013.957027 |
|
Full text: PDF
|
|
In this demonstration we present the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The ARMS system dynamically adapts streams to available bandwidth, client capabilities, packet ...
In this demonstration we present the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The ARMS system dynamically adapts streams to available bandwidth, client capabilities, packet loss, and administratively imposed policies - all while maintaining full content security. The ARMS system is completely standards compliant and to our knowledge is the first such end-to-end MPEG-4-based system. expand
|
|
|
Active capture: automatic direction for automatic movies |
| |
Marc Davis,
Jeffrey Heer,
Ana Ramirez
|
|
Pages: 88-89 |
|
doi>10.1145/957013.957028 |
|
Full text: PDF
|
|
The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating ...
The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating video of non-actors. Active Capture leverages media production knowledge, computer vision and audition, and user interaction design to automate direction and cinematography and thus enables the automatic production of annotated, high quality, reusable media assets. The implemented system automates the process of capturing a non-actor performing two simple reusable actions ("screaming" and "turning her head to look at the camera") and automatically integrates those shots into various commercials and movie trailers. expand
|
|
|
Panoptes: scalable low-power video sensor networking technologies |
| |
Wu-chi Feng,
Brian Code,
Ed Kaiser,
Mike Shea,
Wu-chang Feng
|
|
Pages: 90-91 |
|
doi>10.1145/957013.957029 |
|
Full text: PDF
|
|
This demonstration will show the video sensor networking technologies developed at the OGI School of Science and Engineering. The general purpose video sensors allow programmers to create application-specific filtering, power management, and event triggering ...
This demonstration will show the video sensor networking technologies developed at the OGI School of Science and Engineering. The general purpose video sensors allow programmers to create application-specific filtering, power management, and event triggering mechanisms. The demo will show a handful of video sensors operating under a variety of conditions including intermittent network connectivity as one might see in an environmental observation application. expand
|
|
|
Hyper-hitchcock: authoring interactive videos and generating interactive summaries |
| |
Andreas Girgensohn,
Frank Shipman,
Lynn Wilcox
|
|
Pages: 92-93 |
|
doi>10.1145/957013.957030 |
|
Full text: PDF
|
|
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide ...
To simplify the process of editing interactive video, we developed the concept of "detail-on-demand" video as a subset of general hypervideo. Detail-on-demand video keeps the authoring and viewing interfaces relatively simple while supporting a wide range of interactive video applications. Our editor, Hyper-Hitchcock, provides a direct manipulation environment in which authors can combine video clips and place hyperlinks between them. To summarize a video, Hyper-Hitchcock can also automatically generate a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary. expand
|
|
|
The video paper multimedia playback system |
| |
Jamey Graham,
Berna Erol,
Jonathan J. Hull,
Dar-Shyang Lee
|
|
Pages: 94-95 |
|
doi>10.1145/957013.957031 |
|
Full text: PDF
|
|
Video Paper is a prototype system for multimedia browsing, analysis, and replay. Key frames extracted from a video recording are printed on paper together with bar codes that allow for random access and replay. A transcript for the audio track can also ...
Video Paper is a prototype system for multimedia browsing, analysis, and replay. Key frames extracted from a video recording are printed on paper together with bar codes that allow for random access and replay. A transcript for the audio track can also be shown so that users can read what was said, thus making the document a stand-alone representation for the contents of the multimedia recording. The Video Paper system has been used for several applications, including the analysis of recorded meetings, broadcast news, oral histories and personal recordings. This demonstration will show how the Video Paper system was applied to these domains and the various replay systems that were developed, including a self-contained portable implementation on a PDA and a fixed implementation on desktop PC. expand
|
|
|
Mobile video stream monitoring system |
| |
Kam-Yiu Lam,
Calvin K. H. Chiu
|
|
Pages: 96-97 |
|
doi>10.1145/957013.957032 |
|
Full text: PDF
|
|
IMVS (Intelligent Mobile Video Stream Monitoring System) is a mobile video surveillance system. The objective of IMVS is to design a high performance video stream monitoring system in a mobile computing environment. In particular, ...
IMVS (Intelligent Mobile Video Stream Monitoring System) is a mobile video surveillance system. The objective of IMVS is to design a high performance video stream monitoring system in a mobile computing environment. In particular, the technical questions to be addressed are: (1) how to minimize the amount of video signals to be transmitted between the front-end mobile device and the backend server over the mobile network; and (2) how to divide the jobs to be performed between the front-end and backend processes so that the workload at the front-end mobile device can be maintained within its processing capacity. expand
|
|
|
MPEG-7 video automatic labeling system |
| |
Ching-Yung Lin,
Belle L. Tseng,
Milind Naphade,
Apostol Natsev,
John R. Smith
|
|
Pages: 98-99 |
|
doi>10.1145/957013.957033 |
|
Full text: PDF
|
|
In this demo, we show a novel end-to-end video automatic labeling system, which accepts MPEG-1 sequence inputs and generates MPEG-7 XML metadata files. Detections are based on the prior established anchor models. This system has two parts: model training ...
In this demo, we show a novel end-to-end video automatic labeling system, which accepts MPEG-1 sequence inputs and generates MPEG-7 XML metadata files. Detections are based on the prior established anchor models. This system has two parts: model training process and labeling process. They are comprised of seven modules: Shot Segmentation, Region Segmentation, Annotation, Feature Extraction, Model Learning, Classification, and XML Rendering. expand
|
|
|
DOVE: drawing over video environment |
| |
Jiazhi Ou,
Xilin Chen,
Susan R. Fussell,
Jie Yang
|
|
Pages: 100-101 |
|
doi>10.1145/957013.957034 |
|
Full text: PDF
|
|
We demonstrate a multimedia system that integrates pen-based gesture and live video to support collaboration on physical tasks. The system combines network IP cameras, desktop PCs, and tablet PCs (or PDAs) to allow a remote helper to draw on a video ...
We demonstrate a multimedia system that integrates pen-based gesture and live video to support collaboration on physical tasks. The system combines network IP cameras, desktop PCs, and tablet PCs (or PDAs) to allow a remote helper to draw on a video feed of a workspace as he/she provides task instructions. A gesture recognition component enables the system both to normalize freehand drawings to facilitate communication with remote partners and to use pen-based input as a camera control device. The system also embeds some tools, such as controlled video delay, gesture delay, and remote camera pan-tilt-zoom control. The system provides a software environment for studying multimodal/multimedia communication for remote collaborative physical tasks. expand
|
|
|
An automatic image inpaint tool |
| |
Timothy K. Shih,
Liang-Chen Lu,
Rong-Chi Chang
|
|
Pages: 102-103 |
|
doi>10.1145/957013.957035 |
|
Full text: PDF
|
|
Automatic digital inpainting is a challenge but interesting research area. This demonstration presents a tool, which uses a color interpolation mechanism to restore damaged images. The mechanism checks the variation of pixel blocks and restores pixels ...
Automatic digital inpainting is a challenge but interesting research area. This demonstration presents a tool, which uses a color interpolation mechanism to restore damaged images. The mechanism checks the variation of pixel blocks and restores pixels using different strategies. We test more than 1000 images, including photos, paintings, and cartoon drawings. The proposed system is also available at: http://www.mine.tku.edu.tw/demos/inpaint. expand
|
|
|
The co-opticon: shared access to a robotic streaming video camera |
| |
Dezhen Song,
Ken Goldberg
|
|
Pages: 104-105 |
|
doi>10.1145/957013.957036 |
|
Full text: PDF
|
|
The "co-opticon" is a robotic pan, tilt, and zoom streaming video camera controlled by simultaneous frame requests from remote users. Robotic webcameras are commercially available but currently restrict control to only one user at a time. The co-opticon ...
The "co-opticon" is a robotic pan, tilt, and zoom streaming video camera controlled by simultaneous frame requests from remote users. Robotic webcameras are commercially available but currently restrict control to only one user at a time. The co-opticon introduces a new interface that allows simultaneous control by many users. We will demonstrate the implemented system using a Java-based interface at the conference linked via the Internet to a camera on the UC Berkeley campus. We will also discuss system architecture and several new algorithms we've developed to compute optimal camera paramters based on user frame requests. The co-opticon can be tested online at: www.tele-actor.net/co-opticon. expand
|
|
|
MobiPicture: browsing pictures on mobile devices |
| |
Ming-Yu Wang,
Xing Xie,
Wei-Ying Ma,
Hong-Jiang Zhang
|
|
Pages: 106-107 |
|
doi>10.1145/957013.957037 |
|
Full text: PDF
|
|
Pictures have become increasingly common and popular in mobile communication. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of pictures on the small screen. MobiPicture is a prototype ...
Pictures have become increasingly common and popular in mobile communication. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of pictures on the small screen. MobiPicture is a prototype system which includes a set of novel features to aid or automate a set of common image browsing tasks such as the thumbnail view, set-as-background, zooming and scrolling. expand
|
|
|
Route panoramas for city navigation |
| |
Jiang Yu Zheng,
Min Shi,
Makoto Kato
|
|
Pages: 108-109 |
|
doi>10.1145/957013.957038 |
|
Full text: PDF
|
|
This paper presents a new medium called route panorama (RP) for visualizing a large-scale environment such as a town or a city. An RP is captured by a slit camera mounted on a vehicle. It is a continuous, compact and complete visual representation of ...
This paper presents a new medium called route panorama (RP) for visualizing a large-scale environment such as a town or a city. An RP is captured by a slit camera mounted on a vehicle. It is a continuous, compact and complete visual representation of scenes along a route. It can be transmitted on the Internet in real time as a streaming media and displayed in various styles for virtual city traversing. The application of RPs includes virtual tour, navigation, heritage archiving, urban planning, city indexing, etc. expand
|
|
|
SESSION: Music |
|
|
|
|
Personalization of user profiles for content-based music retrieval based on relevance feedback |
| |
Keiichiro Hoashi,
Kazunori Matsumoto,
Naomi Inoue
|
|
Pages: 110-119 |
|
doi>10.1145/957013.957040 |
|
Full text: PDF
|
|
Numerous efforts on content-based music information retrieval have been presented in recent years. However, the object of such existing research is to retrieve a specific song from a large music database. In this research, we propose a music retrieval ...
Numerous efforts on content-based music information retrieval have been presented in recent years. However, the object of such existing research is to retrieve a specific song from a large music database. In this research, we propose a music retrieval method which retrieves songs based on the user's musical preferences. This enables users to discover new songs which they are expected to like. Since music preferences are expected to be highly ambiguous, we propose the implementation of relevance feedback methods to improve the performance of our music information retrieval method. In order to reduce the burden of users to input learning data to the system, we also propose a method to generate user profiles based on genre preferences, and refinement of such profiles based on relevance feedback. Evaluation experiments are conducted based on a corpus of music data with user ratings. Results of these experiments prove the effectiveness of our method. expand
|
|
|
Polyphonic music modeling with random fields |
| |
Victor Lavrenko,
Jeremy Pickens
|
|
Pages: 120-129 |
|
doi>10.1145/957013.957041 |
|
Full text: PDF
|
|
Recent interest in the area of music information retrieval and related technologies is exploding. However, very few of the existing techniques take advantage of recent developments in statistical modeling. In this paper we discuss an application of Random ...
Recent interest in the area of music information retrieval and related technologies is exploding. However, very few of the existing techniques take advantage of recent developments in statistical modeling. In this paper we discuss an application of Random Fields to the problem of creating accurate yet flexible statistical models of polyphonic music. With such models in hand, the challenges of developing effective searching, browsing and organization techniques for the growing bodies of music collections may be successfully met. We offer an evaluation of these models in terms of perplexity and prediction accuracy, and show that random fields not only outperform Markov chains, but are much more robust in terms of overfitting. expand
|
|
|
Approximate matching algorithms for music information retrieval using vocal input |
| |
Richard L. Kline,
Ephraim P. Glinert
|
|
Pages: 130-139 |
|
doi>10.1145/957013.957042 |
|
Full text: PDF
|
|
Effective use of multimedia collections requires efficient and intuitive methods of searching and browsing. This work considers databases which store music and explores how these may best be searched by providing input queries in some musical form. For ...
Effective use of multimedia collections requires efficient and intuitive methods of searching and browsing. This work considers databases which store music and explores how these may best be searched by providing input queries in some musical form. For the average person, humming several notes of the desired melody is the most straightforward method for providing this input, but such input is very likely to contain several errors. Previously proposed implementations of so-called query-by-humming systems are effective only when the number of input errors is small. We conducted experiments which revealed that the expected error rate for user queries is much higher than existing algorithms can tolerate. We then developed algorithms based on approximate matching techniques which deliver much improved results when comparing error-filled vocal user queries against a music collection. expand
|
|
|
Automated extraction of music snippets |
| |
Lie Lu,
Hong-Jiang Zhang
|
|
Pages: 140-147 |
|
doi>10.1145/957013.957043 |
|
Full text: PDF
|
|
Similar to image and video thumbnail, music snippet is defined as the most representative or highlight excerpt of a music clip, and can be used efficiently for fast browsing large number of music files. Music snippet is usually a part of the repeated ...
Similar to image and video thumbnail, music snippet is defined as the most representative or highlight excerpt of a music clip, and can be used efficiently for fast browsing large number of music files. Music snippet is usually a part of the repeated melody, main theme or chorus. In this paper, we present an approach to extracting music snippet automatically. In our approach, the most salient segment of the music is firstly detected based on its occurrence frequency and energy information. Meanwhile, the boundaries of musical phrases are also detected based on the estimated phrase length and phrase boundary confidence of each frame. These boundaries are used to ensure that an extracted snippet does not break musical phrases. Finally, the musical phrases including the most salient segment are extracted as music snippet. User study indicates that the proposed algorithm works very well on our music database. expand
|
|
|
SESSION: Managing images |
|
|
|
|
Automatic browsing of large pictures on mobile devices |
| |
Hao Liu,
Xing Xie,
Wei-Ying Ma,
Hong-Jiang Zhang
|
|
Pages: 148-155 |
|
doi>10.1145/957013.957045 |
|
Full text: PDF
|
|
Pictures have become increasingly common and popular in mobile communications. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of large pictures on the small screen. In this paper, ...
Pictures have become increasingly common and popular in mobile communications. However, due to the limitation of mobile devices, there is a need to develop new technologies to facilitate the browsing of large pictures on the small screen. In this paper, we propose a novel approach which is able to automate the scrolling and navigation of a large picture with a minimal amount of user interaction on mobile devices. An image attention model is employed to illustrate the information structure within an image. An optimal image browsing path is then calculated based on the image attention model to simulate the human browsing behaviors. Experimental evaluations of the proposed mechanism indicate that our approach is an effective way for viewing large images on small displays. expand
|
|
|
Geographic location tags on digital images |
| |
Kentaro Toyama,
Ron Logan,
Asta Roseway
|
|
Pages: 156-166 |
|
doi>10.1145/957013.957046 |
|
Full text: PDF
|
|
We describe an end-to-end system that capitalizes on geographic location tags for digital photographs. The World Wide Media eXchange (WWMX) database indexes large collections of image media by several pieces of metadata including timestamp, owner, and ...
We describe an end-to-end system that capitalizes on geographic location tags for digital photographs. The World Wide Media eXchange (WWMX) database indexes large collections of image media by several pieces of metadata including timestamp, owner, and critically, location stamp. The location where a photo was shot is important because it says much about its semantic content, while being relatively easy to acquire, index, and search.The process of building, browsing, and writing applications for such a database raises issues that have heretofore been un- addressed in either the multimedia or the GIS community. This paper brings all of these issues together, explores different options, and offers novel solutions where necessary. Topics include acquisition of location tags for image media, data structures for location tags on photos, database optimization for location-tagged image media, and an intuitive UI for browsing a massive location-tagged image database. We end by describing an application built on top of the WWMX, a lightweight travelogue-authoring tool that automatically creates appropriate context maps for a slideshow of location-tagged photographs. expand
|
|
|
Generic image classification using visual knowledge on the web |
| |
Keiji Yanai
|
|
Pages: 167-176 |
|
doi>10.1145/957013.957047 |
|
Full text: PDF
|
|
In this paper, we describe a generic image classification system with an automatic knowledge acquisition mechanism from the World-Wide Web. Due to the recent spread of digital imaging devices, the demand for image recognition of various kinds of real ...
In this paper, we describe a generic image classification system with an automatic knowledge acquisition mechanism from the World-Wide Web. Due to the recent spread of digital imaging devices, the demand for image recognition of various kinds of real world scenes becomes greater. For realizing it, visual knowledge on various kinds of scenes is required. Then, we propose gathering visual knowledge on real world scenes for generic image classification from the World-Wide Web. Our system gathers a large number of images from the Web automatically and makes use of them as training images for generic image classification. It consists of three modules, which are an image-gathering module, an image-learning module and an image classification module. The image-gathering module gathers images related to given class keywords from the Web automatically. The learning module extracts image features from gathered images and associates them with each class. The image classification module classifies an unknown image into one of the classes corresponding to the class keywords by using the association between image features and classes. In the experiments, we achieved a classification rate 44.6% for generic images by using images gathered from the World-Wide Web automatically as training images. expand
|
|
|
SESSION: Student best paper contest |
|
|
|
|
Proscenium: a framework for spatio-temporal video editing |
| |
Eric P. Bennett,
Leonard McMillan
|
|
Pages: 177-184 |
|
doi>10.1145/957013.957049 |
|
Full text: PDF
|
|
We present an approach to video editing where movie sequences are treated as spatio-temporal volumes that can be sheered and warped under user control. This simple capability enables new video editing operations that support complex postproduction modifications, ...
We present an approach to video editing where movie sequences are treated as spatio-temporal volumes that can be sheered and warped under user control. This simple capability enables new video editing operations that support complex postproduction modifications, such as object removal and/or changes in camera motion. Our methods do not rely on complicated and error-prone image analysis or computer vision methods. Moreover, they facilitate an editing approach to video that is similar to standard image-editing tasks. Central to our system is a movie representation framework called Proscenium that supports efficient queries and operations on spatio-temporal volumes while maintaining the original source content. We have adopted a graph-based lazy-evaluation model in order to support interactive visualizations, complex data modifications, and efficient processing of large spatio-temporal volumes. expand
|
|
|
Real-time compression for dynamic 3D environments |
| |
Sang-Uok Kum,
Ketan Mayer-Patel,
Henry Fuchs
|
|
Pages: 185-194 |
|
doi>10.1145/957013.957050 |
|
Full text: PDF
|
|
The goal of tele-immersion has long been to enable people at remote locations to share a sense of presence. A tele-immersion system acquires the 3D representation of a collaborator's environment remotely and sends it over the network where it is rendered ...
The goal of tele-immersion has long been to enable people at remote locations to share a sense of presence. A tele-immersion system acquires the 3D representation of a collaborator's environment remotely and sends it over the network where it is rendered in the user's environment. Acquisition, reconstruction, transmission, and rendering all have to be done in real-time to create a sense of presence. With added commodity hardware resources, parallelism can increase the acquisition volume and reconstruction data quality while maintaining real-time performance. However this is not as easy for rendering since all of the data need to be combined into a single display.In this paper we present an algorithm to compress data from such 3D environments in real-time to solve this imbalance. We expect the compression algorithm to scale comparably to the acquisition and reconstruction, reduce network transmission bandwidth, and reduce the rendering requirement for real-time performance. We have tested the algorithm using a synthetic office data set and have achieved a 5 to 1 compression for 22 depth streams. expand
|
|
|
Confidence-based dynamic ensemble for image annotation and semantics discovery |
| |
Beitao Li,
Kingshy Goh
|
|
Pages: 195-206 |
|
doi>10.1145/957013.957051 |
|
Full text: PDF
|
|
Providing accurate and scalable solutions to map low-level perceptual features to high-level semantics is critical for multimedia information organization and retrieval. In this paper, we propose a confidence-based dynamic ensemble (CDE) to overcome ...
Providing accurate and scalable solutions to map low-level perceptual features to high-level semantics is critical for multimedia information organization and retrieval. In this paper, we propose a confidence-based dynamic ensemble (CDE) to overcome the shortcomings of the traditional static classifiers. In contrast to the traditional models, CDE can make dynamic adjustments to accommodate new semantics, to assist the discovery of useful low-level features, and to improve class-prediction accuracy. We depict two key components of CDE: a multi-level function that asserts class-prediction confidence, and the dynamic ensemble method based upon the confidence function. Through theoretical analysis and empirical study, we demonstrate that CDE is effective in annotating large-scale, real-world image datasets. expand
|
|
|
SESSION: Reception and posters |
|
|
|
|
Weaving stories in digital media: when Spielberg makes home movies |
| |
Brett Adams,
Svetha Venkatesh
|
|
Pages: 207-210 |
|
doi>10.1145/957013.957053 |
|
Full text: PDF
|
|
In this paper we describe research aimed at enabling amateur video makers to improve both the technical quality and communicative capacity of their work. Motivated by the recognition that untold hours of home video are simply abandoned after capture, ...
In this paper we describe research aimed at enabling amateur video makers to improve both the technical quality and communicative capacity of their work. Motivated by the recognition that untold hours of home video are simply abandoned after capture, we have formulated the problem as one of defining the what and how of footage capture.We have implemented a framework that answers the first problem, the what, by means of the age-old communicative powers of Story; the second problem, the how, is addressed by means of well documented aesthetic principles that constitute the film profession, which impact both technical and cinematic considerations for a given project.We provide a brief overview of the process, beginning with the narrative template, embodying a chosen story, through the principal phases of generating a storyboard, directing, and editing, resulting in the finished product. We demonstrate the interplay of narrative, purpose for the production, and aesthetic agents, and their influence on the automatically generated storyboard with examples. expand
|
|
|
An audio stream classification and optimal segmentation for multimedia applications |
| |
Konstantin Biatov,
Joachim Koehler
|
|
Pages: 211-214 |
|
doi>10.1145/957013.957054 |
|
Full text: PDF
|
|
In this paper we investigate on-line zero-crossing based audio stream segmentation and classification into speech and other segments. We consider such segments as applause, noise of the auditorium, and silence. We demonstrate that the features extracted ...
In this paper we investigate on-line zero-crossing based audio stream segmentation and classification into speech and other segments. We consider such segments as applause, noise of the auditorium, and silence. We demonstrate that the features extracted from zero-crossing are stable and valid to be used for speech and other signal discrimination and classification and don't require large amount of data for the training. We describe the optimal segmentation of unlimited audio signals using results of the frames classification. We demonstrate that using optimal segmentation is better than using traditional sliding window technique. expand
|
|
|
Interleaving media data for MPEG-4 presentations |
| |
Jeff Boston,
Michelle Kim,
William Luken,
Edward So,
Steve Wood
|
|
Pages: 215-218 |
|
doi>10.1145/957013.957055 |
|
Full text: PDF
|
|
A composite multimedia presentation may be represented by a sequence of virtual media data packets. An algorithm is presented for ordering these virtual media data packets so as to minimize the initial delay required to transfer the composite stream ...
A composite multimedia presentation may be represented by a sequence of virtual media data packets. An algorithm is presented for ordering these virtual media data packets so as to minimize the initial delay required to transfer the composite stream from a media server to a client. This algorithm has been implemented as part of the IBM Toolkit for MPEG-4. expand
|
|
|
Using structure patterns of temporal and spectral feature in audio similarity measure |
| |
Rui Cai,
Lie Lu,
Hong-Jiang Zhang
|
|
Pages: 219-222 |
|
doi>10.1145/957013.957056 |
|
Full text: PDF
|
|
Although statistical characteristics of audio features are widely used for similarity measure in most of current audio analysis systems and have been proved to be effective, they only utilized the averaged feature variations over time, and thus lead ...
Although statistical characteristics of audio features are widely used for similarity measure in most of current audio analysis systems and have been proved to be effective, they only utilized the averaged feature variations over time, and thus lead to inaccuracy in some cases. In this paper, structure pattern, which describes the representative structure characteristics of both temporal and spectral features, is proposed to improve the similarity measure for audio effects. Three kind structure patterns are proposed and utilized in current work, including energy contour pattern, harmonicity pattern and pitch contour pattern. Evaluations on a content-based audio retrieval system indicate that structure patterns can improve the performance pretty much. expand
|
|
|
Music thumbnailing via structural analysis |
| |
Wei Chai,
Barry Vercoe
|
|
Pages: 223-226 |
|
doi>10.1145/957013.957057 |
|
Full text: PDF
|
|
Music thumbnailing (or music summarization) aims at finding the most representative part of a song, which can be used for web browsing, web searching and music recommendation. Three strategies are proposed in this paper for automatically generating the ...
Music thumbnailing (or music summarization) aims at finding the most representative part of a song, which can be used for web browsing, web searching and music recommendation. Three strategies are proposed in this paper for automatically generating the thumbnails of music. All the strategies are based on the results of music structural analysis, which identifies the recurrent structure of musical signals. Instead of being evaluated subjectively, the generated thumbnails are evaluated by several criteria, mainly based on previous human experiments on music thumbnailing and the properties of thumbnails used for commercial web sites. Additionally, the performance of the structural analysis is demonstrated visually using figures for qualitative evaluation, and by three novel structural similarity metrics for quantitative evaluation. The preliminary results obtained using a corpus of Beatles' songs demonstrate the promise of our method and suggest that different thumbnailing strategies might be proper for different applications. expand
|
|
|
A geographic redirection service for on-line games |
| |
Chris Chambers,
Wu-chi Feng,
Wu-chang Feng,
Debanjan Saha
|
|
Pages: 227-230 |
|
doi>10.1145/957013.957058 |
|
Full text: PDF
|
|
For many on-line games, user experience is impacted significantly by network latency. As on-line games and on-line game servers proliferate, the ability to discover and connect to nearby servers is essential for maintaining user satisfaction. In this ...
For many on-line games, user experience is impacted significantly by network latency. As on-line games and on-line game servers proliferate, the ability to discover and connect to nearby servers is essential for maintaining user satisfaction. In this paper, we present a redirection service for on-line games based on the geographic location of players relative to servers. As our results show, the service better meets client demand, saving each client and the Internet as a whole, thousands of miles of networking inefficiency. expand
|
|
|
A new scanning method for H.264 based fine granular scalable video coding |
| |
Won-Sik Cheong,
Kyuheon Kim,
Gwang Hoon Park
|
|
Pages: 231-234 |
|
doi>10.1145/957013.957059 |
|
Full text: PDF
|
|
In this paper, we introduce a new scanning method for H.264 based Fine Granular Scalable video coding, which can significantly improve the subjective picture quality of a decoded scalable video. Since the network condition is fluctuated, it is often ...
In this paper, we introduce a new scanning method for H.264 based Fine Granular Scalable video coding, which can significantly improve the subjective picture quality of a decoded scalable video. Since the network condition is fluctuated, it is often happened that the important part of the streaming data, especially video sequences, cannot be transmitted, and thus a viewer watches less interesting parts of the sequences or poorer quality of pictures in important regions. Therefore, this paper presents a new scanning method, called water ring scan method, for improving the subjective picture quality of the decoded scalable video by encoding and transmitting the visually important region most-preferentially as a watering is expanded in the lake from the location where gravel falls. The water ring scan method presented in this paper is to encode and decode video sequences at the location designated by a user. From the simulation results, it has been found that the proposed scan method can achieve significantly improved picture quality, especially on the region of interest as being compared with the traditional FGS scheme. expand
|
|
|
Capacity planning tool for streaming media services |
| |
Ludmila Cherkasova,
Wenting Tang
|
|
Pages: 235-238 |
|
doi>10.1145/957013.957060 |
|
Full text: PDF
|
|
The goal of the proposed capacity planning tool is to provide the best cost/performance configuration for support of a known media service workload. There are two essential components in our capacity planning tool: i) the capacity measurements ...
The goal of the proposed capacity planning tool is to provide the best cost/performance configuration for support of a known media service workload. There are two essential components in our capacity planning tool: i) the capacity measurements of different h/w and s/w solutions using a specially designed set of media benchmarks and ii) a media service workload profiler, called MediaProf, which extracts a set of quantitative and qualitative parameters characterizing the service demand. The capacity planning tool matches the requirements of the media service workload profile, SLAs and configuration constraints to produce the best available cost/performance solution. expand
|
|
|
A semantic model for flash retrieval using co-occurrence analysis |
| |
Dawei Ding,
Qing Li,
Bo Feng,
Liu Wenyin
|
|
Pages: 239-242 |
|
doi>10.1145/957013.957061 |
|
Full text: PDF
|
|
Flash is experiencing a breathtaking growth and has become one of the prevailing media formats on the Web. Our goal is to exploit the enormous Flash resources by developing a model of content-based Flash retrieval. Towards this end, we introduce a novel ...
Flash is experiencing a breathtaking growth and has become one of the prevailing media formats on the Web. Our goal is to exploit the enormous Flash resources by developing a model of content-based Flash retrieval. Towards this end, we introduce a novel approach for discovering semantic relationships among the co-occurrence patterns of elements in Flash movies. The proposed approach includes a three-layered structure to index the Flash movie, a query expansion procedure to improve the recall performance, and a relevance ranking procedure utilizing link analysis to improve the precision performance of Flash retrieval. Experiments show the potential of leveraging co-occurrence analysis of elements in the context of scenes for improving the performance of Flash retrieval. expand
|
|
|
Nonparametric color characterization using mean shift |
| |
Ling-Yu Duan,
Min Xu,
Qi Tian,
Chang-sheng Xu
|
|
Pages: 243-246 |
|
doi>10.1145/957013.957062 |
|
Full text: PDF
|
|
Color is very useful in locating and recognizing objects that occur in artificial environments. The color histogram has shown its efficiency and advantages as a general tool for various applications, such as content-based image retrieval and video browsing, ...
Color is very useful in locating and recognizing objects that occur in artificial environments. The color histogram has shown its efficiency and advantages as a general tool for various applications, such as content-based image retrieval and video browsing, object indexing and location, and video segmentation. However, due to the lack of any spatial and context information, the histogram is not robust and effective for color characterization (e.g. dominant color) in large video databases. In this paper, we propose a nonparametric color characterization model using mean shift procedure, with an emphasis on spatio-temporal consistency. Experimental results suggest that the color characterization model is much more effective for video indexing and browsing, particularly in the domain of structured video (e.g. sports video). expand
|
|
|
Looking into video frames on small displays |
| |
Xin Fan,
Xing Xie,
He-Qin Zhou,
Wei-Ying Ma
|
|
Pages: 247-250 |
|
doi>10.1145/957013.957063 |
|
Full text: PDF
|
|
With the growing popularity of personal digital assistants and smart phones, people have become enthusiastic to watch videos through these mobile devices. However, a crucial challenge is to provide a better user experience for browsing videos on the ...
With the growing popularity of personal digital assistants and smart phones, people have become enthusiastic to watch videos through these mobile devices. However, a crucial challenge is to provide a better user experience for browsing videos on the limited and heterogeneous screen sizes. In this paper, we present a novel approach which allows users to overcome the display constraints by zooming into video frames while browsing. An automatic approach for detecting the focus regions is introduced to minimize the amount of user interaction. In order to improve the quality of output stream, virtual camera control is employed in the system. Preliminary evaluation shows that this approach is an effective way for video browsing on small displays. expand
|
|
|
Observation based vs. model based admission control for interactive multimedia sessions |
| |
Silvia Hollfelder,
Peter Fankhauser,
Erich J. Neuhold
|
|
Pages: 251-254 |
|
doi>10.1145/957013.957064 |
|
Full text: PDF
|
|
Interactive multimedia sessions cause high variations in workload due to presenting different media streams at different points in time. As users interact, the workload variations can not be predicted precisely. But admission control mechanisms need ...
Interactive multimedia sessions cause high variations in workload due to presenting different media streams at different points in time. As users interact, the workload variations can not be predicted precisely. But admission control mechanisms need to at least estimate the workload in order to provide Quality of Service. In this paper, we investigate two approaches to estimate workloads and on this basis introduce admission control mechanisms that give stochastic QoS guarantees for such sessions. The observation based approach uses bookkept workload data for the prediction. The model based approach employs knowledge about usage patterns and media streams. For both approaches we introduce a uniform stochastic admission control criterion. Furthermore, we illustrate the relative benefits of these approaches for various session scenarios by means of simulations. expand
|
|
|
Discriminative model fusion for semantic concept detection and annotation in video |
| |
G. Iyengar,
H. J. Nock
|
|
Pages: 255-258 |
|
doi>10.1145/957013.957065 |
|
Full text: PDF
|
|
In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept ...
In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. expand
|
|
|
Affective content detection using HMMs |
| |
Hang-Bong Kang
|
|
Pages: 259-262 |
|
doi>10.1145/957013.957066 |
|
Full text: PDF
|
|
This paper discusses a new technique for detecting affective events using Hidden Markov Models(HMM). To map low level features of video data to high level emotional events, we perform empirical study on the relationship between emotional events and low-level ...
This paper discusses a new technique for detecting affective events using Hidden Markov Models(HMM). To map low level features of video data to high level emotional events, we perform empirical study on the relationship between emotional events and low-level features. After that, we compute simple low-level features that represent emotional characteristics and construct a token or observation vector by combining low level features. The observation vector sequence is tested to detect emotional events through HMMs. We create two HMM topologies and test both topologies. The affective events are detected from our proposed models with good accuracy. expand
|
|
|
Universal synchronization scheme for distributed audio-video capture on heterogeneous computing platforms |
| |
Rainer Lienhart,
Igor Kozintsev,
Stefan Wehr
|
|
Pages: 263-266 |
|
doi>10.1145/957013.957067 |
|
Full text: PDF
|
|
We propose a universal synchronization scheme for distributed audio-video capture on heterogeneous computing devices such as laptops, tablets, PDAs, cellular phones, audio recorders, and camcorders. These devices typically possess sensors such as microphones ...
We propose a universal synchronization scheme for distributed audio-video capture on heterogeneous computing devices such as laptops, tablets, PDAs, cellular phones, audio recorders, and camcorders. These devices typically possess sensors such as microphones and possibly cameras. In order to combine them wirelessly into a distributed sensing and computing system, it is necessary to provide relative time synchronization among the distributed sensors. In this work we propose a setup and an algorithm that provide synchronization between sampling times for a network of distributed multi-channel audio sensors connected to general purpose computing (GPC) platforms. Extensive experimental results on distributed acoustic Blind Source Separation (BSS) algorithms validate the performance of our synchronization scheme. expand
|
|
|
An automatic singing voice rectifier design |
| |
Cheng-Yuan Lin,
J.-S. Roger Jang,
Mao-Yuan Hsu
|
|
Pages: 267-270 |
|
doi>10.1145/957013.957068 |
|
Full text: PDF
|
|
This paper proposes a new approach to automatic singing voice rectification. There are two components in the rectifier; one is the recognizer based on dynamic time warping and the other is the synthesizer based PSOLA (Pitch Synchronous Overlap and Add) ...
This paper proposes a new approach to automatic singing voice rectification. There are two components in the rectifier; one is the recognizer based on dynamic time warping and the other is the synthesizer based PSOLA (Pitch Synchronous Overlap and Add) for pitch shifting. The purpose of the recognizer is to identify the locations of off-key parts of the user's acoustic input. Then with the target music score, the synthesizer tries to correct the off-key parts by appropriate pitch shifting to match the give music score. We also attempt some singing and listening experiments for evaluating the feasibility of the rectifier and the results exhibit the satisfactory performance. expand
|
|
|
Location-aware data broadcasting: an application for digital mobile broadcasting in Japan |
| |
Kinji Matsumura,
Kazuya Usui,
Kenjiro Kai,
Koichi Ishikawa
|
|
Pages: 271-274 |
|
doi>10.1145/957013.957069 |
|
Full text: PDF
|
|
Terrestrial digital broadcasting that uses the ISDB-T (Integrated Services Digital Broadcasting-Terrestrial) system is scheduled for launch in Japan in December 2003. This system also enables mobile broadcasting service, which will be offered a few years ...
Terrestrial digital broadcasting that uses the ISDB-T (Integrated Services Digital Broadcasting-Terrestrial) system is scheduled for launch in Japan in December 2003. This system also enables mobile broadcasting service, which will be offered a few years later. We are developing a Location-Aware Data Broadcasting Service as a remarkably new type of interactive mobile broadcasting service. In this paper, we describe the service application, information filtering method, and presentation techniques for this location-aware data service. We also discuss a further experiment that uses scalable vector graphics. expand
|
|
|
On image auto-annotation with latent space models |
| |
Florent Monay,
Daniel Gatica-Perez
|
|
Pages: 275-278 |
|
doi>10.1145/957013.957070 |
|
Full text: PDF
|
|
Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance ...
Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text analysis, namely Latent Semantic Analysis (LSA) and Probabilistic LSA (PLSA). Annotation strategies for each model are discussed. Remarkably, we found that, on a 8000-image dataset, a classic LSA model defined on keywords and a very basic image representation performed as well as much more complex, state-of-the-art methods. Furthermore, non-probabilistic methods (LSA and direct image matching) outperformed PLSA on the same dataset. expand
|
|
|
Colour picking: the pecking prder of form and function |
| |
Frank Nack,
Amit Manniesing,
Lynda Hardman
|
|
Pages: 279-282 |
|
doi>10.1145/957013.957071 |
|
Full text: PDF
|
|
Multimedia presentation generation has to be able to balance the functional aspects of a presentation that address the information needs of the user and its aesthetic form. We demonstrate our approach using automatic colour design for which we integrate ...
Multimedia presentation generation has to be able to balance the functional aspects of a presentation that address the information needs of the user and its aesthetic form. We demonstrate our approach using automatic colour design for which we integrate relevant aspects of colour theory. We do not provide a definition of the relative importance of form versus function, but seek to explore the roles of subjective elements in the generation process. expand
|
|
|
A robust dissolve detector by support vector machine |
| |
Chong-Wah Ngo
|
|
Pages: 283-286 |
|
doi>10.1145/957013.957072 |
|
Full text: PDF
|
|
In this paper, we propose a novel approach for the robust detection and classification of dissolve sequences in videos. Our approach is based on the multi-resolution representation of temporal slices extracted from 3D image volume. At the low-resolution ...
In this paper, we propose a novel approach for the robust detection and classification of dissolve sequences in videos. Our approach is based on the multi-resolution representation of temporal slices extracted from 3D image volume. At the low-resolution (LR) scale, the problem of dissolve detection is reduced as cut transition detection. At the high-resolution (HR) space, Gabor wavelet features are computed for regions that surround the cuts located at LR scale. The computed features are then input to support vector machines for pattern classification. Encouraging results have been obtained through experiments. expand
|
|
|
Hierarchical topical segmentation in instructional films based on cinematic expressive functions |
| |
Dinh Q. Phung,
Svetha Venkatesh,
Chitra Dorai
|
|
Pages: 287-290 |
|
doi>10.1145/957013.957073 |
|
Full text: PDF
|
|
In this paper, we propose a novel solution for segmenting an instructional video into hierarchical topical sections. Incorporating the knowledge of education-oriented film theory with our previous study of expressive functions namely the content density ...
In this paper, we propose a novel solution for segmenting an instructional video into hierarchical topical sections. Incorporating the knowledge of education-oriented film theory with our previous study of expressive functions namely the content density and the thematic functions, we develop an algorithm to effectively structuralize an instructional video into a two-tiered hierarchy of topical sections at the main and sub-topic levels. Our experimental results on a set of ten industrial instructional videos demonstrate the validity of the detection scheme. expand
|
|
|
Programming portable optimized multimedia applications |
| |
Juan Carlos Rojas,
Miriam Leeser
|
|
Pages: 291-294 |
|
doi>10.1145/957013.957074 |
|
Full text: PDF
|
|
Multimedia computer architectures can speed-up applications significantly when programmed manually. Optimized programs have been non-portable up to now, because of differences in instruction sets, register lengths, alignment requirements and programming ...
Multimedia computer architectures can speed-up applications significantly when programmed manually. Optimized programs have been non-portable up to now, because of differences in instruction sets, register lengths, alignment requirements and programming styles. We solve all these problems by using a library of C pre-processor macros called MMM. We implemented three examples from video compression in MMM, and automatically translated them into optimized code for four distinct multimedia processors. Their performance is comparable, and in several cases better, than equivalent examples optimized by the processor vendors. expand
|
|
|
ARTiFACIAL: automated reverse turing test using FACIAL features |
| |
Yong Rui,
Zicheg Liu
|
|
Pages: 295-298 |
|
doi>10.1145/957013.957075 |
|
Full text: PDF
|
|
Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world ...
Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world issues have recently generated a new research area called Human Interactive Proofs (HIP), whose goal is to defend services from malicious attacks by differentiating bots from human users. In this paper, we propose a new HIP algorithm based on detecting human face and facial features. Human faces are the most familiar object to humans, rendering it possibly the best candidate for HIP. We conducted user studies and showed the ease of use of our system to human users. We designed attacks using the best existing face detectors and demonstrated the difficulty to bots. expand
|
|
|
Extracting information about emotions in films |
| |
Andrew Salway,
Mike Graham
|
|
Pages: 299-302 |
|
doi>10.1145/957013.957076 |
|
Full text: PDF
|
|
We present a method being developed to extract information about characters' emotions in films. It is suggested that this information can help describe higher levels of multimedia semantics relating to narrative structures. Our method extracts information ...
We present a method being developed to extract information about characters' emotions in films. It is suggested that this information can help describe higher levels of multimedia semantics relating to narrative structures. Our method extracts information from audio description that is provided for the visually-impaired with an increasing number of films. The method is based on a cognitive theory of emotions that links a character's emotional states to the events in their environment. In this paper the method is described along with some preliminary evaluation and discussions about the kinds of novel video retrieval and browsing applications it may support. expand
|
|
|
Video cut editing rule based on participants' gaze in multiparty conversation |
| |
Yoshinao Takemae,
Kazuhiro Otsuka,
Naoki Mukawa
|
|
Pages: 303-306 |
|
doi>10.1145/957013.957077 |
|
Full text: PDF
|
|
This paper proposes a video cut editing rule based on participants' gaze for extracting and conveying the flow of conversation in multiparty conversation. Systems that record meetings and those that support teleconferences are attracting considerable ...
This paper proposes a video cut editing rule based on participants' gaze for extracting and conveying the flow of conversation in multiparty conversation. Systems that record meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to convey a sufficient amount of nonverbal information about the participants and the flow of conversation. We focus on participants' gaze since it is a good indicator of the participants' intent and emotion, conversational attention etc. We propose a video cut editing rule based on the convergence of participants' gaze direction. We conduct an experiment to evaluate the effectiveness of the proposed method. The results indicate that the proposed method can successfully convey who is taking to whom, which is a key indicator of the flow of conversation. expand
|
|
|
Securing media for adaptive streaming |
| |
Chitra Venkatramani,
Peter Westerink,
Olivier Verscheure,
Pascal Frossard
|
|
Pages: 307-310 |
|
doi>10.1145/957013.957078 |
|
Full text: PDF
|
|
This paper describes the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The secure streaming algorithms ensure end-to-end security while the content is adapted and streamed via intermediate, ...
This paper describes the ARMS system which enables secure and adaptive rich media streaming to a large-scale, heterogeneous client population. The secure streaming algorithms ensure end-to-end security while the content is adapted and streamed via intermediate, potentially untrusted servers. ARMS streaming is completely standards compliant and to our knowledge is the first such end-to-end MPEG-4-based system. expand
|
|
|
Real-time goal-mouth detection in MPEG soccer video |
| |
Kongwah Wan,
Xin Yan,
Xinguo Yu,
Changsheng Xu
|
|
Pages: 311-314 |
|
doi>10.1145/957013.957079 |
|
Full text: PDF
|
|
We report our work in real-time detection of goal-mouth appearances in MPEG soccer video. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions ...
We report our work in real-time detection of goal-mouth appearances in MPEG soccer video. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions typically seen in soccer video. The vertical goal-posts and horizontal goal-bar are then isolated by color-based region (pole)-growing. We demonstrate its application for quick video browsing and virtual content insertion. Extensive test over a large data set of about 15 hours of MPEG-1 soccer video @1.15Mbps, CIF-resolution, shows the robustness of our method. expand
|
|
|
Synchronization of lecture videos and electronic slides by video text analysis |
| |
Feng Wang,
Chong-Wah Ngo,
Ting-Chuen Pong
|
|
Pages: 315-318 |
|
doi>10.1145/957013.957080 |
|
Full text: PDF
|
|
An essential goal of structuring lecture videos captured in live presentation is to provide a synchronized view of video clips and electronic slides. This paper presents an automatic approach to match video clips and slides based on the analysis of text ...
An essential goal of structuring lecture videos captured in live presentation is to provide a synchronized view of video clips and electronic slides. This paper presents an automatic approach to match video clips and slides based on the analysis of text embedded in lecture videos. We describe a method to reconstruct high-resolution video texts from multiple keyframes for robust OCR recognition. A two-stage matching algorithm based on the title and content similarity measures between video clips and slides is also proposed. expand
|
|
|
Experience based sampling technique for multimedia analysis |
| |
Jun Wang,
Mohan S. Kankanhalli
|
|
Pages: 319-322 |
|
doi>10.1145/957013.957081 |
|
Full text: PDF
|
|
We present a novel experience based sampling or experiential sampling technique which has the ability to focus on the analysis's task by making use of the contextual information from the environment. In this technique, sensor samples are ...
We present a novel experience based sampling or experiential sampling technique which has the ability to focus on the analysis's task by making use of the contextual information from the environment. In this technique, sensor samples are used to gather information about the current environment and attention samples are used to represent the current state of attention. The task-attended samples are inferred from experience and maintained by a sampling based dynamical system. The multimedia analysis task can then focus on the attention samples only. Moreover, past experiences and the current environment can be used to adaptively correct and tune the attention. Experimental results have been presented to demonstrate the efficacy of our technique. expand
|
|
|
R-Histogram: quantitative representation of spatial relations for similarity-based image retrieval |
| |
Yuhang Wang,
Fillia Makedon
|
|
Pages: 323-326 |
|
doi>10.1145/957013.957082 |
|
Full text: PDF
|
|
Representation of relative spatial relations between objects is required in many multimedia database applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance is often required. This cannot ...
Representation of relative spatial relations between objects is required in many multimedia database applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance is often required. This cannot be accomplished by assimilating an object to elementary entities such as the centroid or the minimum bounding rectangle. Thus many authors have proposed numerous representations based on the notion of histograms of angles. However, they can only represent directional relations, but not the topological spatial relations "inside" and "overlap." Moreover, distance information is not explicitly taken into account. To address these issues, we propose in this paper a new histogram representation called R-Histogram that extends the histogram of angles by incorporating both angles and labeled distances. Dissimilarity between images is then defined by the distance between corresponding R-Histograms. A prototype Query By Example (QBE) system using the R-Histogram has been implemented. The effectiveness of our algorithm is demonstrated with experiments on two databases of 2000 synthetic images. expand
|
|
|
Studying streaming video quality: from an application point of view |
| |
Zhiheng Wang,
Sujata Banerjee,
Sugih Jamin
|
|
Pages: 327-330 |
|
doi>10.1145/957013.957083 |
|
Full text: PDF
|
|
An important aspect of improving streaming application performance is the streaming quality evaluation process. In this paper we introduce a set of alternative objective streaming video quality metrics, which are suitable for large scale deployment. ...
An important aspect of improving streaming application performance is the streaming quality evaluation process. In this paper we introduce a set of alternative objective streaming video quality metrics, which are suitable for large scale deployment. Derived from an existent media application, our metrics are designed to capture the application behaviors disrupting the streaming video quality. We also present a set of experiments to demonstrate the effectiveness of these metrics. expand
|
|
|
Construction of interactive video information system by applying results of object recognition |
| |
Xiaomeng Wu,
Wenli Zhang,
Shunsuke Kamijo,
Masao Sakauchi
|
|
Pages: 331-334 |
|
doi>10.1145/957013.957084 |
|
Full text: PDF
|
|
Although numerous attempts have been made to determine algorithms and approaches for building up a video information system, not many practical applications have been proposed. In this paper, a novel interactive video information system called the Drama ...
Although numerous attempts have been made to determine algorithms and approaches for building up a video information system, not many practical applications have been proposed. In this paper, a novel interactive video information system called the Drama Characters' Popularity Voting System (DCPVS) is constructed by applying the results of off-line object recognition. The system's purpose is to provide description annotation, retrieval, and statistics in the video associated with an object, such as a character as a basic unit, over the Internet. By using the proposed system, multiple users in a network can enjoy the same video and can vote for the characters they like in it. The voting information is collected and stored in the server, which then provides the statistics regarding the popularity of different characters or the voting rates within different periods of the video. expand
|
|
|
Application of a content-based percussive sound synthesizer to packet loss recovery in music streaming |
| |
Lonce Wyse,
Ye Wang,
Xinglei Zhu
|
|
Pages: 335-338 |
|
doi>10.1145/957013.957085 |
|
Full text: PDF
|
|
This paper presents a novel method to recover lost packets in music streaming using a synthesizer to generate percussive sounds. As an improvement of the state-of-the-art system that uses a content-based audio codebook, the new method can greatly reduce ...
This paper presents a novel method to recover lost packets in music streaming using a synthesizer to generate percussive sounds. As an improvement of the state-of-the-art system that uses a content-based audio codebook, the new method can greatly reduce the redundant information needed to recover perceptually critical lost packets. expand
|
|
|
The combination limit in multimedia retrieval |
| |
Rong Yan,
Alexander G. Hauptmann
|
|
Pages: 339-342 |
|
doi>10.1145/957013.957086 |
|
Full text: PDF
|
|
Combining search results from multimedia sources is crucial for dealing with heterogeneous multimedia data, particularly in multimedia retrieval where a final ranked list of items of interest is returned sorted by confidence or relevance. However, relatively ...
Combining search results from multimedia sources is crucial for dealing with heterogeneous multimedia data, particularly in multimedia retrieval where a final ranked list of items of interest is returned sorted by confidence or relevance. However, relatively little attention has been given to combination functions, especially their upper bound performance limits. This paper presents a theoretical framework for studying upper bounds for two types of combination functions. A general upper bound and two approximations are proposed for monotonic combination functions. We also studied the upper bounds for linear combination functions using a global optimization technique. Our experimental results show that the choice of combination functions has a considerable influence to retrieval performance. expand
|
|
|
Negative pseudo-relevance feedback in content-based video retrieval |
| |
Rong Yan,
Alexander G. Hauptmann,
Rong Jin
|
|
Pages: 343-346 |
|
doi>10.1145/957013.957087 |
|
Full text: PDF
|
|
Video information retrieval requires a system to find information relevant to a query which may be represented simultaneously in different ways through a text description, audio, still images and/or video sequences. We present a novel approach that uses ...
Video information retrieval requires a system to find information relevant to a query which may be represented simultaneously in different ways through a text description, audio, still images and/or video sequences. We present a novel approach that uses pseudo-relevance feedback from retrieved items that are NOT similar to the query items without further inquiring user feedback. We provide insight into this approach using a statistical model and suggest a score combination scheme via posterior probability estimation. An evaluation on the 2002 TREC Video Track queries shows that this technique can improve video retrieval performance on a real collection. We believe that negative pseudo-relevance feedback shows great promise for very difficult multimedia retrieval tasks, especially when combined with other different retrieval algorithms. expand
|
|
|
Avatar motion control by user body postures |
| |
Satoshi Yonemoto,
Hiroshi Nakano,
Rin-ichiro Taniguchi
|
|
Pages: 347-350 |
|
doi>10.1145/957013.957088 |
|
Full text: PDF
|
|
This paper describes an avatar motion control by body postures. Our goal is to do seamless mapping of human motion in the real world into virtual environments. We hope that the idea of direct human motion sensing will be used on future interfaces. With ...
This paper describes an avatar motion control by body postures. Our goal is to do seamless mapping of human motion in the real world into virtual environments. We hope that the idea of direct human motion sensing will be used on future interfaces. With the aim of making computing systems suited for users, we have developed a computer vision based avatar motion control. The human motion sensing is based on skin-color blob tracking. Our method can generate realistic avatar motion from the sensing data. We address our framework to use virtual scene context as a priori knowledge. We assume that virtual objects in virtual environments can afford avatar's action, that is, the virtual environments provide action information for the avatar. Avatar's motion is controlled, based on simulating the idea of affordance extended into the virtual environments. expand
|
|
|
Model-based talking face synthesis for anthropomorphic spoken dialog agent system |
| |
Tatsuo Yotsukura,
Shigeo Morishima,
Satoshi Nakamura
|
|
Pages: 351-354 |
|
doi>10.1145/957013.957089 |
|
Full text: PDF
|
|
Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. ...
Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols.In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part. expand
|
|
|
Automated annotation of human faces in family albums |
| |
Lei Zhang,
Longbin Chen,
Mingjing Li,
Hongjiang Zhang
|
|
Pages: 355-358 |
|
doi>10.1145/957013.957090 |
|
Full text: PDF
|
|
Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based ...
Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based image retrieval and face recognition are seamlessly integrated to achieve automated annotation. Secondly, face annotation is formulated in a Bayesian framework, in which the face similarity measure is defined as maximum a posteriori (MAP) estimation. Thirdly, to deal with the missing features, marginal probability is used so that samples which have missing features are compared with those having the full feature set to ensure a non-biased decision. The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums. expand
|
|
|
Music scale modeling for melody matching |
| |
Yongwei Zhu,
Mohan Kankanhalli
|
|
Pages: 359-362 |
|
doi>10.1145/957013.957091 |
|
Full text: PDF
|
|
Several time series matching techniques have been proposed for content-based music retrieval. These techniques have shown to be robust and effective for music retrieval by acoustic inputs, such as query-by-humming. However, due to the key transposition ...
Several time series matching techniques have been proposed for content-based music retrieval. These techniques have shown to be robust and effective for music retrieval by acoustic inputs, such as query-by-humming. However, due to the key transposition issue, all the current methods need to search a large space for the proper key in melody matching. This computation can be prohibitive for a practical music retrieval system with a large database.In this paper, we present a music scale modeling technique for melody matching. The root note of music scale (Major or Minor) of a melody is estimated by fitting the notes to a music scale model. The estimated root note can then be used as the key in melody matching. To the best of our knowledge, this is the first approach that utilizes music scale knowledge for retrieval. In our experiments, 96% of the songs in the database (3000 melodies) can fit into the music scale model. Promising results for query-by-humming retrieval have been obtained by using this novel approach. expand
|
|
|
Inventing new media: what we can learn from new media art and media history |
| |
Lev Manovich
|
|
Pages: 363-363 |
|
doi>10.1145/957013.957015 |
|
Full text: PDF
|
|
Throughout the human history, the design of different cultural techniques and media forms for representing human knowledge, collective and personal experience, and what we now call "data" have not been confined to single individuals or disciplines. To ...
Throughout the human history, the design of different cultural techniques and media forms for representing human knowledge, collective and personal experience, and what we now call "data" have not been confined to single individuals or disciplines. To mention just a few examples, natural languages, printed books, a linear perspective, landscape photography, and documentary cinema have all been developed and refined over time by whole societies and multiple individuals.Today with the computer acting as the interface for all past, present, and emergent forms of media, the situation is quite different. Computer scientists working on media computing play the key role in how our societies will remember their histories, how we will represent ourselves and others, what we will imagine and what metaphors we use to understand reality. In short, to work on media computing today is to assume big cultural responsibility - and also have tremendous power to define new forms of media.Unfortunately, more often than not, computer scientists do not take advantage of their powers. Too often, they simply translate existing media forms and cultural techniques into software interfaces. For instance, the controls of software media players are modeled after VCR; Acrobat software tries to recreate the conventions of a printed page; etc. This made sense twenty years ago when people were coming to computers after having first experienced other media technologies --- but not today.While I do not want to position the whole field of digital art as more innovative in this respect, one can point at a number of artists who have dedicated their careers to use computers to invent substantially new forms of media, often with very exiting results. In my talk I will show and discuss a number of works by these artists. I will also talk about the selected historical moments than a new media form emerged to see what we can learn from these histories. expand
|
|
|
SESSION: Image annotation and video summarization |
|
|
|
|
Temporal event clustering for digital photo collections |
| |
Matthew Cooper,
Jonathan Foote,
Andreas Girgensohn,
Lynn Wilcox
|
|
Pages: 364-373 |
|
doi>10.1145/957013.957093 |
|
Full text: PDF
|
|
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm ...
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization. Finally, we include experimental results for the proposed algorithms and several competing approaches on two test collections. expand
|
|
|
Contrast-based image attention analysis by using fuzzy growing |
| |
Yu-Fei Ma,
Hong-Jiang Zhang
|
|
Pages: 374-381 |
|
doi>10.1145/957013.957094 |
|
Full text: PDF
|
|
Visual attention analysis provides an alternative methodology to semantic image understanding in many applications such as adaptive content delivery and region-based image retrieval. In this paper, we propose a feasible and fast approach to attention ...
Visual attention analysis provides an alternative methodology to semantic image understanding in many applications such as adaptive content delivery and region-based image retrieval. In this paper, we propose a feasible and fast approach to attention area detection in images based on contrast analysis. The main contributions are threefold: 1) a new saliency map generation method based on local contrast analysis is proposed; 2) by simulating human perception, a fuzzy growing method is used to extract attended areas or objects from the saliency map; and 3) a practicable framework for image attention analysis is presented, which provides three-level attention analysis, i.e., attended view, attended areas and attended points. This framework facilitates visual analysis tools or vision systems to automatically extract attentions from images in a manner like human perception. User study results indicate that the proposed approach is effective and practicable. expand
|
|
|
Video summarization based on user log enhanced link analysis |
| |
Bin Yu,
Wei-Ying Ma,
Klara Nahrstedt,
Hong-Jiang Zhang
|
|
Pages: 382-391 |
|
doi>10.1145/957013.957095 |
|
Full text: PDF
|
|
Efficient video data management calls for intelligent video summarization tools that automatically generate concise video summaries for fast skimming and browsing. Traditional video summarization techniques are based on low-level feature analysis, which ...
Efficient video data management calls for intelligent video summarization tools that automatically generate concise video summaries for fast skimming and browsing. Traditional video summarization techniques are based on low-level feature analysis, which generally fails to capture the semantics of video content. Our vision is that users unintentionally embed their understanding of the video content in their interaction with computers. This valuable knowledge, which is difficult for computers to learn autonomously, can be utilized for video summarization process. In this paper, we present an intelligent video browsing and summarization system that utilizes previous viewers' browsing log to facilitate future viewers. Specifically, a novel ShotRank notion is proposed as a measure of the subjective interestingness and importance of each video shot. A ShotRank computation framework is constructed to seamlessly unify low-level video analysis and user browsing log mining. The resulting ShotRank is used to organize the presentation of video shots and generate video skims. Experimental results from user studies have strongly confirmed that ShotRank indeed represents the subjective notion of interestingness and importance of each video shot, and it significantly improves future viewers' browsing experience. expand
|
|
|
Generation of interactive multi-level video summaries |
| |
Frank Shipman,
Andreas Girgensohn,
Lynn Wilcox
|
|
Pages: 392-401 |
|
doi>10.1145/957013.957096 |
|
Full text: PDF
|
|
In this paper, we describe how a detail-on-demand representation for interactive video is used in video summarization. Our approach automatically generates a hypervideo composed of multiple video summary levels and navigational links between these summaries ...
In this paper, we describe how a detail-on-demand representation for interactive video is used in video summarization. Our approach automatically generates a hypervideo composed of multiple video summary levels and navigational links between these summaries and the original video. Viewers may interactively select the amount of detail they see, access more detailed summaries, and navigate to the source video through the summary. We created a representation for interactive video that supports a wide range of interactive video applications and Hyper-Hitchcock, an editor and player for this type of interactive video. Hyper-Hitchcock employs methods to determine (1) the number and length of levels in the hypervideo summary, (2) the video clips for each level in the hypervideo, (3) the grouping of clips into composites, and (4) the links between elements in the summary. These decisions are based on an inferred quality of video segments and temporal relations those segments. expand
|
|
|
SESSION: Multimedia coding and security |
|
|
|
|
Improved p-domain rate control and perceived quality optimizations for MPEG-4 real-time video applications |
| |
Michael Militzer,
Maciej Suchomski,
Klaus Meyer-Wegener
|
|
Pages: 402-411 |
|
doi>10.1145/957013.957098 |
|
Full text: PDF
|
|
The paper describes bit rate control for a one-pass MPEG-4 video encoding algorithm in order to make it suitable for real-time applications. The proposed control method is of low computational complexity and more accurate than previous approaches. In ...
The paper describes bit rate control for a one-pass MPEG-4 video encoding algorithm in order to make it suitable for real-time applications. The proposed control method is of low computational complexity and more accurate than previous approaches. In result, the rate-control buffer size which highly influences the latency between a video sender and receiver can be decreased significantly. Additionally, a solution is proposed for increasing the perceived quality by introducing an advanced bit allocation scheme and by exploiting activity masking. The proposed algorithm has been implemented in the XVID codec, a representative of the MPEG-4 standard. Experiments prove that the proposed algorithm is highly accurate and provides improved perceived visual quality. Moreover, the implementation outperforms other up-to-now bit rate control algorithms. expand
|
|
|
Content-based UEP: a new scheme for packet loss recovery in music streaming |
| |
Ye Wang,
Ali Ahmaniemi,
David Isherwood,
Wendong Huang
|
|
Pages: 412-421 |
|
doi>10.1145/957013.957099 |
|
Full text: PDF
|
|
Bandwidth efficiency and error robustness are two essential and conflicting requirements for streaming media content over error-prone channels, such as wireless channels. This paper describes a new scheme called content-based unequal error protection ...
Bandwidth efficiency and error robustness are two essential and conflicting requirements for streaming media content over error-prone channels, such as wireless channels. This paper describes a new scheme called content-based unequal error protection (C-UEP), which aims to improve the user-perceived QoS in the case of packet loss. We use music streaming as an example to show the effectiveness of the new concept. C-UEP requires only a small fraction of the redundancy used in existing forward error correction (FEC) methods. C-UEP classifies every audio segment (e.g. an encoding frame) into different classes to improve encoding efficiency. Salient transients such as drumbeats and note onsets are encoded with more redundancy in a secondary bitstream used to recover lost packets by the receiver. Formal perceptual evaluations show that our scheme improves audio quality significantly over simple muting and packet repetition baselines. This improvement is achieved with a negligible amount of redundancy, which is transmitted to the receiver ahead of playback. expand
|
|
|
Layered coding vs. multiple descriptions for video streaming over multiple paths |
| |
J. Chakareski,
S. Han,
B. Girod
|
|
Pages: 422-431 |
|
doi>10.1145/957013.957100 |
|
Full text: PDF
|
|
In this paper, we examine the performance of specific implementations of multiple description coding and of layered coding for video streaming over error-prone packet switched networks. We compare their performance using different transmission schemes ...
In this paper, we examine the performance of specific implementations of multiple description coding and of layered coding for video streaming over error-prone packet switched networks. We compare their performance using different transmission schemes with and without network path diversity. It is shown that given the specific implementations there is a large variation in relative performance between multiple description coding and layered coding depending on the employed transmission scheme. For scenarios where the packet transmission schedules can be optimized in a rate-distortion sense, layered coding provides a better performance. The converse is true for scenarios where the packet schedules are not rate-distortion optimized. expand
|
|
|
A flexible and scalable authentication scheme for JPEG2000 image codestreams |
| |
Cheng Peng,
Robert H. Deng,
Yongdong Wu,
Weizhong Shao
|
|
Pages: 433-441 |
|
doi>10.1145/957013.957101 |
|
Full text: PDF
|
|
JPEG2000 is an emerging standard for still image compression and is becoming the solution of choice for many digital imaging fields and applications. An important aspect of JPEG2000 is its "compress once, decompress many ways" property [1], i. e., it ...
JPEG2000 is an emerging standard for still image compression and is becoming the solution of choice for many digital imaging fields and applications. An important aspect of JPEG2000 is its "compress once, decompress many ways" property [1], i. e., it allows extraction of various sub-images (e.g., images with various resolutions, pixel fidelities, tiles and components) all from a single compressed image codestream. In this paper, we present a flexible and scalable authentication scheme for JPEG2000 images based on the Merkle hash tree and digital signature. Our scheme is fully compatible with JPEG2000 and possesses a "sign once, verify many ways" property. That is, it allows users to verify the authenticity and integrity of different sub-images extracted from a single compressed codestream protected with a single digital signature. expand
|
|
|
DEMONSTRATION SESSION: Demonstration session 2 |
|
|
|
|
Music videos miner |
| |
Lalitha Agnihotri,
Nevenka Dimitrova,
John Kender,
John Zimmerman
|
|
Pages: 442-443 |
|
doi>10.1145/957013.957103 |
|
Full text: PDF
|
|
|
|
|
Identifying audio clips with RARE |
| |
Chris J. C. Burges,
John C. Platt,
Jonathan Goldstein
|
|
Pages: 444-445 |
|
doi>10.1145/957013.957104 |
|
Full text: PDF
|
|
In this paper, we describe RARE (Robust Audio Recognition Engine): a system for identifying audio streams and files. RARE can be used in a variety of applications: from enhancing the consumer listening experience to cleaning large audio databases. RARE ...
In this paper, we describe RARE (Robust Audio Recognition Engine): a system for identifying audio streams and files. RARE can be used in a variety of applications: from enhancing the consumer listening experience to cleaning large audio databases. RARE was designed with two key qualities in mind: robustness to distortion of the audio, and lookup speed. RARE identifies audio clips in a stream against a database of 1/4 million songs in real time using approximately 10% CPU on an 850 MHz P3, and with a measured false positive rate of 1.5x10-8 per clip, per database entry, at a false negative rate of 0.2% per clip. We demo RARE in real-time on a stream and on distorted files. expand
|
|
|
An affinity-based image retrieval system for multimedia authoring and presentation |
| |
Shu-Ching Chen,
Mei-Ling Shyu,
Na Zhao,
Chengcui Zhang
|
|
Pages: 446-447 |
|
doi>10.1145/957013.957105 |
|
Full text: PDF
|
|
In this demonstration, we present an image retrieval system to support multimedia authoring and presentation. An affinity-based mechanism, Markov Model Mediator (MMM), is used as the search engine for the system, which utilizes both the low-level image ...
In this demonstration, we present an image retrieval system to support multimedia authoring and presentation. An affinity-based mechanism, Markov Model Mediator (MMM), is used as the search engine for the system, which utilizes both the low-level image features and the learned high-level concepts via user access patterns and access frequencies. This system is one of the major components of MediaManager, a distributed multimedia management system developed by us. Both retrieval and learning facilities are supported in this system. The retrieval system also provides input information to the Multimedia Augmented Transition Network (MATN) environment for multimedia authoring and presentation. expand
|
|
|
MuSA.RT: music on the spiral array. real-time |
| |
Elaine Chew,
Alexandre R.J. Francois
|
|
Pages: 448-449 |
|
doi>10.1145/957013.957106 |
|
Full text: PDF
|
|
We present MuSA.RT, Opus 1, a multimodal interactive system for music analysis and visualization using the Spiral Array model. Real-time MIDI input from a live performance is processed, analyzed and mapped to the 3D model, revealing tonal structures ...
We present MuSA.RT, Opus 1, a multimodal interactive system for music analysis and visualization using the Spiral Array model. Real-time MIDI input from a live performance is processed, analyzed and mapped to the 3D model, revealing tonal structures such as pitches, chords and keys. A user can concurrently navigate through the Spiral Array space using a gamepad or set the camera control to automatic pilot. The interaction among and concurrent processing of the different data streams is made possible through the Modular Flow Scheduling Middleware. expand
|
|
|
Indexing, searching, and skimming of multimedia documents containing recorded lectures and live presentations |
| |
Wolfgang Hürst
|
|
Pages: 450-451 |
|
doi>10.1145/957013.957107 |
|
Full text: PDF
|
|
This demonstration illustrates different ways to support users dealing with recorded live presentations in order to improve the usability of the corresponding documents. It highlights different problems in this context and presents solutions and alternative ...
This demonstration illustrates different ways to support users dealing with recorded live presentations in order to improve the usability of the corresponding documents. It highlights different problems in this context and presents solutions and alternative approaches for both, multimedia indexing and query processing as well as user interface issues in order to support users who are skimming or browsing such documents in search for information. expand
|
|
|
Microcontroller implementation of melody recognition: a prototype |
| |
Jyh-Shing Roger Jang,
Yung-Sen Jang
|
|
Pages: 452-453 |
|
doi>10.1145/957013.957108 |
|
Full text: PDF
|
|
This demo presents a 16-bit microcontroller implementation of a content-based music retrieval system that can take a user's acoustic input (5-second clip of singing or humming) and then retrieve the intended song from 20 candidate songs. Performance ...
This demo presents a 16-bit microcontroller implementation of a content-based music retrieval system that can take a user's acoustic input (5-second clip of singing or humming) and then retrieve the intended song from 20 candidate songs. Performance evaluation based on 192 clips shows that the system has a satisfactory top-1 recognition rate of 92%. This system demonstrates the feasibility of microcontroller based melody recognition for music retrieval, which can be used in consumer electronics such as melody-activated interactive toys, query engines for MP3 players or karaoke machines, and so on. expand
|
|
|
Human + agent: creating recombinant information |
| |
Andruid Kerne,
Vikram Sundaram,
Jin Wang,
Madhur Khandelwal,
J. Michael Mistrot
|
|
Pages: 454-455 |
|
doi>10.1145/957013.957109 |
|
Full text: PDF
|
|
combinFormation is a tool that enables browsing and collecting information elements in a generative space. By generative, we mean that the tool is an agent that automatically retrieves information elements and visually composes them. A combinFormation ...
combinFormation is a tool that enables browsing and collecting information elements in a generative space. By generative, we mean that the tool is an agent that automatically retrieves information elements and visually composes them. A combinFormation session presents a dynamic, evolving recombination of information elements from different sources. The elements are manipulable in the information space. Recombination is the process of taking previously unconnected elements, and combining them to create new configurations.One purpose of this space is to support the formation of ideas, through more and less focused processes of foraging. While ideas are forming, the criteria that underlie information foraging activities may not be well defined. Collecting the specific subset of related information elements is challenging. Cognitive scientists have established that combinations of images and textual elements are examples of preinventive structures that can lead to the emergence of new ideas. These preinventive structures often combine existing representations.Our program generates recombinant visualizations that develop interrelationships between the information elements. The generative visualization is based on a procedural model of the information, and the user's interests. The user model reflects interactions in which s/he explicitly expresses interest. The agent retrieves information based on the evolving model. The visual composition is also developed to emphasize the user's evolving sense of what is important. This involves solving problems in the dynamic visualization of dynamic, heterogeneous collections. In our novel interaction model, the human being shares control of the evolving information space with the agent. The user can express interest in information elements as they stream in, and design the visual space, using interactive tools. Expressions feed back through the model to drive the program's retrieval and visual composition decisions. expand
|
|
|
Enhancing web accessibility |
| |
Alison Lee,
Vicki Hanson
|
|
Pages: 456-457 |
|
doi>10.1145/957013.957110 |
|
Full text: PDF
|
|
This demonstration will illustrate the key technical and user interface aspects of the Web Adaptation Technology. Various transformations underlying the system will be shown that illustrate how this approach enables a wide range of users with reduced ...
This demonstration will illustrate the key technical and user interface aspects of the Web Adaptation Technology. Various transformations underlying the system will be shown that illustrate how this approach enables a wide range of users with reduced visual, cognitive, and motor abilities to access a large proportion of Web pages using a standard browser. expand
|
|
|
eXtensible content protection |
| |
Florian Pestoni,
Clemens Drews
|
|
Pages: 458-459 |
|
doi>10.1145/957013.957111 |
|
Full text: PDF
|
|
This paper describes a proof of concept implementation of xCP, a content protection scheme for home networks.
This paper describes a proof of concept implementation of xCP, a content protection scheme for home networks. expand
|
|
|
Creating touch-screens anywhere with interactive projected displays |
| |
Claudio Pinhanez,
Rick Kjeldsen,
Lijun Tang,
Anthony Levas,
Mark Podlaseck,
Noi Sukaviriya,
Gopal Pingali
|
|
Pages: 460-461 |
|
doi>10.1145/957013.957112 |
|
Full text: PDF
|
|
We demonstrate a system that combines steerable projection and computer vision technologies to create "touch-screen" style interactive displays on any flat surface in a space. A high-end version of the system -- the Everywhere Display (ED) -- combines ...
We demonstrate a system that combines steerable projection and computer vision technologies to create "touch-screen" style interactive displays on any flat surface in a space. A high-end version of the system -- the Everywhere Display (ED) -- combines an LCD projector with motorized focus and zoom, a computer controlled pan-tilt mirror, and a pan-tilt zoom camera to enable steering of interactive projections around space. A low-end version (ED-lite) enables creation of interactive displays using a portable projector and camera attached to a laptop computer. Unlike traditional augmented reality systems, the ED systems enable delivery of interactive multimedia content on ordinary objects without requiring users to wear head mounted displays or carry special input devices. expand
|
|
|
Excuse me, but are you human? |
| |
Yong Rui,
Zicheg Liu
|
|
Pages: 462-463 |
|
doi>10.1145/957013.957113 |
|
Full text: PDF
|
|
Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world ...
Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free email accounts in a minute; participate in online polls to skew results; and irritate people by joining online chat rooms. These real-world issues have recently generated a new research area called Human Interactive Proofs (HIP), whose goal is to defend services from malicious attacks by differentiating bots from human users. We propose a new HIP algorithm based on detecting human face and facial features. Human faces are the most familiar object to humans, rendering it possibly the best candidate for HIP. We conducted user studies and showed the ease of use of our system to human users. We designed attacks using the best existing face detectors and demonstrated the difficulty to bots. expand
|
|
|
Interactive multimedia messaging service platform |
| |
Shen Jun,
Yan Rong,
Sun Pei,
Song Song
|
|
Pages: 464-465 |
|
doi>10.1145/957013.957114 |
|
Full text: PDF
|
|
An interactive multimedia messaging service platform and two demos are described in this paper.
An interactive multimedia messaging service platform and two demos are described in this paper. expand
|
|
|
Interactive storytelling system using behavior-based non-verbal information: ZENetic computer |
| |
Naoko Tosa,
Seigo Matsuoka,
Koji Miyazaki
|
|
Pages: 466-467 |
|
doi>10.1145/957013.957115 |
|
Full text: PDF
|
|
We have developed an interactive storytelling system that aims to help us "recreate" our conscious selves by calling on Buddhist principles, Asian philosophy, and traditional Japanese culture through the inspirational media of ink painting, kimono and ...
We have developed an interactive storytelling system that aims to help us "recreate" our conscious selves by calling on Buddhist principles, Asian philosophy, and traditional Japanese culture through the inspirational media of ink painting, kimono and haiku. "Recreating ourselves" means the process of making the consciousness of our 'daily self' meet that of our 'hidden self'. through stimulation of activity deep within us. Ultimately, this may meld our consciousness and unconsciousness in complete harmony. It is difficult to achieve this through traditional logic-based interactions. Our system is a new approach to reaching this goal by incorporating traditional media and methods in an interactive computer system. expand
|
|
|
Robust goal-mouth detection for virtual content insertion |
| |
Kongwah WAN,
Xin YAN,
Xinguo YU,
Changsheng XU
|
|
Pages: 468-469 |
|
doi>10.1145/957013.957116 |
|
Full text: PDF
|
|
In this paper, we describe a working system that detects and segments goal-mouth appearances of soccer video in real-time. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection ...
In this paper, we describe a working system that detects and segments goal-mouth appearances of soccer video in real-time. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions. The vertical goal-posts and horizontal goal-bar are then isolated by color-based region (pole)-growing. We demonstrate its application for quick video browsing and virtual content insertion. expand
|
|
|
SESSION: 3D multimedia environments |
|
|
|
|
Computation and performance issues In coliseum: an immersive videoconferencing system |
| |
H. Harlyn Baker,
Nina Bhatti,
Donald Tanguay,
Irwin Sobel,
Dan Gelb,
Michael E. Goss,
John MacCormick,
Kei Yuasa,
W. Bruce Culbertson,
Thomas Malzbender
|
|
Pages: 470-479 |
|
doi>10.1145/957013.957118 |
|
Full text: PDF
|
|
Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops. Five cameras are attached to each PC display and directed at the participant. From ...
Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops. Five cameras are attached to each PC display and directed at the participant. From these video streams, view synthesis methods produce arbitrary-perspective renderings of the participant and transmit them to others at interactive rates, currently about 15 frames per second. Combining these renderings in a shared synthetic environment gives the appearance of having all participants interacting in a common space. In this way, Coliseum enables users to share a virtual world, with acquired-image renderings of their appearance replacing the synthetic representations provided by more conventional avatar-populated virtual worlds. The system supports virtual mobility--participants may move around the shared space--and reciprocal gaze, and has been demonstrated in collaborative sessions of up to ten Coliseum workstations, and sessions spanning two continents. This paper summarizes the technology, and reports on issues related to its performance. expand
|
|
|
Design of a multi-sender 3D videoconferencing application over an end system multicast protocol |
| |
Mojtaba Hosseini,
Nicolas D. Georganas
|
|
Pages: 480-489 |
|
doi>10.1145/957013.957119 |
|
Full text: PDF
|
|
Videoconferencing in the context of 3D virtual environments promises better spatial consistency and mutual awareness for its participants. However, in the absence of IP Multicast and limited upload bandwidth of today's DSL connections, the feasibility ...
Videoconferencing in the context of 3D virtual environments promises better spatial consistency and mutual awareness for its participants. However, in the absence of IP Multicast and limited upload bandwidth of today's DSL connections, the feasibility of such systems in supporting even a small group of users is in question. This paper presents the design and implementation of an awareness driven 3D videoconferencing application that runs on a peer-to-peer architecture and our own End System Multicast protocol. The paper highlights the unique requirements of multiparty videoconferencing applications and presents a solution that can support 4-10 bandwidth-limited users without the need for IP Multicast capability. expand
|
|
|
SESSION: Multimedia authoring |
|
|
|
|
AVE: automated home video editing |
| |
Xian-Sheng HUA,
Lie LU,
Hong-Jiang ZHANG
|
|
Pages: 490-497 |
|
doi>10.1145/957013.957121 |
|
Full text: PDF
|
|
In this paper, we present a system that automates home video editing. This system automatically extracts a set of highlight segments from a set of raw home videos and aligns them with user supplied incidental music based on the content of the video and ...
In this paper, we present a system that automates home video editing. This system automatically extracts a set of highlight segments from a set of raw home videos and aligns them with user supplied incidental music based on the content of the video and incidental music. We developed an approach for extracting temporal structure and determining the importance of a video segment in order to facilitate the selection of highlight segments. Additionally we extract temporal structure, beats and tempos from the incidental music. In order to create more professional-looking results, the selected highlight segments satisfy a set of editing rules and are matched to the content of the incidental music. This task is formulated as a non-linear 0-1 programming problem and the rules are embedded as constraints. The output video is rendered by connecting the selected highlight video segments with transition effects and the incidental music. expand
|
|
|
Linking multimedia presentations with their symbolic source documents: algorithm and applications |
| |
Berna Erol,
Jonathan J. Hull,
Dar-Shyang Lee
|
|
Pages: 498-507 |
|
doi>10.1145/957013.957122 |
|
Full text: PDF
|
|
An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPoint™ or Acrobat™) from which they were generated. The images are captured either by tapping the video output from ...
An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPoint™ or Acrobat™) from which they were generated. The images are captured either by tapping the video output from a laptop connected to a projector or by taking a picture of what's displayed on the screen in a conference room. The matching algorithm extracts features from the image data, including OCR output, edges, projection profiles, and layout and determines the symbolic file that contains the most similar collection of features. This algorithm enables several unique applications for enhancing a meeting in real-time and accessing the audio and video that were recorded while a presentation was being given. These applications include the simultaneous translation of presentation slides during a meeting, linking video clips inside a PowerPoint file that show how each slide was described by the presenter, and retrieving presentation recordings using digital camera images as queries. expand
|
|
|
SESSION: Surveillance |
|
|
|
|
Video retrieval using spatio-temporal descriptors |
| |
Daniel DeMenthon,
David Doermann
|
|
Pages: 508-517 |
|
doi>10.1145/957013.957124 |
|
Full text: PDF
|
|
This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time ...
This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a threshold-free hierarchical space-time segmentation technique. Each region is then described by a high-dimensional point whose components represent the position, motion and, when possible, color of the region. In the indexing phase for a video database, these points are assigned labels that specify their video clip of origin. All the labeled points for all the clips are stored into a single binary tree for efficient $k$-nearest neighbor retrieval. The retrieval phase uses video segments as queries. Half-second clips of these queries are again segmented to produce sets of points, and for each point the labels of its nearest neighbors are retrieved. The labels that receive the largest numbers of votes correspond to the database clips that are the most similar to the query video segment. We illustrate this approach for video indexing and retrieval and for action recognition. First, we describe retrieval experiments for dynamic logos, and for video queries that differ from the indexed broadcasts by the addition of large overlays. Then we describe experiments in which office actions (such as pulling and closing drawers, taking and storing items, picking up and putting down a phone) are recognized. Color information is ignored to insure independence to people's appearance. One of the distinct advantages of using this approach for action recognition is that there is no need for detection or recognition of body parts. expand
|
|
|
Invariance in motion analysis of videos |
| |
Cen Rao,
Mubarak Shah,
Tanveer Syeda-Mahmood
|
|
Pages: 518-527 |
|
doi>10.1145/957013.957125 |
|
Full text: PDF
|
|
In this paper, we propose an approach that retrieves motion of objects from the videos based on the dynamic time warping of view invariant characteristics. The motion is represented as a sequence of dynamic instants and intervals, which are automatically ...
In this paper, we propose an approach that retrieves motion of objects from the videos based on the dynamic time warping of view invariant characteristics. The motion is represented as a sequence of dynamic instants and intervals, which are automatically computed using the spatiotemporal curvature of the trajectory of moving object in the videos. Dynamic Time Warping (DTW) method matches trajectories using a view invariant similarity measure. Our system is able to incrementally learn different actions without any initialization mode, therefore it can work in an unsupervised manner. The retrieval of relevant videos can be easily performed by computing a simple distance metric. This paper makes two fundamental contribution to view invariant video retrieval: (1) Dynamic Instant detection in trajectories of moving objects acquired from video. (2) View-invariant Dynamic Time Warping to measure similarity between two trajectories of actions performed by different persons and from different viewpoints. Although the learning algorithm is relatively simple in our approach, we can achieve high recognition rate because of the view-invariant representation and the similarity measure using DTW. expand
|
|
|
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance |
| |
Gang Wu,
Yi Wu,
Long Jiao,
Yuan-Fang Wang,
Edward Y. Chang
|
|
Pages: 528-538 |
|
doi>10.1145/957013.957126 |
|
Full text: PDF
|
|
We present a framework for multi-camera video surveillance. The framework consists of three phases: detection, representation, and recognition. The detection phase handles multi-source spatio-temporal data fusion for efficiently ...
We present a framework for multi-camera video surveillance. The framework consists of three phases: detection, representation, and recognition. The detection phase handles multi-source spatio-temporal data fusion for efficiently and reliably extracting motion trajectories from video. The representation phase summarizes raw trajectory data to construct hierarchical, invariant, and content-rich descriptions of the motion events. Finally, the recognition phase deals with event classification and identification on the data descriptors. Because of space limits, we describe only briefly how we detect and represent events, but we provide in-depth treatment on the third phase: event recognition. For effective recognition, we devise a sequence-alignment kernel function to perform sequence data learning for identifying suspicious events. We show that when the positive training instances (i.e., suspicious events) are significantly outnumbered by the negative training instances (benign events), then SVMs (or any other learning methods) can suffer a high incidence of errors. To remedy this problem, we propose the kernel boundary alignment (KBA) algorithm to work with the sequence-alignment kernel. Through empirical study in a parking-lot surveillance setting, we show that our spatio-temporal fusion scheme and biased sequence-data learning method are highly effective in identifying suspicious events. expand
|
|
|
SESSION: Interacting with media |
|
|
|
|
Interacting with audio streams for entertainment and communication |
| |
Mat C. Hans,
Mark T. Smith
|
|
Pages: 539-545 |
|
doi>10.1145/957013.957128 |
|
Full text: PDF
|
|
We present a new model of interactive audio for entertainment and communication. A new device called the DJammer and its associated technologies are described. The DJammer introduces the idea of provisioning mobile users to interact cooperatively with ...
We present a new model of interactive audio for entertainment and communication. A new device called the DJammer and its associated technologies are described. The DJammer introduces the idea of provisioning mobile users to interact cooperatively with digital audio streams. Users can augment the audio in real time and communicate the result in several ways resulting in a new form of multimedia communication across diverse devices and multiple networks. This paper describes the technologies incorporated into the DJammer, and discusses the actual implementation of the prototype DJammer. Future enhancements are also described. expand
|
|
|
Shared interactive video for teleconferencing |
| |
Chunyuan Liao,
Qiong Liu,
Don Kimber,
Patrick Chiu,
Jonathan Foote,
Lynn Wilcox
|
|
Pages: 546-554 |
|
doi>10.1145/957013.957129 |
|
Full text: PDF
|
|
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text ...
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text commands, buttons, or other artificial symbols, our approach allows users to interact with devices through live video of the environment. This naturally extends our video supported pan/tilt/zoom (PTZ) camera control system, by allowing gestures in video windows to control not only PTZ cameras, but also other devices visible in video images. For example, an authorized meeting participant can show a presentation on a screen by dragging the file on a personal laptop and dropping it on the video image of the presentation screen. This paper presents the system architecture, implementation tradeoffs, and various meeting control scenarios. expand
|
|
|
Visualizing the pulse of a classroom |
| |
Milton Chen
|
|
Pages: 555-561 |
|
doi>10.1145/957013.957130 |
|
Full text: PDF
|
|
Effective classroom teaching often requires an instructor to be acutely aware of every student. The instructor must rapidly look from student to student to catch fleeting gestures or facial expressions. To facilitate the tracking of communicative actions ...
Effective classroom teaching often requires an instructor to be acutely aware of every student. The instructor must rapidly look from student to student to catch fleeting gestures or facial expressions. To facilitate the tracking of communicative actions in a remote classroom, we built a multiparty videoconferencing system that automatically determine whether students are speaking, making gestures, or moving in their seats. These activity indicators are displayed over the video such that the instructor can see into the recent past. The activity indicators are also grouped into a visualization of the classroom interaction dynamics, thereby providing a measure of the pulse of the classroom.We conducted a user study where teachers used our system in a simulated class. The teachers found that the activity indicators to be a useful teaching aid during class; however, the indicators are most useful as a record of the class. In a student survey, we found that if audio, video, or activity indicators must be recorded, students overwhelmingly prefer activity indicators since the indicators mask the content of the communication and thus are less intrusive to the students' privacy. expand
|
|
|
SESSION: Multimedia for tiny devices |
|
|
|
|
Panoptes: scalable low-power video sensor networking technologies |
| |
Wu-chi Feng,
Brian Code,
Ed Kaiser,
Mike Shea,
Wu-chang Feng,
Louis Bavoil
|
|
Pages: 562-571 |
|
doi>10.1145/957013.957132 |
|
Full text: PDF
|
|
Video-based sensor networks can provide important visual information in a number of applications including: environmental monitoring, health care, emergency response, and video security. This paper describes the Panoptes video-based sensor networking ...
Video-based sensor networks can provide important visual information in a number of applications including: environmental monitoring, health care, emergency response, and video security. This paper describes the Panoptes video-based sensor networking architecture, including its design, implementation, and performance. We describe a video sensor platform that can deliver high-quality video over 802.11 networks with a power requirement of approximately 5 watts. In addition, we describe the streaming and prioritization mechanisms that we have designed to allow it to survive long-periods of disconnected operation. Finally, we describe a sample application and bitmapping algorithm that we have implemented to show the usefulness of our platform. Our experiments include an in-depth analysis of the bottlenecks within the system as well as power measurements for the various components of the system. expand
|
|
|
Position calibration of audio sensors and actuators in a distributed computing platform |
| |
Vikas C. Raykar,
Igor Kozintsev,
Rainer Lienhart
|
|
Pages: 572-581 |
|
doi>10.1145/957013.957133 |
|
Full text: PDF
|
|
In this paper, we present a novel approach to automatically determine the positions of sensors and actuators in an ad-hoc distributed network of general purpose computing platforms. The formulation and solution accounts for the limited precision in temporal ...
In this paper, we present a novel approach to automatically determine the positions of sensors and actuators in an ad-hoc distributed network of general purpose computing platforms. The formulation and solution accounts for the limited precision in temporal synchronization among multiple platforms. The theoretical performance limit for the sensor positions is derived via the Cramer-Rao bound. We analyze the sensitivity of localization accuracy with respect to the number of sensors and actuators as well as their geometry. Extensive Monte Carlo simulation results are reported together with a discussion of the real-time system. In a test platform consisting of 4 speakers and 4 microphones, the sensors' and actuators' three dimensional locations could be estimated with an average bias of 0.08 cm and average standard deviation of 3.8 cm. expand
|
|
|
Integrated power management for video streaming to mobile handheld devices |
| |
Shivajit Mohapatra,
Radu Cornea,
Nikil Dutt,
Alex Nicolau,
Nalini Venkatasubramanian
|
|
Pages: 582-591 |
|
doi>10.1145/957013.957134 |
|
Full text: PDF
|
|
Optimizing user experience for streaming video applications on handheld devices is a significant research challenge. In this paper, we propose an integrated power management approach that unifies low level architectural optimizations (CPU, memory, register), ...
Optimizing user experience for streaming video applications on handheld devices is a significant research challenge. In this paper, we propose an integrated power management approach that unifies low level architectural optimizations (CPU, memory, register), OS power-saving mechanisms (Dynamic Voltage Scaling) and adaptive middleware techniques (admission control, optimal transcoding, network traffic regulation). Specifically, we identify interaction parameters between the different levels and optimize them to significantly reduce power consumption. With knowledge of device configurations, dynamic device parameters and changing system conditions, the middleware layer selects an appropriate video quality and fine tunes the architecture for optimized delivery of video. Our performance results indicate that architectural optimizations that are cognizant of user level parameters(e.g. transcoded video quality) can provide energy gains as high as 57.5% for the CPU and memory. Middleware adaptations to changing network noise levels can save as much as 70% of energy consumed by the wireless network interface. Furthermore, we demonstrate how such an integrated framework, that supports tight coupling of inter-level parameters can enhance user experience on a handheld substantially. expand
|
|
|
DEMONSTRATION SESSION: Video demonstration session |
|
|
|
|
Photo2Video |
| |
Xian-Sheng HUA,
Lie LU,
Hong-Jiang ZHANG
|
|
Pages: 592-593 |
|
doi>10.1145/957013.957136 |
|
Full text: PDF
|
|
To exploit rich content embedded in a single photograph, a system named Photo2Video was developed to automatically convert a photographic series into a video by simulating camera motions, set to incidental music of the user's choice. For a chosen ...
To exploit rich content embedded in a single photograph, a system named Photo2Video was developed to automatically convert a photographic series into a video by simulating camera motions, set to incidental music of the user's choice. For a chosen photographic series, an appropriate camera motion pattern is selected for each photograph to generate a corresponding motion photograph clip. Then, the final output video is rendered by connecting a series of motion photograph clips with specific transitions, and aligning with the selected incidental music. Photo2Video provides a novel way to browse a series of images and can be regarded as a system exploring the new medium between photograph and video. expand
|
|
|
Essistants |
| |
Christos Tryfonas,
James Schumacher
|
|
Pages: 594-595 |
|
doi>10.1145/957013.957137 |
|
Full text: PDF
|
|
One of the challenges the service providers currently face is the ability to introduce a variety of services at a minimal cost and impact to their customers. These services often become personalized as more and more content becomes available. The natural ...
One of the challenges the service providers currently face is the ability to introduce a variety of services at a minimal cost and impact to their customers. These services often become personalized as more and more content becomes available. The natural progression to the service/content explosion is a seamless user interface that remains consistent across the various services and devices. The Essistant architecture attempts to provide personalized services to the end users through a seamless multi-modal user interface and a systematization of the backend services. This paper describes the overall architecture of the Essistant project. The associated video demonstrates the functionality of Essistant for a set of services that have been implemented in a lab environment. The services include video-on-demand using an automated price broker, broadcast video over IP multicast, personalized news, and horoscope, and interaction with the physical space by acting as a proxy to a robot. expand
|
|
|
The automatic video editor |
| |
Sam Yip,
Eugenia Leu,
Hunter Howe
|
|
Pages: 596-597 |
|
doi>10.1145/957013.957138 |
|
Full text: PDF
|
|
More and more home videos are being produced with the increasing popularity of digital video camcorders. Yet the resulting home videos tend to be very long and boring to watch. The precious memories within those videos are ultimately lost. The problem ...
More and more home videos are being produced with the increasing popularity of digital video camcorders. Yet the resulting home videos tend to be very long and boring to watch. The precious memories within those videos are ultimately lost. The problem is that the average home videographer does not have the time, or the editing skills to edit their home videos. It is a shame to let all those precious moments go to waste.There are editing software in the market that allow the user to edit their own home videos. For example, Apple's iMovie [1], Microsoft's MovieMaker [2], and the Hitchcock editing system [3]. But they still demand time, skill and effort from the user.With the above issue in mind, we developed the Automatic Video Editor, an application that analyzes a home video and edit it automatically into a condensed and interesting mini-movie.In our video, we will first show a clip of an unedited home video. Afterwards, we will show the output video created by our Automatic Video Editor. expand
|
|
|
Managing digital memories with the FXPAL photo application |
| |
John Adcock,
Matthew Cooper,
John Doherty,
Jonathan Foote,
Andreas Girgensohn,
Lynn Wilcox
|
|
Pages: 598-599 |
|
doi>10.1145/957013.957139 |
|
Full text: PDF
|
|
The FXPAL Photo Application is designed to faciliate the organization of digital images from digital cameras and other sources through automated organization and intuitive user interfaces.
The FXPAL Photo Application is designed to faciliate the organization of digital images from digital cameras and other sources through automated organization and intuitive user interfaces. expand
|
|
|
Detail-on-demand hypervideo |
| |
John Doherty,
Andreas Girgensohn,
Jonathan Helfman,
Frank Shipman,
Lynn Wilcox
|
|
Pages: 600-601 |
|
doi>10.1145/957013.957140 |
|
Full text: PDF
|
|
We demonstrate the use of detail-on-demand hypervideo in interactive training and video summarization. Detail-on-demand video allows viewers to watch short video segments and to follow hyperlinks to see additional detail. The player for detail-on-demand ...
We demonstrate the use of detail-on-demand hypervideo in interactive training and video summarization. Detail-on-demand video allows viewers to watch short video segments and to follow hyperlinks to see additional detail. The player for detail-on-demand video displays keyframes indicating what links are available at each point in the video. The Hyper-Hitchcock authoring tool helps users create hypervideo by automatically dividing video into clips that can be combined in a direct manipulation interface. Clips can be grouped into composites and hyperlinks can be placed between clips and composites. A summarization algorithm creates multi-level hypervideo summaries from linear video by automatically selecting clips and placing links between them. expand
|
|
|
Active capture: automatic direction for automatic movies |
| |
Marc Davis
|
|
Pages: 602-603 |
|
doi>10.1145/957013.957141 |
|
Full text: PDF
|
|
Current consumer media production is laborious, tedious, and produces unsatisfying results. To address this problem, Active Capture leverages media production knowledge, computer vision and audition algorithms, and user interaction techniques to automate ...
Current consumer media production is laborious, tedious, and produces unsatisfying results. To address this problem, Active Capture leverages media production knowledge, computer vision and audition algorithms, and user interaction techniques to automate direction and cinematography and thus enables the automatic production of annotated, high quality, reusable media assets. Active Capture is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating video of non-actors. The implemented system automates the process of capturing a non-actor performing two simple reusable actions ("screaming" and "turning her head to look at the camera") and automatically integrates those shots into various commercials and movie trailers. expand
|
|
|
SESSION: Content-based retrieval |
|
|
|
|
Multimedia content processing through cross-modal association |
| |
Dongge Li,
Nevenka Dimitrova,
Mingkun Li,
Ishwar K. Sethi
|
|
Pages: 604-611 |
|
doi>10.1145/957013.957143 |
|
Full text: PDF
|
|
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide ...
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different cross-modal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use off-line supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1% and 73.9% accuracy, respectively. As shown by experiments, cross-modal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications. expand
|
|
|
A practical SVM-based algorithm for ordinal regression in image retrieval |
| |
Hong Wu,
Hanqing Lu,
Songde Ma
|
|
Pages: 612-621 |
|
doi>10.1145/957013.957144 |
|
Full text: PDF
|
|
Most current learning algorithms for image retrieval are based on dichotomy relevance judgement (relevant and non-relevant), though this measurement of relevance is too coarse. To better identify the user needs and preference, a good retrieval system ...
Most current learning algorithms for image retrieval are based on dichotomy relevance judgement (relevant and non-relevant), though this measurement of relevance is too coarse. To better identify the user needs and preference, a good retrieval system should be able to handle multilevel relevance judgement. In this paper, we focus on relevance feedback with multilevel relevance judgment, where the relevance feedback is considered as an ordinal regression problem. Herbrich has proposed a support vector learning algorithm for ordinal regression based on the Linear Utility Model. His algorithm is intrinsically to train a SVM on a new derived training set, whose size increases rapidly when the original training set gets bigger. This property limits its applicability in relevance feedback, due to real-time requirement of the interactive process. By thoroughly analyzing Herbrich's algorithm, we first propose a new model for ordinal regression, called Cascade Linear Utility Model, then a practical SVM-based algorithm for image retrieval upon it. Our new algorithm is tested on a real-world image database, and compared with other three algorithms capable to handle multilevel relevance judgment. The experimental results show that the retrieval performance of our algorithm is comparable with that of Herbrich's algorithm but with only a fraction of its computational time, and apparently outperform the other methods. expand
|
|
|
Knowing a tree from the forest: art image retrieval using a society of profiles |
| |
Kai Yu,
Wei-Ying Ma,
Volker Tresp,
Zhao Xu,
Xiaofei He,
HongJiang Zhang,
Hans-Peter Kriegel
|
|
Pages: 622-631 |
|
doi>10.1145/957013.957145 |
|
Full text: PDF
|
|
This paper aims to address the problem of art image retrieval (AIR), which aims to help users find their favorite painting images. AIR is of great interests to us because of its application potentials and interesting research challenges---the retrieval ...
This paper aims to address the problem of art image retrieval (AIR), which aims to help users find their favorite painting images. AIR is of great interests to us because of its application potentials and interesting research challenges---the retrieval is not only based on painting contents or styles, but also heavily based on user preference profiles. This paper describes the collaborative ensemble learning, a novel statistical learning approach to this task. It at first applies probabilistic support vector machines (SVMs) to model each individual user's profile based on given examples, i.e. liked or disliked paintings. Due to the high complexity of profile modelling, the SVMs can be rather weak in predicting preferences for new paintings. To overcome this problem, we combine a society of users' profiles, represented by their respective SVM models, to predict a given user's preferences for painting images. We demonstrate that the combination scheme is embedded in a Bayesian framework and retains intuitive interpretations---like-minded users are likely to share similar preferences. We report extensive empirical studies based on two experimental settings. The first one includes some controlled simulations performed on 4533 painting images. In the second setting, we report evaluations based on user preferences collected through an online web-based survey. Both experiments demonstrate that the proposed approach achieves excellent performance in terms of capturing a user's diverse preferences. expand
|
|
|
VideoQA: question answering on news video |
| |
Hui Yang,
Lekha Chaisorn,
Yunlong Zhao,
Shi-Yong Neo,
Tat-Seng Chua
|
|
Pages: 632-641 |
|
doi>10.1145/957013.957146 |
|
Full text: PDF
|
|
When querying a news video archive, the users are interested in retrieving precise answers in the form of a summary that best answers the query. However, current video retrieval systems, including the search engines on the web, are designed to retrieve ...
When querying a news video archive, the users are interested in retrieving precise answers in the form of a summary that best answers the query. However, current video retrieval systems, including the search engines on the web, are designed to retrieve documents instead of precise answers. This research explores the use of question answering (QA) techniques to support personalized news video retrieval. Users interact with our system, VideoQA, using short natural language questions with implicit constraints on contents, context, duration, and genre of expected videos. VideoQA returns short precise news video summaries as answers. The main contributions of this research are: (a) the extension of QA technology to support QA in news video; and (b) the use of multi-modal features, including visual, audio, textual, and external resources, to help correct speech recognition errors and to perform precise question answering. The system has been tested on 7 days of news video and has been found to be effective. expand
|
|
|
SESSION: Doctoral symposium - session I |
|
|
|
|
A framework for cost-effective peer-to-peer content distribution |
| |
Mohamed M. Hefeeda
|
|
Pages: 642-643 |
|
doi>10.1145/957013.957148 |
|
Full text: PDF
|
|
|
|
|
Transport-level protocol coordination in distributed multimedia applications |
| |
David E. Ott,
Ketan Mayer-Patel
|
|
Pages: 644-645 |
|
doi>10.1145/957013.957149 |
|
Full text: PDF
|
|
|
|
|
A scalable overlay video mixing service model |
| |
Bin Yu,
Klara Nahrstedt
|
|
Pages: 646-647 |
|
doi>10.1145/957013.957150 |
|
Full text: PDF
|
|
|
|
|
SESSION: Doctoral symposium - session II |
|
|
|
|
The mindful camera: common sense for documentary videography |
| |
Barbara Barry
|
|
Pages: 648-649 |
|
doi>10.1145/957013.957152 |
|
Full text: PDF
|
|
Cameras with story understanding can help videographers reflect on their process of content capture during documentary construction. This paper describes a set of tools that use common sense knowledge to support documentary videography.
Cameras with story understanding can help videographers reflect on their process of content capture during documentary construction. This paper describes a set of tools that use common sense knowledge to support documentary videography. expand
|
|
|
Making sense of video content |
| |
A. Viranga Ratnaike,
Bala Srinivasan,
Surya Nepal
|
|
Pages: 650-651 |
|
doi>10.1145/957013.957153 |
|
Full text: PDF
|
|
Our aim in this research is to make sense of scenes in video. We expect this will enable us to identify different scenes sharing the same semantic, even if they do not share any multimedia cues. Our approach is based on emergence, and involves classification ...
Our aim in this research is to make sense of scenes in video. We expect this will enable us to identify different scenes sharing the same semantic, even if they do not share any multimedia cues. Our approach is based on emergence, and involves classification and reasoning. We use patterns of cues to synthesize semantic classifications. These classifications need to be consistent with the observed cues, and suggest other elements that might be present. The suggestions might be based on sets of related patterns, ontology, or other knowledge bases. With an idea of what we are looking for, we can re-examine the scenes for multimedia elements, which support our hypothesis, or at least are not inconsistent. We expect that sufficiently detailed semantic descriptions can be generated, by cycling through these steps. expand
|
|
|
Algorithms and systems for shared access to a robotic streaming video camera |
| |
Dezhen Song
|
|
Pages: 652-653 |
|
doi>10.1145/957013.957154 |
|
Full text: PDF
|
|
|