Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building classifiers in a score space defined by a pre-deployed set of multimodal models. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. An interesting side result shows speech-only models give performance comparable to our best video-only models for detecting visual concepts such as "outdoors", "face" and "cityscape".
Advertisements



top of pageAUTHORS



Author image not provided  C-Y. Lin

No contact information provided yet.

Bibliometrics: publication history
Publication years2001-2016
Publication count47
Citation Count480
Available for download27
Downloads (6 Weeks)94
Downloads (12 Months)888
Downloads (cumulative)15,999
Average downloads per article592.56
Average citations per article10.21
View colleagues of C-Y. Lin


Author image not provided  M. Naphade

No contact information provided yet.

Bibliometrics: publication history
Publication years2000-2014
Publication count33
Citation Count602
Available for download16
Downloads (6 Weeks)19
Downloads (12 Months)480
Downloads (cumulative)8,247
Average downloads per article515.44
Average citations per article18.24
View colleagues of M. Naphade


Author image not provided  A. Natsev

No contact information provided yet.

Bibliometrics: publication history
Publication years1999-2013
Publication count33
Citation Count517
Available for download20
Downloads (6 Weeks)24
Downloads (12 Months)271
Downloads (cumulative)12,468
Average downloads per article623.40
Average citations per article15.67
View colleagues of A. Natsev


Author image not provided  C. Neti

No contact information provided yet.

Bibliometrics: publication history
Publication years2001-2006
Publication count15
Citation Count87
Available for download5
Downloads (6 Weeks)0
Downloads (12 Months)22
Downloads (cumulative)2,194
Average downloads per article438.80
Average citations per article5.80
View colleagues of C. Neti


Author image not provided  J. R. Smith

No contact information provided yet.

Bibliometrics: publication history
Publication years1994-2015
Publication count98
Citation Count1,889
Available for download37
Downloads (6 Weeks)68
Downloads (12 Months)995
Downloads (cumulative)25,315
Average downloads per article684.19
Average citations per article19.28
View colleagues of J. R. Smith


Author image not provided  B. Tseng

No contact information provided yet.

Bibliometrics: publication history
Publication years1978-2012
Publication count58
Citation Count1,210
Available for download39
Downloads (6 Weeks)215
Downloads (12 Months)3,145
Downloads (cumulative)54,156
Average downloads per article1,388.62
Average citations per article20.86
View colleagues of B. Tseng


Author image not provided  H. J. Nock

No contact information provided yet.

Bibliometrics: publication history
Publication years1999-2005
Publication count10
Citation Count93
Available for download5
Downloads (6 Weeks)9
Downloads (12 Months)66
Downloads (cumulative)3,397
Average downloads per article679.40
Average citations per article9.30
View colleagues of H. J. Nock


Author image not provided  W. Adams

No contact information provided yet.

Bibliometrics: publication history
Publication years2003-2003
Publication count1
Citation Count3
Available for download1
Downloads (6 Weeks)0
Downloads (12 Months)8
Downloads (cumulative)519
Average downloads per article519.00
Average citations per article3.00
View colleagues of W. Adams

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
IBM Multimodal Annotation Tool. http://www.alphaworks.ibm.com/tech/multimodalannotation.
 
2
W. Adams and al. IBM Research TREC-2002 Video Retrieval System. In Proc. TREC Workshop, 2002.
 
3
G. Iyengar and al. Semantic Indexing of Multimedia using Audio, Text and Visual Cues. In Proc. ICME, 2002.
 
4
C.-Y. Lin and al. VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning. In Proc ICME, MD, USA, July 2003. http://www.alphaworks.ibm.com/tech/videoannex.
 
5
S. Robertson and al. Okapi at TREC-3. In Proc. TREC Workshop, 1995.
 
6
A. Senior. Face and Feature Finding for a Face Recognition System. In Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication, March 1999.

top of pageCITED BY

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title SIGIR '03 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
General Chairs Charles Clarke University of Waterloo, Canada
Gordon Cormack University of Waterloo, Canada
Program Chairs Jamie Callan Carnegie Mellon University, Pittsburgh, PA
David Hawking Australian National University, Australia
Alan Smeaton Dublin City University, Ireland
Pages 403-404
Publication Date2003-07-28 (yyyy-mm-dd)
Sponsor SIGIR ACM Special Interest Group on Information Retrieval
PublisherACM New York, NY, USA ©2003
ISBN: 1-58113-646-3 Order Number: 534032 doi>10.1145/860435.860522
Conference IRResearch and Development in Information Retrieval IR logo
Paper Acceptance Rate 46 of 266 submissions, 17%
Overall Acceptance Rate 1,201 of 6,327 submissions, 19%
Year Submitted Accepted Rate
SIGIR '99 135 33 24%
SIGIR '01 201 47 23%
SIGIR '02 219 44 20%
SIGIR '03 266 46 17%
SIGIR '04 267 58 22%
SIGIR '05 368 71 19%
SIGIR '06 399 74 19%
SIGIR '07 490 85 17%
SIGIR '08 497 85 17%
SIGIR '09 494 78 16%
SIGIR '10 520 87 17%
SIGIR '11 543 108 20%
SIGIR '12 483 98 20%
SIGIR '13 366 73 20%
SIGIR '14 387 82 21%
SIGIR '15 351 70 20%
SIGIR '16 341 62 18%
Overall 6,327 1,201 19%

APPEARS IN
Artificial Intelligence
Digital Content

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Table of Contents
Keynote Address - exploring, modeling, and using the web graph
Andrei Broder
Pages: 1-1
doi>10.1145/860435.860436
Full text: PDFPDF

The Web graph, meaning the graph induced by Web pages as nodes and their hyperlinks as directed edges, has become a fascinating object of study for many people: physicists, sociologists, mathematicians, computer scientists, and information retrieval ...
expand
Salton Award Lecture - Information retrieval and computer science: an evolving relationship
W. Bruce Croft
Pages: 2-3
doi>10.1145/860435.860437
Full text: PDFPDF

Following the tradition of these acceptance talks, I will be giving my thoughts on where our field is going. Any discussion of the future of information retrieval (IR) research, however, needs to be placed in the context of its history and relationship ...
expand
SESSION: Retreval models
Bayesian extension to the language model for ad hoc information retrieval
Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping
Pages: 4-9
doi>10.1145/860435.860439
Full text: PDFPDF

We propose a Bayesian extension to the ad-hoc Language Model. Many smoothed estimators used for the multinomial query model in ad-hoc Language Models (including Laplace and Bayes-smoothing) are approximations to the Bayesian predictive distribution. ...
expand
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Cheng Xiang Zhai, William W. Cohen, John Lafferty
Pages: 10-17
doi>10.1145/860435.860440
Full text: PDFPDF

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in ...
expand
Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model
Jaime Teevan, David R. Karger
Pages: 18-25
doi>10.1145/860435.860441
Full text: PDFPDF

Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined ...
expand
SESSION: Qusetion answering
Question classification using support vector machines
Dell Zhang, Wee Sun Lee
Pages: 26-32
doi>10.1145/860435.860443
Full text: PDFPDF

Question classification is very important for question answering. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with five machine learning algorithms: Nearest Neighbors ...
expand
Structured use of external knowledge for event-based open domain question answering
Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh
Pages: 33-40
doi>10.1145/860435.860444
Full text: PDFPDF

One of the major problems in question answering (QA) is that the queries are either too brief or often do not contain most relevant terms in the target corpus. In order to overcome this problem, our earlier work integrates external knowledge extracted ...
expand
Quantitative evaluation of passage retrieval algorithms for question answering
Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, Gregory Marton
Pages: 41-47
doi>10.1145/860435.860445
Full text: PDFPDF

Passage retrieval is an important component common to many question answering systems. Because most evaluations of question answering systems focus on end-to-end performance, comparison of common components becomes difficult. To address this shortcoming, ...
expand
SESSION: Web
Building a web thesaurus from web link structure
Zheng Chen, Shengping Liu, Liu Wenyin, Geguang Pu, Wei-Ying Ma
Pages: 48-55
doi>10.1145/860435.860447
Full text: PDFPDF

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web ...
expand
Implicit link analysis for small web search
Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma, Hong-Jiang Zhang, Chao-Jun Lu
Pages: 56-63
doi>10.1145/860435.860448
Full text: PDFPDF

Current Web search engines generally impose link analysis-based re-ranking on web-page retrieval. However, the same techniques, when applied directly to small web search such as intranet and site search, cannot achieve the same performance because their ...
expand
Query type classification for web document retrieval
In-Ho Kang, GilChang Kim
Pages: 64-71
doi>10.1145/860435.860449
Full text: PDFPDF

The heterogeneous Web exacerbates IR problems and short user queries make them worse. The contents of web documents are not enough to find good answer documents. Link information and URL information compensates for the insufficiencies of content information. ...
expand
SESSION: Human interaction
Stuff I've seen: a system for personal information retrieval and re-use
Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, Daniel C. Robbins
Pages: 72-79
doi>10.1145/860435.860451
Full text: PDFPDF

Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information. We describe the design and evaluation of a system, called Stuff I've ...
expand
Search strategies in content-based image retrieval
Sharon McDonald, John Tait
Pages: 80-87
doi>10.1145/860435.860452
Full text: PDFPDF

This paper describes two studies that looked at users' ability to formulate visual queries with a Content-Based Image Retrieval system that uses dominant image colour as the primary indexing key. The first experiment examined users' performance with ...
expand
Using terminological feedback for web search refinement: a log-based study
Peter Anick
Pages: 88-95
doi>10.1145/860435.860453
Full text: PDFPDF

Although interactive query reformulation has been actively studied in the laboratory, little is known about the actual behavior of web searchers who are offered terminological feedback along with their search results. We analyze log sessions for two ...
expand
SESSION: Text categorization
A scalability analysis of classifiers in text categorization
Yiming Yang, Jian Zhang, Bryan Kisiel
Pages: 96-103
doi>10.1145/860435.860455
Full text: PDFPDF

Real-world applications of text categorization often require a system to deal with tens of thousands of categories defined over a large taxonomy. This paper addresses the problem with respect to a set of popular algorithms in text categorization, including ...
expand
A repetition based measure for verification of text collections and for text categorization
Dmitry V. Khmelev, William J. Teahan
Pages: 104-110
doi>10.1145/860435.860456
Full text: PDFPDF

We suggest a way for locating duplicates and plagiarisms in a text collection using an R-measure, which is the normalized sum of the lengths of all suffixes of the text repeated in other documents of the collection. The R-measure can be effectively ...
expand
Using asymmetric distributions to improve text classifier probability estimates
Paul N. Bennett
Pages: 111-118
doi>10.1145/860435.860457
Full text: PDFPDF

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes ...
expand
SESSION: Multimedia information retrieval
Automatic image annotation and retrieval using cross-media relevance models
J. Jeon, V. Lavrenko, R. Manmatha
Pages: 119-126
doi>10.1145/860435.860459
Full text: PDFPDF

Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming ...
expand
Modeling annotated data
David M. Blei, Michael I. Jordan
Pages: 127-134
doi>10.1145/860435.860460
Full text: PDFPDF

We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models ...
expand
Experimental result analysis for a generative probabilistic image retrieval model
Thijs Westerveld, Arjen P. de Vries
Pages: 135-142
doi>10.1145/860435.860461
Full text: PDFPDF

The main conclusion from the metrics-based evaluation of video retrieval systems at TREC's video track is that non-interactive image retrieval from general collections using visual information only is not yet feasible. We show how a detailed analysis ...
expand
SESSION: Structured documents
Combining document representations for known-item search
Paul Ogilvie, Jamie Callan
Pages: 143-150
doi>10.1145/860435.860463
Full text: PDFPDF

This paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is very similar to work in meta-search and data fusion, we adapt several hypotheses ...
expand
Searching XML documents via XML fragments
David Carmel, Yoelle S. Maarek, Matan Mandelbrod, Yosi Mass, Aya Soffer
Pages: 151-158
doi>10.1145/860435.860464
Full text: PDFPDF

Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information ...
expand
SESSION: Text representation
Word sense disambiguation in information retrieval revisited
Christopher Stokoe, Michael P. Oakes, John Tait
Pages: 159-166
doi>10.1145/860435.860466
Full text: PDFPDF

Word sense ambiguity is recognized as having a detrimental effect on the precision of information retrieval systems in general and web search systems in particular, due to the sparse nature of the queries involved. Despite continued research into the ...
expand
Probabilistic term variant generator for biomedical terms
Yoshimasa Tsuruoka, Jun'ichi Tsujii
Pages: 167-173
doi>10.1145/860435.860467
Full text: PDFPDF

This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially useful for query and dictionary expansions. The probabilistic ...
expand
SESSION: Text categorization
A maximal figure-of-merit learning approach to text categorization
Sheng Gao, Wen Wu, Chin-Hui Lee, Tat-Seng Chua
Pages: 174-181
doi>10.1145/860435.860469
Full text: PDFPDF

A novel maximal figure-of-merit (MFoM) learning approach to text categorization is proposed. Different from the conventional techniques, the proposed MFoM method attempts to integrate any performance metric of interest (e.g. accuracy, recall, precision, ...
expand
Text categorization by boosting automatically extracted concepts
Lijuan Cai, Thomas Hofmann
Pages: 182-189
doi>10.1145/860435.860470
Full text: PDFPDF

Term-based representations of documents have found wide-spread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with ...
expand
Robustness of regularized linear classification methods in text categorization
Jian Zhang, Yiming Yang
Pages: 190-197
doi>10.1145/860435.860471
Full text: PDFPDF

Real-world applications often require the classification of documents under situations of small number of features, mis-labeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classification methods ...
expand
SESSION: Human interaction
Building and applying a concept hierarchy representation of a user profile
Nikolaos Nanas, Victoria Uren, Anne De Roeck
Pages: 198-204
doi>10.1145/860435.860473
Full text: PDFPDF

Term dependence is a natural consequence of language use. Its successful representation has been a long standing goal for Information Retrieval research. We present a methodology for the construction of a concept hierarchy that takes into account the ...
expand
Query length in interactive information retrieval
N. J. Belkin, D. Kelly, G. Kim, J.-Y. Kim, H.-J. Lee, G. Muresan, M.-C. Tang, X.-J. Yuan, C. Cool
Pages: 205-212
doi>10.1145/860435.860474
Full text: PDFPDF

Query length in best-match information retrieval (IR) systems is well known to be positively related to effectiveness in the IR task, when measured in experimental, non-interactive environments. However, in operational, interactive IR systems, query ...
expand
Re-examining the potential effectiveness of interactive query expansion
Ian Ruthven
Pages: 213-220
doi>10.1145/860435.860475
Full text: PDFPDF

Much attention has been paid to the relative effectiveness of interactive query expansion versus automatic query expansion. Although interactive query expansion has the potential to be an effective means of improving a search, in this paper we show that, ...
expand
SESSION: IR theory
Latent concepts and the number orthogonal factors in latent semantic analysis
Georges Dupret
Pages: 221-226
doi>10.1145/860435.860477
Full text: PDFPDF

We seek insight into Latent Semantic Indexing by establishing a method to identify the optimal number of factors in the reduced matrix for representing a keyword. This method is demonstrated empirically by duplicating all documents containing a term ...
expand
A frequency-based and a poisson-based definition of the probability of being informative
Thomas Roelleke
Pages: 227-234
doi>10.1145/860435.860478
Full text: PDFPDF

This paper reports on theoretical investigations about the assumptions underlying the inverse document frequency (idf). We show that an intuitive idf-based probability function for the probability of a term being informative assumes disjoint ...
expand
Table extraction using conditional random fields
David Pinto, Andrew McCallum, Xing Wei, W. Bruce Croft
Pages: 235-242
doi>10.1145/860435.860479
Full text: PDFPDF

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional ...
expand
SESSION: Filtering and retrieval models
Building a filtering test collection for TREC 2002
Ian Soboroff, Stephen Robertson
Pages: 243-250
doi>10.1145/860435.860481
Full text: PDFPDF

Test collections for the filtering track in TREC have typically used either past sets of relevance judgments, or categorized collections such as Reuters Corpus Volume 1 or OHSUMED, because filtering systems need relevance judgments during the experiment ...
expand
An empirical study on retrieval models for different document genres: patents and newspaper articles
Makoto Iwayama, Atsushi Fujii, Noriko Kando, Yuzo Marukawa
Pages: 251-258
doi>10.1145/860435.860482
Full text: PDFPDF

Reflecting the rapid growth in the utilization of large test collections for information retrieval since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. However, most collections ...
expand
Collaborative filtering via gaussian probabilistic latent semantic analysis
Thomas Hofmann
Pages: 259-266
doi>10.1145/860435.860483
Full text: PDFPDF

Collaborative filtering aims at learning predictive models of user preferences, interests or behavior from community data, i.e. a database of available user preferences. In this paper, we describe a new model-based algorithm designed for this task, which ...
expand
SESSION: Clustering
Document clustering based on non-negative matrix factorization
Wei Xu, Xin Liu, Yihong Gong
Pages: 267-273
doi>10.1145/860435.860485
Full text: PDFPDF

In this paper, we propose a novel document clustering method based on the non-negative factorization of the term-document matrix of the given document corpus. In the latent semantic space derived by the non-negative matrix factorization (NMF), each axis ...
expand
ReCoM: reinforcement clustering of multi-type interrelated data objects
Jidong Wang, Huajun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-Ying Ma
Pages: 274-281
doi>10.1145/860435.860486
Full text: PDFPDF

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated ...
expand
A comparative study on content-based music genre classification
Tao Li, Mitsunori Ogihara, Qi Li
Pages: 282-289
doi>10.1145/860435.860487
Full text: PDFPDF

Content-based music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Currently little ...
expand
SESSION: Distributed information retrieval
Evaluating different methods of estimating retrieval quality for resource selection
Henrik Nottelmann, Norbert Fuhr
Pages: 290-297
doi>10.1145/860435.860489
Full text: PDFPDF

In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be routed. Most existing resource selection algorithms compute a library ranking ...
expand
Relevant document distribution estimation method for resource selection
Luo Si, Jamie Callan
Pages: 298-305
doi>10.1145/860435.860490
Full text: PDFPDF

Prior research under a variety of conditions has shown the CORI algorithm to be one of the most effective resource selection algorithms, but the range of database sizes studied was not large. This paper shows that the CORI algorithm does not do well ...
expand
SETS: search enhanced by topic segmentation
Mayank Bawa, Gurmeet Singh Manku, Prabhakar Raghavan
Pages: 306-313
doi>10.1145/860435.860491
Full text: PDFPDF

We present SETS, an architecture for efficient search in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a topic-segmented overlay ...
expand
SESSION: Novelty and topic change
Retrieval and novelty detection at the sentence level
James Allan, Courtney Wade, Alvaro Bolivar
Pages: 314-321
doi>10.1145/860435.860493
Full text: PDFPDF

Previous research in novelty detection has focused on the task of finding novel material, given a set or stream of documents on a certain topic. This study investigates the more difficult two-part task defined by the TREC 2002 novelty track: given a ...
expand
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Xiang Ji, Hongyuan Zha
Pages: 322-329
doi>10.1145/860435.860494
Full text: PDFPDF

This paper presents a novel domain-independent text segmentation method, which identifies the boundaries of topic changes in long text documents and/or text streams. The method consists of three components: As a preprocessing step, we eliminate the document-dependent ...
expand
A System for new event detection
Thorsten Brants, Francine Chen, Ayman Farahat
Pages: 330-337
doi>10.1145/860435.860495
Full text: PDFPDF

We present a new method and system for performing the New Event Detection task, i.e., in one or multiple streams of news stories, all stories on a previously unseen (new) event are marked. The method is based on an incremental TF-IDF model. Our extensions ...
expand
SESSION: Cross-lingual information retrieval
Probabilistic structured query methods
Kareem Darwish, Douglas W. Oard
Pages: 338-344
doi>10.1145/860435.860497
Full text: PDFPDF

Structured methods for query term replacement rely on separate estimates of term tes of replacement probabilities. Statistically significantfrequency and document frequency to compute a weight for each query term. This paper reviews prior work on structured ...
expand
Fuzzy translation of cross-lingual spelling variants
Ari Pirkola, Jarmo Toivonen, Heikki Keskustalo, Kari Visala, Kalervo Järvelin
Pages: 345-352
doi>10.1145/860435.860498
Full text: PDFPDF

We will present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first stage, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated ...
expand
Automatic transliteration for Japanese-to-English text retrieval
Yan Qu, Gregory Grefenstette, David A. Evans
Pages: 353-360
doi>10.1145/860435.860499
Full text: PDFPDF

For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between ...
expand
POSTER SESSION: Posters
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Javed A. Aslam, Robert Savell
Pages: 361-362
doi>10.1145/860435.860501
Full text: PDFPDF

Soboroff, Nicholas and Cahan recently proposed a method for evaluating the performance of retrieval systems without relevance judgments. They demonstrated that the system evaluations produced by their methodology are correlated with actual evaluations ...
expand
Resource selection and data fusion in multimedia distributed digital libraries
Jamie Callan, Fabio Crestani, Henrik Nottelmann, Pietro Pala, Xiao Mang Shou
Pages: 363-364
doi>10.1145/860435.860502
Full text: PDFPDF
Transliteration of proper names in cross-language applications
Paola Virga, Sanjeev Khudanpur
Pages: 365-366
doi>10.1145/860435.860503
Full text: PDFPDF
Toward a unification of text and link analysis
Brian D. Davison
Pages: 367-368
doi>10.1145/860435.860504
Full text: PDFPDF

This paper presents a simple yet profound idea. By thinking about the relationships between and within terms and documents, we can generate a richer representation that encompasses aspects of Web link analysis as well as text analysis techniques from ...
expand
Investigating the relationship between language model perplexity and IR precision-recall measures
Leif Azzopardi, Mark Girolami, Keith van Risjbergen
Pages: 369-370
doi>10.1145/860435.860505
Full text: PDFPDF

An empirical study has been conducted investigating the relationship between the performance of an aspect based language model in terms of perplexity and the corresponding information retrieval performance obtained. It is observed, on the corpora considered, ...
expand
Topic distillation using hierarchy concept tree
Ikkyu Choi, Minkoo Kim
Pages: 371-372
doi>10.1145/860435.860506
Full text: PDFPDF

In this paper, we propose a new approach for topic distillation on World Wide Web. Topic distillation is to find quality documents related to the user query topic. Our approach is based on Bharat's topic distillation algorithm [1]. We present the analysis ...
expand
Using manually-built web directories for automatic evaluation of known-item retrieval
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David Grossman, Ophir Frieder
Pages: 373-374
doi>10.1145/860435.860507
Full text: PDFPDF

Information retrieval system evaluation is complicated by the need for manually assessed relevance judgments. Large manually-built directories on the web open the door to new evaluation procedures. By assuming that web pages are the known relevant items ...
expand
Popular music retrieval by detecting mood
Yazhong Feng, Yueting Zhuang, Yunhe Pan
Pages: 375-376
doi>10.1145/860435.860508
Full text: PDFPDF
Exploiting query history for document ranking in interactive information retrieval
Xuehua Shen, Cheng Xiang Zhai
Pages: 377-378
doi>10.1145/860435.860509
Full text: PDFPDF

In this poster,we incorporate user query history, as context information, to improve the retrieval performance in interactive retrieval. Experiments using the TREC data show that incorporating such context information indeed consistently improves the ...
expand
Automatic ranking of retrieval systems in imperfect environments
Rabia Nuray, Fazli Can
Pages: 379-380
doi>10.1145/860435.860510
Full text: PDFPDF

The empirical investigation of the effectiveness of information retrieval (IR) systems requires a test collection, a set of query topics, and a set of relevance judgments made by human assessors for each query. Previous experiments show that differences ...
expand
An investigation of broad coverage automatic pronoun resolution for information retrieval
Richard J. Edens, Helen L. Gaylard, Gareth J. F. Jones, Adenike M. Lam-Adesina
Pages: 381-382
doi>10.1145/860435.860511
Full text: PDFPDF

Term weighting methods have been shown to give significant increases in information retrieval performance. The presence of pronomial references in documents reduces the term frequencies of associated words with a consequent effect on term weights and ...
expand
Syntactic features in question answering
Xiaoyan Li
Pages: 383-384
doi>10.1145/860435.860512
Full text: PDFPDF

Syntactic information potentially plays a much more important role in question answering than it does in information retrieval. Although many people have used syntactic evidence in Question Answering, there haven't been many detailed experiments reported ...
expand
Searchers' criteria For assessing web pages
Anastasios Tombros, Ian Ruthven, Joemon M. Jose
Pages: 385-386
doi>10.1145/860435.860513
Full text: PDFPDF

We investigate the criteria used by online searchers when assessing the relevance of web pages to information-seeking tasks. Twenty four searchers were given three tasks each, and indicated the features of web pages which they employed when deciding ...
expand
When query expansion fails
Bodo Billerbeck, Justin Zobel
Pages: 387-388
doi>10.1145/860435.860514
Full text: PDFPDF

The effectiveness of queries in information retrieval can be improved through query expansion. This technique automatically introduces additional query terms that are statistically likely to match documents on the intended topic. However, query expansion ...
expand
Music modeling with random fields
Victor Lavrenko, Jeremy Pickens
Pages: 389-390
doi>10.1145/860435.860515
Full text: PDFPDF
Fractal summarization: summarization based on fractal theory
Christopher C. Yang, Fu Lee Wang
Pages: 391-392
doi>10.1145/860435.860516
Full text: PDFPDF

In this paper, we introduce the fractal summarization model based on the fractal theory. In fractal summarization, the important information is captured from the source text by exploring the hierarchical structure and salient features of the document. ...
expand
A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm
Javed A. Aslam, Virgiliu Pavlu, Robert Savell
Pages: 393-394
doi>10.1145/860435.860517
Full text: PDFPDF

We present a unified framework for simultaneously solving both the pooling problem (the construction of efficient document pools for the evaluation of retrieval systems) and metasearch (the fusion of ranked lists returned by retrieval systems in order ...
expand
Statistical visual feature indexes in video retrieval
Xiangming Mu, Gary Marchionini
Pages: 395-396
doi>10.1145/860435.860518
Full text: PDFPDF

Four statistical visual feature indexes are proposed: SLM (Shot Length Mean), the average length of each shot in a video; SLD (Shot Length Deviation), the standard deviation of shot lengths for a video; ONM (Object Number Mean), the average number of ...
expand
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora
Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura
Pages: 397-398
doi>10.1145/860435.860519
Full text: PDFPDF

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined ...
expand
Document-self expansion for text categorization
Yuen-Hsien Tseng, Da-Wei Juang
Pages: 399-400
doi>10.1145/860435.860520
Full text: PDFPDF

Approaches to increase training examples to hopefully improve classification effectiveness are proposed in this work. The approaches were verified by use of two Chinese collections classified by two top-performing classifiers.
expand
An architecture for peer-to-peer information retrieval
Iraklis A. Klampanos, Joemon M. Jose
Pages: 401-402
doi>10.1145/860435.860521
Full text: PDFPDF
User-trainable video annotation using multimodal cues
C-Y. Lin, M. Naphade, A. Natsev, C. Neti, J. R. Smith, B. Tseng, H. J. Nock, W. Adams
Pages: 403-404
doi>10.1145/860435.860522
Full text: PDFPDF

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building ...
expand
Incorporating query term dependencies in language models for document retrieval
Munirathnam Srikanth, Rohini Srihari
Pages: 405-406
doi>10.1145/860435.860523
Full text: PDFPDF
Error analysis of difficult TREC topics
Xiao Hu, Sindhura Bandhakavi, Chengxiang Zhai
Pages: 407-408
doi>10.1145/860435.860524
Full text: PDFPDF

Given the experimental nature of information retrieval, progress critically depends on analyzing the errors made by existing retrieval approaches and understanding their limitations. Our research explores various hypothesized reasons for hard topics ...
expand
XML retrieval: what to retrieve?
Jaap Kamps, Maarten Marx, Maarten de Rijke, Börkur Sigurbjörnsson
Pages: 409-410
doi>10.1145/860435.860525
Full text: PDFPDF

The fundamental difference between standard information retrieval and XML retrieval is the unit of retrieval. In traditional IR, the unit of retrieval is fixed: it is the complete document. In XML retrieval, every XML element in a document is a retrievable ...
expand
Discovering and structuring information flow among bioinformatics resources
Joan C. Bartlett, Elaine G. Toms
Pages: 411-412
doi>10.1145/860435.860526
Full text: PDFPDF

In this poster, we present a model of the flow of information among bioinformatics resources in the context of a specific scientific problem. Combining task analysis with traditional, qualitative research, we determined the extent to which the bioinformatics ...
expand
eBizSearch: a niche search engine for e-business
C. Lee Giles, Yves Petinot, Pradeep B. Teregowda, Hui Han, Steve Lawrence, Arvind Rangaswamy, Nirmal Pal
Pages: 413-414
doi>10.1145/860435.860527
Full text: PDFPDF

Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance. By taking advantage of their domain of concentration they achieve ...
expand
Single n-gram stemming
James Mayfield, Paul McNamee
Pages: 415-416
doi>10.1145/860435.860528
Full text: PDFPDF

Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. We demonstrate that selection ...
expand
Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels
Tetsuya Sakai
Pages: 417-418
doi>10.1145/860435.860529
Full text: PDFPDF
A comparison of various approaches for using probabilistic dependencies in language modeling
Peter Bruza, Dawei Song
Pages: 419-420
doi>10.1145/860435.860530
Full text: PDFPDF
Topic hierarchy generation via linear discriminant projection
Tao Li, Shenghuo Zhu, Mitsunori Ogihara
Pages: 421-422
doi>10.1145/860435.860531
Full text: PDFPDF
A personalised information retrieval tool
Innes Martin, Joemon M. Jose
Pages: 423-424
doi>10.1145/860435.860532
Full text: PDFPDF

Industry professionals and everyday users of the Internet have long accepted that due to both the size and growth of this ubiquitous repository, new tools are needed to assist with the finding and extraction of very specific resources relevant to a user's ...
expand
Classification of source code archives
Robert Krovetz, Secil Ugurel, C. Lee Giles
Pages: 425-426
doi>10.1145/860435.860533
Full text: PDFPDF

The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined ...
expand
Passage retrieval vs. document retrieval for factoid question answering
Charles L. A. Clarke, Egidio L. Terra
Pages: 427-428
doi>10.1145/860435.860534
Full text: PDFPDF
Evaluating retrieval performance for Japanese question answering: what are best passages?
Tetsuya Sakai, Tomoharu Kokubu
Pages: 429-430
doi>10.1145/860435.860535
Full text: PDFPDF
Image classification using hybrid neural networks
Chih-Fong Tsai, Ken McGarry, John Tait
Pages: 431-432
doi>10.1145/860435.860536
Full text: PDFPDF

Use of semantic content is one of the major issues which needs to be addressed for improving image retrieval effectiveness. We present a new approach to classify images based on the combination of image processing techniques and hybrid neural networks. ...
expand
On an equivalence between PLSI and LDA
Mark Girolami, Ata Kabán
Pages: 433-434
doi>10.1145/860435.860537
Full text: PDFPDF

Latent Dirichlet Allocation (LDA) is a fully generative approach to language modelling which overcomes the inconsistent generative semantics of Probabilistic Latent Semantic Indexing (PLSI). This paper shows that PLSI is a maximum a posteriori ...
expand
Query word deletion prediction
Rosie Jones, Daniel C. Fain
Pages: 435-436
doi>10.1145/860435.860538
Full text: PDFPDF

Web search query logs contain traces of users' search modifications. One strategy users employ is deleting terms, presumably to obtain greater coverage. It is useful to model and automate term deletion when arbitrary searches are conjunctively matched ...
expand
Assessing the effectiveness of pen-based input queries
Stephen Levin, Paul Clough, Mark Sanderson
Pages: 437-438
doi>10.1145/860435.860539
Full text: PDFPDF

In this poster, we describe an experiment exploring the effectiveness of a pen based text input device for use in query construction. Standard TREC queries were written, recognised, and subsequently retrieved upon. Comparisons between retrieval effectiveness ...
expand
A light weight PDA-friendly collection fusion technique
Jeffery Antoniuk, Mario A. Nascimento
Pages: 439-440
doi>10.1145/860435.860540
Full text: PDFPDF

This short paper presents a light weight technique to merge results lists obtained from querying different databases. The motivation for such a technique is a general purpose search engine for Palm-OS based PDAs.
expand
Speech-based and video-supported indexing of multimedia broadcast news
Yoshihiko Hayashi, Katsutoshi Ohtsuki, Katsuji Bessho, Osamu Mizuno, Yoshihiro Matsuo, Shoichi Matsunaga, Minoru Hayashi, Takaaki Hasegawa, Naruhiro Ikeda
Pages: 441-442
doi>10.1145/860435.860541
Full text: PDFPDF

This paper describes an automatic content indexing system for news programs, with a special emphasis on its segmentation process. The process can successfully segment an entire news program into topic-centered news stories; the primary tool is a linguistic ...
expand
Summary evaluation and text categorization
Khurshid Ahmad, Bogdan Vrusias, Paulo C F de Oliveira
Pages: 443-444
doi>10.1145/860435.860542
Full text: PDFPDF

In general terms the evaluation of a summary depends on how close it is to the chief points in the source text. This begets the question as to what are the chief points in the source text and how is this information used in itself in identifying the ...
expand
Rule-based word clustering for text classification
Hui Han, Eren Manavoglu, C. Lee Giles, Hongyuan Zha
Pages: 445-446
doi>10.1145/860435.860543
Full text: PDFPDF

This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such ...
expand
HAT: a hardware assisted TOP-DOC inverted index component
S. Kagan Agun, Ophir Frieder
Pages: 447-448
doi>10.1145/860435.860544
Full text: PDFPDF

A novel Hardware Assisted Top-Doc (HAT) component is disclosed. HAT is an optimized content indexing device based on a modified inverted index structure. HAT accommodates patterns of different lengths and supports a varied posting list versus term count ...
expand
An information-theoretic measure for document similarity
Javed A. Aslam, Meredith Frost
Pages: 449-450
doi>10.1145/860435.860545
Full text: PDFPDF

Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifically to document similarity and test the effectiveness of an information-theoretic ...
expand
Optimizing term vectors for efficient and robust filtering
David A. Evans, Jeffrey Bennett, David A. Hull
Pages: 451-452
doi>10.1145/860435.860546
Full text: PDFPDF

We describe an efficient, robust method for selecting and optimizing terms for a classification or filtering task. Terms are extracted from positive examples in training data based on several alternative term-selection algorithms, then combined additively ...
expand
The TREC-like evaluation of music IR systems
J. Stephen Downie
Pages: 453-454
doi>10.1145/860435.860547
Full text: PDFPDF

This poster reports upon the ongoing efforts being made to establish TREC-like and other comprehensive evaluation paradigms within the Music IR (MIR) and Music Digital Library (MDL) research communities. The proposed research tasks are based upon expert ...
expand
Stemming in the language modeling framework
James Allan, Giridhar Kumaran
Pages: 455-456
doi>10.1145/860435.860548
Full text: PDFPDF
Generating hierarchical summaries for web searches
Dawn J. Lawrie, W. Bruce Croft
Pages: 457-458
doi>10.1145/860435.860549
Full text: PDFPDF

Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and then apply this technique to summarizing the documents retrieved by a ...
expand
Analysis of anchor text for web search
Nadav Eiron, Kevin S. McCurley
Pages: 459-460
doi>10.1145/860435.860550
Full text: PDFPDF
DEMONSTRATION SESSION: Demos
User-assisted query translation for interactive CLIR
Daqing He, Jianqiang Wang, Douglas W. Oard, Michael Nossal
Pages: 461-461
doi>10.1145/860435.860552
Full text: PDFPDF
DefScriber: a hybrid system for definitional QA
Sasha Blair-Goldensohn, Kathleen R. McKeown, Andrew Hazen Schlaikjer
Pages: 462-462
doi>10.1145/860435.860553
Full text: PDFPDF
Querying XML using structures and keywords in timber
Cong Yu, H. V. Jagadish, Dragomir R. Radev
Pages: 463-463
doi>10.1145/860435.860554
Full text: PDFPDF

This demonstration will describe how Timber, a native XML database system, has been extended with the capability to answer XML-style structured queries (e.g., XQuery) with embedded IR-style keyword-based non-boolean conditions. With the original structured ...
expand
SE-LEGO: creating metasearch engines on demand
Zonghuan Wu, Vijay Raghavan, Chun Du, Komanduru Sai C, Weiyi Meng, Hai He, Clement Yu
Pages: 464-464
doi>10.1145/860435.860555
Full text: PDFPDF
MIND: resource selection and data fusion in multimedia distributed digital libraries
Stefano Berretti, Jamie Callan, Henrik Nottelmann, Xiao Mang Shou, Shengli Wu
Pages: 465-465
doi>10.1145/860435.860556
Full text: PDFPDF
Head/modifier pairs for everyone
Cornelis H. A. Koster
Pages: 466-466
doi>10.1145/860435.860557
Full text: PDFPDF
Document retrieval from user-selected web sites
Ulrich Bohnacker, Ingrid Renz
Pages: 467-467
doi>10.1145/860435.860558
Full text: PDFPDF

We present a new tool for gathering textual information according to a query (texts) on arbitrary web sites specified by an information-seeking user. This tool is helpful in any knowledge-intensive area. Its technology is based on the vector space model ...
expand
eArchivarius: accessing collections of electronic mail
Anton Leuski, Douglas W. Oard, Rahul Bhagat
Pages: 468-468
doi>10.1145/860435.860559
Full text: PDFPDF

We present eArchivarius an interactive system for accessing collections of electronic mail. The system combines search, clustering visualization, and time-based visualization of email messages and people who send or received the messages.
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder