Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

In the World Wide Web, myriads of hyperlinks connect documents and pages to create an unprecedented, highly complex graph structure - the Web graph. This paper presents a novel approach to learning probabilistic models of the Web, which can be used to make reliable predictions about connectivity and information content of Web documents. The proposed method is a probabilistic dimension reduction technique which recasts and unites Latent Semantic Analysis and Kleinberg's Hubs-and-Authorities algorithm in a statistical setting.

This meant to be a first step towards the development of a statistical foundation for Web—related information technologies. Although this paper does not focus on a particular application, a variety of algorithms operating in the Web/Internet environment can take advantage of the presented techniques, including search engines, Web crawlers, and information agent systems.

Advertisements



top of pageAUTHORS



Thomas Hofmann Thomas Hofmann

homepage
thomas_hofmannatacm.org
Bibliometrics: publication history
Publication years1995-2016
Publication count73
Citation Count4,143
Available for download29
Downloads (6 Weeks)249
Downloads (12 Months)3,484
Downloads (cumulative)39,460
Average downloads per article1,360.69
Average citations per article56.75
View colleagues of Thomas Hofmann

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407, 1990.
 
4
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39:1-38, 1977.
 
5
6
 
7
8
 
9
L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Language Processing, pages 81-89. 1997.

top of pageCITED BY

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Chairmen Emmanuel Yannakoudakis Athens Univ. of Economics and Business, Greece
Nicholas J. Belkin Rutgers Univ.
Mun-Kew Leong Kent Ridge Digital Labs
Peter Ingwersen Royal School of Library and Information Science
Pages 369-371
Publication Date2000-07-01 (yyyy-mm-dd)
Sponsors SIGIR ACM Special Interest Group on Information Retrieval
Athens U of Econ & Business Athens University of Economics and Business
Greek Com Soc Greek Computer Society
PublisherACM New York, NY, USA ©2000
ISBN: 1-58113-226-3 doi>10.1145/345508.345660
Conference IRResearch and Development in Information Retrieval IR logo
Overall Acceptance Rate 1,201 of 6,327 submissions, 19%
Year Submitted Accepted Rate
SIGIR '99 135 33 24%
SIGIR '01 201 47 23%
SIGIR '02 219 44 20%
SIGIR '03 266 46 17%
SIGIR '04 267 58 22%
SIGIR '05 368 71 19%
SIGIR '06 399 74 19%
SIGIR '07 490 85 17%
SIGIR '08 497 85 17%
SIGIR '09 494 78 16%
SIGIR '10 520 87 17%
SIGIR '11 543 108 20%
SIGIR '12 483 98 20%
SIGIR '13 366 73 20%
SIGIR '14 387 82 21%
SIGIR '15 351 70 20%
SIGIR '16 341 62 18%
Overall 6,327 1,201 19%

APPEARS IN
Digital Content
Artificial Intelligence

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Table of Contents
Salton Award lecture: on theoretical argument in information retrieval (summary only): on theoretical argument in information retrieval
Stephen Robertson
Page: 1
doi>10.1145/345508.344658
Full text: PDFPDF

The last winner of the Salton Award, Tefko Saracevic, gave an acceptance address at SIGIR in Philadelphia in 1997. Previous winners were William Cooper (1994), Cyril Cleverdon (1991), Karen Sparck Jones (1988) and Gerard Salton himself (1985). In this ...
expand
Relevance and contributing information types of searched documents in task performance
Pertti Vakkari
Pages: 2-9
doi>10.1145/345508.345512
Full text: PDFPDF

End-users base the relevance judgements of the searched documents on the expected contribution to their task of the information contained in the documents. There is a shortage of studies analyzing the relationships between the experienced contribution, ...
expand
Relevance feedback with a small number of relevance judgements: incremental relevance feedback vs. document clustering
Makoto Iwayama
Pages: 10-16
doi>10.1145/345508.345538
Full text: PDFPDF

The use of incremental relevance feedback and document clustering were investigated in an relevance feedback environment in which the number of relevance judgements was quite small. Through experiments on the TREC collection, the incremental relevance ...
expand
Do batch and user evaluations give the same results?
William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kramer, Lynetta Sacherek, Daniel Olson
Pages: 17-24
doi>10.1145/345508.345539
Full text: PDFPDF

Do improvements in system performance demonstrated by batch evaluations confer the same benefit for real users? We carried out experiments designed to investigate this question. After identifying a weighting scheme that gave maximum improvement over ...
expand
A novel method for the evaluation of Boolean query effectiveness across a wide operational range
Eero Sormunen
Pages: 25-32
doi>10.1145/345508.345541
Full text: PDFPDF

Traditional methods for the system-oriented evaluation of Boolean IR system suffer from validity and reliability problems. Laboratory-based research neglects the searcher and studies suboptimal queries. Research on operational systems fails to make a ...
expand
Evaluating evaluation measure stability
Chris Buckley, Ellen M. Voorhees
Pages: 33-40
doi>10.1145/345508.345543
Full text: PDFPDF

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment ...
expand
IR evaluation methods for retrieving highly relevant documents
Kalervo Järvelin, Jaana Kekäläinen
Pages: 41-48
doi>10.1145/345508.345545
Full text: PDFPDF

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
expand
Automatic generation of overview timelines
Russell Swan, James Allan
Pages: 49-56
doi>10.1145/345508.345546
Full text: PDFPDF

We present a statistical model of feature occurrence over time, and develop tests based on classical hypothesis testing for significance of term appearance on a given date. Using additional classical hypothesis testing we are able to combine these terms ...
expand
Event tracking based on domain dependency
Fumiyo Fukumoto, Yoshimi Suzuki
Pages: 57-64
doi>10.1145/345508.345548
Full text: PDFPDF

This paper proposes a method for event tracking on broadcast news stories based on distinction between a topic and an event. A topic and an event are identified using a simple criterion called domain dependency of words: how greatly a word features a ...
expand
Improving text categorization methods for event tracking
Yiming Yang, Tom Ault, Thomas Pierce, Charles W. Lattimer
Pages: 65-72
doi>10.1145/345508.345550
Full text: PDFPDF

Automated tracking of events from chronologically ordered document streams is a new challenge for statistical text classification. Existing learning techniques must be adapted or improved in order to effectively handle difficult situations where the ...
expand
Evaluation of a simple and effective music information retrieval method
Stephen Downie, Michael Nelson
Pages: 73-80
doi>10.1145/345508.345551
Full text: PDFPDF

We developed, and then evaluated, a music information retrieval (MIR) system based upon the intervals found within the melodies of a collection of 9354 folksongs. The songs were converted to an interval-only representation of monophonic melodies and ...
expand
Phonetic confusion matrix based spoken document retrieval
Savitha Srinivasan, Dragutin Petkovic
Pages: 81-87
doi>10.1145/345508.345552
Full text: PDFPDF

Combined word-based index and phonetic indexes have been used to improve the performance of spoken document retrieval systems primarily by addressing the out-of-vocabulary retrieval problem. However, a known problem with phonetic recognition is its limited ...
expand
Multiple evidence combination in image retrieval: Diogenes searches for people on the Web
Y. Alp Aslandogan, Clement T. Yu
Pages: 88-95
doi>10.1145/345508.345553

In this work, we examine evidence combination mechanisms for classifying multimedia information. In particular, we examine linear and Dempster-Shafer methods of evidence combination in the context of identifying personal images on the World Wide Web. ...
expand
Link-based and content-based evidential information in a belief network model
Ilmério Silva, Berthier Ribeiro-Neto, Pável Calado, Edleno Moura, Nívio Ziviani
Pages: 96-103
doi>10.1145/345508.345554
Full text: PDFPDF

This work presents an information retrieval model developed to deal with hyperlinked environments. The model is based on belief networks and provides a framework for combining information extracted from the content of the documents with information derived ...
expand
The feature quantity: an information theoretic perspective of Tfidf-like measures
Akiko Aizawa
Pages: 104-111
doi>10.1145/345508.345556
Full text: PDFPDF

The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined ...
expand
INSYDER — an information assistant for business intelligence
Harald Reiterer, Gabriela Mußler, Thomas M. Mann, Siegfried Handschuh
Pages: 112-119
doi>10.1145/345508.345559
Full text: PDFPDF

The WWW is the most important resource for external business information. This paper presents a tool called INSYDER, an information assistant for finding and analysis business information from the WWW. INSYDER is a system using different agents for crawling ...
expand
Structured translation for cross-language information retrieval
Ruth Sperer, Douglas W. Oard
Pages: 120-127
doi>10.1145/345508.345562
Full text: PDFPDF

The paper introduces a query translation model that reflects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with ...
expand
Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods
Georgios Petasis, Alessandro Cucchiarelli, Paola Velardi, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos
Pages: 128-135
doi>10.1145/345508.345563
Full text: PDFPDF

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of ...
expand
Document centered approach to text normalization
Andrei Mikheev
Pages: 136-143
doi>10.1145/345508.345564
Full text: PDFPDF

In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification ...
expand
OCELOT: a system for summarizing Web pages
Adam L. Berger, Vibhu O. Mittal
Pages: 144-151
doi>10.1145/345508.345565
Full text: PDFPDF

We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in ...
expand
Extracting sentence segments for text summarization: a machine learning approach
Wesley T. Chuang, Jihoon Yang
Pages: 152-159
doi>10.1145/345508.345566
Full text: PDFPDF

With the proliferation of the Internet and the huge amount of data it transfers, text summarization is becoming more important. We present an approach to the design of an automatic text summarizer that generates a summary by extracting sentence segments. ...
expand
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages
Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, Constantine D. Spyropoulos
Pages: 160-167
doi>10.1145/345508.345569
Full text: PDFPDF

The growing problem of unsolicited bulk e-mail, also known as “spam”, has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative ...
expand
Text filtering by boosting naive Bayes classifiers
Yu-Hwan Kim, Shang-Yoon Hahn, Byoung-Tak Zhang
Pages: 168-175
doi>10.1145/345508.345572
Full text: PDFPDF

Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are ...
expand
Document filtering method using non-relevant information profile
Keiichiro Hoashi, Kazunori Matsumoto, Naomi Inoue, Kazuo Hashimoto
Pages: 176-183
doi>10.1145/345508.345573
Full text: PDFPDF

Document filtering is a task to retrieve documents relevant to a user's profile from a flow of documents. Generally, filtering systems calculate the similarity between the profile and each incoming document, and retrieve documents with similarity higher ...
expand
Question-answering by predictive annotation
John Prager, Eric Brown, Anni Coden, Dragomir Radev
Pages: 184-191
doi>10.1145/345508.345574
Full text: PDFPDF

We present a new technique for question answering called Predictive Annotation. Predictive Annotation identifies potential answers to questions in text, annotates them accordingly and indexes them. This technique, along with a complementary analysis ...
expand
Bridging the lexical chasm: statistical approaches to answer-finding
Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, Vibhu Mittal
Pages: 192-199
doi>10.1145/345508.345576
Full text: PDFPDF

This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing ...
expand
Building a question answering test collection
Ellen M. Voorhees, Dawn M. Tice
Pages: 200-207
doi>10.1145/345508.345577
Full text: PDFPDF

The TREC-8 Question Answering (QA) Track was the first large-scale evaluation of domain-independent question answering systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used ...
expand
Document clustering using word clusters via the information bottleneck method
Noam Slonim, Naftali Tishby
Pages: 208-215
doi>10.1145/345508.345578
Full text: PDFPDF

We present a novel implementation of the recently introduced information bottleneck method for unsupervised document clustering. Given a joint empirical distribution of words and documents, p(x, y), we first cluster the words, ...
expand
Latent semantic space: iterative scaling improves precision of inter-document similarity measurement
Rie Kubota Ando
Pages: 216-223
doi>10.1145/345508.345579
Full text: PDFPDF

We present a novel algorithm that creates document vectors with reduced dimensionality. This work was motivated by an application characterizing relationships among documents in a collection. Our algorithm yielded inter-document similarities with an ...
expand
An investigation of linguistic features and clustering algorithms for topical document clustering
Vasileios Hatzivassiloglou, Luis Gravano, Ankineedu Maganti
Pages: 224-231
doi>10.1145/345508.345582
Full text: PDFPDF

We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase heads and proper names) in the context of document clustering. A statistical ...
expand
The impact of database selection on distributed searching
Allison L. Powell, James C. French, Jamie Callan, Margaret Connell, Charles L. Viles
Pages: 232-239
doi>10.1145/345508.345584
Full text: PDFPDF

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper ...
expand
Hill climbing algorithms for content-based retrieval of similar configurations
Dimitris Papadias
Pages: 240-247
doi>10.1145/345508.345587
Full text: PDFPDF

The retrieval of stored images matching an input configuration is an important form of content-based retrieval. Exhaustive processing (i.e., retrieval of the best solutions) of configuration similarity queries is, in general, exponential and fast search ...
expand
Partial collection replication versus caching for information retrieval systems
Zhihong Lu, Kathryn S. McKinley
Pages: 248-255
doi>10.1145/345508.345591
Full text: PDFPDF

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection ...
expand
Hierarchical classification of Web content
Susan Dumais, Hao Chen
Pages: 256-263
doi>10.1145/345508.345593
Full text: PDFPDF

This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned ...
expand
A practical hypertext catergorization method using links and incrementally available class information
Hyo-Jung Oh, Sung Hyon Myaeng, Mann-Ho Lee
Pages: 264-271
doi>10.1145/345508.345594
Full text: PDFPDF

As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, ...
expand
Topical locality in the Web
Brian D. Davison
Pages: 272-279
doi>10.1145/345508.345597
Full text: PDFPDF

Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable World-Wide Web. ...
expand
Interactive Internet search: keyword, directory and query reformulation mechanisms compared
Peter Bruza, Robert McArthur, Simon Dennis
Pages: 280-287
doi>10.1145/345508.345598
Full text: PDFPDF

This article compares search effectiveness when using query-based Internet search (via the Google search engine), directory-based search (via Yahoo) and phrase-based query reformulation assisted search (via the Hyperindex browser) by means of a controlled, ...
expand
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web
Xiaolan Zhu, Susan Gauch
Pages: 288-295
doi>10.1145/345508.345602
Full text: PDFPDF

Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms based solely on term frequency statistics. Information quality is usually ignored. This leads to the problem that documents are retrieved without regard ...
expand
Does “authority” mean quality? predicting expert quality ratings of Web documents
Brian Amento, Loren Terveen, Will Hill
Pages: 296-303
doi>10.1145/345508.345603
Full text: PDFPDF

For many topics, the World Wide Web contains hundreds or thousands of relevant documents of widely varying quality. Users face a daunting challenge in identifying a small subset of documents worthy of their attention. Link analysis algorithms have received ...
expand
Document classification on neural networks using only positive examples (poster session)
Larry M. Manevitz, Malik Yousef
Pages: 304-306
doi>10.1145/345508.345608
Full text: PDFPDF

In this paper, we show how a simple feed-forward neural network can be trained to filter documents when only positive information is available, and that this method seems to be superior to more standard methods, such as tf-idf retrieval based on an “average ...
expand
New paradigms in information visualization (poster session)
Peter Au, Matthew Carey, Shalini Sewraz, Yike Guo, Stefan M. Rüger
Pages: 307-309
doi>10.1145/345508.345610
Full text: PDFPDF

We present three new visualization front-ends that aid navigation through the set of documents returned by a search engine (hit documents). We cluster the hit documents to visually group these documents and label the groups with related words. The ...
expand
Latent semantic indexing model for Boolean query formulation (poster session)
Dae-Ho Baek, HeuiSeok Lim, Hae-Chang Rim
Pages: 310-312
doi>10.1145/345508.345612
Full text: PDFPDF

A new model named Boolean Latent Semantic Indexing model based on the Singular Value Decomposition and Boolean query formulation is introduced. While the Singular Value Decomposition alleviates the problems of lexical matching in the traditional information ...
expand
Generation of user profiles for information filtering — research agenda (poster session)
Tsvi Kuflik, Peretz Shoval
Pages: 313-315
doi>10.1145/345508.345615
Full text: PDFPDF

In information filtering (IF) systems, user long-term needs we expressed as user profiles. The quality of a user profile has a major impact on the performance of IF systems. The focus of the proposed research is on the study of user profile generation ...
expand
Variance based classifier comparison in text catergorization (poster session)
Atsuhiro Takasu, Kenro Aihara
Pages: 316-317
doi>10.1145/345508.345618
Full text: PDFPDF

Text categorization is one of the key functions for utilizing vast amount of documents. It can be seen as a classification problem, which has been studied in pattern recognition and machine learning fields for a long time and several classification methods ...
expand
The use of phrases from query texts in information retrieval (poster session)
Masumi Narita, Yasushi Ogawa
Pages: 318-320
doi>10.1145/345508.345621
Full text: PDFPDF
Pseudo-frequency method (poster session): an efficient document ranking retrieval method for n-gram indexing
Ogawa Yasushi
Pages: 321-323
doi>10.1145/345508.345622

Although n-gram (n successive characters) indexing is widely used in retrieval systems for documents in Japanese and other Asian languages, it is difficult to process ranking retrieval efficiently using n-gram indexing. This is because frequency ...
expand
Lexical semantic relatedness and online new event detection (poster session)
Nicola Stokes, Paula Hatch, Joe Carthy
Pages: 324-325
doi>10.1145/345508.345623
Full text: PDFPDF
Modeling question-response patterns by scaling and visualization (poster session)
Mark Rorvig
Pages: 326-327
doi>10.1145/345508.345624
Full text: PDFPDF

The evaluation of question difficulty is usually considered the domain of Latent Trait Theory. However, these methods require standardized question sets normalized by large populations, rendering them inefficient for use in the numerous areas where questions ...
expand
The effect of query type on subject searching behavior of image databases (poster session): an exploratory study
Efthimis N. Efthimiadis, Raya Fidel
Pages: 328-330
doi>10.1145/345508.345625
Full text: PDFPDF
The role of a judge in a user based retrieval experiment (poster session)
Mingfang Wu, Michael Fuller, Ross Wilkinson
Pages: 331-333
doi>10.1145/345508.345628
Full text: PDFPDF
Auto-construction of a live thesaurus from search term logs for interactive Web search (poster session)
Shui-Lung Chuang, Hsiao-Tieh Pu, Wen-Hsiang Lu, Lee-Feng Chien
Pages: 334-336
doi>10.1145/345508.345630

The purpose of this paper is to present an on-going research that is intended to construct a live thesaurus directly from search term logs of real-world search engines. Such a thesaurus designed can contain representative search terms, their frequency ...
expand
Cognitive approach for building user model in an information retrieval context (poster session)
Amina Sayeb Belhassen, Nabil Ben Abdallah, Henda Hadjami Ben Ghezala
Pages: 337-338
doi>10.1145/345508.345632
Full text: PDFPDF

The recent development of communication networks and multimedia system provides users with the availability of a huge amount of information making worse the problem of information overload [9]. The evolution of system design is necessary becoming more ...
expand
Multimedia information retrieval from recorded presentations (poster session)
Wolfgang Hürst, Rainer Müller, Christoph Mayer
Pages: 339-341
doi>10.1145/345508.345636
Full text: PDFPDF

In presentation recording special effort is usually put into the automation of the production process, that is in automatically creating high quality data files without much or any need for manual recording and post-editing [5]. With the advent of such ...
expand
Influence of speech recognition errors on topic detection (poster session)
J. Scott McCarley, Martin Franz
Pages: 342-344
doi>10.1145/345508.345638
Full text: PDFPDF

We investigate the effect of speech-recognition errors on a system for the unsupervised, nearly synchronous clustering of broadcast news stories, using the TDT (Topic Detection and Tracking) Corpora. Two questions are addressed: (1) Are speech recognition ...
expand
Word document density and relevance scoring (poster session)
Martin Franz, J. Scott McCarley
Pages: 345-347
doi>10.1145/345508.345641
Full text: PDFPDF

Previous work addressing the issue of word distribution in documents has shown the importance of Word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency ...
expand
Ranking digital images using combination of evidences (poster session)
Iadh Ounis
Pages: 348-350
doi>10.1145/345508.345643
Full text: PDFPDF
Collaborative filtering and the generalized vector space model (poster session)
Ian Soboroff, Charles Nicholas
Pages: 351-353
doi>10.1145/345508.345646
Full text: PDFPDF

Collaborative filtering is a technique for recommending documents to users based on how similar their tastes are to other users. If two users tend to agree on what they like, the system will recommend the same documents to them. The generalized vector ...
expand
Theme-based retrieval of Web news (poster session)
Nuno Maria, Mário J. Silva
Pages: 354-356
doi>10.1145/345508.345648
Full text: PDFPDF

We present our framework for classification of Web news, based on support vector machines, and some of the initial measurements of its accuracy.
expand
Stemming and its effects on TFIDF ranking (poster session)
Mark Kantrowitz, Behrang Mohit, Vibhu Mittal
Pages: 357-359
doi>10.1145/345508.345650
Full text: PDFPDF
Exploration of a heuristic approach to threshold learning in adaptive filtering (poster session)
Chengxiang Zhai, Peter Jansen, David A. Evans
Pages: 360-362
doi>10.1145/345508.345652
Full text: PDFPDF

In this paper we examine the learning behavior of a heuristic threshold setting approach to information filtering. In particular, we study how different initial threshold settings and different updating parameter settings affect threshold learning. The ...
expand
On the design and evaluation of a multi-dimensional approach to information retrieval (poster session)
M. Catherine McCabe, Jinho Lee, Abdur Chowdhury, David Grossman, Ophir Frieder
Pages: 363-365
doi>10.1145/345508.345656
Full text: PDFPDF

We present a method of searching text collections that takes advantage of hierarchrical information within documents and integrates searches of structured and unstructured data. We show that Multidimensional databases (MDB), designed for accessing ...
expand
SWAMI (poster session): a framework for collaborative filtering algorithm development and evaluation
Danyel Fisher, Kris Hildrum, Jason Hong, Mark Newman, Megan Thomas, Rich Vuduc
Pages: 366-368
doi>10.1145/345508.345658
Full text: PDFPDF

We present a Java-based framework, SWAMI (Shared Wisdom through the Amalgamation of Many Interpretations) for building and studying collaborative filtering systems. SWAMI consists of three components: a prediction engine, an evaluation system, and a ...
expand
Learning probabilistic models of the Web (poster session)
Thomas Hofmann
Pages: 369-371
doi>10.1145/345508.345660
Full text: PDFPDF

In the World Wide Web, myriads of hyperlinks connect documents and pages to create an unprecedented, highly complex graph structure - the Web graph. This paper presents a novel approach to learning probabilistic models of the Web, which can be used to ...
expand
Effects of out of vocabulary words in spoken document retrieval (poster session)
P. C. Woodland, S. E. Johnson, P. Jourlin, K. Spärck Jones
Pages: 372-374
doi>10.1145/345508.345661
Full text: PDFPDF

The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the ...
expand
Towards an adaptive and task-specific ranking mechanism in Web searching (poster session)
Chen Ding, Chi-Hung Chi
Pages: 375-376
doi>10.1145/345508.345663
Full text: PDFPDF
Beyond the traditional query operators (poster session)
Chen Ding, Chi-Hung Chi
Pages: 377-378
doi>10.1145/345508.345664
Full text: PDFPDF
Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session)
Javed A. Aslam, Mark Montague
Pages: 379-381
doi>10.1145/345508.345665
Full text: PDFPDF

We introduce a new, probabilistic model for combining the outputs of an arbitrary number of query retrieval systems. By gathering simple statistics on the average performance of a given set of query retrieval systems, we construct a Bayes optimal mechanism ...
expand
Information access for context-aware appliances (poster session)
Gareth J. F. Jones, Peter J. Brown
Pages: 382-384
doi>10.1145/345508.345666
Full text: PDFPDF

The emergence of networked context-aware mobile computing appliances potentially offers opportunities for remote access to huge online information resources. Information access in context-aware information appliances can utilize existing techniques developed ...
expand
Finding relevant passages using noun-noun compounds (poster session): coherence vs. proximity
Eduard Hoenkamp, Rob de Groot
Pages: 385-387
doi>10.1145/345508.345667
Full text: PDFPDF

Intuitively, words forming phrases are a more precise description of content than words as a sequence of keywords. Yet, evidence that phrases would be more effective for information retrieval is inconclusive. This paper isolates a neglected class of ...
expand
Semantic Explorer — navigation in documents collections; Proxima Daily — learning personal newspaper (demonstration session)
Vadim Asadov, Serge Shumsky
Page: 388
doi>10.1145/345508.345668
Full text: PDFPDF
Integrated search tools for newspaper digital libraries (demonstration session)
S. L. Mantzaris, B. Gatos, N. Gouraros, P. Tzavelis
Page: 389
doi>10.1145/345508.345670
Full text: PDFPDF
Managing photos with AT&T Shoebox (demonstration session)
Timothy J. Mills, David Pye, David Sinclair, Kenneth R. Wood
Page: 390
doi>10.1145/345508.345671
Full text: PDFPDF
ClusterBook, a tool for dual information access (demonstration session)
Gheorghe Mureşan, David J. Harper, Ayşe Göker, Peter Lowit
Page: 391
doi>10.1145/345508.345672
Uexküll (demonstration session): an interactive visual user interface for document retrieval in vector space
Michael Preminger, Sandor Daranyi
Page: 392
doi>10.1145/345508.345673
Full text: PDFPDF
TimeMine (demonstration session): visualizing automatically constructed timelines
Russell Swan, James Allan
Page: 393
doi>10.1145/345508.345674
Full text: PDFPDF
The Cambridge University Multimedia Document Retrieval demo system (demonstration session)
A. Tuerk, S. E. Johnson, P. Jourlin, K. Spärck Jones, P. C. Woodland
Page: 394
doi>10.1145/345508.345675
Full text: PDFPDF

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder