Searched for keywords.author.keyword:"document representation" OR acmdlCCS:"document representation"  [new search]  [edit/save query]  [advanced search]
Searched The ACM Full-Text Collection: 472,798 records   [Expand your search to The ACM Guide to Computing Literature: 2,693,848 records] Help: ACM vs. Guide
6,116 results found
Export Results: bibtexendnoteacmrefcsv

Refine by People
Names show/hide
Institutions show/hide
Authors show/hide
Editors show/hide
Reviewers show/hide
Refine by Publications
Publication Names show/hide
ACM Publications show/hide
All Publications show/hide
Content Formats show/hide
Publishers show/hide
Refine by Conferences
Sponsors show/hide
Events show/hide
Proceeding Series show/hide
Refine by Publication Year
1960
Result 1 – 20 of 6,116
Result page: 1 2 3 4 5 6 7 8 9 10 >>

Sort by:

1 published by ACM
July 2013 SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 1,   Downloads (12 Months): 25,   Downloads (Overall): 227

Full text available: PDFPDF
n-gram representations of documents may improve over a simple bag-of-word representation by relaxing the independence assumption of word and introducing context. However, this comes at a cost of adding features which are non-descriptive, and increasing the dimension of the vector space model exponentially. We present new representations that avoid both ...
Keywords: stringology, document representation, maximal repeats
[result highlights]

2
April 2016 WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
Publisher: International World Wide Web Conferences Steering Committee
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 24,   Downloads (Overall): 35

Full text available: PDFPDF
Offline evaluation for information retrieval aims to compare the performance of retrieval systems based on relevance judgments for a set of test queries. Since manual judgments are expensive, selective labeling has been developed to semi-automatically label documents, in the wake of the similarity relationship among retrieved documents. Intuitively, the agreement ...
Keywords: cluster hypothesis, low-cost evaluation, document representation
[result highlights]

3 published by ACM
July 2008 ACM Transactions on Algorithms (TALG): Volume 4 Issue 3, June 2008
Publisher: ACM
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 0,   Downloads (12 Months): 15,   Downloads (Overall): 456

Full text available: PDFPDF
An ordinal tree is an arbitrary rooted tree where the children of each node are ordered. Succinct representations for ordinal trees with efficient query support have been extensively studied. The best previously known result is due to Geary et al. [2004b, pages 1--10]. The number of bits required by their ...
Keywords: Succinct data structures, XML document representation
[result highlights]

4 published by ACM
February 2016 WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 25,   Downloads (12 Months): 238,   Downloads (Overall): 590

Full text available: PDFPDF
We deal with the problem of document representation for the task of measuring semantic relatedness between documents. A document is represented as a compact concept graph where nodes represent concepts extracted from the document through references to entities in a knowledge base such as DBpedia. Edges represent the semantic and ...
Keywords: document representation, graph model, dbpedia, document semantic similarity, neural network
[result highlights]

5
April 2016 WWW '16: Proceedings of the 25th International Conference on World Wide Web
Publisher: International World Wide Web Conferences Steering Committee
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 14,   Downloads (12 Months): 117,   Downloads (Overall): 159

Full text available: PDFPDF
Many text mining approaches adopt bag-of-words or $n$-grams models to represent documents. Looking beyond just the words, fiie, the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate ...
Keywords: document representation, keyphrase extraction, noisy-or bayesian network, keyphrase inference
[result highlights]

6 published by ACM
September 2016 ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 64,   Downloads (Overall): 64

Full text available: PDFPDF
This paper presents a new bag-of-entities representation for document ranking, with the help of modern knowledge bases and automatic entity linking. Our system represents query and documents by bag-of-entities vectors constructed from their entity annotations, and ranks documents by their matches with the query in the entity space. Our experiments ...
Keywords: text representation, base-of-entities, document representation, knowledge base
[result highlights]

7 published by ACM
October 2008 CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 50
Downloads (6 Weeks): 2,   Downloads (12 Months): 29,   Downloads (Overall): 625

Full text available: PDFPDF
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the probability distribution of each document ...
Keywords: manifold regularization, probabilistic latent semantic indexing, document representation, generative model
[result highlights]

8 published by ACM
July 2013 JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Downloads (Overall): 148

Full text available: PDFPDF
We propose a new theory to quantify information in probability distributions and derive a new document representation model for text clustering. By extending Shannon entropy to accommodate a non-linear relation between information and uncertainty, the proposed Least Information theory (LIT) provides insight into how terms can be weighted based on ...
Keywords: document representation, term weighting, text clustering, information measure, semantic information
[result highlights]

9 published by ACM
November 2007 CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 29
Downloads (6 Weeks): 3,   Downloads (12 Months): 9,   Downloads (Overall): 345

Full text available: PDFPDF
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing (LSI) which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI ...
Keywords: document representation and indexing, dimensionality reduction, regularized locality preserving indexing
[result highlights]

10 published by ACM
October 2011 CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6,   Downloads (12 Months): 46,   Downloads (Overall): 390

Full text available: PDFPDF
In traditional clustering methods, a document is often represented as "bag of words" (in BOW model) or n-grams (in suffix tree document model) without considering the natural language relationships between the words. In this paper, we propose a novel approach DGDC (Dependency Graph-based Document Clustering algorithm) to address this issue. ...
Keywords: dependency graph, document representation model, document clustering, similarity measure
[result highlights]

11 published by ACM
June 2016 JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 8,   Downloads (12 Months): 153,   Downloads (Overall): 188

Full text available: PDFPDF
As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no ``golden feature set" was proposed. In this paper, ...
Keywords: document representation, quality assessment, wikipedia, feature engineering, deep learning
[result highlights]

12 published by ACM
July 1991 ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval: Volume 9 Issue 3, July 1991
Publisher: ACM
Bibliometrics:
Citation Count: 71
Downloads (6 Weeks): 9,   Downloads (12 Months): 28,   Downloads (Overall): 894

Full text available: PDFPDF
Keywords: complex document representation, probabilistic indexing, linear indexing functions, relevance descriptions, linear retrieval functions, probabilistic retrieval
[result highlights]

13 published by ACM
August 2005 SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 26
Downloads (6 Weeks): 2,   Downloads (12 Months): 15,   Downloads (Overall): 988

Full text available: PDFPDF
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is ...
Keywords: document representation and indexing, similarity measure, dimensionality reduction, orthogonal locality preserving indexing, vector space model, locality preserving indexing
[result highlights]

14 published by ACM
July 2004 SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 43
Downloads (6 Weeks): 2,   Downloads (12 Months): 21,   Downloads (Overall): 1,228

Full text available: PDFPDF
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI ...
Keywords: document representation and indexing, latent semantic indexing, similarity measure, dimensionality reduction, vector space model, locality preserving indexing
[result highlights]

15
August 2011 WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 4,   Downloads (Overall): 25

Full text available: PDFPDF
This paper proposes a novel text representation for Web pages written in Vietnamese. This representation is based on an analysis of Vietnamese documents at phonetic level in which each document will be represented as a bag of phonemes. It is designed to capture sound-based information in documents and to be ...
Keywords: Document representation, Classification
[result highlights]

16 published by ACM
July 2011 SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 2,   Downloads (12 Months): 18,   Downloads (Overall): 410

Full text available: PDFPDF
Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated ...
Keywords: document representation, machine translation
[result highlights]

17 published by ACM
June 2009 KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Publisher: ACM
Bibliometrics:
Citation Count: 79
Downloads (6 Weeks): 13,   Downloads (12 Months): 122,   Downloads (Overall): 2,561

Full text available: PDFPDF
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared ...
Keywords: Wikipedia, document representation, text clustering
[result highlights]

18 published by ACM
October 2012 CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2,   Downloads (12 Months): 36,   Downloads (Overall): 292

Full text available: PDFPDF
We present a new, robust and computationally efficient Hierarchical Bayesian model for effective topic correlation modeling. We model the prior distribution of topics by a Generalized Dirichlet distribution (GD) rather than a Dirichlet distribution as in Latent Dirichlet Allocation (LDA). We define this model as GD-LDA. This framework captures correlations ...
Keywords: document representation, statistical topic modeling
[result highlights]

19 published by ACM
September 2013 DocEng '13: Proceedings of the 2013 ACM symposium on Document engineering
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 7,   Downloads (12 Months): 8,   Downloads (Overall): 45

Full text available: PDFPDF
Digital Libraries collect, organize and provide to end users large quantities of selected documents. While these documents come in a variety of formats, it is desirable that they are delivered to final users in a uniform way. Web formats are a suitable choice for this purpose. While Web documents are ...
Keywords: document rendering, document representation, layout analysis
[result highlights]

20 published by ACM
November 2009 PaIR '09: Proceedings of the 2nd international workshop on Patent information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 3,   Downloads (12 Months): 8,   Downloads (Overall): 307

Full text available: PDFPDF
Design rationale (DR) refers to the explanation of why an artifact is designed the way it is. The management of DR in engineering design is an important task since DR is often regarded as crucial information in design decision support, design analysis and design knowledge management. The existing DR systems ...
Keywords: patent document, representation model, design rationale
[result highlights]

Result 1 – 20 of 6,116
Result page: 1 2 3 4 5 6 7 8 9 10 >>



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us
 
Export Formats