Result page:
1
2
3
4
5
6
7
8
9
10
>>
1
July 2013
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 1, Downloads (12 Months): 25, Downloads (Overall): 227
Full text available:
PDF
n-gram representations of documents may improve over a simple bag-of-word representation by relaxing the independence assumption of word and introducing context. However, this comes at a cost of adding features which are non-descriptive, and increasing the dimension of the vector space model exponentially. We present new representations that avoid both ...
Keywords:
stringology, document representation, maximal repeats
CCS:
Document representation
Keywords:
document representation
Primary CCS:
Document representation
2
April 2016
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
Publisher: International World Wide Web Conferences Steering Committee
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3, Downloads (12 Months): 24, Downloads (Overall): 35
Full text available:
PDF
Offline evaluation for information retrieval aims to compare the performance of retrieval systems based on relevance judgments for a set of test queries. Since manual judgments are expensive, selective labeling has been developed to semi-automatically label documents, in the wake of the similarity relationship among retrieved documents. Intuitively, the agreement ...
Keywords:
cluster hypothesis, low-cost evaluation, document representation
CCS:
Document representation
Keywords:
document representation
Abstract:
... representing documents, certain information is lost. We argue that better document representation can lead to better agreement with the cluster hypothesis. To ... with the cluster hypothesis. To this end, we investigate different document representations on established benchmarks in the context of low-cost evaluation, showing ... benchmarks in the context of low-cost evaluation, showing that different document representations vary in how well they capture document similarity relative to ...
Primary CCS:
Document representation
Title:
Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations
Full Text:
... Meanwhile,in representing documents, certain information is lost. Weargue that better document representation can lead to betteragreement with the cluster hypothesis. To this ... with the cluster hypothesis. To this end, we in-vestigate di?erent document representations ... on establishedbenchmarks in the context of low-cost evaluation, showingthat di?erent document representations vary in how wellthey capture document similarity relative to a ... favor of the low-cost evaluation. To this end,we compare multiple document representations, , includingbag-of-words, latent semantic analysis [4], latent dirichlet al-location [5] ...
... Since the choice of words can in-47Table 1: Comparison of Document Representations on Di?erent BenchmarksTask Benchmark Bow Ebow Lda Lsa Para2vecAdhoc TaskTripleTest ... method to encode the word embedding informationfrom word2vec [7] into document representations, , whereasdocument expansion is used in Ebow.3. EVALUATIONIn this section, ... least one subtopic are used. In Lsa, Ldaand Para2vec, the document representation is computedseparately for each query, given the size of the ... improvements when com-pared against Bow. Intuitively, the comparisons among dif-ferent document representations are in terms of their agree-ment degree to the desirable ...
3
July 2008
ACM Transactions on Algorithms (TALG): Volume 4 Issue 3, June 2008
Publisher: ACM
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 0, Downloads (12 Months): 15, Downloads (Overall): 456
Full text available:
PDF
An ordinal tree is an arbitrary rooted tree where the children of each node are ordered. Succinct representations for ordinal trees with efficient query support have been extensively studied. The best previously known result is due to Geary et al. [2004b, pages 1--10]. The number of bits required by their ...
Keywords:
Succinct data structures, XML document representation
CCS:
Document representation
Keywords:
XML document representation
4
February 2016
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 25, Downloads (12 Months): 238, Downloads (Overall): 590
Full text available:
PDF
We deal with the problem of document representation for the task of measuring semantic relatedness between documents. A document is represented as a compact concept graph where nodes represent concepts extracted from the document through references to entities in a knowledge base such as DBpedia. Edges represent the semantic and ...
Keywords:
document representation, graph model, dbpedia, document semantic similarity, neural network
CCS:
Document representation
Keywords:
document representation
Abstract:
<p>We deal with the problem of document representation for the task of measuring semantic relatedness between documents. A ...
Primary CCS:
Document representation
Full Text:
... Research, Chinazhuhuij@cn.ibm.comShao Sheng CaoXidian Universityshelsoncao@gmail.comABSTRACTWe deal with the problem of document representation forthe task of measuring semantic relatedness between docu-ments. A document ... a similarity measure betweendocuments in terms of their respective representations.Most document representations map the documents tovectors of a fixed length, aiming to ...
... methods to represent concepts (as opposed towords) as continuous vectors.For document representation we build a graph whose nodesare explicit concepts from a ... aconcept as a continuous vector of 200 dimensions.We evaluate our document representation and similaritymeasures on LP50 [10], a standard benchmark for docu-ment ... systems.To summarize, our contributions are the following:? We propose a document representation as a conceptgraph whose nodes and edges are weighted. In ...
... Techniques are specifiedin detail in section 4.2.The key idea of document representation by a conceptgraph is to link the concepts in the ...
5
April 2016
WWW '16: Proceedings of the 25th International Conference on World Wide Web
Publisher: International World Wide Web Conferences Steering Committee
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 14, Downloads (12 Months): 117, Downloads (Overall): 159
Full text available:
PDF
Many text mining approaches adopt bag-of-words or $n$-grams models to represent documents. Looking beyond just the words, fiie, the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate ...
Keywords:
document representation, keyphrase extraction, noisy-or bayesian network, keyphrase inference
CCS:
Document representation
Keywords:
document representation
Abstract:
... human-curated knowledge base to incorporate other related concepts in the document representation. . But these methods are not desirable when applied to ... keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models ...
Primary CCS:
Document representation
Full Text:
... a human-curated knowledge base to incorporate otherrelated concepts in the document representation. . But thesemethods are not desirable when applied to vertical ... latent documentkeyphrases, going beyond just explicit mentions. Comparedwith the state-of-art document representation approaches,LAKI fills the gap between bag-of-words and concept-basedmodels by using ...
... nodes represent domain keyphrases and content units respectively.forward interpretation for document representation, , whichis critical for model verification and for ensuring that ...
... Analysis (LSA) [7] is a topic modelingtechnique learning word and document representations byapplying Singular Value Decomposition to the words-by-documents co-occurrence matrix? Latent ... Latent Keyphrase Inference (LAKI) is proposed in thiswork to derive document representation via inferring la-tent keyphrases in the text.Table 2 provides more ...
... . .? ?? ?? ?? ?Academia YelpTable 7: Examples of document representation by LAKI with top-ranked document keyphrases in the vector (related-ness ...
6
September 2016
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1, Downloads (12 Months): 64, Downloads (Overall): 64
Full text available:
PDF
This paper presents a new bag-of-entities representation for document ranking, with the help of modern knowledge bases and automatic entity linking. Our system represents query and documents by bag-of-entities vectors constructed from their entity annotations, and ranks documents by their matches with the query in the entity space. Our experiments ...
Keywords:
text representation, base-of-entities, document representation, knowledge base
CCS:
Document representation
Keywords:
document representation
Primary CCS:
Document representation
Full Text:
... byas much as 18% in standard document ranking tasks.KeywordsText Representation, Document Representation, , KnowledgeBase, Bag-of-Entities1. INTRODUCTIONIn the earliest information retrieval systems, query ...
7
October 2008
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 50
Downloads (6 Weeks): 2, Downloads (12 Months): 29, Downloads (Overall): 625
Full text available:
PDF
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the probability distribution of each document ...
Keywords:
manifold regularization, probabilistic latent semantic indexing, document representation, generative model
CCS:
Document representation
Keywords:
document representation
References:
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proc. 2004 Int. Conf. on Research and Development in Information Retrieval (SIGIR'04), pages 96--103, Sheffield, UK, July 2004.
Full Text:
... and Indexing?Indexing methodsGeneral TermsAlgorithms, Performance, TheoryKeywordsProbabilistic Latent Semantic Indexing, Manifold Regularization,Document Representation, , Generative ModelPermission to make digital or hard copies of ...
... D. Cai, H. Liu, and W.-Y. Ma. Locality preservingindexing for document representation. . In Proc. 2004 Int.Conf. on Research and Development in ...
8
July 2013
JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2, Downloads (12 Months): 14, Downloads (Overall): 148
Full text available:
PDF
We propose a new theory to quantify information in probability distributions and derive a new document representation model for text clustering. By extending Shannon entropy to accommodate a non-linear relation between information and uncertainty, the proposed Least Information theory (LIT) provides insight into how terms can be weighted based on ...
Keywords:
document representation, term weighting, text clustering, information measure, semantic information
CCS:
Document representation
Keywords:
document representation
Abstract:
... to quantify information in probability distributions and derive a new document representation model for text clustering. By extending Shannon entropy to accommodate ...
Full Text:
... and Retrieval?ClusteringGeneral TermsTheory, Algorithms, Performance, ExperimentationKeywordsterm weighting, information measure, semantic information,document representation, , text clusteringPermission to make digital or hard copies of ... clustering research, TF*IDF has been extensivelyused for term weighting and document representation [21,34]. While term frequency (TF) indicates the degree of adocument?s ...
... we apply the proposed least information theory toterm weighting and document representation. . A text docu-ment can be viewed as a set ...
... system basedon the Weka data mining framework [32]. We implementedvarious document representation methods including the pro-posed term weighting schemes and TF*IDF based ...
... look at the impact of feature selection on theeffectiveness of document representation for clustering. Inthis work, we selected features based on their ... for which the top Nf mostfrequent terms were kept for document representation. . Weperformed this experimental analysis on three collections,namely, WebKB, 20Newsgroup, ...
... point degraded clusteringperformance when there were insufficient features for accu-rate document representation. . As shown in Figures 3 and1494, similar patterns about ...
9
November 2007
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 29
Downloads (6 Weeks): 3, Downloads (12 Months): 9, Downloads (Overall): 345
Full text available:
PDF
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing (LSI) which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI ...
Keywords:
document representation and indexing, dimensionality reduction, regularized locality preserving indexing
CCS:
Document representation
Keywords:
document representation and indexing
Primary CCS:
Document representation
References:
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proc. 2004 Int. Conf. on Research and Development in Information Retrieval (SIGIR'04), pages 96--103, Sheffield, UK, July 2004.
Full Text:
... large size.3. REGULARIZED LOCALITYPRESERVING INDEXINGAlthough LPI can learn a compact document representation whichis beneficial for many text analysis tasks such as clustering ...
... D. Cai, H. Liu, and W.-Y. Ma. Locality preservingindexing for document representation. . In Proc. 2004 Int.Conf. on Research and Development in ...
10
October 2011
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6, Downloads (12 Months): 46, Downloads (Overall): 390
Full text available:
PDF
In traditional clustering methods, a document is often represented as "bag of words" (in BOW model) or n-grams (in suffix tree document model) without considering the natural language relationships between the words. In this paper, we propose a novel approach DGDC (Dependency Graph-based Document Clustering algorithm) to address this issue. ...
Keywords:
dependency graph, document representation model, document clustering, similarity measure
CCS:
Document representation
Keywords:
document representation model
Full Text:
... Similarity Measure1. INTRODUCTIONDocument clustering techniques usually rely on four mod-ules: document representation model, similarity measure,clustering model and the clustering algorithm which gener-ates ... and the clustering algorithm which gener-ates clusters based on the document representation model[5]. Among all these modules, the document representationmodel is very ... and crucial for the clustering re-sults.The most basic model for document representation is theVector Space Document (VSD) model. In this model, thedocument ... suggested in the natural language sentences arestill ignored in these document representation models.In this paper, we propose a more informative documentrepresentation model, ...
... experimentally.The baseline approach utilizes the ?bag of words? (BOW)model for document representation and weights each wordin the feature vector by tf-idf measure. ...
11
June 2016
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 8, Downloads (12 Months): 153, Downloads (Overall): 188
Full text available:
PDF
As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no ``golden feature set" was proposed. In this paper, ...
Keywords:
document representation, quality assessment, wikipedia, feature engineering, deep learning
CCS:
Document representation
Keywords:
document representation
12
July 1991
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval: Volume 9 Issue 3, July 1991
Publisher: ACM
Bibliometrics:
Citation Count: 71
Downloads (6 Weeks): 9, Downloads (12 Months): 28, Downloads (Overall): 894
Full text available:
PDF
Keywords:
complex document representation, probabilistic indexing, linear indexing functions, relevance descriptions, linear retrieval functions, probabilistic retrieval
CCS:
Document representation
Keywords:
complex document representation
Primary CCS:
Document representation
References:
CROFT, W.B. Document representation in probabilistic models of information retrieval. J. Am. Soc. Inf. Sc~. 32, (1981), 451-457.
Full Text:
... Learning?parameter learningGeneral Terms: Experimentation, TheoryAdditional Key Words and Phrases: Complex document representation, , linear indexing func-tions, linear retrieval functions, probabilistic indexing, probabilistic ...
... representations D and Q of these objects.With the mapping a~, document representations D are derived from theoriginal documents ~. In the same ... @ A ~J G D] of pairs ofrelevance judgments and document representations. . Thus, two queries qland q2 have the same representation ...
... = 1, ift,E q:, and z~i ?? O otherwise. The document representation is not furtherspecified in the BII model, and below we ...
... withmultiple occurrences can be abandoned (some concepts for a moredetailed document representation are described by Fuhr [13]). On theother hand, our approach ...
... structures).Document RepresentationIn order to test the effect of an improved document representation, , weperformed a few experiments with the indexing function e~a. ...
... Sot. Inf Set. 37, 2 (19S6), 71-77.6. CROFT, W. B. Document representation in probabilistic models of information retrieval. J.Am. Sot. In; SCL, ...
13
August 2005
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 26
Downloads (6 Weeks): 2, Downloads (12 Months): 15, Downloads (Overall): 988
Full text available:
PDF
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is ...
Keywords:
document representation and indexing, similarity measure, dimensionality reduction, orthogonal locality preserving indexing, vector space model, locality preserving indexing
CCS:
Document representation
Keywords:
document representation and indexing
Primary CCS:
Document representation
References:
X. He, D. Cai, H. Liu, and W.-Y. Ma. Locality preserving indexing for document representation. In Proceedings of ACM SIGIR, 2004.
Full Text:
... Measurement, Performance, Experimentation, TheoryKeywordsOrthogonal Locality Preserving Indexing, Locality Preserving In-dexing, Document Representation and Indexing, Similarity Mea-sure, Dimensionality Reduction, Vector Space ModelPermission to ... Model (VSM) might be one of themost popular model for document representation. . Each documentis represented as a bag of words. Correspondingly, ...
... ), LSI (correspondingto document set DLSI ) and the original document representation( (corresponding to document set D as baseline algorithm). In gen-eral, ...
... D. Cai, H. Liu, and W.-Y. Ma. Locality preservingindexing for document representation. . In Proceedings ofACM SIGIR, 2004.[12] T. Hofmann. Probabilistic latent ...
14
July 2004
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 43
Downloads (6 Weeks): 2, Downloads (12 Months): 21, Downloads (Overall): 1,228
Full text available:
PDF
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI ...
Keywords:
document representation and indexing, latent semantic indexing, similarity measure, dimensionality reduction, vector space model, locality preserving indexing
CCS:
Document representation
Keywords:
document representation and indexing
Abstract:
Document representation and indexing is a key problem for document analysis and ... an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI might not ... space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare ...
Primary CCS:
Document representation
Title:
Locality preserving indexing for document representation
Full Text:
Microsoft Word - p250-he_1_.docLocality Preserving Indexing for Document Representation Xiaofei He Computer Science Dept. University of Chicago xiaofei@cs.uchicago.edu Deng ... hfliu@cs.toronto.edu Wei-Ying Ma Microsoft Research Asia Beijing, China wyma@microsoft.com ABSTRACT Document representation and indexing is a key problem for document analysis and ... space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare ... Indexing, Similarity Measure, Dimen-sionality Reduction, Vector Space Model 1. INTRODUCTION Document representation and indexing is a fundamental problem for efficient clustering, classification, ... Many dimensionality reduction techniques [1][2][5][7][14] [15] have been applied to document representation and indexing. Among these techniques, Latent Semantic Indexing (LSI) [7] ... the most repre-sentative features rather the most discriminative features for document representation. . Therefore, LSI might not be optimal in discriminating documents ... propose a new approach called Locality Preserv-ing Indexing (LPI) to document representation, , which aims to discover the local geometrical structure of ...
... a semantic sub-space. Section 3 introduces Locality Preserving Indexing for document representation. . Theoretical analysis of LPP and its con-nections to LDA ... Locality Preserving Projections (LPP) [13], the core algorithm used for document representation and indexing in this paper. Different from LSI which assumes ...
... preserving subspace. Based on LPP, we describe our method for document representation and indexing. In the document analysis and processing problems one ...
... apply LPP to learn a low dimensional semantic space for document representation. . 4.3 Discriminant Analysis of LPP Traditionally, document indexing and ...
15
August 2011
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1, Downloads (12 Months): 4, Downloads (Overall): 25
Full text available:
PDF
This paper proposes a novel text representation for Web pages written in Vietnamese. This representation is based on an analysis of Vietnamese documents at phonetic level in which each document will be represented as a bag of phonemes. It is designed to capture sound-based information in documents and to be ...
Keywords:
Document representation, Classification
Keywords:
Document representation, Classification
References:
Nguyen, GS., Gao X. and Andreae P.Vietnamese Document Representation and Classification, AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2009, Volume 5866/2009, 577 - 586.
Full Text:
... their favorite sound patterns. We aim to develop a new document representation using the phonemes. The vocabulary of phonemes is much smaller ...
... NY. 1997. [10] Nguyen, GS., Gao X. and Andreae P.Vietnamese Document Representation and Classification, AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2009, Volume ...
16
July 2011
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 2, Downloads (12 Months): 18, Downloads (Overall): 410
Full text available:
PDF
Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated ...
Keywords:
document representation, machine translation
Title:
Enriching document representation via translation for improved monolingual information retrieval
Keywords:
document representation
Abstract:
... this paper proposes the use of translated words to enrich document representation, , going beyond the words in the original source language ...
Full Text:
mt4ir.dviEnriching Document Representation via Translation forImproved Monolingual Information RetrievalSeung-Hoon NaDepartment of Computer ScienceNational ...
... translated using the pro-posed method of expected frequency estimation, producingbilingual document representations (Section 5). When a newtest query is given, the query ...
... follows:c(w,dclu) =Xd??Cludc(w,d?)We now additionally introduce d?clu to indicate the cluster-enhanced document representation of d, which is theweighted representation of the original source ... c(w,d) + ?clu c(w,dclu)where ?clu is the weight of document representation to clus-ter representation. To estimate the smoothed cluster lan-guage model, ...
17
June 2009
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Publisher: ACM
Bibliometrics:
Citation Count: 79
Downloads (6 Weeks): 13, Downloads (12 Months): 122, Downloads (Overall): 2,561
Full text available:
PDF
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared ...
Keywords:
Wikipedia, document representation, text clustering
Keywords:
document representation
Abstract:
... most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two ... text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, ... LA Times) show that clustering performance improves significantly by enriching document representation
Full Text:
... most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two ... text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, ... LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories. Categories and Subject Descriptors I.5.3 ... similarity measures. General Terms Algorithms, Experimentation Keywords Text Clustering, Wikipedia, Document Representation 1. INTRODUCTION Traditional clustering algorithms are usually based on the ... forms. One way to resolve this problem is to enrich document representation with the background knowledge represented by an ontology. An ontology ... relationships among concepts. All of them can be used for document representation and clustering. The most common way of applying ontologies for ... possible; and a proper matching method which can enrich the document representation by fully leveraging ontology terms and relations but without introducing ...
... clustering. As for how to integrate ontology concepts into the document representation and clustering process, in this paper, we propose two approaches ... baseline. The results show that, in agglomerative clustering method, enriching document representation with Wikipedia concepts and categories by both exact-match and relatedness ... propose a method to improve text classification performance by enriching document representation with Wikipedia concepts. The mapping between each document and Wikipedia ...
... news. The method in both [1] and [2] only augment document representation with Wikipedia concepts without considering the hierarchical relationship embedded in ... In our method, we also integrate Wikipedia category information into document representation based on the hierarchical structure of Wikipedia. We believe that ...
18
October 2012
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2, Downloads (12 Months): 36, Downloads (Overall): 292
Full text available:
PDF
We present a new, robust and computationally efficient Hierarchical Bayesian model for effective topic correlation modeling. We model the prior distribution of topics by a Generalized Dirichlet distribution (GD) rather than a Dirichlet distribution as in Latent Dirichlet Allocation (LDA). We define this model as GD-LDA. This framework captures correlations ...
Keywords:
document representation, statistical topic modeling
Keywords:
document representation
Full Text:
... Filtering; G.3 [Probabilityand Statistics]: Statistical ComputingGeneral TermsAlgorithms, ExperimentationKeywordsStatistical Topic Modeling, Document Representation? ?Main contact.Permission to make digital or hard copies of all ... research in topic modeling. This research has2 main streams in document representation: : 1) The explo-ration of super and subtopics as in ...
19
September 2013
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineering
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 7, Downloads (12 Months): 8, Downloads (Overall): 45
Full text available:
PDF
Digital Libraries collect, organize and provide to end users large quantities of selected documents. While these documents come in a variety of formats, it is desirable that they are delivered to final users in a uniform way. Web formats are a suitable choice for this purpose. While Web documents are ...
Keywords:
document rendering, document representation, layout analysis
Keywords:
document representation
Full Text:
... STORAGEANDRETRIEVAL?Digital Libraries;I.7.2 [Computing Methodologies]: DOCUMENT ANDTEXT PROCESSING?Document PreparationGeneral TermsALGORITHMSKeywordsLayout Analysis; Document Representation; ; Document Ren-dering1. INTRODUCTIONThe wide spread of documents in digital ... called DoMInUS. The next Section introduces DoMI-nUS and its internal document representation. . Then, Sec-tion 3 describes the identification and reconstruction of ...
... identifyingthe high-level geometrical structure of the document, andspecifically to the document representation they are basedon. Technical details on how this representation is ...
20
November 2009
PaIR '09: Proceedings of the 2nd international workshop on Patent information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 3, Downloads (12 Months): 8, Downloads (Overall): 307
Full text available:
PDF
Design rationale (DR) refers to the explanation of why an artifact is designed the way it is. The management of DR in engineering design is an important task since DR is often regarded as crucial information in design decision support, design analysis and design knowledge management. The existing DR systems ...
Keywords:
patent document, representation model, design rationale
Full Text:
... the art search [11]. Konishi et al. extended the patent document representation model by integrating TF/IDF of International Patent Classification (IPC) and ... and patent document structure ontology, to introduce a content-based patent document representation schema for patent processing, e.g. patent retrieval, classification and clustering ...
Result page:
1
2
3
4
5
6
7
8
9
10
>>