August 2002
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher: ACM
Bibliometrics:
Citation Count: 49
Downloads (6 Weeks): 2, Downloads (12 Months): 14, Downloads (Overall): 538
Full text available:
PDF
Keywords:
language models, information retrieval, biterms, n-grams
Title:
Biterm language models for document retrieval
Keywords:
biterms
Full Text:
... information retrieval than adocument containing retrieval of information. To this end,Biterm Language Models are introduced. Biterm languagemodels are similar to bigram language models except thatthe constraint ... be assigned the same probabil-ity of generating the query using biterm language models.To distinguish unordered word-pairs from order word-pairsor bigrams in ... in statistical language modeling terminology, werefer to the former as biterms. . Unordered word-pairs havebeen explored as document features for document ... vector space models as well as text categorization [3] ap-plications.The biterm probabilities can be approximated using thefrequency of occurrence of terms. ... Three approximation meth-ods are suggested here. In the ?rst case, biterm probabilityof {wi?1, wi} is viewed as an average of bigram ... documents. It issimilar to the bigram probability in (3). The biterm proba-bility of term pair {wi?1, wi} is computed as the ... C(wi, wi?1|d)min{C(wi?1|d), C(wi|d)} (6)PBT2 and PBT3 are ad-hoc approximations for biterm prob-abilities.The sparse data problem in representing documents usinglanguage models is ... problem in representing documents usinglanguage models is handled by smoothing biterm probabil-ities using unigram probabilities. The unigram probabilityP (wi|d) of a ... turn smoothed us-ing its corpus probability P (wi|C). Thus the biterm andunigram probabilities are computed byP (wi|wi?1, d) = ?1PBT (wi|wi?1, ... ?2)P (wi|C) (8)where ?1 and ?2 are constants.The weights for biterm
... to 40% andthe weighting parameter for bigram document model set to10%.Biterm retrieval systems were implemented with di?erentapproximations for biterm probabilities. The interpolationparameters were set at 40% for document model ... 40% for document model over cor-pus model and 10% for biterm models over unigram models.BT1(40+10) uses average of bigram probabilities as ... average of bigram probabilities as givenby (4). BT2(40+10) is the biterm language model usingthe ad-hoc probability given by (5) and BT3(40+10) ... ad-hoc probability given by (5) and BT3(40+10) cor-responds to the biterm probability in (6). For comparison,the language model suggested by Ponte ... experiments on a WSJ data set, the per-formance of di?erent biterm approximations is around thebigram language models. With ?xed model weights, ... language models. With ?xed model weights, the ad-hoc approximations of biterm probabilities in biterm lan-guage model perform better than bigram language models.The reduction in ... probabilitiesreducing the e?ect of term pairs. The ad-hoc approximationPBT2 for biterms improves slightly over bigram languagemodel since it ignores the order ... a document, the e?ect of the co-occurrence of term-pairto the biterm probability is reduced. Choosing the mini-mum of the term occurrence ... BT3 has improved performance than bigram languagemodel. All approximations of biterm probabilities performbetter than Ponte and Croft language model (PCLM) andsmoothed ... retrieval are di?erent from those for speechrecognition or machine translation. Biterm language mod-els capture term co-occurrence better than bigram mod-els. Di?erent ... capture term co-occurrence better than bigram mod-els. Di?erent approximations for biterm probabilities havebeen shown to provide better average precision than bigramlanguage ... approximations forbiterm probabilities. We plan to explore better models torepresent biterms. . While constant weighting parameterswere used in our experiments, the ...
May 2013
WWW '13: Proceedings of the 22nd international conference on World Wide Web
Publisher: ACM
Bibliometrics:
Citation Count: 44
Downloads (6 Weeks): 42, Downloads (12 Months): 538, Downloads (Overall): 2,217
Full text available:
PDF
Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly ...
Keywords:
short text, content analysis, topic model, biterm
Title:
A biterm topic model for short texts
Keywords:
biterm
Abstract:
... novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics ... by directly modeling the generation of word co-occurrence patterns (i.e. biterms) ) in the whole corpus. The major advantages of BTM ...
Full Text:
A Biterm Topic Model for Short TextsXiaohui Yan, Jiafeng Guo, Yanyan Lan, ... a novel way for modelingtopics in short texts, referred as biterm topic model (BTM).Speci?cally, in BTM we learn the topics by ... topics by directly modelingthe generation of word co-occurrence patterns (i.e. biterms) )in the whole corpus. The major advantages of BTM arethat ... InformationSearch and Retrieval; I.5.3 [Pattern Recognition]: Clus-teringKeywordsShort Text, Topic Model, Biterm, , Content Analysis, docu-ment clustering1. INTRODUCTIONShort texts are prevalent on ...
... co-occurrence patterns for better revealing topics?Speci?cally, we propose a generative biterm topic model(BTM), which learns topics over short texts by directly ... topics over short texts by directly mod-eling the generation of biterms in the whole corpus. Here,a biterm is an unordered word-pair co-occurred in a shortcontext. The data ... in that 1) BTM explicitly models the word co-occurrencepatterns (i.e. biterms) ), rather than documents, to enhancethe topic learning; and 2) ...
... over short texts bydirectly modeling the generation of all the biterms (i.e. wordco-occurrence patterns) in the whole corpus.1446Figure 1: Graphical representation ... LDAand mixture of unigrams, BTM models the generation procedure of biterms in a collection, rather thandocuments. For clarity, the ?xed hyperparameters ... For clarity, the ?xed hyperparameters ?, ? are not presented.3.1 Biterm ExtractionWithout loss of generality, topics are represented as groupsof correlated ... our BTM directly models the word co-occurrence patterns based on biterms. . A biterm denotes anunordered word-pair co-occurring in a short context (i.e. aninstance ... any two distinct wordsin a short text document as a biterm. . For example, in theshort text document ?I visit apple ... store.?, if we ignoringthe stop word ?I?, there are three biterms, , i.e. ?visit apple?,?visit store?, ?apple store?. The biterms extracted from allthe documents in the collection compose the training ... allthe documents in the collection compose the training dataof BTM.3.2 Biterm Topic ModelThe key idea of BTM is to learn topics ... is to learn topics over short textsbased on the aggregated biterms in the whole corpus totackle the sparsity problem in single ... that the whole corpus as a mixture of topics,where each biterm is drawn from a speci?c topic indepen-dently1. The probability that ... drawn from a speci?c topic indepen-dently1. The probability that a biterm drawn from a speci?ctopic is further captured by the chances ... is further captured by the chances that both words inthe biterm are drawn from the topic. Suppose ? and ? arethe ... topic distribution ? ? Dir(?) for the wholecollection1Strictly speaking, two biterms in a document sharing thesame word occurrence are not independent. ... the computation by considering BTMas a model built upon a biterm set.3. For each biterm b in the biterm set B(a) draw a topic assignment z ? Multi(?)(b) draw ... word, and then enhance the learningof topics. Moreover, all the biterms
... the disadvantage of mix-ture of unigrams by breaking documents into biterms. . Inthis way, BTM not only can keep the correlation ... a document equals to the expectation of thetopic proportions of biterms generated from the document:P (z|d) =?bP (z|b)P (b|d). (3)In Eq.(3), ... obtain P (b|d). Here we simply take theempirical distribution of biterms in the document as theestimationP (b|d) = nd(b)?b nd(b),where nd(b) ... (b|d) = nd(b)?b nd(b),where nd(b) is the frequency of the biterm b in the documentd. In short texts, P (b|d) is ... short texts, P (b|d) is nearly an uniform distributionover all biterms in the document d. Despite of its simplicity,we ?nd this ... algorithm for BTMInput: the number of topics K, hyperparameters ?, ?,biterm set BOutput: multinomial parameter ? and ?initialize topic assignments randomly ... Consequently, we only have tosample the topic assignment for each biterm
... the con-ditional distribution P (z|z?b, B, ?, ?) for each biterm b =(wi, wj), where z?b denotes the topic assignments for ... topic assignments for allbiterms except b, B is the global biterm set. By applyingthe chain rule on the joint probability of ... +M?)2, (4)where nz is the number of times of the biterm b assigned tothe topic z, and nw|z is the number ... use symmetric Dirichlet priors ? and ?. Note thatonce a biterm b is assigned to the topic z, the two words ... BTM is evaluating the conditional probabilityin Eq.(4) for all the biterms, , with time complexity O(K|B|).During the entire process, we need ... the counters nz,nw|z, and the topic assignment z for each biterm, , in total of(K+MK+ |B|) variables in memory. Note that ... part of mem-ory in BTM is used to store the biterms in training dataset.Therefore, BTM is a better choice for large ...
... any other processing. Note that in BTM, we need toextract biterms from the collection. This process is a littledi?erent from that ... a littledi?erent from that in short texts. Recall that a biterm isde?ned as a word-pair co-occurred in a short context. It ... 19 talk.politics.misc10 rec.sport.baseball 20 talk.religion.miscder to reduce meaningless and noise biterms, , the biterm setis constructed by extracting any two words co-occur withina context ...
... At this point, the assumption thatthe two words in a biterm have the same topic will be lesscredible. Moreover, a larger ... Moreover, a larger context range threshold r willgenerate much more biterms, , which increases the trainingcost. Therefore, for both e?ectiveness and ...
... paper, wepropose a novel probabilistic topic model for short texts,namely biterm topic model (BTM). BTM can well capturethe topics within short ...