Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly capture the document-level word co-occurrence patterns to reveal topics, and thus suffer from the severe data sparsity in short documents. In this paper, we propose a novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole corpus. The major advantages of BTM are that 1) BTM explicitly models the word co-occurrence patterns to enhance the topic learning; and 2) BTM uses the aggregated patterns in the whole corpus for learning topics to solve the problem of sparse word co-occurrence patterns at document-level. We carry out extensive experiments on real-world short text collections. The results demonstrate that our approach can discover more prominent and coherent topics, and significantly outperform baseline methods on several evaluation metrics. Furthermore, we find that BTM can outperform LDA even on normal texts, showing the potential generality and wider usage of the new topic model.

top of pageAUTHORS



Author image not provided  Xiaohui Yan

No contact information provided yet.

Bibliometrics: publication history
Publication years2011-2015
Publication count4
Citation Count58
Available for download3
Downloads (6 Weeks)49
Downloads (12 Months)627
Downloads (cumulative)3,003
Average downloads per article1,001.00
Average citations per article14.50
View colleagues of Xiaohui Yan


Author image not provided  Jiafeng Guo

No contact information provided yet.

Bibliometrics: publication history
Publication years2008-2016
Publication count53
Citation Count372
Available for download36
Downloads (6 Weeks)426
Downloads (12 Months)3,755
Downloads (cumulative)18,273
Average downloads per article507.58
Average citations per article7.02
View colleagues of Jiafeng Guo


Author image not provided  Yanyan Lan

No contact information provided yet.

Bibliometrics: publication history
Publication years2008-2016
Publication count33
Citation Count160
Available for download20
Downloads (6 Weeks)193
Downloads (12 Months)1,902
Downloads (cumulative)10,705
Average downloads per article535.25
Average citations per article4.85
View colleagues of Yanyan Lan


Author image not provided  Xueqi Cheng

No contact information provided yet.

Bibliometrics: publication history
Publication years2002-2016
Publication count133
Citation Count785
Available for download79
Downloads (6 Weeks)483
Downloads (12 Months)4,317
Downloads (cumulative)32,567
Average downloads per article412.24
Average citations per article5.90
View colleagues of Xueqi Cheng

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
D. Blei and J. McAuliffe. Supervised topic models. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 121--128. MIT Press, Cambridge, MA, 2008.
 
3
4
 
5
J. Boyd-Graber and D. M. Blei. Syntactic topic models. Technical Report arXiv:1002.4665, Feb 2010.
 
6
J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, 2009.
7
8
 
9
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391--407, 1990.
 
10
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.
 
11
T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. NIPS, 17:537--544, 2005.
 
12
A. Gruber, M. Rosen-Zvi, and Y. Weiss. Hidden topic markov models. Artificial Intelligence and Statistics (AISTATS), 2007.
13
14
 
15
G. Heinrich. Parameter estimation for text analysis. Technical report, 2005.
16
17
 
18
L. Hubert and P. Arabie. Comparing partitions. Journal of classification, 2(1):193--218, 1985.
19
20
 
21
 
22
D. Newman, E. V. Bonilla, and W. Buntine. Improving topic coherence with regularized topic models. In Advances in Neural Information Processing Systems 24, pages 496--504. 2011.
 
23
24
25
 
26
D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In International AAAI Conference on Weblogs and Social Media, volume 5, pages 130--137, 2010.
 
27
28
 
29
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101, 2004.
30
31
32
 
33
X. Yan, J. Guo, S. Liu, X. Cheng, and Y. Wang. Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 2013.
34
 
35

top of pageCITED BY

44 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title WWW '13 Proceedings of the 22nd international conference on World Wide Web table of contents
General Chairs Daniel Schwabe PUC-Rio - Brazil
Virgílio Almeida UFMG - Brazil
Hartmut Glaser CGI.br - Brazil
Program Chairs Ricardo Baeza-Yates Yahoo! Labs - Spain & Chile
Sue Moon KAIST - South Korea
Pages 1445-1456
Publication Date2013-05-13 (yyyy-mm-dd)
Sponsors CGIBR Comite Gestor da Internet no Brazil
NICBR Nucleo de Informatcao e Coordenacao do Ponto BR
In-Cooperations SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web
PublisherACM New York, NY, USA ©2013
ISBN: 978-1-4503-2035-1 doi>10.1145/2488388.2488514
Conference WWWInternational World Wide Web Conference WWW logo
Paper Acceptance Rate 125 of 831 submissions, 15%
Overall Acceptance Rate 1,770 of 10,827 submissions, 16%
Year Submitted Accepted Rate
WWW '07 753 111 15%
WWW '08 880 103 12%
WWW '09 823 198 24%
WWW '10 754 105 14%
WWW '11 283 166 59%
WWW '11 658 81 12%
WWW '12 885 108 12%
WWW '13 831 125 15%
WWW '14 645 84 13%
WWW '15 929 131 14%
WWW '16 Companion 727 115 16%
WWW '16 727 115 16%
WWW '17 966 164 17%
WWW '17 Companion 966 164 17%
Overall 10,827 1,770 16%

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 22nd international conference on World Wide Web
Table of Contents
SESSION: Research papers
Real-time recommendation of diverse related articles
Sofiane Abbar, Sihem Amer-Yahia, Piotr Indyk, Sepideh Mahabadi
Pages: 1-12
doi>10.1145/2488388.2488390
Full text: PDFPDF

News articles typically drive a lot of traffic in the form of comments posted by users on a news site. Such user-generated content tends to carry additional information such as entities and sentiment. In general, when articles are recommended to users, ...
expand
Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages
Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, Manik Varma
Pages: 13-24
doi>10.1145/2488388.2488391
Full text: PDFPDF

Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to ...
expand
Hierarchical geographical modeling of user locations from social media posts
Amr Ahmed, Liangjie Hong, Alexander J. Smola
Pages: 25-36
doi>10.1145/2488388.2488392
Full text: PDFPDF

With the availability of cheap location sensors, geotagging of messages in online social networks is proliferating. For instance, Twitter, Facebook, Foursquare, and Google+ provide these services both explicitly by letting users choose their location ...
expand
Distributed large-scale natural graph factorization
Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, Alexander J. Smola
Pages: 37-48
doi>10.1145/2488388.2488393
Full text: PDFPDF

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models ...
expand
A CRM system for social media: challenges and experiences
Jitendra Ajmera, Hyung-iL Ahn, Meena Nagarajan, Ashish Verma, Danish Contractor, Stephen Dill, Matthew Denesuk
Pages: 49-58
doi>10.1145/2488388.2488394
Full text: PDFPDF

The social Customer Relationship Management (CRM) landscape is attracting significant attention from customers and enterprises alike as a sustainable channel for tracking, managing and improving customer relations. Enterprises are taking a hard look ...
expand
Here's my cert, so trust me, maybe?: understanding TLS errors on the web
Devdatta Akhawe, Bernhard Amann, Matthias Vallentin, Robin Sommer
Pages: 59-70
doi>10.1145/2488388.2488395
Full text: PDFPDF

When browsers report TLS errors, they cannot distinguish between attacks and harmless server misconfigurations; hence they leave it to the user to decide whether continuing is safe. However, actual attacks remain rare. As a result, users quickly become ...
expand
Towards a robust modeling of temporal interest change patterns for behavioral targeting
Mohamed Aly, Sandeep Pandey, Vanja Josifovski, Kunal Punera
Pages: 71-82
doi>10.1145/2488388.2488396
Full text: PDFPDF

Modern web-scale behavioral targeting platforms leverage historical activity of billions of users to predict user interests and inclinations, and consequently future activities. Future activities of particular interest involve purchases or transactions, ...
expand
The anatomy of LDNS clusters: findings and implications for web content delivery
Hussein A. Alzoubi, Michael Rabinovich, Oliver Spatscheck
Pages: 83-94
doi>10.1145/2488388.2488397
Full text: PDFPDF

We present a large-scale measurement of clusters of hosts sharing the same local DNS servers. We analyze properties of these "LDNS clusters" from the perspective of content delivery networks, which commonly use DNS for load distribution. We found that ...
expand
Steering user behavior with badges
Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec
Pages: 95-106
doi>10.1145/2488388.2488398
Full text: PDFPDF

An increasingly common feature of online communities and social media sites is a mechanism for rewarding user achievements based on a system of badges. Badges are given to users for particular contributions to a site, such as performing a certain ...
expand
Cascading tree sheets and recombinant HTML: better encapsulation and retargeting of web content
Edward O. Benson, David R. Karger
Pages: 107-118
doi>10.1145/2488388.2488399
Full text: PDFPDF

Cascading Style Sheets (CSS) took a valuable step towards separating web content from presentation. But HTML pages still contain large amounts of "design scaffolding" needed to hierarchically layer content for proper presentation. This paper presents ...
expand
CopyCatch: stopping group attacks by spotting lockstep behavior in social networks
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos
Pages: 119-130
doi>10.1145/2488388.2488400
Full text: PDFPDF

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping ...
expand
Inferring the demographics of search users: social data meets search queries
Bin Bi, Milad Shokouhi, Michal Kosinski, Thore Graepel
Pages: 131-140
doi>10.1145/2488388.2488401
Full text: PDFPDF

Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such signals however are often only available for a small fraction of search ...
expand
Strategyproof mechanisms for competitive influence in networks
Allan Borodin, Mark Braverman, Brendan Lucier, Joel Oren
Pages: 141-150
doi>10.1145/2488388.2488402
Full text: PDFPDF

Motivated by applications to word-of-mouth advertising, we consider a game-theoretic scenario in which competing advertisers want to target initial adopters in a social network. Each advertiser wishes to maximize the resulting cascade of influence, modeled ...
expand
Reactive crowdsourcing
Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri
Pages: 153-164
doi>10.1145/2488388.2488403
Full text: PDFPDF

An essential aspect for building effective crowdsourcing com- putations is the ability of "controlling the crowd", i.e. of dynamically adapting the behaviour of the crowdsourcing systems as response to the quantity and quality of completed tasks or to ...
expand
On participation in group chats on Twitter
Ceren Budak, Rakesh Agrawal
Pages: 165-176
doi>10.1145/2488388.2488404
Full text: PDFPDF

The success of a group depends on continued participation of its members through time. We study the factors that affect continued user participation in the context of educational Twitter chats. To predict whether a user that attended her first session ...
expand
The role of web hosting providers in detecting compromised websites
Davide Canali, Davide Balzarotti, Aurélien Francillon
Pages: 177-188
doi>10.1145/2488388.2488405
Full text: PDFPDF

Compromised websites are often used by attackers to deliver malicious content or to host phishing pages designed to steal private information from their victims. Unfortunately, most of the targeted websites are managed by users with little security background ...
expand
Your browsing behavior for a big mac: economics of personal information online
Juan Pablo Carrascal, Christopher Riederer, Vijay Erramilli, Mauro Cherubini, Rodrigo de Oliveira
Pages: 189-200
doi>10.1145/2488388.2488406
Full text: PDFPDF

Most online service providers offer free services to users and in part, these services collect and monetize personally identifiable information (PII), primarily via targeted advertisements. Against this backdrop of economic exploitation of PII, it is ...
expand
Is this app safe for children?: a comparison study of maturity ratings on Android and iOS applications
Ying Chen, Heng Xu, Yilu Zhou, Sencun Zhu
Pages: 201-212
doi>10.1145/2488388.2488407
Full text: PDFPDF

There is a rising concern among parents who have experienced unreliable content maturity ratings for mobile applications (apps) that result in inappropriate risk exposure for their children and adolescents. In reality, there is no consistent maturity ...
expand
Traveling the silk road: a measurement analysis of a large anonymous online marketplace
Nicolas Christin
Pages: 213-224
doi>10.1145/2488388.2488408
Full text: PDFPDF

We perform a comprehensive measurement analysis of Silk Road, an anonymous, international online marketplace that operates as a Tor hidden service and uses Bitcoin as its exchange currency. We gather and analyze data over eight months between the end ...
expand
Group chats on Twitter
James Cook, Krishnaram Kenthapadi, Nina Mishra
Pages: 225-236
doi>10.1145/2488388.2488409
Full text: PDFPDF

We report on a new kind of group conversation on Twitter that we call a group chat. These chats are periodic, synchronized group conversations focused on specific topics and they exist at a massive scale. The groups and the members of these groups are ...
expand
How to grow more pairs: suggesting review targets for comparison-friendly review ecosystems
James Cook, Alex Fabrikant, Avinatan Hassidim
Pages: 237-248
doi>10.1145/2488388.2488410
Full text: PDFPDF

We consider the algorithmic challenges behind a novel interface that simplifies consumer research of online reviews by surfacing relevant comparable review bundles: reviews for two or more of the items being researched, all generated in similar enough ...
expand
A framework for benchmarking entity-annotation systems
Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita
Pages: 249-260
doi>10.1145/2488388.2488411
Full text: PDFPDF

In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to ...
expand
A framework for learning web wrappers from the crowd
Valter Crescenzi, Paolo Merialdo, Disheng Qiu
Pages: 261-272
doi>10.1145/2488388.2488412
Full text: PDFPDF

The development of solutions to scale the extraction of data from Web sources is still a challenging issue. High accuracy can be achieved by supervised approaches but the costs of training data, i.e., annotations over a set of sample pages, limit their ...
expand
Lightweight server support for browser-based CSRF protection
Alexei Czeskis, Alexander Moshchuk, Tadayoshi Kohno, Helen J. Wang
Pages: 273-284
doi>10.1145/2488388.2488413
Full text: PDFPDF

Cross-Site Request Forgery (CSRF) attacks are one of the top threats on the web today. These attacks exploit ambient authority in browsers (eg cookies, HTTP authentication state), turning them into confused deputies and causing undesired side effects ...
expand
Aggregating crowdsourced binary ratings
Nilesh Dalvi, Anirban Dasgupta, Ravi Kumar, Vibhor Rastogi
Pages: 285-294
doi>10.1145/2488388.2488414
Full text: PDFPDF

In this paper we analyze a crowdsourcing system consisting of a set of users and a set of binary choice questions. Each user has an unknown, fixed, reliability that determines the user's error rate in answering questions. The problem is to determine ...
expand
Optimal hashing schemes for entity matching
Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, Tamas Sarlos
Pages: 295-306
doi>10.1145/2488388.2488415
Full text: PDFPDF

In this paper, we consider the problem of devising blocking schemes for entity matching. There is a lot of work on blocking techniques for supporting various kinds of predicates, e.g. exact matches, fuzzy string-similarity matches, and spatial matches. ...
expand
No country for old members: user lifecycle and linguistic change in online communities
Cristian Danescu-Niculescu-Mizil, Robert West, Dan Jurafsky, Jure Leskovec, Christopher Potts
Pages: 307-318
doi>10.1145/2488388.2488416
Full text: PDFPDF

Vibrant online communities are in constant flux. As members join and depart, the interactional norms evolve, stimulating further changes to the membership and its social dynamics. Linguistic change --- in the sense of innovation that becomes accepted ...
expand
Crowdsourced judgement elicitation with endogenous proficiency
Anirban Dasgupta, Arpita Ghosh
Pages: 319-330
doi>10.1145/2488388.2488417
Full text: PDFPDF

Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with an aggregate evaluation from a number of non-experts, in applications ranging from rating and categorizing online content all the way to evaluation of student ...
expand
Timespent based models for predicting user retention
Kushal S. Dave, Vishal Vaingankar, Sumanth Kolar, Vasudeva Varma
Pages: 331-342
doi>10.1145/2488388.2488418
Full text: PDFPDF

Content discovery is fast becoming the preferred tool for user engagement on the web. Discovery allows users to get educated and entertained about their topics of interest. StumbleUpon is the largest personalized content discovery engine on the Web, ...
expand
Attributing authorship of revisioned content
Luca de Alfaro, Michael Shavlovsky
Pages: 343-354
doi>10.1145/2488388.2488419
Full text: PDFPDF

A considerable portion of web content, from wikis to collaboratively edited documents, to code posted online, is revisioned. We consider the problem of attributing authorship to such revisioned content, and we develop scalable attribution algorithms ...
expand
ClausIE: clause-based open information extraction
Luciano Del Corro, Rainer Gemulla
Pages: 355-366
doi>10.1145/2488388.2488420
Full text: PDFPDF

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' ...
expand
Pick-a-crowd: tell me what you like, and i'll tell you what to do
Djellel Eddine Difallah, Gianluca Demartini, Philippe Cudré-Mauroux
Pages: 367-374
doi>10.1145/2488388.2488421
Full text: PDFPDF

Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that ...
expand
Compact explanation of data fusion decisions
Xin Luna Dong, Divesh Srivastava
Pages: 379-390
doi>10.1145/2488388.2488422
Full text: PDFPDF

Despite the abundance of useful information on the Web, different Web sources often provide conflicting data, some being out-of-date, inaccurate, or erroneous. Data fusion aims at resolving conflicts and finding the truth. Advanced fusion techniques ...
expand
From query to question in one click: suggesting synthetic questions to searchers
Gideon Dror, Yoelle Maarek, Avihai Mejer, Idan Szpektor
Pages: 391-402
doi>10.1145/2488388.2488423
Full text: PDFPDF

In Web search, users may remain unsatisfied for several reasons: the search engine may not be effective enough or the query might not reflect their intent. Years of research focused on providing the best user experience for the data available to the ...
expand
Perception and understanding of social annotations in web search
Jennifer Fernquist, Ed H. Chi
Pages: 403-412
doi>10.1145/2488388.2488424
Full text: PDFPDF

As web search increasingly becomes reliant on social signals, it is imperative for us to understand the effect of these signals on users' behavior. There are multiple ways in which social signals can be used in search: (a) to surface and rank important ...
expand
AMIE: association rule mining under incomplete evidence in ontological knowledge bases
Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, Fabian Suchanek
Pages: 413-422
doi>10.1145/2488388.2488425
Full text: PDFPDF

Recent advances in information extraction have led to huge knowledge bases (KBs), which capture knowledge in a machine-readable format. Inductive Logic Programming (ILP) can be used to mine logical rules from the KB. These rules can help deduce and add ...
expand
PrefixSolve: efficiently solving multi-source multi-destination path queries on RDF graphs by sharing suffix computations
Sidan Gao, Kemafor Anyanwu
Pages: 423-434
doi>10.1145/2488388.2488426
Full text: PDFPDF

Uncovering the "nature" of the connections between a set of entities e.g. passengers on a flight and organizations on a watchlist can be viewed as a Multi-Source Multi-Destination (MSMD) Path Query problem on labeled graph data models such as RDF. Using ...
expand
When tolerance causes weakness: the case of injection-friendly browsers
Yossi Gilad, Amir Herzberg
Pages: 435-446
doi>10.1145/2488388.2488427
Full text: PDFPDF

We present a practical off-path TCP-injection attack for connections between current, non-buggy browsers and web-servers. The attack allows web-cache poisoning with malicious objects; these objects can be cached for long time period, exposing ...
expand
Exploiting innocuous activity for correlating users across sites
Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, Renata Teixeira
Pages: 447-458
doi>10.1145/2488388.2488428
Full text: PDFPDF

We study how potential attackers can identify accounts on different social network sites that all belong to the same user, exploiting only innocuous activity that inherently comes with posted content. We examine three specific features on Yelp, Flickr, ...
expand
The cost of annoying ads
Daniel G. Goldstein, R. Preston McAfee, Siddharth Suri
Pages: 459-470
doi>10.1145/2488388.2488429
Full text: PDFPDF

Display advertisements vary in the extent to which they annoy users. While publishers know the payment they receive to run annoying ads, little is known about the cost such ads incur due to user abandonment. We conducted a two-experiment investigation ...
expand
Researcher homepage classification using unlabeled data
Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles
Pages: 471-482
doi>10.1145/2488388.2488430
Full text: PDFPDF

A classifier that determines if a webpage is relevant to a specified set of topics comprises a key component for focused crawling. Can a classifier that is tuned to perform well on training datasets continue to filter out irrelevant pages in the face ...
expand
Google+ or Google-?: dissecting the evolution of the new OSN in its first year
Roberto Gonzalez, Ruben Cuevas, Reza Motamedi, Reza Rejaie, Angel Cuevas
Pages: 483-494
doi>10.1145/2488388.2488431
Full text: PDFPDF

In the era when Facebook and Twitter dominate the market for social media, Google has introduced Google+ (G+) and reported a significant growth in its size while others called it a ghost town. This begs the question that "whether G+ can really attract ...
expand
Probabilistic group recommendation via information matching
Jagadeesh Gorla, Neal Lathia, Stephen Robertson, Jun Wang
Pages: 495-504
doi>10.1145/2488388.2488432
Full text: PDFPDF

Increasingly, web recommender systems face scenarios where they need to serve suggestions to groups of users; for example, when families share e-commerce or movie rental web accounts. Research to date in this domain has proposed two approaches: computing ...
expand
WTF: the who to follow service at Twitter
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, Reza Zadeh
Pages: 505-514
doi>10.1145/2488388.2488433
Full text: PDFPDF

WTF ("Who to Follow") is Twitter's user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural ...
expand
Mining expertise and interests from social media
Ido Guy, Uri Avraham, David Carmel, Sigalit Ur, Michal Jacovi, Inbal Ronen
Pages: 515-526
doi>10.1145/2488388.2488434
Full text: PDFPDF

The rising popularity of social media in the enterprise presents new opportunities for one of the organization's most important needs--expertise location. Social media data can be very useful for expertise mining due to the variety of existing applications, ...
expand
Measuring personalization of web search
Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, Christo Wilson
Pages: 527-538
doi>10.1145/2488388.2488435
Full text: PDFPDF

Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about ...
expand
Estimating clustering coefficients and size of social networks via random walk
Stephen J. Hardiman, Liran Katzir
Pages: 539-550
doi>10.1145/2488388.2488436
Full text: PDFPDF

Online social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if ...
expand
Exploiting annotations for the rapid development of collaborative web applications
Matthias Heinrich, Franz Josef Grüneberger, Thomas Springer, Martin Gaedke
Pages: 551-560
doi>10.1145/2488388.2488437
Full text: PDFPDF

Web application frameworks are a proven means to accelerate the development of interactive web applications. However, implementing collaborative real-time applications like Google Docs requires specific concurrency control services (i.e. document synchronization ...
expand
Web usage mining with semantic analysis
Laura Hollink, Peter Mika, Roi Blanco
Pages: 561-570
doi>10.1145/2488388.2488438
Full text: PDFPDF

Web usage mining has traditionally focused on the individual queries or query words leading to a web site or web page visit, mining patterns in such data. In our work, we aim to characterize websites in terms of the semantics of the queries that lead ...
expand
Organizational overlap on social networks and its applications
Cho-Jui Hsieh, Mitul Tiwari, Deepak Agarwal, Xinyi (Lisa) Huang, Sam Shah
Pages: 571-582
doi>10.1145/2488388.2488439
Full text: PDFPDF

Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, ...
expand
Space-efficient data structures for Top-k completion
Bo-June (Paul) Hsu, Giuseppe Ottaviano
Pages: 583-594
doi>10.1145/2488388.2488440
Full text: PDFPDF

Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings beginning with ...
expand
Personalized recommendation via cross-domain triadic factorization
Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, Can Zhu
Pages: 595-606
doi>10.1145/2488388.2488441
Full text: PDFPDF

Collaborative filtering (CF) is a major technique in recommender systems to help users find their potentially desired items. Since the data sparsity problem is quite commonly encountered in real-world scenarios, Cross-Domain Collaborative Filtering (CDCF) ...
expand
Unsupervised sentiment analysis with emotional signals
Xia Hu, Jiliang Tang, Huiji Gao, Huan Liu
Pages: 607-618
doi>10.1145/2488388.2488442
Full text: PDFPDF

The explosion of social media services presents a great opportunity to understand the sentiment of the public via analyzing its large-scale and opinion-rich data. In social media, it is easy to amass vast quantities of unlabeled data, but very costly ...
expand
An analysis of socware cascades in online social networks
Ting-Kai Huang, Md Sazzadur Rahman, Harsha V. Madhyastha, Michalis Faloutsos, Bruno Ribeiro
Pages: 619-630
doi>10.1145/2488388.2488443
Full text: PDFPDF

Online social networks (OSNs) have become a popular new vector for distributing malware and spam, which we refer to as socware. Unlike email spam, which is sent by spammers directly to intended victims, socware cascades through OSNs as compromised users ...
expand
Measurement and analysis of child pornography trafficking on P2P networks
Ryan Hurley, Swagatika Prusty, Hamed Soroush, Robert J. Walls, Jeannie Albrecht, Emmanuel Cecchet, Brian Neil Levine, Marc Liberatore, Brian Lynn, Janis Wolak
Pages: 631-642
doi>10.1145/2488388.2488444
Full text: PDFPDF

Peer-to-peer networks are the most popular mechanism for the criminal acquisition and distribution of child pornography (CP). In this paper, we examine observations of peers sharing known CP on the eMule and Gnutella networks, which were collected by ...
expand
HeteroMF: recommendation in heterogeneous information networks using context dependent factor models
Mohsen Jamali, Laks Lakshmanan
Pages: 643-654
doi>10.1145/2488388.2488445
Full text: PDFPDF

With the growing amount of information available online, recommender systems are starting to provide a viable alternative and complement to search engines, in helping users to find objects of interest. Methods based on Matrix Factorization (MF) models ...
expand
Interactive exploratory search for multi page search results
Xiaoran Jin, Marc Sloan, Jun Wang
Pages: 655-666
doi>10.1145/2488388.2488446
Full text: PDFPDF

Modern information retrieval interfaces typically involve multiple pages of search results, and users who are recall minded or engaging in exploratory search using ad hoc queries are likely to access more than one page. Document rankings for such queries ...
expand
Spatio-temporal dynamics of online memes: a study of geo-tagged tweets
Krishna Y. Kamath, James Caverlee, Kyumin Lee, Zhiyuan Cheng
Pages: 667-678
doi>10.1145/2488388.2488447
Full text: PDFPDF

We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding ...
expand
Accountable key infrastructure (AKI): a proposal for a public-key validation infrastructure
Tiffany Hyun-Jin Kim, Lin-Shung Huang, Adrian Perrig, Collin Jackson, Virgil Gligor
Pages: 679-690
doi>10.1145/2488388.2488448
Full text: PDFPDF

Recent trends in public-key infrastructure research explore the tradeoff between decreased trust in Certificate Authorities (CAs), resilience against attacks, communication overhead (bandwidth and latency) for setting up an SSL/TLS connection, and availability ...
expand
DIGTOBI: a recommendation system for Digg articles using probabilistic modeling
Younghoon Kim, Yoonjae Park, Kyuseok Shim
Pages: 691-702
doi>10.1145/2488388.2488449
Full text: PDFPDF

Digg is a social news website that lets people submit articles to share their favorite web pages (e.g. blog postings or news articles) and vote the articles posted by others. Digg service currently lists the articles in the front page by popularity without ...
expand
Understanding latency variations of black box services
Darja Krushevskaja, Mark Sandler
Pages: 703-714
doi>10.1145/2488388.2488450
Full text: PDFPDF

Data centers run many services that impact millions of users daily. In reality, the latency of each service varies from one request to another. Existing tools allow to monitor services for performance glitches or service disruptions, but typically they ...
expand
Diversified recommendation on graphs: pitfalls, measures, and algorithms
Onur Küçüktunç, Erik Saule, Kamer Kaya, Ümit V. Çatalyürek
Pages: 715-726
doi>10.1145/2488388.2488451
Full text: PDFPDF

Result diversification has gained a lot of attention as a way to answer ambiguous queries and to tackle the redundancy problem in the results. In the last decade, diversification has been applied on or integrated into the process of PageRank- or eigenvector-based ...
expand
What is the added value of negative links in online social networks?
Jérôme Kunegis, Julia Preusse, Felix Schwagereit
Pages: 727-736
doi>10.1145/2488388.2488452
Full text: PDFPDF

We investigate the "negative link" feature of social networks that allows users to tag other users as foes or as distrusted in addition to the usual friend and trusted links. To answer the question whether negative links have ...
expand
Voices of victory: a computational focus group framework for tracking opinion shift in real time
Yu-Ru Lin, Drew Margolin, Brian Keegan, David Lazer
Pages: 737-748
doi>10.1145/2488388.2488453
Full text: PDFPDF

Social media have been employed to assess public opinions on events, markets, and policies. Most current work focuses on either developing aggregated measures or opinion extraction methods like sentiment analysis. These approaches suffer from unpredictable ...
expand
Rethinking the web as a personal archive
Siân E. Lindley, Catherine C. Marshall, Richard Banks, Abigail Sellen, Tim Regan
Pages: 749-760
doi>10.1145/2488388.2488454
Full text: PDFPDF

In recent years the Web has evolved substantially, transforming from a place where we primarily find information to a place where we also leave, share and keep it. This presents a fresh set of challenges for the management of personal information, which ...
expand
Expressive languages for selecting groups from graph-structured data
Vitaliy Liptchinsky, Benjamin Satzger, Rostyslav Zabolotnyi, Schahram Dustdar
Pages: 761-770
doi>10.1145/2488388.2488455
Full text: PDFPDF

Many query languages for graph-structured data are based on regular path expressions, which describe relations among pairs of nodes. We propose an extension that allows to retrieve groups of nodes based on group structural characteristics and relations ...
expand
Modeling/predicting the evolution trend of osn-based applications
Han Liu, Atif Nazir, Jinoo Joung, Chen-Nee Chuah
Pages: 771-780
doi>10.1145/2488388.2488456
Full text: PDFPDF

While various models have been proposed for generating social/friendship network graphs, the dynamics of user interactions through online social network (OSN) based applications remain largely unexplored. We previously developed a growth model to capture ...
expand
SoCo: a social network aided context-aware recommender system
Xin Liu, Karl Aberer
Pages: 781-802
doi>10.1145/2488388.2488457
Full text: PDFPDF

Contexts and social network information have been proven to be valuable information for building accurate recommender system. However, to the best of our knowledge, no existing works systematically combine diverse types of such information to further ...
expand
Using stranger as sensors: temporal and geo-sensitive question answering via social media
Yefeng Liu, Todorka Alexandrova, Tatsuo Nakajima
Pages: 803-814
doi>10.1145/2488388.2488458
Full text: PDFPDF

MoboQ is a location-based real-time social question answering service deployed in the field in China. Using MoboQ, people can ask temporal and geo-sensitive questions, such as how long is the line at a popular business right now, and then receive answers ...
expand
Imagen: runtime migration of browser sessions for javascript web applications
James Teng Kin Lo, Eric Wohlstadter, Ali Mesbah
Pages: 815-826
doi>10.1145/2488388.2488459
Full text: PDFPDF

Due to the increasing complexity of web applications and emerging HTML5 standards, a large amount of runtime state is created and managed in the user's browser. While such complexity is desirable for user experience, it makes it hard for developers to ...
expand
Gender swapping and user behaviors in online social games
Jing-Kai Lou, Kunwoo Park, Meeyoung Cha, Juyong Park, Chin-Laung Lei, Kuan-Ta Chen
Pages: 827-836
doi>10.1145/2488388.2488460
Full text: PDFPDF

Modern Massively Multiplayer Online Role-Playing Games (MMORPGs) provide lifelike virtual environments in which players can conduct a variety of activities including combat, trade, and chat with other players. While the game world and the available actions ...
expand
Mining structural hole spanners through information diffusion in social networks
Tiancheng Lou, Jie Tang
Pages: 825-836
doi>10.1145/2488388.2488461
Full text: PDFPDF

The theory of structural holes suggests that individuals would benefit from filling the "holes" (called as structural hole spanners) between people or groups that are otherwise disconnected. A few empirical studies have verified that structural ...
expand
On the evolution of the internet economic ecosystem
Richard T.B. Ma, John C.S. Lui, Vishal Misra
Pages: 849-860
doi>10.1145/2488388.2488462
Full text: PDFPDF

The evolution of the Internet has manifested itself in many ways: the traffic characteristics, the interconnection topologies and the business relationships among the autonomous components. It is important to understand why (and how) this evolution came ...
expand
Two years of short URLs internet measurement: security threats and countermeasures
Federico Maggi, Alessandro Frossi, Stefano Zanero, Gianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna
Pages: 861-872
doi>10.1145/2488388.2488463
Full text: PDFPDF

URL shortening services have become extremely popular. However, it is still unclear whether they are an effective and reliable tool that can be leveraged to hide malicious URLs, and to what extent these abuses can impact the end users. With these questions ...
expand
Know your personalization: learning topic level personalization in online services
Anirban Majumder, Nisheeth Shrivastava
Pages: 873-884
doi>10.1145/2488388.2488464
Full text: PDFPDF

Online service platforms (OSPs), such as search engines, news-websites, ad-providers, etc., serve highly personalized content to the user, based on the profile extracted from her history with the OSP. In this paper, we capture OSP's personalization ...
expand
Saving, reusing, and remixing web video: using attitudes and practices to reveal social norms
Catherine C. Marshall, Frank M. Shipman
Pages: 885-896
doi>10.1145/2488388.2488465
Full text: PDFPDF

The growth of online videos has spurred a concomitant increase in the storage, reuse, and remix of this content. As we gain more experience with video content, social norms about ownership have evolved accordingly, spelling out what people think is appropriate ...
expand
From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews
Julian John McAuley, Jure Leskovec
Pages: 897-908
doi>10.1145/2488388.2488466
Full text: PDFPDF

Recommending products to consumers means not only understanding their tastes, but also understanding their level of experience. For example, it would be a mistake to recommend the iconic film Seven Samurai simply because a user enjoys ...
expand
The FLDA model for aspect-based opinion mining: addressing the cold start problem
Samaneh Moghaddam, Martin Ester
Pages: 909-918
doi>10.1145/2488388.2488467
Full text: PDFPDF

Aspect-based opinion mining from online reviews has attracted a lot of attention recently. The main goal of all of the proposed methods is extracting aspects and/or estimating aspect ratings. Recent works, which are often based on Latent Dirichlet Allocation ...
expand
Iolaus: securing online content rating systems
Arash Molavi Kakhki, Chloe Kliman-Silver, Alan Mislove
Pages: 919-930
doi>10.1145/2488388.2488468
Full text: PDFPDF

Online content ratings services allow users to find and share content ranging from news articles (Digg) to videos (YouTube) to businesses (Yelp). Generally, these sites allow users to create accounts, declare friendships, upload and rate content, and ...
expand
On cognition, emotion, and interaction aspects of search tasks with different search intentions
Yashar Moshfeghi, Joemon M. Jose
Pages: 931-942
doi>10.1145/2488388.2488469
Full text: PDFPDF

The complex and dynamic nature of search processes surrounding information seeking have been exhaustively studied. Recent studies have highlighted search processes with different intentions, such as those for entertainment purposes or re-finding a visited ...
expand
Ad impression forecasting for sponsored search
Abhirup Nath, Shibnath Mukherjee, Prateek Jain, Navin Goyal, Srivatsan Laxman
Pages: 943-952
doi>10.1145/2488388.2488470
Full text: PDFPDF

A typical problem for a search engine (hosting sponsored search service) is to provide the advertisers with a forecast of the number of impressions his/her ad is likely to obtain for a given bid. Accurate forecasts have high business value, since they ...
expand
Measurement and modeling of eye-mouse behavior in the presence of nonlinear page layouts
Vidhya Navalpakkam, LaDawn Jentzsch, Rory Sayres, Sujith Ravi, Amr Ahmed, Alex Smola
Pages: 953-964
doi>10.1145/2488388.2488471
Full text: PDFPDF

As search pages are becoming increasingly complex, with images and nonlinear page layouts, understanding how users examine the page is important. We present a lab study on the effect of a rich informational panel to the right of the search result column, ...
expand
Understanding and decreasing the network footprint of catch-up tv
Gianfranco Nencioni, Nishanth Sastry, Jigna Chandaria, Jon Crowcroft
Pages: 965-976
doi>10.1145/2488388.2488472
Full text: PDFPDF

"Catch-up", or on-demand access of previously broadcast TV content over the public Internet, constitutes a significant fraction of peak time network traffic. This paper analyses consumption patterns of nearly 6 million users of a nationwide deployment ...
expand
Sorry, i don't speak SPARQL: translating SPARQL queries into natural language
Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann, Daniel Gerber
Pages: 977-988
doi>10.1145/2488388.2488473
Full text: PDFPDF

Over the past years, Semantic Web and Linked Data technologies have reached the backend of a considerable number of applications. Consequently, large amounts of RDF data are constantly being made available across the planet. While experts can easily ...
expand
Bitsquatting: exploiting bit-flips for fun, or profit?
Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, Wouter Joosen
Pages: 989-998
doi>10.1145/2488388.2488474
Full text: PDFPDF

Over the last fifteen years, several types of attacks against domain names and the companies relying on them have been observed. The well-known cybersquatting of domain names gave way to typosquatting, the abuse of a user's mistakes when typing a URL ...
expand
One-class collaborative filtering with random graphs
Ulrich Paquet, Noam Koenigstein
Pages: 999-1008
doi>10.1145/2488388.2488475
Full text: PDFPDF

The bane of one-class collaborative filtering is interpreting and modelling the latent signal from the missing class. In this paper we present a novel Bayesian generative model for implicit collaborative filtering. It forms a core component of the Xbox ...
expand
Latent credibility analysis
Jeff Pasternack, Dan Roth
Pages: 1009-1020
doi>10.1145/2488388.2488476
Full text: PDFPDF

A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) ...
expand
Predicting group stability in online social networks
Akshay Patil, Juan Liu, Jie Gao
Pages: 1021-1030
doi>10.1145/2488388.2488477
Full text: PDFPDF

Social groups often exhibit a high degree of dynamism. Some groups thrive, while many others die over time. Modeling group stability dynamics and understanding whether/when a group will remain stable or shrink over time can be important in a number of ...
expand
Predictive web automation assistant for people with vision impairments
Yury Puzis, Yevgen Borodin, Rami Puzis, I.V. Ramakrishnan
Pages: 1031-1040
doi>10.1145/2488388.2488478
Full text: PDFPDF

The Web is far less usable and accessible for people with vision impairments than it is for sighted people. Web automation, a process of automating browsing actions on behalf of the user, has the potential to bridge the divide between the ways sighted ...
expand
Mining collective intelligence in diverse groups
Guo-Jun Qi, Charu C. Aggarwal, Jiawei Han, Thomas Huang
Pages: 1041-1052
doi>10.1145/2488388.2488479
Full text: PDFPDF

Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a ...
expand
Trade area analysis using user generated mobile location data
Yan Qu, Jun Zhang
Pages: 1053-1064
doi>10.1145/2488388.2488480
Full text: PDFPDF

In this paper, we illustrate how User Generated Mobile Location Data (UGMLD) like Foursquare check-ins can be used in Trade Area Analysis (TAA) by introducing a new framework and corresponding analytic methods. Three key processes were created: identifying ...
expand
Psychological maps 2.0: a web engagement enterprise starting in London
Daniele Quercia, Joao Paulo Pesce, Virgilio Almeida, Jon Crowcroft
Pages: 1065-1076
doi>10.1145/2488388.2488481
Full text: PDFPDF

Planners and social psychologists have suggested that the recognizability of the urban environment is linked to people's socio-economic well-being. We build a web game that puts the recognizability of London's streets to the test. It follows as closely ...
expand
Towards realistic team formation in social networks based on densest subgraphs
Syama Sundar Rangapuram, Thomas Bühler, Matthias Hein
Pages: 1077-1088
doi>10.1145/2488388.2488482
Full text: PDFPDF

Given a task T, a set of experts V with multiple skills and a social network G(V, W) reflecting the compatibility among the experts, team formation is the problem of identifying a team C ? V that ...
expand
Efficient community detection in large networks using content and links
Yiye Ruan, David Fuhry, Srinivasan Parthasarathy
Pages: 1089-1098
doi>10.1145/2488388.2488483
Full text: PDFPDF

In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks ...
expand
Learning joint query interpretation and response ranking
Uma Sawant, Soumen Chakrabarti
Pages: 1099-1110
doi>10.1145/2488388.2488484
Full text: PDFPDF

Thanks to information extraction and semantic Web efforts, search on unstructured text is increasingly refined using semantic annotations and structured knowledge bases. However, most users cannot become familiar with the schema of knowledge bases and ...
expand
A model for green design of online news media services
Daniel Schien, Paul Shabajee, Stephen G. Wood, Chris Preist
Pages: 1111-1122
doi>10.1145/2488388.2488485
Full text: PDFPDF

The use of information and communication technology and the web-based products it provides is responsible for significant emissions of greenhouse gases. In order to enable the reduction of emissions during the design of such products, it is necessary ...
expand
Potential networks, contagious communities, and understanding social network structure
Grant Schoenebeck
Pages: 1123-1132
doi>10.1145/2488388.2488486
Full text: PDFPDF

In this paper we study how the network of agents adopting a particular technology relates to the structure of the underlying network over which the technology adoption spreads. We develop a model and show that the network of agents adopting a particular ...
expand
Do social explanations work?: studying and modeling the effects of social explanations in recommender systems
Amit Sharma, Dan Cosley
Pages: 1133-1144
doi>10.1145/2488388.2488487
Full text: PDFPDF

Recommender systems associated with social networks often use social explanations (e.g. "X, Y and 2 friends like this") to support the recommendations. We present a study of the effects of these social explanations in a music recommendation context. ...
expand
Question answering on interlinked data
Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer
Pages: 1145-1156
doi>10.1145/2488388.2488488
Full text: PDFPDF

The Data Web contains a wealth of knowledge on a large number of domains. Question answering over interlinked data sources is challenging due to two inherent characteristics. First, different datasets employ heterogeneous schemas and each one may only ...
expand
Pricing mechanisms for crowdsourcing markets
Yaron Singer, Manas Mittal
Pages: 1157-1166
doi>10.1145/2488388.2488489
Full text: PDFPDF

Every day millions of crowdsourcing tasks are performed in exchange for payments. Despite the important role pricing plays in crowdsourcing campaigns and the complexity of the market, most platforms do not provide requesters appropriate tools for effective ...
expand
Truthful incentives in crowdsourcing tasks using regret minimization mechanisms
Adish Singla, Andreas Krause
Pages: 1167-1178
doi>10.1145/2488388.2488490
Full text: PDFPDF

What price should be offered to a worker for a task in an online labor market? How can one enable workers to express the amount they desire to receive for the task completion? Designing optimal pricing policies and determining the right monetary incentives ...
expand
A predictive model for advertiser value-per-click in sponsored search
Eric Sodomka, Sébastien Lahaie, Dustin Hillard
Pages: 1179-1190
doi>10.1145/2488388.2488491
Full text: PDFPDF

Sponsored search is a form of online advertising where advertisers bid for placement next to search engine results for specific keywords. As search engines compete for the growing share of online ad spend, it becomes important for them to understand ...
expand
I know the shortened URLs you clicked on Twitter: inference attack using public click analytics and Twitter metadata
Jonghyuk Song, Sangho Lee, Jong Kim
Pages: 1191-1200
doi>10.1145/2488388.2488492
Full text: PDFPDF

Twitter is a popular social network service for sharing messages among friends. Because Twitter restricts the length of messages, many Twitter users use URL shortening services, such as bit.ly and goo.gl, to share long URLs with friends. Some URL shortening ...
expand
Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance
Yang Song, Hao Ma, Hongning Wang, Kuansan Wang
Pages: 1201-1212
doi>10.1145/2488388.2488493
Full text: PDFPDF

In this paper, we present a log-based study on user search behavior comparisons on three different platforms: desktop, mobile and tablet. We use three-month search logs in 2012 from a commercial search engine for our study. Our objective is to better ...
expand
Evaluating and predicting user engagement change with degraded search relevance
Yang Song, Xiaolin Shi, Xin Fu
Pages: 1213-1224
doi>10.1145/2488388.2488494
Full text: PDFPDF

User engagement in search refers to the frequency for users (re-)using the search engine to accomplish their tasks. Among factors that affected users' visit frequency, relevance of search results is believed to play a pivotal role. While multiple work ...
expand
Data-Fu: a language and an interpreter for interaction with read/write linked data
Steffen Stadtmüller, Sebastian Speiser, Andreas Harth, Rudi Studer
Pages: 1225-1236
doi>10.1145/2488388.2488495
Full text: PDFPDF

An increasing amount of applications build their functionality on the utilisation and manipulation of web resources. Consequently REST gains popularity with a resource-centric interaction architecture that draws its flexibility from links between resources. ...
expand
NIFTY: a system for large scale information flow tracking and clustering
Caroline Suen, Sandy Huang, Chantat Eksombatchai, Rok Sosic, Jure Leskovec
Pages: 1237-1248
doi>10.1145/2488388.2488496
Full text: PDFPDF

The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content ...
expand
When relevance is not enough: promoting diversity and freshness in personalized question recommendation
Idan Szpektor, Yoelle Maarek, Dan Pelleg
Pages: 1249-1260
doi>10.1145/2488388.2488497
Full text: PDFPDF

What makes a good question recommendation system for community question-answering sites? First, to maintain the health of the ecosystem, it needs to be designed around answerers, rather than exclusively for askers. Next, it needs to scale to many questions ...
expand
Mining acronym expansions and their meanings using query click log
Bilyana Taneva, Tao Cheng, Kaushik Chakrabarti, Yeye He
Pages: 1261-1272
doi>10.1145/2488388.2488498
Full text: PDFPDF

Acronyms are abbreviations formed from the initial components of words or phrases. Acronym usage is becoming more common in web searches, email, text messages, tweets, blogs and posts. Acronyms are typically ambiguous and often disambiguated by context ...
expand
Groundhog day: near-duplicate detection on Twitter
Ke Tao, Fabian Abel, Claudia Hauff, Geert-Jan Houben, Ujwal Gadiraju
Pages: 1273-1284
doi>10.1145/2488388.2488499
Full text: PDFPDF

With more than 340~million messages that are posted on Twitter every day, the amount of duplicate content as well as the demand for appropriate duplicate detection mechanisms is increasing tremendously. Yet there exists little research that aims at detecting ...
expand
Uncovering locally characterizing regions within geotagged data
Bart Thomee, Adam Rae
Pages: 1285-1296
doi>10.1145/2488388.2488500
Full text: PDFPDF

We propose a novel algorithm for uncovering the colloquial boundaries of locally characterizing regions present in collections of labeled geospatial data. We address the problem by first modeling the data using scale-space theory, allowing us to represent ...
expand
Spectral analysis of communication networks using Dirichlet eigenvalues
Alexander Tsiatas, Iraj Saniee, Onuttom Narayan, Matthew Andrews
Pages: 1297-1306
doi>10.1145/2488388.2488501
Full text: PDFPDF

Good clustering can provide critical insight into potential locations where congestion in a network may occur. A natural measure of congestion for a collection of nodes in a graph is its Cheeger ratio, defined as the ratio of the size of its boundary ...
expand
Subgraph frequencies: mapping the empirical and extremal geography of large graph collections
Johan Ugander, Lars Backstrom, Jon Kleinberg
Pages: 1307-1318
doi>10.1145/2488388.2488502
Full text: PDFPDF

A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs --- these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods ...
expand
The self-feeding process: a unifying model for communication dynamics in the web
Pedro Olmo S. Vaz de Melo, Christos Faloutsos, Renato Assunção, Antonio Loureiro
Pages: 1319-1330
doi>10.1145/2488388.2488503
Full text: PDFPDF

How often do individuals perform a given communication activity in the Web, such as posting comments on blogs or news? Could we have a generative model to create communication events with realistic inter-event time distributions (IEDs)? Which properties ...
expand
Google+Ripples: a native visualization of information flow
Fernanda Viégas, Martin Wattenberg, Jack Hebert, Geoffrey Borggaard, Alison Cichowlas, Jonathan Feinberg, Jon Orwant, Christopher Wren
Pages: 1389-1398
doi>10.1145/2488388.2488504
Full text: PDFPDF

G+ Ripples is a visualization of information flow that shows users how public posts are shared on Google+. Unlike other social network visualizations, Ripples exists as a "native" visualization: it is directly accessible from public posts on Google+. ...
expand
Whom to mention: expand the diffusion of tweets by @ recommendation on micro-blogging systems
Beidou Wang, Can Wang, Jiajun Bu, Chun Chen, Wei Vivian Zhang, Deng Cai, Xiaofei He
Pages: 1331-1340
doi>10.1145/2488388.2488505
Full text: PDFPDF

Nowadays, micro-blogging systems like Twitter have become one of the most important ways for information sharing. In Twitter, a user posts a message (tweet) and the others can forward the message (retweet). Mention is a new feature in micro-blogging ...
expand
Wisdom in the social crowd: an analysis of quora
Gang Wang, Konark Gill, Manish Mohanlal, Haitao Zheng, Ben Y. Zhao
Pages: 1341-1352
doi>10.1145/2488388.2488506
Full text: PDFPDF

Efforts such as Wikipedia have shown the ability of user communities to collect, organize and curate information on the Internet. Recently, a number of question and answer (Q&A) sites have successfully built large growing knowledge repositories, ...
expand
Learning to extract cross-session search tasks
Hongning Wang, Yang Song, Ming-Wei Chang, Xiaodong He, Ryen W. White, Wei Chu
Pages: 1353-1364
doi>10.1145/2488388.2488507
Full text: PDFPDF

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user search intent. Most prior research in this area has focused on short-term search tasks within ...
expand
Content-aware click modeling
Hongning Wang, ChengXiang Zhai, Anlei Dong, Yi Chang
Pages: 1365-1376
doi>10.1145/2488388.2488508
Full text: PDFPDF

Click models aim at extracting intrinsic relevance of documents to queries from biased user clicks. One basic modeling assumption made in existing work is to treat such intrinsic relevance as an atomic query-document-specific parameter, which is solely ...
expand
Is it time for a career switch?
Jian Wang, Yi Zhang, Christian Posse, Anmol Bhasin
Pages: 1377-1388
doi>10.1145/2488388.2488509
Full text: PDFPDF

Tenure is a critical factor for an individual to consider when making a job transition. For instance, software engineers make a job transition to senior software engineers in a span of 2 years on average, or it takes for approximately 3 ...
expand
From cookies to cooks: insights on dietary patterns via analysis of web usage logs
Robert West, Ryen W. White, Eric Horvitz
Pages: 1399-1410
doi>10.1145/2488388.2488510
Full text: PDFPDF

Nutrition is a key factor in people's overall health. Hence, understanding the nature and dynamics of population-wide dietary preferences over time and space can be valuable in public health. To date, studies have leveraged small samples of participants ...
expand
Enhancing personalized search by mining and modeling task behavior
Ryen W. White, Wei Chu, Ahmed Hassan, Xiaodong He, Yang Song, Hongning Wang
Pages: 1411-1420
doi>10.1145/2488388.2488511
Full text: PDFPDF

Personalized search systems tailor search results to the current user intent using historic search interactions. This relies on being able to find pertinent information in that user's search history, which can be challenging for unseen queries and for ...
expand
Inferring dependency constraints on parameters for web services
Qian Wu, Ling Wu, Guangtai Liang, Qianxiang Wang, Tao Xie, Hong Mei
Pages: 1421-1432
doi>10.1145/2488388.2488512
Full text: PDFPDF

Recently many popular websites such as Twitter and Flickr expose their data through web service APIs, enabling third-party organizations to develop client applications that provide function-alities beyond what the original websites offer. These client ...
expand
Predicting advertiser bidding behaviors in sponsored search by rationality modeling
Haifeng Xu, Bin Gao, Diyi Yang, Tie-Yan Liu
Pages: 1433-1444
doi>10.1145/2488388.2488513
Full text: PDFPDF

We study how an advertiser changes his/her bid prices in sponsored search, by modeling his/her rationality. Predicting the bid changes of advertisers with respect to their campaign performances is a key capability of search engines, since it can be used ...
expand
A biterm topic model for short texts
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng
Pages: 1445-1456
doi>10.1145/2488388.2488514
Full text: PDFPDF

Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work ...
expand
Unified entity search in social media community
Ting Yao, Yuan Liu, Chong-Wah Ngo, Tao Mei
Pages: 1457-1466
doi>10.1145/2488388.2488515
Full text: PDFPDF

The search for entities is the most common search behavior on the Web, especially in social media communities where entities (such as images, videos, people, locations, and tags) are highly heterogeneous and correlated. While previous research usually ...
expand
MATRI: a multi-aspect and transitive trust inference model
Yuan Yao, Hanghang Tong, Xifeng Yan, Feng Xu, Jian Lu
Pages: 1467-1476
doi>10.1145/2488388.2488516
Full text: PDFPDF

Trust inference, which is the mechanism to build new pair-wise trustworthiness relationship based on the existing ones, is a fundamental integral part in many real applications, e.g., e-commerce, social networks, peer-to-peer networks, etc. State-of-the-art ...
expand
Predicting positive and negative links in signed social networks by transfer learning
Jihang Ye, Hong Cheng, Zhe Zhu, Minghua Chen
Pages: 1477-1488
doi>10.1145/2488388.2488517
Full text: PDFPDF

Different from a large body of research on social networks that has focused almost exclusively on positive relationships, we study signed social networks with both positive and negative links. Specifically, we focus on how to reliably and effectively ...
expand
Sparse online topic models
Aonan Zhang, Jun Zhu, Bo Zhang
Pages: 1489-1500
doi>10.1145/2488388.2488518
Full text: PDFPDF

Topic models have shown great promise in discovering latent semantic structures from complex data corpora, ranging from text documents and web news articles to images, videos, and even biological data. In order to deal with massive data collections and ...
expand
TopRec: domain-specific recommendation through community topic mining in social network
Xi Zhang, Jian Cheng, Ting Yuan, Biao Niu, Hanqing Lu
Pages: 1501-1510
doi>10.1145/2488388.2488519
Full text: PDFPDF

Traditionally, Collaborative Filtering assumes that similar users have similar responses to similar items. However, human activities exhibit heterogenous features across multiple domains such that users own similar tastes in one domain may behave quite ...
expand
Localized matrix factorization for recommendation based on matrix block diagonal forms
Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma, Shi Feng
Pages: 1511-1520
doi>10.1145/2488388.2488520
Full text: PDFPDF

Matrix factorization on user-item rating matrices has achieved significant success in collaborative filtering based recommendation tasks. However, it also encounters the problems of data sparsity and scalability when applied in real-world recommender ...
expand
Predicting purchase behaviors from social media
Yongzheng Zhang, Marco Pennacchiotti
Pages: 1521-1532
doi>10.1145/2488388.2488521
Full text: PDFPDF

In the era of social commerce, users often connect from e-commerce websites to social networking venues such as Facebook and Twitter. However, there have been few efforts on understanding the correlations between users' social media profiles and their ...
expand
Anatomy of a web-scale resale market: a data mining approach
Yuchen Zhao, Neel Sundaresan, Zeqian Shen, Philip S. Yu
Pages: 1533-1544
doi>10.1145/2488388.2488522
Full text: PDFPDF

Reuse and remarketing of content and products is an integral part of the internet. As E-commerce has grown, online resale and secondary markets form a significant part of the commerce space. The intentions and methods for reselling are diverse. In this ...
expand
Questions about questions: an empirical analysis of information needs on Twitter
Zhe Zhao, Qiaozhu Mei
Pages: 1545-1556
doi>10.1145/2488388.2488523
Full text: PDFPDF

Conventional studies of online information seeking behavior usually focus on the use of search engines or question answering (Q&A) websites. Recently, the fast growth of online social platforms such as Twitter and Facebook has made it possible for ...
expand
Which vertical search engines are relevant?
Ke Zhou, Ronan Cummins, Mounia Lalmas, Joemon M. Jose
Pages: 1557-1568
doi>10.1145/2488388.2488524
Full text: PDFPDF

Aggregating search results from a variety of heterogeneous sources, so-called verticals, such as news, image and video, into a single interface is a popular paradigm in web search. Current approaches that evaluate the effectiveness of aggregated search ...
expand
Making the most of your triple store: query answering in OWL 2 using an RL reasoner
Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Jay Banerjee
Pages: 1569-1580
doi>10.1145/2488388.2488525
Full text: PDFPDF

Triple stores implementing the RL profile of OWL 2 are becoming increasingly popular. In contrast to unrestricted OWL 2, the RL profile is known to enjoy favourable computational properties for query answering, and state-of-the-art RL reasoners such ...
expand
Security implications of password discretization for click-based graphical passwords
Bin B. Zhu, Dongchen Wei, Maowei Yang, Jeff Yan
Pages: 1581-1591
doi>10.1145/2488388.2488526
Full text: PDFPDF

Discretization is a standard technique used in click-based graphical passwords for tolerating input variance so that approximately correct passwords are accepted by the system. In this paper, we show for the first time that two representative discretization ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder