Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e.., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while each higher-level unit hierarchically influences its containing units. Such an architecture enables the learned word embeddings to encode both global and local information. We evaluate the learned word embeddings and experiments demonstrate the effectiveness of our model.

top of pageAUTHORS



Author image not provided  Xun Wang

No contact information provided yet.

Bibliometrics: publication history
Publication years2015-2015
Publication count3
Citation Count0
Available for download3
Downloads (6 Weeks)2
Downloads (12 Months)23
Downloads (cumulative)281
Average downloads per article93.67
Average citations per article0.00
View colleagues of Xun Wang


Author image not provided  Katsuhoto Sudoh

No contact information provided yet.

 
View colleagues of Katsuhoto Sudoh


Author image not provided  Masaaki Nagata

No contact information provided yet.

Bibliometrics: publication history
Publication years1992-2016
Publication count46
Citation Count156
Available for download34
Downloads (6 Weeks)101
Downloads (12 Months)562
Downloads (cumulative)6,394
Average downloads per article188.06
Average citations per article3.39
View colleagues of Masaaki Nagata

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. (2006). Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer.
 
2
3
 
4
5
 
6
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
 
7
 
8
JeffreyPennington, R. and Manning, C. (2014). Glove: Global vectors for word representation.
 
9
Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
 
10
Li, J., Jurafsky, D., and Hovy, E. (2015a). When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185.
 
11
Li, J., Li, R., and Hovy, E. (2014). Recursive deep models for discourse parsing.
 
12
Li, J., Luong, M.-T., and Jurafsky, D. (2015b). A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057.
 
13
Luong, M.-T., Socher, R., and Manning, C. (2013). Better word representations with recursive neural networks for morphology. CoNLL-2013, 104.
 
14
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
 
15
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, pages 1045--1048.
 
16
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011). Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201.
 
17
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119.
 
18
Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751. Citeseer.
 
19
Miller, G. A. and Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1--28.
 
20
Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.
21
 
22
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP.
 
23
Srivastava, N. (2013). Improving neural networks with dropout. PhD thesis, University of Toronto.
 
24
Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.
 
25
Vaswani, A., Zhao, Y., Fossum, V., and Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In EMNLP, pages 1387--1392. Citeseer.
 
26
Zaremba, W. and Sutskever, I. (2014). Learning to execute. arXiv preprint arXiv:1410.4615.

top of pageCITED BY

Citings are not available

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title CIKM '15 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management table of contents
General Chairs James Bailey The University of Melbourne
Alistair Moffat The University of Melbourne
Program Chairs Charu C. Aggarwal IBM
Maarten de Rijke University of Amsterdam
Ravi Kumar Google
Vanessa Murdock Microsoft
Timos Sellis RMIT University
Jeffrey Xu Yu Chinese University of Hong Kong
Pages 1927-1930
Publication Date2015-10-17 (yyyy-mm-dd)
Sponsors SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR ACM Special Interest Group on Information Retrieval
PublisherACM New York, NY, USA ©2015
ISBN: 978-1-4503-3794-6 Order Number: 605159 doi>10.1145/2806416.2806637
Conference CIKMConference on Information and Knowledge Management CIKM logo
Paper Acceptance Rate 165 of 646 submissions, 26%
Overall Acceptance Rate 1,960 of 10,758 submissions, 18%
Year Submitted Accepted Rate
CIKM '05 425 77 18%
CIKM '06 537 81 15%
CIKM '07 512 86 17%
CIKM '08 772 132 17%
CIKM '09 847 123 15%
CIKM '10 945 126 13%
CIKM '11 918 228 25%
CIKM '12 1088 146 13%
CIKM '13 848 143 17%
CIKM '14 838 175 21%
CIKM '15 646 165 26%
CIKM '16 701 160 23%
CIKM '17 855 171 20%
CIKM '18 826 147 18%
Overall 10,758 1,960 18%

APPEARS IN
Artificial Intelligence
Digital Content
Networking
Software

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
Table of Contents
SESSION: Keynote Address I
Session details: Keynote Address I
Alistair Moffat
doi>10.1145/3252277
Slow Search: Improving Information Retrieval Using Human Assistance
Jaime Teevan
Pages: 1-1
doi>10.1145/2806416.2806417
Full text: PDFPDF

We live in a world where the pace of everything from communication to transportation is getting faster. In recent years a number of "slow movements" have emerged that advocate for reducing speed in exchange for increasing quality, including the slow ...
expand
SESSION: Session 1A: Scalability
Session details: Session 1A: Scalability
Rui Zhang
doi>10.1145/3252278
External Data Access And Indexing In AsterixDB
Abdullah A. Alamoudi, Raman Grover, Michael J. Carey, Vinayak Borkar
Pages: 3-12
doi>10.1145/2806416.2806428
Full text: PDFPDF

Traditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-resident ...
expand
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
Kasper Grud Skat Madsen, Yongluan Zhou
Pages: 13-22
doi>10.1145/2806416.2806449
Full text: PDFPDF

The emerging interest in Massively Parallel Stream Processing Engines (MPSPEs), which are able to process long-standing computations over data streams with ever-growing velocity at a large-scale cluster, calls for efficient dynamic resource management ...
expand
A Parallel GPU-Based Approach to Clustering Very Fast Data Streams
Pengtao Huang, Xiu Li, Bo Yuan
Pages: 23-32
doi>10.1145/2806416.2806545
Full text: PDFPDF

Clustering data streams has become a hot topic in the era of big data. Driven by the ever increasing volume, velocity and variety of data, more efficient algorithms for clustering large-scale complex data streams are needed. In this paper, we present ...
expand
Scalable Clustering Algorithm via a Triangle Folding Processing for Complex Networks
Ying Kang, Xiaoyan Gu, Weiping Wang, Dan Meng
Pages: 33-42
doi>10.1145/2806416.2806563
Full text: PDFPDF

Facing up to the incessant growth of complex networks, more and more researchers start turning to a multilevel computing paradigm with high scalability for clustering. By virtue of iterative coarsening level by level, the clustering results which are ...
expand
SESSION: Session 1B: Personal Search
Session details: Session 1B: Personal Search
Peter Bailey
doi>10.1145/3252279
Understanding the Impact of the Role Factor in Collaborative Information Retrieval
Lynda Tamine, Laure Soulier
Pages: 43-52
doi>10.1145/2806416.2806481
Full text: PDFPDF

Collaborative information retrieval systems often rely on division of labor policies. Such policies allow work to be divided among collaborators with the aim of preventing redundancy and optimizing the synergic effects of collaboration. Most of the underlying ...
expand
Experiments with a Venue-Centric Model for Personalisedand Time-Aware Venue Suggestion
Romain Deveaud, M-Dyaa Albakour, Craig Macdonald, Iadh Ounis
Pages: 53-62
doi>10.1145/2806416.2806484
Full text: PDFPDF

Location-based social networks (LBSNs), such as Foursquare, fostered the emergence of new tasks such as recommending venues a user might wish to visit. In the literature, recommending venues has typically been addressed using user-centric recommendation ...
expand
Search Result Diversification Based on Hierarchical Intents
Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen
Pages: 63-72
doi>10.1145/2806416.2806455
Full text: PDFPDF

A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware ...
expand
Category-Driven Approach for Local Related Business Recommendations
Yonathan Perez, Michael Schueppert, Matthew Lawlor, Shaunak Kishore
Pages: 73-82
doi>10.1145/2806416.2806495
Full text: PDFPDF

When users search online for a business, the search engine may present them with a list of related business recommendations. We address the problem of constructing a useful and diverse list of such recommendations that would include an optimal combination ...
expand
SESSION: Session 1C: Learning
Session details: Session 1C: Learning
Leif Azzopardi
doi>10.1145/3252280
A Soft Computing Approach for Learning to Aggregate Rankings
Javier Alvaro Vargas Muñoz, Ricardo da Silva Torres, Marcos André Gonçalves
Pages: 83-92
doi>10.1145/2806416.2806478
Full text: PDFPDF

This paper presents an approach to combine rank aggregation techniques using a soft computing technique -- Genetic Programming -- in order to improve the results in Information Retrieval tasks. Previous work shows that by combining rank aggregation techniques ...
expand
Approximate String Matching by End-Users using Active Learning
Lutz Büch, Artur Andrzejak
Pages: 93-102
doi>10.1145/2806416.2806453
Full text: PDFPDF

Identifying approximately identical strings is key for many data cleaning and data integration processes, including similarity join and record matching. The accuracy of such tasks crucially depends on appropriate choices of string similarity measures ...
expand
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
Shoaib Jameel, Wai Lam, Steven Schockaert, Lidong Bing
Pages: 103-112
doi>10.1145/2806416.2806482
Full text: PDFPDF

While most methods for learning-to-rank documents only consider relevance scores as features, better results can often be obtained by taking into account the latent topic structure of the document collection. Existing approaches that consider latent ...
expand
Collaborating between Local and Global Learning for Distributed Online Multiple Tasks
Xin Jin, Ping Luo, Fuzhen Zhuang, Jia He, Qing He
Pages: 113-122
doi>10.1145/2806416.2806553
Full text: PDFPDF

This paper studies the novel learning scenarios of Distributed Online Multi-tasks (DOM), where the learning individuals with continuously arriving data are distributed separately and meanwhile they need to learn individual models collaboratively. It ...
expand
SESSION: Session 1D: Text Processing
Session details: Session 1D: Text Processing
Tim Baldwin
doi>10.1145/3252281
Lifespan-based Partitioning of Index Structures for Time-travel Text Search
Animesh Nandi, Suriya Subramanian, Sriram Lakshminarasimhan, Prasad M. Deshpande, Sriram Raghavan
Pages: 123-132
doi>10.1145/2806416.2806442
Full text: PDFPDF

Time-travel text search over a temporally evolving document collection is useful in various applications. Supporting a wide range of query classes demanded by these applications require different index layouts optimized for their respective query access ...
expand
Contextual Text Understanding in Distributional Semantic Space
Jianpeng Cheng, Zhongyuan Wang, Ji-Rong Wen, Jun Yan, Zheng Chen
Pages: 133-142
doi>10.1145/2806416.2806517
Full text: PDFPDF

Representing discrete words in a continuous vector space turns out to be useful for natural language applications related to text understanding. Meanwhile, it poses extensive challenges, one of which is due to the polysemous nature of human language. ...
expand
External Knowledge and Query Strategies in Active Learning: a Study in Clinical Information Extraction
Mahnoosh Kholghi, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen
Pages: 143-152
doi>10.1145/2806416.2806550
Full text: PDFPDF

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning ...
expand
Ranking Deep Web Text Collections for Scalable Information Extraction
Pablo Barrio, Luis Gravano, Chris Develder
Pages: 153-162
doi>10.1145/2806416.2806581
Full text: PDFPDF

Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...
expand
SESSION: Session 1E: Applications
Session details: Session 1E: Applications
Huizhi (Elly) Liang
doi>10.1145/3252282
Forming Online Support Groups for Internet and Behavior Related Addictions
Chih-Ya Shen, Hong-Han Shuai, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S. Yu, Ming-Syan Chen
Pages: 163-172
doi>10.1145/2806416.2806423
Full text: PDFPDF

While online social networks have become a part of many people's daily lives, Internet and social network addictions (ISNAs) have been noted recently. With increased patients in addictive Internet use, clinicians often form support groups to help patients. ...
expand
Concept-Based Relevance Models for Medical and Semantic Information Retrieval
Chunye Wang, Ramakrishna Akella
Pages: 173-182
doi>10.1145/2806416.2806497
Full text: PDFPDF

Relevance models provide an important approach for estimating probabilities of words in the relevant class. However, the associated bag-of-words assumption breaks dependencies between words, especially between those within a phrase. If such dependencies ...
expand
PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface
Longqi Yang, Yin Cui, Fan Zhang, John P. Pollak, Serge Belongie, Deborah Estrin
Pages: 183-192
doi>10.1145/2806416.2806544
Full text: PDFPDF

Food preference learning is an important component of wellness applications and restaurant recommender systems as it provides personalized information for effective food targeting and suggestions. However, existing systems require some form of food journaling ...
expand
Data Driven Water Pipe Failure Prediction: A Bayesian Nonparametric Approach
Peng Lin, Bang Zhang, Yi Wang, Zhidong Li, Bin Li, Yang Wang, Fang Chen
Pages: 193-202
doi>10.1145/2806416.2806509
Full text: PDFPDF

Water pipe failures can cause significant economic and social costs, hence have become the primary challenge to water utilities. In this paper, we propose a Bayesian nonparametric approach, namely the Dirichlet process mixture of hierarchical beta process ...
expand
SESSION: Session 1F: Social Media 1
Session details: Session 1F: Social Media 1
Lynda Tamine
doi>10.1145/3252283
Tumblr Blog Recommendation with Boosted Inductive Matrix Completion
Donghyuk Shin, Suleyman Cetintas, Kuang-Chih Lee, Inderjit S. Dhillon
Pages: 203-212
doi>10.1145/2806416.2806578
Full text: PDFPDF

Popular microblogging sites such as Tumblr have attracted hundreds of millions of users as a content sharing platform, where users can create rich content in the form of posts that are shared with other users who follow them. Due to the sheer amount ...
expand
BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media
Haokai Lu, James Caverlee, Wei Niu
Pages: 213-222
doi>10.1145/2806416.2806573
Full text: PDFPDF

We propose a lightweight system for (i) semi-automatically discovering and tracking bias themes associated with opposing sides of a topic; (ii) identifying strong partisans who drive the online discussion; and (iii) inferring the opinion bias of "regular" ...
expand
Knowlywood: Mining Activity Knowledge From Hollywood Narratives
Niket Tandon, Gerard de Melo, Abir De, Gerhard Weikum
Pages: 223-232
doi>10.1145/2806416.2806583
Full text: PDFPDF

Despite the success of large knowledge bases, one kind of knowledge that has not received attention so far is that of human activities. An example of such an activity is proposing to someone (to get married). For the computer, knowing that this involves ...
expand
Entity and Aspect Extraction for Organizing News Comments
Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt
Pages: 233-242
doi>10.1145/2806416.2806576
Full text: PDFPDF

News websites give their users the opportunity to participate in discussions about published articles, by writing comments. Typically, these comments are unstructured making it hard to understand the flow of user discussions. Thus, there is a need for ...
expand
SESSION: Session 2A: Graphs
Session details: Session 2A: Graphs
Sourav S. Bhowmick
doi>10.1145/3252284
HDRF: Stream-Based Partitioning for Power-Law Graphs
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, Giorgio Iacoboni
Pages: 243-252
doi>10.1145/2806416.2806424
Full text: PDFPDF

Balanced graph partitioning is a fundamental problem that is receiving growing attention with the emergence of distributed graph-computing (DGC) frameworks. In these frameworks, the partitioning strategy plays an important role since it drives the communication ...
expand
Towards Scale-out Capability on Social Graphs
Haichuan Shang, Xiang Zhao, Uday Kiran, Masaru Kitsuregawa
Pages: 253-262
doi>10.1145/2806416.2806420
Full text: PDFPDF

The development of cloud storage and computing has facilitated the rise of various big data applications. As a representative high performance computing (HPC) workload, graph processing is becoming a part of cloud computing. However, scalable computing ...
expand
Identifying Top-k Structural Hole Spanners in Large-Scale Social Networks
Mojtaba Rezvani, Weifa Liang, Wenzheng Xu, Chengfei Liu
Pages: 263-272
doi>10.1145/2806416.2806431
Full text: PDFPDF

Recent studies have shown that in social networks, users who bridge different communities, known as structural hole spanners, have great potentials to acquire available resources from these communities and gain access to multiple sources of information ...
expand
Scalable Facility Location for Massive Graphs on Pregel-like Systems
Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, Mauro Sozio
Pages: 273-282
doi>10.1145/2806416.2806508
Full text: PDFPDF

We propose a new scalable algorithm for the facility-location problem. We study the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on a graph. This setting is applicable to various ...
expand
SESSION: Session 2B: Retrieval Algorithms
Session details: Session 2B: Retrieval Algorithms
Guido Zuccon
doi>10.1145/3252285
Rank by Time or by Relevance?: Revisiting Email Search
David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, Ariel Raviv
Pages: 283-292
doi>10.1145/2806416.2806471
Full text: PDFPDF

With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in ...
expand
On the Cost of Extracting Proximity Features for Term-Dependency Models
Xiaolu Lu, Alistair Moffat, J. Shane Culpepper
Pages: 293-302
doi>10.1145/2806416.2806467
Full text: PDFPDF

Sophisticated ranking mechanisms make use of term dependency features in order to compute similarity scores for documents. These features often include exact phrase occurrences, and term proximity estimates. Both cases build on the intuition that if ...
expand
An Optimization Framework for Merging Multiple Result Lists
Chia-Jung Lee, Qingyao Ai, W. Bruce Croft, Daniel Sheldon
Pages: 303-312
doi>10.1145/2806416.2806489
Full text: PDFPDF

Developing effective methods for fusing multiple ranked lists of documents is crucial to many applications. Federated web search, for instance, has become a common practice where a query is issued to different verticals and a single ranked list of blended ...
expand
Searching and Stopping: An Analysis of Stopping Rules and Strategies
David Maxwell, Leif Azzopardi, Kalervo Järvelin, Heikki Keskustalo
Pages: 313-322
doi>10.1145/2806416.2806476
Full text: PDFPDF

Searching naturally involves stopping points, both at a query level (how far down the ranked list should I go?) and at a session level (how many queries should I issue?). Understanding when searchers stop has been of much interest to the ...
expand
SESSION: Session 2C: Text Analysis
Session details: Session 2C: Text Analysis
Krisztian Balog
doi>10.1145/3252286
Automated News Suggestions for Populating Wikipedia Entity Pages
Besnik Fetahu, Katja Markert, Avishek Anand
Pages: 323-332
doi>10.1145/2806416.2806531
Full text: PDFPDF

Wikipedia entity pages are a valuable source of information for direct consumption and for knowledge-base construction, update and maintenance. Facts in these entity pages are typically supported by references. Recent studies show that as much as 20% ...
expand
Mining Coordinated Intent Representation for Entity Search and Recommendation
Huizhong Duan, ChengXiang Zhai
Pages: 333-342
doi>10.1145/2806416.2806557
Full text: PDFPDF

We study the problem of learning query intent representation for an entity search task such as product retrieval, where a user would use a keyword query to retrieve entities based on their structured attribute value descriptions. Existing intent representation ...
expand
Sentiment Extraction by Leveraging Aspect-Opinion Association Structure
Li Zhao, Minlie Huang, Jiashen Sun, Hengliang Luo, Xiankai Yang, Xiaoyan Zhu
Pages: 343-352
doi>10.1145/2806416.2806525
Full text: PDFPDF

Sentiment extraction aims to extract and group the task of extracting and grouping aspect and opinion words from online reviews. Previous works usually extract aspect and opinion words by leveraging association between a single pair of aspect and opinion ...
expand
Leveraging Joint Interactions for Credibility Analysis in News Communities
Subhabrata Mukherjee, Gerhard Weikum
Pages: 353-362
doi>10.1145/2806416.2806537
Full text: PDFPDF

Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. ...
expand
SESSION: Session 2D: Clustering
Session details: Session 2D: Clustering
Ravi Kumar
doi>10.1145/3252287
Clustering-based Active Learning on Sensor Type Classification in Buildings
Dezhi Hong, Hongning Wang, Kamin Whitehouse
Pages: 363-372
doi>10.1145/2806416.2806574
Full text: PDFPDF

Commercial and industrial buildings account for a considerable portion of all energy consumed in the U.S., and thus reducing this energy consumption is a national grand challenge. Based on the large deployment of sensors in modern commercial buildings, ...
expand
gSparsify: Graph Motif Based Sparsification for Graph Clustering
Peixiang Zhao
Pages: 373-382
doi>10.1145/2806416.2806543
Full text: PDFPDF

Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intra-cluster density and intercluster sparsity. In many real-world applications, however, ...
expand
Incomplete Multi-view Clustering via Subspace Learning
Qiyue Yin, Shu Wu, Liang Wang
Pages: 383-392
doi>10.1145/2806416.2806526
Full text: PDFPDF

Multi-view clustering, which explores complementary information between multiple distinct feature sets for better clustering, has a wide range of applications, e.g., knowledge management and information retrieval. Traditional multi-view clustering methods ...
expand
Robust Subspace Clustering via Tighter Rank Approximation
Zhao Kang, Chong Peng, Qiang Cheng
Pages: 393-401
doi>10.1145/2806416.2806506
Full text: PDFPDF

Matrix rank minimization problem is in general NP-hard. The nuclear norm is used to substitute the rank function in many recent studies. Nevertheless, the nuclear norm approximation adds all singular values together and the approximation error may depend ...
expand
SESSION: Session 2E: Users and Predictions
Session details: Session 2E: Users and Predictions
James Caverlee
doi>10.1145/3252288
Interactive User Group Analysis
Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, Alexandre Termier
Pages: 403-412
doi>10.1145/2806416.2806519
Full text: PDFPDF

User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. The analysis of user data is appealing to scientists who work on population studies, recommendations, and large-scale data analytics. ...
expand
Viewability Prediction for Online Display Ads
Chong Wang, Achir Kalra, Cristian Borcea, Yi Chen
Pages: 413-422
doi>10.1145/2806416.2806536
Full text: PDFPDF

As a massive industry, display advertising delivers advertisers' marketing messages to attract customers through graphic banners on webpages. Advertisers are charged by ad serving, where their ads are shown in web pages. However, recent studies show ...
expand
10 Bits of Surprise: Detecting Malicious Users with Minimum Information
Reza Zafarani, Huan Liu
Pages: 423-431
doi>10.1145/2806416.2806535
Full text: PDFPDF

Malicious users are a threat to many sites and defending against them demands innovative countermeasures. When malicious users join sites, they provide limited information about themselves. With this limited information, sites can find it difficult to ...
expand
MAPer: A Multi-scale Adaptive Personalized Model for Temporal Human Behavior Prediction
Sarah Masud Preum, John A. Stankovic, Yanjun Qi
Pages: 433-442
doi>10.1145/2806416.2806562
Full text: PDFPDF

The primary objective of this research is to develop a simple and interpretable predictive framework to perform temporal modeling of individual user's behavior traits based on each person's past observed traits/behavior. Individual-level human behavior ...
expand
SESSION: Session 2F: Heterogeneous Networks
Session details: Session 2F: Heterogeneous Networks
Michael Quan Z. Sheng
doi>10.1145/3252289
Classification with Active Learning and Meta-Paths in Heterogeneous Information Networks
Chang Wan, Xiang Li, Ben Kao, Xiao Yu, Quanquan Gu, David Cheung, Jiawei Han
Pages: 443-452
doi>10.1145/2806416.2806507
Full text: PDFPDF

A heterogeneous information network (HIN) is used to model objects of different types and their relationships. Meta-paths are sequences of object types. They are used to represent complex relationships between objects beyond what links in a homogeneous ...
expand
Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks
Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, Bin Wu
Pages: 453-462
doi>10.1145/2806416.2806528
Full text: PDFPDF

Recently heterogeneous information network (HIN) analysis has attracted a lot of attention, and many data mining tasks have been exploited on HIN. As an important data mining task, recommender system includes a lot of object types (e.g., users, movies, ...
expand
A Graph-based Recommendation across Heterogeneous Domains
Deqing Yang, Jingrui He, Huazheng Qin, Yanghua Xiao, Wei Wang
Pages: 463-472
doi>10.1145/2806416.2806523
Full text: PDFPDF

Given the users from a social network site, who have been tagged with a set of terms, how can we recommend the movies tagged with a completely different set of terms hosted by another website? Given the users from a website dedicated to Type I and Type ...
expand
Query Relaxation across Heterogeneous Data Sources
Verena Kantere, George Orfanoudakis, Anastasios Kementsietsidis, Timos Sellis
Pages: 473-482
doi>10.1145/2806416.2806529
Full text: PDFPDF

The fundamental assumption for query rewriting in heterogeneous environments is that the mappings used for the rewriting are complete, i.e., every relation and attribute mentioned in the query is associated, through mappings, to relations ...
expand
SESSION: Session 3A: Veracity
Session details: Session 3A: Veracity
Laure Berti-?quille
doi>10.1145/3252290
Approximated Summarization of Data Provenance
Eleanor Ainy, Pierre Bourhis, Susan B. Davidson, Daniel Deutch, Tova Milo
Pages: 483-492
doi>10.1145/2806416.2806429
Full text: PDFPDF

Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult ...
expand
An Integrated Bayesian Approach for Effective Multi-Truth Discovery
Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, Xue Li
Pages: 493-502
doi>10.1145/2806416.2806443
Full text: PDFPDF

Truth-finding is the fundamental technique for corroborating reports from multiple sources in both data integration and collective intelligent applications. Traditional truth-finding methods assume a single true value for each data item and therefore ...
expand
Approximate Truth Discovery via Problem Scale Reduction
Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Xue Li, Xiaofei Xu, Lina Yao
Pages: 503-512
doi>10.1145/2806416.2806444
Full text: PDFPDF

Many real-world applications rely on multiple data sources to provide information on their interested items. Due to the noises and uncertainty in data, given a specific item, the information from different sources may conflict. To make reliable decisions ...
expand
SESSION: Session 3B: Social Networks 1
Session details: Session 3B: Social Networks 1
Niloy Ganguly
doi>10.1145/3252291
Organic or Organized?: Exploring URL Sharing Behavior
Cheng Cao, James Caverlee, Kyumin Lee, Hancheng Ge, Jinwook Chung
Pages: 513-522
doi>10.1145/2806416.2806572
Full text: PDFPDF

URL sharing has become one of the most popular activities on many online social media platforms. Shared URLs are an avenue to interesting news articles, memes, photos, as well as low-quality content like spam, promotional ads, and phishing sites. While ...
expand
Mining Brokers in Dynamic Social Networks
Chonggang Song, Wynne Hsu, Mong Li Lee
Pages: 523-532
doi>10.1145/2806416.2806468
Full text: PDFPDF

The theory of brokerage in sociology suggests if contacts between two parties are enabled through a third party, the latter occupies a strategic position of controlling information flows. Such individuals are called brokers and they play a key ...
expand
Who Will You "@"?
Yeyun Gong, Qi Zhang, Xuyang Sun, Xuanjing Huang
Pages: 533-542
doi>10.1145/2806416.2806458
Full text: PDFPDF

In Twitter-like social networking services, people can use the "@" symbol to mention other users in tweets and send them a message or link to their profiles. In recent years, social media services are rapidly growing with thousands of millions of users ...
expand
SESSION: Session 3C: Query Completion
Session details: Session 3C: Query Completion
Maarten de Rijke
doi>10.1145/3252292
Characterizing and Predicting Voice Query Reformulation
Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, Rosie Jones
Pages: 543-552
doi>10.1145/2806416.2806491
Full text: PDFPDF

Voice interactions are becoming more prevalent as the usage of voice search and intelligent assistants gains more popularity. Users frequently reformulate their requests in hope of getting better results either because the system was unable to recognize ...
expand
A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion
Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, Jian-Yun Nie
Pages: 553-562
doi>10.1145/2806416.2806493
Full text: PDFPDF

Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous ...
expand
A Network-Aware Approach for Searching As-You-Type in Social Media
Paul Lagrée, Bogdan Cautis, Hossein Vahabi
Pages: 563-572
doi>10.1145/2806416.2806435
Full text: PDFPDF

We present in this paper a novel approach for as-you-type top-k keyword search over social media. We adopt a natural "network-aware" interpretation for information relevance, by which information produced by users who are closer to the seeker ...
expand
SESSION: Session 3D: Microblogs
Session details: Session 3D: Microblogs
Antoine Doucet
doi>10.1145/3252293
Improving Microblog Retrieval with Feedback Entity Model
Feifan Fan, Runwei Qiang, Chao Lv, Jianwu Yang
Pages: 573-582
doi>10.1145/2806416.2806461
Full text: PDFPDF

When searching over the microblogging, users prefer using queries including terms that represent some specific entities. Meanwhile, tweets, though limited within 140 characters, are often generated with one or more entities. Entities, as an important ...
expand
Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, Saptarshi Ghosh
Pages: 583-592
doi>10.1145/2806416.2806485
Full text: PDFPDF

Microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however, this information is immersed among hundreds ...
expand
Profession-Based Person Search in Microblogs: Using Seed Sets to Find Journalists
Mossaab Bagdouri, Douglas W. Oard
Pages: 593-602
doi>10.1145/2806416.2806466
Full text: PDFPDF

We introduce the problem of searching for professionals in microblogging platforms. We describe a study of how a group of professional journalists with some common characteristics (e.g., works in a specific language, belongs to certain region, or specializes ...
expand
SESSION: Session 3E: Graph-Based Analysis
Session details: Session 3E: Graph-Based Analysis
Lina Yao
doi>10.1145/3252294
Learning Entity Types from Query Logs via Graph-Based Modeling
Jingyuan Zhang, Luo Jie, Altaf Rahman, Sihong Xie, Yi Chang, Philip S. Yu
Pages: 603-612
doi>10.1145/2806416.2806498
Full text: PDFPDF

Entities (e.g., person, movie or place) play an important role in real-world applications and learning entity types has attracted much attention in recent years. Most conventional automatic techniques use large corpora, such as news articles, to learn ...
expand
Collaborative Prediction for Multi-entity Interaction With Hierarchical Representation
Qiang Liu, Shu Wu, Liang Wang
Pages: 613-622
doi>10.1145/2806416.2806530
Full text: PDFPDF

With the rapid growth of Internet applications, there are more and more entities in interaction scenarios, and thus collaborative prediction for multi-entity interaction is becoming a significant problem. The state-of-the-art methods, e.g., tensor factorization ...
expand
Learning to Represent Knowledge Graphs with Gaussian Embedding
Shizhu He, Kang Liu, Guoliang Ji, Jun Zhao
Pages: 623-632
doi>10.1145/2806416.2806502
Full text: PDFPDF

The representation of a knowledge graph (KG) in a latent space recently has attracted more and more attention. To this end, some proposed models (e.g., TransE) embed entities and relations of a KG into a "point" vector space by optimizing ...
expand
SESSION: Session 3F: Classification 1
Session details: Session 3F: Classification 1
Alexandra Uitdenbogerd
doi>10.1145/3252295
Associative Classification with Statistically Significant Positive and Negative Rules
Jundong Li, Osmar Zaiane
Pages: 633-642
doi>10.1145/2806416.2806524
Full text: PDFPDF

Rule-based classifier has shown its popularity in building many decision support systems such as medical diagnosis and financial fraud detection. One major advantage is that the models are human understandable and can be edited. Associative classifiers, ...
expand
A Min-Max Optimization Framework For Online Graph Classification
Peng Yang, Peilin Zhao
Pages: 643-652
doi>10.1145/2806416.2806548
Full text: PDFPDF

Traditional online learning for graph node classification adapts graph regularization into ridge regression, which may not be suitable when data is adversarially generated. To solve this issue, we propose a more general min-max optimization framework ...
expand
An Inference Approach to Basic Level of Categorization
Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, Yanghua Xiao
Pages: 653-662
doi>10.1145/2806416.2806533
Full text: PDFPDF

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications ...
expand
SESSION: Keynote Address II
Session details: Keynote Address II
James Bailey
doi>10.1145/3252296
Making Sense of Spatial Trajectories
Xiaofang Zhou, Kai Zheng, Hoyoung Jueng, Jiajie Xu, Shazia Sadiq
Pages: 671-672
doi>10.1145/2806416.2806418
Full text: PDFPDF

Spatial trajectory data is widely available today. Over a sustained period of time, trajectory data has been collected from numerous GPS devices, smartphones, sensors and social media applications. Daily increases of real-time trajectory data have also ...
expand
SESSION: Session 4A: Location-Based Services
Session details: Session 4A: Location-Based Services
Timos Sellis
doi>10.1145/3252297
ReverseCloak: Protecting Multi-level Location Privacy over Road Networks
Chao Li, Balaji Palanisamy
Pages: 673-682
doi>10.1145/2806416.2806437
Full text: PDFPDF

With advances in sensing and positioning technology, fueled by the ubiquitous deployment of wireless networks, location-aware computing has become a fundamental model for offering a wide range of life enhancing services. However, the ability to locate ...
expand
GLUE: a Parameter-Tuning-Free Map Updating System
Hao Wu, Chuanchuan Tu, Weiwei Sun, Baihua Zheng, Hao Su, Wei Wang
Pages: 683-692
doi>10.1145/2806416.2806425
Full text: PDFPDF

Map data are widely used in mobile services, but most maps might not be complete. Updating the map automatically is an important problem because road networks are frequently changed with the development of the city. This paper studies the problem of ...
expand
A Cost-based Method for Location-Aware Publish/Subscribe Services
Minghe Yu, Guoliang Li, Jianhua Feng
Pages: 693-702
doi>10.1145/2806416.2806427
Full text: PDFPDF

Location-based services have attracted significant attentions from both industry and academia, thanks to modern smartphones and mobile Internet. To provide users with gratifications, location-aware publish/subscribe has been recently proposed, which ...
expand
Probabilistic Forecasts of Bike-Sharing Systems for Journey Planning
Nicolas Gast, Guillaume Massonnet, Daniel Reijsbergen, Mirco Tribastone
Pages: 703-712
doi>10.1145/2806416.2806569
Full text: PDFPDF

We study the problem of making forecasts about the future availability of bicycles in stations of a bike-sharing system (BSS). This is relevant in order to make recommendations guaranteeing that the probability that a user will be able to make a journey ...
expand
SESSION: Session 4B: Query Explanation
Session details: Session 4B: Query Explanation
Sebastian Link
doi>10.1145/3252298
Efficient Computation of Polynomial Explanations of Why-Not Questions
Nicole Bidoit, Melanie Herschel, Aikaterini Tzompanaki
Pages: 713-722
doi>10.1145/2806416.2806426
Full text: PDFPDF

Answering a Why-Not question consists in explaining why a query result does not contain some expected data, called missing answers. This paper focuses on processing Why-Not questions in a query-based approach that identifies the culprit query components. ...
expand
Interruption-Sensitive Empty Result Feedback: Rethinking the Visual Query Feedback Paradigm for Semistructured Data
Sourav S Bhowmick, Curtis Dyreson, Byron Choi, Min-Hwee Ang
Pages: 723-732
doi>10.1145/2806416.2806432
Full text: PDFPDF

The usability of visual querying schemes for tree and graph-structured data can be greatly enhanced by providing feedback during query construction, but feedback at inopportune times can hamper query construction. In this paper, we rethink the traditional ...
expand
Implementing Query Completeness Reasoning
Werner Nutt, Sergey Paramonov, Ognjen Savkovic
Pages: 733-742
doi>10.1145/2806416.2806439
Full text: PDFPDF

Data completeness is commonly regarded as one of the key aspects of data quality. With this paper we make two main contributions: (i) we develop techniques to reason about the completeness of a query answer over a partially complete database, taking ...
expand
Towards Scalable and Complete Query Explanation with OWL 2 EL Ontologies
Zhe Wang, Mahsa Chitsaz, Kewen Wang, Jianfeng Du
Pages: 743-752
doi>10.1145/2806416.2806547
Full text: PDFPDF

Ontology-mediated data access and management systems are rapidly emerging. Besides standard query answering, there is also a need for such systems to be coupled with explanation facilities, in particular to explain missing query answers (i.e. desired ...
expand
SESSION: Session 4C: Crowds
Session details: Session 4C: Crowds
Falk Scholer
doi>10.1145/3252299
Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons
Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, Gergely V. Zaruba
Pages: 753-762
doi>10.1145/2806416.2806451
Full text: PDFPDF

This is the first study of crowdsourcing Pareto-optimal object finding over partial orders and by pairwise comparisons, which has applications in public opinion collection, group decision making, and information exploration. Departing from prior studies ...
expand
Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics
Alexey Drutsa, Anna Ufliand, Gleb Gusev
Pages: 763-772
doi>10.1145/2806416.2806496
Full text: PDFPDF

Online controlled experiments, e.g., A/B testing, is the state-of-the-art approach used by modern Internet companies to improve their services based on data-driven decisions. The most challenging problem is to define an appropriate online metric of user ...
expand
Generalized Team Draft Interleaving
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis
Pages: 773-782
doi>10.1145/2806416.2806477
Full text: PDFPDF

Interleaving is an online evaluation method that compares two ranking functions by mixing their results and interpreting the users' click feedback. An important property of an interleaving method is its sensitivity, i.e. the ability to obtain reliable ...
expand
Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes
Martin Davtyan, Carsten Eickhoff, Thomas Hofmann
Pages: 783-790
doi>10.1145/2806416.2806460
Full text: PDFPDF

The use of crowdsourcing for document relevance assessment has been found to be a viable alternative to corpus annotation by highly trained experts. The question of quality control is a recurring challenge that is often addressed by aggregating multiple ...
expand
SESSION: Session 4D: Optimization
Session details: Session 4D: Optimization
Xuan Vinh Nguyen
doi>10.1145/3252300
L2Knng: Fast Exact K-Nearest Neighbor Graph Construction with L2-Norm Pruning
David C. Anastasiu, George Karypis
Pages: 791-800
doi>10.1145/2806416.2806534
Full text: PDFPDF

The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the ...
expand
Lingo: Linearized Grassmannian Optimization for Nuclear Norm Minimization
Qian Li, Wenjia Niu, Gang Li, Yanan Cao, Jianlong Tan, Li Guo
Pages: 801-809
doi>10.1145/2806416.2806532
Full text: PDFPDF

As a popular heuristic to the matrix rank minimization problem, nuclear norm minimization attracts intensive research attentions. Matrix factorization based algorithms can reduce the expensive computation cost of SVD for nuclear norm ...
expand
Deep Collaborative Filtering via Marginalized Denoising Auto-encoder
Sheng Li, Jaya Kawale, Yun Fu
Pages: 811-820
doi>10.1145/2806416.2806527
Full text: PDFPDF

Collaborative filtering (CF) has been widely employed within recommender systems to solve many real-world problems. Learning effective latent factors plays the most important role in collaborative filtering. Traditional CF methods based upon matrix factorization ...
expand
Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation
Tong Zhao, Julian McAuley, Irwin King
Pages: 821-830
doi>10.1145/2806416.2806511
Full text: PDFPDF

Latent Factor models, which transform both users and items into the same latent feature space, are one of the most successful and ubiquitous models in recommender systems. Most existing models in this paradigm define both users' and items' latent factors ...
expand
SESSION: Session 4E: Social Networks 2
Session details: Session 4E: Social Networks 2
Hongzhi Yin
doi>10.1145/3252301
Node Immunization over Infectious Period
Chonggang Song, Wynne Hsu, Mong Li Lee
Pages: 831-840
doi>10.1145/2806416.2806522
Full text: PDFPDF

Locating nodes to immunize in computer/social networks to control the spread of virus or rumors has become an important problem. In real world contagions, nodes may get infected by external sources when the propagation is underway. While most studies ...
expand
Enterprise Social Link Recommendation
Jiawei Zhang, Yuanhua Lv, Philip Yu
Pages: 841-850
doi>10.1145/2806416.2806549
Full text: PDFPDF

Many companies have started to use Enterprise Social Networks (ESNs), such as Yammer, to facilitate collaboration and communication amongst their employees in the business context. Social link recommendation, which finds and suggests whom one wants to ...
expand
Exploiting Game Theoretic Analysis for Link Recommendation in Social Networks
Tong Zhao, H. Vicky Zhao, Irwin King
Pages: 851-860
doi>10.1145/2806416.2806510
Full text: PDFPDF

The popularity of Online Social Networks (OSNs) has attracted great research interests in different fields. In Economics, researchers use game theory to analyze the mechanism of network formation, which is called Network Formation Game. While in Computer ...
expand
Extracting Interest Tags for Non-famous Users in Social Network
Wei He, Hongyan Liu, Jun He, Shu Tang, Xiaoyong Du
Pages: 861-870
doi>10.1145/2806416.2806514
Full text: PDFPDF

Inferring interests of users in social network is important for many applications such as personalized search, recommender systems and online advertising. Most previous studies inferred users' interests based on text posted in social network, which is ...
expand
SESSION: Session 4F: Matrix Factorization
Session details: Session 4F: Matrix Factorization
Jeffrey Chan
doi>10.1145/3252302
Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF
Hongchang Gao, Feiping Nie, Weidong Cai, Heng Huang
Pages: 871-880
doi>10.1145/2806416.2806568
Full text: PDFPDF

As an important matrix factorization model, Nonnegative Matrix Factorization (NMF) has been widely used in information retrieval and data mining research. Standard Nonnegative Matrix Factorization is known to use the Frobenius norm to calculate the residual, ...
expand
MF-Tree: Matrix Factorization Tree for Large Multi-Class Learning
Lei Liu, Pang-Ning Tan, Xi Liu
Pages: 881-890
doi>10.1145/2806416.2806540
Full text: PDFPDF

Many big data applications require accurate classification of objects into one of possibly thousands or millions of categories. Such classification tasks are challenging due to issues such as class imbalance, high testing cost, and model interpretability ...
expand
GraRep: Learning Graph Representations with Global Structural Information
Shaosheng Cao, Wei Lu, Qiongkai Xu
Pages: 891-900
doi>10.1145/2806416.2806512
Full text: PDFPDF

In this paper, we present {GraRep}, a novel model for learning vertex representations of weighted graphs. This model learns low dimensional vectors to represent vertices appearing in a graph and, unlike existing work, integrates global structural information ...
expand
Context-Adaptive Matrix Factorization for Multi-Context Recommendation
Tong Man, Huawei Shen, Junming Huang, Xueqi Cheng
Pages: 901-910
doi>10.1145/2806416.2806503
Full text: PDFPDF

Data sparsity is a long-standing challenge for recommender systems based on collaborative filtering. A promising solution for this problem is multi-context recommendation, i.e., leveraging users' explicit or implicit feedback from multiple contexts. ...
expand
SESSION: Session 5A: Trips and Trajectories
Session details: Session 5A: Trips and Trajectories
Iadh Ounis
doi>10.1145/3252303
Personalized Trip Recommendation with POI Availability and Uncertain Traveling Time
Chenyi Zhang, Hongwei Liang, Ke Wang, Jianling Sun
Pages: 911-920
doi>10.1145/2806416.2806558
Full text: PDFPDF

As location-based social network (LBSN) services become increasingly popular, trip recommendation that recommends a sequence of points of interest (POIs) to visit for a user emerges as one of many important applications of LBSNs. Personalized ...
expand
Range Search on Uncertain Trajectories
Liming Zhan, Ying Zhang, Wenjie Zhang, Xiaoyang Wang, Xuemin Lin
Pages: 921-930
doi>10.1145/2806416.2806430
Full text: PDFPDF

The range search on trajectories is fundamental in a wide spectrum of applications such as environment monitoring and location based services. In practice, a large portion of spatio-temporal data in the above applications is generated with low sampling ...
expand
Efficient Computation of Trips with Friends and Families
Tanzima Hashem, Sukarna Barua, Mohammed Eunus Ali, Lars Kulik, Egemen Tanin
Pages: 931-940
doi>10.1145/2806416.2806433
Full text: PDFPDF

A group of friends located at their working places may want to plan a trip to visit a shopping center, have dinner at a restaurant, watch a movie at a theater, and then finally return to their homes with the minimum total trip distance. For a group of ...
expand
Sampling Big Trajectory Data
Yanhua Li, Chi-Yin Chow, Ke Deng, Mingxuan Yuan, Jia Zeng, Jia-Dong Zhang, Qiang Yang, Zhi-Li Zhang
Pages: 941-950
doi>10.1145/2806416.2806422
Full text: PDFPDF

The increasing prevalence of sensors and mobile devices has led to an explosive increase of the scale of spatio-temporal data in the form of trajectories. A trajectory aggregate query, as a fundamental functionality for measuring trajectory data, aims ...
expand
SESSION: Session 5B: Retrieval Enhancements 1
Session details: Session 5B: Retrieval Enhancements 1
J. Shane Culpepper
doi>10.1145/3252304
EsdRank: Connecting Query and Documents through External Semi-Structured Data
Chenyan Xiong, Jamie Callan
Pages: 951-960
doi>10.1145/2806416.2806456
Full text: PDFPDF

This paper presents EsdRank, a new technique for improving ranking using external semi-structured data such as controlled vocabularies and knowledge bases. EsdRank treats vocabularies, terms and entities from external data, as objects connecting query ...
expand
A Probabilistic Framework for Temporal User Modeling on Microblogs
Jitao Sang, Dongyuan Lu, Changsheng Xu
Pages: 961-970
doi>10.1145/2806416.2806470
Full text: PDFPDF

In social media, users have contributed enormous behavior data online which can be leveraged for user modeling and conduct personalized services. Temporal user modeling, which incorporates the timestamp of these behavior data and understands users' interest ...
expand
Deriving Intensional Descriptions for Web Services
Maria Koutraki, Dan Vodislav, Nicoleta Preda
Pages: 971-980
doi>10.1145/2806416.2806447
Full text: PDFPDF

Many data providers make their data available through Web service APIs. In order to unleash the potential of these sources for intelligent applications, the data has to be combined across different APIs. However, due to the heterogeneity of schemas, ...
expand
An Optimization Framework for Propagation of Query-Document Features by Query Similarity Functions
Maxim Zhukovskiy, Tsimafei Khatkevich, Gleb Gusev, Pavel Serdyukov
Pages: 981-990
doi>10.1145/2806416.2806487
Full text: PDFPDF

It is well known that a great number of query--document features which significantly improve the quality of ranking for popular queries, however, do not provide any benefit for new or rare queries since there is typically not enough data associated with ...
expand
SESSION: Session 5C: Privacy
Session details: Session 5C: Privacy
James Thom
doi>10.1145/3252305
Rank Consistency based Multi-View Learning: A Privacy-Preserving Approach
Han-Jia Ye, De-Chuan Zhan, Yuan Miao, Yuan Jiang, Zhi-Hua Zhou
Pages: 991-1000
doi>10.1145/2806416.2806552
Full text: PDFPDF

Complex media objects are often described by multi-view feature groups collected from diverse domains or information channels. Multi-view learning, which attempts to exploit the relationship among multiple views to improve learning performance, has drawn ...
expand
Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach
Haoran Li, Li Xiong, Xiaoqian Jiang, Jinfei Liu
Pages: 1001-1010
doi>10.1145/2806416.2806441
Full text: PDFPDF

Differential privacy has recently become a de facto standard for private statistical data release. Many algorithms have been proposed to generate differentially private histograms or synthetic data. However, most of them focus on "one-time" release of ...
expand
WaveCluster with Differential Privacy
Ling Chen, Ting Yu, Rada Chirkova
Pages: 1011-1020
doi>10.1145/2806416.2806546
Full text: PDFPDF

WaveCluster is an important family of grid-based clustering algorithms that are capable of finding clusters of arbitrary shapes. In this paper, we investigate techniques to perform WaveCluster while ensuring differential privacy.Our goal is to develop ...
expand
Process-Driven Data Privacy
Weiyi Xia, Murat Kantarcioglu, Zhiyu Wan, Raymond Heatherly, Yevgeniy Vorobeychik, Bradley Malin
Pages: 1021-1030
doi>10.1145/2806416.2806580
Full text: PDFPDF

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt ...
expand
SESSION: Session 5D: Data Streams
Session details: Session 5D: Data Streams
Anthony Wirth
doi>10.1145/3252306
Unsupervised Feature Selection on Data Streams
Hao Huang, Shinjae Yoo, Shiva Prasad Kasiviswanathan
Pages: 1031-1040
doi>10.1145/2806416.2806521
Full text: PDFPDF

Massive data streams are continuously being generated from sources such as social media, broadcast news, etc., and typically these datapoints lie in high-dimensional spaces (such as the vocabulary space of a language). Timely and accurate feature subset ...
expand
Unsupervised Streaming Feature Selection in Social Media
Jundong Li, Xia Hu, Jiliang Tang, Huan Liu
Pages: 1041-1050
doi>10.1145/2806416.2806501
Full text: PDFPDF

The explosive growth of social media sites brings about massive amounts of high-dimensional data. Feature selection is effective in preparing high-dimensional data for data analytics. The characteristics of social media present novel challenges for feature ...
expand
Weighted Similarity Estimation in Data Streams
Konstantin Kutzkov, Mohamed Ahmed, Sofia Nikitaki
Pages: 1051-1060
doi>10.1145/2806416.2806515
Full text: PDFPDF

Similarity computation between pairs of objects is often a bottleneck in many applications that have to deal with massive volumes of data. Motivated by applications such as collaborative filtering in large-scale recommender systems, and influence probabilities ...
expand
Private Analysis of Infinite Data Streams via Retroactive Grouping
Rui Chen, Yilin Shen, Hongxia Jin
Pages: 1061-1070
doi>10.1145/2806416.2806454
Full text: PDFPDF

With the rapid advances in hardware technology, data streams are being generated daily in large volumes, enabling a wide range of real-time analytical tasks. Yet data streams from many sources are inherently sensitive, and thus providing continuous privacy ...
expand
SESSION: Session 5E: Classification 2
Session details: Session 5E: Classification 2
Ping Luo
doi>10.1145/3252307
Parallel Lazy Semi-Naive Bayes Strategies for Effective and Efficient Document Classification
Felipe Viegas, Marcos André Gonçalves, Wellington Martins, Leonardo Rocha
Pages: 1071-1080
doi>10.1145/2806416.2806565
Full text: PDFPDF

Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering and content organization. Naive Bayes (NB) approaches are a widely used classification paradigm, due to their simplicity, efficiency, absence of ...
expand
A Novel Class Noise Estimation Method and Application in Classification
Lin Gui, Qin Lu, Ruifeng Xu, Minglei Li, Qikang Wei
Pages: 1081-1090
doi>10.1145/2806416.2806554
Full text: PDFPDF

Noise in class labels of any training set can lead to poor classification results no matter what machine learning method is used. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, ...
expand
Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
Meenakshi Mishra, Jun Huan
Pages: 1091-1100
doi>10.1145/2806416.2806570
Full text: PDFPDF

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image annotation, ...
expand
KSGM: Keynode-driven Scalable Graph Matching
Xilun Chen, K. Selçuk Candan, Maria Luisa Sapino, Paulo Shakarian
Pages: 1101-1110
doi>10.1145/2806416.2806577
Full text: PDFPDF

Understanding how a given pair of graphs align with each other (also known as the graph matching problem) is a critical task in many search, classification, and analysis applications. Unfortunately, the problem of maximum common subgraph isomorphism ...
expand
SESSION: Session 5F: Sentiment and Content Analysis
Session details: Session 5F: Sentiment and Content Analysis
Ke Deng
doi>10.1145/3252308
Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework
Bing Hu, Bin Liu, Neil Zhenqiang Gong, Deguang Kong, Hongxia Jin
Pages: 1111-1120
doi>10.1145/2806416.2806579
Full text: PDFPDF

Mobile applications (Apps) could expose children or adolescents to mature themes such as sexual content, violence and drug use, which results in an inappropriate security and privacy risk for them. Therefore, mobile platforms provide rating policies ...
expand
The Role of Query Sessions in Interpreting Compound Noun Phrases
Marius Pasca
Pages: 1121-1129
doi>10.1145/2806416.2806571
Full text: PDFPDF

The meaning of compound noun phrases can be approximated in the form of lexical interpretations extracted from text. The interpretations hint at the role that modifiers play relative to heads within the noun phrases. In a study examining the role of ...
expand
Deep Semantic Frame-Based Deceptive Opinion Spam Analysis
Seongsoon Kim, Hyeokyoon Chang, Seongwoon Lee, Minhwan Yu, Jaewoo Kang
Pages: 1131-1140
doi>10.1145/2806416.2806551
Full text: PDFPDF

User-generated content is becoming increasingly valuable to both individuals and businesses due to its usefulness and influence in e-commerce markets. As consumers rely more on such information, posting deceptive opinions, which can be deliberately used ...
expand
Topic Modeling in Semantic Space with Keywords
Xiaojia Pu, Rong Jin, Gangshan Wu, Dingyi Han, Gui-Rong Xue
Pages: 1141-1150
doi>10.1145/2806416.2806584
Full text: PDFPDF

A common and convenient approach for user to describe his information need is to provide a set of keywords. Therefore, the technique to understand the need becomes crucial. In this paper, for the information need about a topic or category, we propose ...
expand
SESSION: Session 6A: Time Series and Streams
Session details: Session 6A: Time Series and Streams
Jenny Xuizhen Zhang
doi>10.1145/3252309
F1: Accelerating the Optimization of Aggregate Continuous Queries
Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis
Pages: 1151-1160
doi>10.1145/2806416.2806450
Full text: PDFPDF

Data Stream Management Systems performing on-line analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). The state-of-the-art WeaveShare optimizer uses the Weavability concept in order to selectively ...
expand
Fast Distributed Correlation Discovery Over Streaming Time-Series Data
Tian Guo, Saket Sathe, Karl Aberer
Pages: 1161-1170
doi>10.1145/2806416.2806440
Full text: PDFPDF

The dramatic rise of time-series data in a variety of contexts, such as social networks, mobile sensing, data centre monitoring, etc., has fuelled interest in obtaining real-time insights from such data using distributed stream processing systems. One ...
expand
Time Series Analysis of Nursing Notes for Mortality Prediction via a State Transition Topic Model
Yohan Jo, Natasha Loghmanpour, Carolyn Penstein Rosé
Pages: 1171-1180
doi>10.1145/2806416.2806541
Full text: PDFPDF

Accurate mortality prediction is an important task in intensive care units in order to channel prompt care to patients in the most critical condition and to reduce nurses' alarm fatigue. Nursing notes carry valuable information in this regard, but nothing ...
expand
SESSION: Session 6B: Adaptive Learning
Session details: Session 6B: Adaptive Learning
Damiano Spina
doi>10.1145/3252310
Learning Relative Similarity from Data Streams: Active Online Learning Approaches
Shuji Hao, Peilin Zhao, Steven C.H. Hoi, Chunyan Miao
Pages: 1181-1190
doi>10.1145/2806416.2806464
Full text: PDFPDF

Relative similarity learning, as an important learning scheme for information retrieval, aims to learn a bi-linear similarity function from a collection of labeled instance-pairs, and the learned function would assign a high similarity value for a similar ...
expand
Ad Hoc Monitoring of Vocabulary Shifts over Time
Tom Kenter, Melvin Wevers, Pim Huijnen, Maarten de Rijke
Pages: 1191-1200
doi>10.1145/2806416.2806474
Full text: PDFPDF

Word meanings change over time. Detecting shifts in meaning for particular words has been the focus of much research recently. We address the complementary problem of monitoring shifts in vocabulary over time. That is, given a small seed set of words, ...
expand
Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events
Tuan A. Tran, Claudia Niederee, Nattiya Kanhabua, Ujwal Gadiraju, Avishek Anand
Pages: 1201-1210
doi>10.1145/2806416.2806486
Full text: PDFPDF

Long-running, high-impact events such as the Boston Marathon bombing often develop through many stages and involve a large number of entities in their unfolding. Timeline summarization of an event by key sentences eases story digestion, but does not ...
expand
SESSION: Session 6C: Points-of-Interest
Session details: Session 6C: Points-of-Interest
Egemen Tanin
doi>10.1145/3252311
Location-Based Influence Maximization in Social Networks
Tao Zhou, Jiuxin Cao, Bo Liu, Shuai Xu, Ziqing Zhu, Junzhou Luo
Pages: 1211-1220
doi>10.1145/2806416.2806462
Full text: PDFPDF

In this paper, we aim at the product promotion in O2O model and carry out the research of location-based influence maximization on the platform of LBSN. As offline consuming behavior exists under the O2O environment, the traditional online influence ...
expand
Location and Time Aware Social Collaborative Retrieval for New Successive Point-of-Interest Recommendation
Wei Zhang, Jianyong Wang
Pages: 1221-1230
doi>10.1145/2806416.2806564
Full text: PDFPDF

In location-based social networks (LBSNs), new successive point-of-interest (POI) recommendation is a newly formulated task which tries to regard the POI a user currently visits as his POI-related query and recommend new POIs the user has not visited ...
expand
Where you Instagram?: Associating Your Instagram Photos with Points of Interest
Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, Shonali Krishnaswamy
Pages: 1231-1240
doi>10.1145/2806416.2806463
Full text: PDFPDF

Instagram, an online photo-sharing platform, has gained increasing popularity. It allows users to take photos, apply digital filters and share them with friends instantaneously by using mobile devices.Instagram provides users with the functionality to ...
expand
SESSION: Session 6D: Matrices
Session details: Session 6D: Matrices
Weidong Cai
doi>10.1145/3252312
Gradient-based Signatures for Efficient Similarity Search in Large-scale Multimedia Databases
Christian Beecks, Merih Seran Uysal, Judith Hermanns, Thomas Seidl
Pages: 1241-1250
doi>10.1145/2806416.2806459
Full text: PDFPDF

With the continuous rise of multimedia, the question of how to access large-scale multimedia databases efficiently has become of crucial importance. Given a multimedia database comprising millions of multimedia objects, how to approximate the content-based ...
expand
Cross-Modal Similarity Learning: A Low Rank Bilinear Formulation
Cuicui Kang, Shengcai Liao, Yonghao He, Jian Wang, Wenjia Niu, Shiming Xiang, Chunhong Pan
Pages: 1251-1260
doi>10.1145/2806416.2806469
Full text: PDFPDF

The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. ...
expand
Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis
Yong-Yeon Jo, Sang-Wook Kim, Duck-Ho Bae
Pages: 1261-1270
doi>10.1145/2806416.2806445
Full text: PDFPDF

As a number of social network services appear online recently, there have been many attempts to analyze social networks for extracting valuable information. Most existing methods first represent a social network as a quite sparse adjacency matrix, ...
expand
SESSION: Session 6E: Citation Networks
Session details: Session 6E: Citation Networks
Zhifeng Bao
doi>10.1145/3252313
The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset
Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal
Pages: 1271-1280
doi>10.1145/2806416.2806566
Full text: PDFPDF

The impact and significance of a scientific publication is measured mostly by the number of citations it accumulates over the years. Early prediction of the citation profile of research articles is a significant as well as challenging problem. In this ...
expand
Discovering Canonical Correlations between Topical and Topological Information in Document Networks
Yuan He, Cheng Wang, Changjun Jiang
Pages: 1281-1290
doi>10.1145/2806416.2806518
Full text: PDFPDF

Document network is a kind of intriguing dataset which can provide both topical (textual content) and topological (relational link) information. A key point in viably modeling such datasets is to discover proper denominators beneath the two different ...
expand
Chronological Citation Recommendation with Information-Need Shifting
Zhuoren Jiang, Xiaozhong Liu, Liangcai Gao
Pages: 1291-1300
doi>10.1145/2806416.2806567
Full text: PDFPDF

As the volume of publications has increased dramatically, an urgent need has developed to assist researchers in locating high-quality, candidate-cited papers from a research repository. Traditional scholarly-recommendation approaches ignore the chronological ...
expand
SESSION: Session 6F: Knowledge Bases
Session details: Session 6F: Knowledge Bases
Vanessa Murdock
doi>10.1145/3252314
Answering Questions with Complex Semantic Constraints on Open Knowledge Bases
Pengcheng Yin, Nan Duan, Ben Kao, Junwei Bao, Ming Zhou
Pages: 1301-1310
doi>10.1145/2806416.2806542
Full text: PDFPDF

A knowledge-based question-answering system (KB-QA) is one that answers natural language questions with information stored in a large-scale knowledge base (KB). Existing KB-QA systems are either powered by curated KBs in which factual knowledge ...
expand
Inducing Space Dirichlet Process Mixture Large-Margin Entity RelationshipInference in Knowledge Bases
Sotirios P. Chatzis
Pages: 1311-1320
doi>10.1145/2806416.2806499
Full text: PDFPDF

In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it. This is an essential problem of knowledge representation systems, since knowledge bases typically ...
expand
Semi-Automated Exploration of Data Warehouses
Thibault Sellam, Emmanuel Müller, Martin Kersten
Pages: 1321-1330
doi>10.1145/2806416.2806538
Full text: PDFPDF

Exploratory data analysis tries to discover novel dependencies and unexpected patterns in large databases. Traditionally, this process is manual and hypothesis-driven. However, analysts can come short of patience and imagination. In this paper, we introduce ...
expand
Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances
Zhuoyu Wei, Jun Zhao, Kang Liu, Zhenyu Qi, Zhengya Sun, Guanhua Tian
Pages: 1331-1340
doi>10.1145/2806416.2806513
Full text: PDFPDF

Constructing large-scale knowledge bases has attracted much attention in recent years, for which Knowledge Base Completion (KBC) is a key technique. In general, inferring new facts in a large-scale knowledge base is not a trivial task. The large number ...
expand
SESSION: Keynote Address III
Session details: Keynote Address III
Shonali Krishnaswamy
doi>10.1145/3252315
Large-Scale Analysis of Dynamics of Choice Among Discrete Alternatives
Andrew Tomkins
Pages: 1349-1349
doi>10.1145/2806416.2806419
Full text: PDFPDF

The online world is rife with scenarios in which a user must select one from a finite set of alternatives: which movie to watch, which song to play, which camera to order, which website to visit. There is a long history of study of these types of questions ...
expand
SESSION: Session 7A: Database Optimization
Session details: Session 7A: Database Optimization
Sven Helmer
doi>10.1145/3252316
On Gapped Set Intersection Size Estimation
Chen Chen, Jianbin Qin, Wei Wang
Pages: 1351-1360
doi>10.1145/2806416.2806438
Full text: PDFPDF

There exists considerable literature on estimating the cardinality of set intersection result. In this paper, we consider a generalized problem for integer sets where, given a gap parameter δ, two elements are deemed as matches if their numeric ...
expand
Inclusion Dependencies Reloaded
Henning Köhler, Sebastian Link
Pages: 1361-1370
doi>10.1145/2806416.2806539
Full text: PDFPDF

Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data cleaning and profiling, entity resolution and schema matching. Surprisingly, ...
expand
Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents
Ioana Giurgiu, Mirela Botezatu, Dorothea Wiesmann
Pages: 1371-1380
doi>10.1145/2806416.2806448
Full text: PDFPDF

Configuring enterprise database management systems is a notoriously hard problem. The combinatorial parameter space makes it intractable to run and observe the DBMS behavior in all scenarios. Thus, the database administrator has the difficult task of ...
expand
An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model
Krzysztof Marcin Choromanski, Afshin Rostamizadeh, Umar Syed
Pages: 1381-1390
doi>10.1145/2806416.2806421
Full text: PDFPDF

We give the first Õ(1 over √ T)-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm is optimal up to the poly(log(T)) factor ...
expand
SESSION: Session 7B: Retrieval Enhancements 2
Session details: Session 7B: Retrieval Enhancements 2
Mark Sanderson
doi>10.1145/3252317
Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model
Pavel Metrikov, Virgil Pavlu, Javed A. Aslam
Pages: 1391-1400
doi>10.1145/2806416.2806492
Full text: PDFPDF

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the ...
expand
Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization
Peng Li, Weidong Cai, Heng Huang
Pages: 1401-1410
doi>10.1145/2806416.2806494
Full text: PDFPDF

In this paper, we propose a new weakly supervised abstractive news summarization framework using pattern based approaches. Our system first generates meaningful patterns from sentences. Then, in order to precisely cluster patterns, we propose a novel ...
expand
Short Text Similarity with Word Embeddings
Tom Kenter, Maarten de Rijke
Pages: 1411-1420
doi>10.1145/2806416.2806475
Full text: PDFPDF

Determining semantic similarity between texts is important in many tasks in information retrieval such as search, query suggestion, automatic summarization and image finding. Many approaches have been suggested, based on lexical matching, handcrafted ...
expand
Building Representative Composite Items
VIncent Leroy, Sihem Amer-Yahia, Eric Gaussier, Hamid Mirisaee
Pages: 1421-1430
doi>10.1145/2806416.2806465
Full text: PDFPDF

The problem of summarizing a large collection of homogeneous items has been addressed extensively in particular in the case of geo-tagged datasets (e.g. Flickr photos and tags). In our work, we study the problem of summarizing large collections ...
expand
SESSION: Session 7C: Search Mechanisms
Session details: Session 7C: Search Mechanisms
Justin Zobel
doi>10.1145/3252318
More Accurate Question Answering on Freebase
Hannah Bast, Elmar Haussmann
Pages: 1431-1440
doi>10.1145/2806416.2806472
Full text: PDFPDF

Real-world factoid or list questions often have a simple structure, yet are hard to match to facts in a given knowledge base due to high representational and linguistic variability. For example, to answer "who is the ceo of apple" on Freebase requires ...
expand
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
Jyun-Yu Jiang, Jing Liu, Chin-Yew Lin, Pu-Jen Cheng
Pages: 1441-1450
doi>10.1145/2806416.2806479
Full text: PDFPDF

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
expand
Assessing the Impact of Syntactic and Semantic Structures for Answer Passages Reranking
Kateryna Tymoshenko, Alessandro Moschitti
Pages: 1451-1460
doi>10.1145/2806416.2806490
Full text: PDFPDF

In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task. We propose several dependency-based structures enriched with Linked Open Data (LD) ...
expand
Ranking Entities for Web Queries Through Text and Knowledge
Michael Schuhmacher, Laura Dietz, Simone Paolo Ponzetto
Pages: 1461-1470
doi>10.1145/2806416.2806480
Full text: PDFPDF

When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style ...
expand
SESSION: Session 7D: Social Networks 3
Session details: Session 7D: Social Networks 3
Carsten Eickhoff
doi>10.1145/3252319
What Is a Network Community?: A Novel Quality Function and Detection Algorithms
Atsushi Miyauchi, Yasushi Kawase
Pages: 1471-1480
doi>10.1145/2806416.2806555
Full text: PDFPDF

In this study, we introduce a novel quality function for a network community, which we refer to as the communitude. The communitude has a strong statistical background. Specifically, it measures the Z-score of a subset of vertices S with respect ...
expand
DifRec: A Social-Diffusion-Aware Recommender System
Hossein Vahabi, Iordanis Koutsopoulos, Francesco Gullo, Maria Halkidi
Pages: 1481-1490
doi>10.1145/2806416.2806559
Full text: PDFPDF

Recommender systems used in current online social platforms make recommendations by only considering how relevant an item is to a specific user but they ignore the fact that, thanks to mechanisms like sharing or re-posting across the underlying social ...
expand
Who With Whom And How?: Extracting Large Social Networks Using Search Engines
Stefan Siersdorfer, Philipp Kemkes, Hanno Ackermann, Sergej Zerr
Pages: 1491-1500
doi>10.1145/2806416.2806582
Full text: PDFPDF

Social network analysis is leveraged in a variety of applications such as identifying influential entities, detecting communities with special interests, and determining the flow of information and innovations. However, existing approaches for extracting ...
expand
Modeling Individual-Level Infection Dynamics Using Social Network Information
Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram
Pages: 1501-1510
doi>10.1145/2806416.2806575
Full text: PDFPDF

Epidemic monitoring systems engaged in accurate discovery of infected individuals enable better understanding of the dynamics of epidemics and thus may promote effective disease mitigation or prevention. Currently, infection discovery systems require ...
expand
SESSION: Session 8A: Query Evaluation
Session details: Session 8A: Query Evaluation
Yiqun Liu
doi>10.1145/3252320
Finding Probabilistic k-Skyline Sets on Uncertain Data
Jinfei Liu, Haoyu Zhang, Li Xiong, Haoran Li, Jun Luo
Pages: 1511-1520
doi>10.1145/2806416.2806452
Full text: PDFPDF

Skyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is ...
expand
Ordering Selection Operators Under Partial Ignorance
Khaled H. Alyoubi, Sven Helmer, Peter T. Wood
Pages: 1521-1530
doi>10.1145/2806416.2806446
Full text: PDFPDF

Optimising queries in real-world situations under imperfect conditions is still a problem that has not been fully solved. We consider finding the optimal order in which to execute a given set of selection operators under partial ignorance of their selectivities. ...
expand
Querying Temporal Drifts at Multiple Granularities
Sofia Kleisarchaki, Sihem Amer-Yahia, Ahlame Douzal-Chouakria, Vassilis Christophides
Pages: 1531-1540
doi>10.1145/2806416.2806436
Full text: PDFPDF

There exists a large body of work on online drift detection with the goal of dynamically finding and maintaining changes in data streams. In this paper, we adopt a query-based approach to drift detection. Our approach relies on a drift index, ...
expand
Efficient Incremental Evaluation of Succinct Regular Expressions
Henrik Björklund, Wim Martens, Thomas Timm
Pages: 1541-1550
doi>10.1145/2806416.2806434
Full text: PDFPDF

Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming ...
expand
SESSION: Session 8B: Web Search
Session details: Session 8B: Web Search
David Hawking
doi>10.1145/3252321
Struggling and Success in Web Search
Daan Odijk, Ryen W. White, Ahmed Hassan Awadallah, Susan T. Dumais
Pages: 1551-1560
doi>10.1145/2806416.2806488
Full text: PDFPDF

Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle ...
expand
Behavioral Dynamics from the SERP's Perspective: What are Failed SERPs and How to Fix Them?
Julia Kiseleva, Jaap Kamps, Vadim Nikulin, Nikita Makarov
Pages: 1561-1570
doi>10.1145/2806416.2806483
Full text: PDFPDF

Web search is always in a state of flux: queries, their intent, and the most relevant content are changing over time, in predictable and unpredictable ways. Modern search technology has made great strides in keeping up to pace with these changes, but ...
expand
What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries
Michael Völske, Pavel Braslavski, Matthias Hagen, Galina Lezina, Benno Stein
Pages: 1571-1580
doi>10.1145/2806416.2806457
Full text: PDFPDF

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users' needs. Based on a dataset of about one billion question queries submitted during ...
expand
Does Vertical Bring more Satisfaction?: Predicting Search Satisfaction in a Heterogeneous Environment
Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, Shaoping Ma
Pages: 1581-1590
doi>10.1145/2806416.2806473
Full text: PDFPDF

The study of search satisfaction is one of the prime concerns in search performance evaluation research. Most existing works on search satisfaction primarily rely on the hypothesis that all results on search engine result pages (SERPs) are homogeneous. ...
expand
SESSION: Session 8C: Social Media 2
Session details: Session 8C: Social Media 2
Karin Verspoor
doi>10.1145/3252322
Characterizing and Predicting Viral-and-Popular Video Content
David Vallet, Shlomo Berkovsky, Sebastien Ardon, Anirban Mahanti, Mohamed Ali Kafaar
Pages: 1591-1600
doi>10.1145/2806416.2806556
Full text: PDFPDF

The proliferation of online video content has triggered numerous works on its evolution and popularity, as well as on the effect of social sharing on content propagation. In this paper, we focus on the observable dependencies between the virality of ...
expand
Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization
Fangzhao Wu, Jinyun Shu, Yongfeng Huang, Zhigang Yuan
Pages: 1601-1610
doi>10.1145/2806416.2806560
Full text: PDFPDF

The popularity of microblogging platforms, such as Twitter, makes them important for information dissemination and sharing. However, they are also recognized as ideal places by spammers to conduct social spamming. Massive social spammers and spam messages ...
expand
Central Topic Model for Event-oriented Topics Mining in Microblog Stream
Min Peng, Jiahui Zhu, Xuhui Li, Jiajia Huang, Hua Wang, Yanchun Zhang
Pages: 1611-1620
doi>10.1145/2806416.2806561
Full text: PDFPDF

To date, data generates and arrives in the form of stream to propagate discussions of public events in microblog services. Discovering event-oriented topics from the stream will lead to a better understanding of the change of public concern. However, ...
expand
Video Popularity Prediction by Sentiment Propagation via Implicit Network
Wanying Ding, Yue Shang, Lifan Guo, Xiaohua Hu, Rui Yan, Tingting He
Pages: 1621-1630
doi>10.1145/2806416.2806505
Full text: PDFPDF

Video popularity prediction plays a foundational role in many aspects of life, such as recommendation systems and investment consulting. Because of its technological and economic importance, this problem has been extensively studied for years. However, ...
expand
SESSION: Session 8D: Recommendation
Session details: Session 8D: Recommendation
Gangshan Wu
doi>10.1145/3252323
Joint Modeling of User Check-in Behaviors for Point-of-Interest Recommendation
Hongzhi Yin, Xiaofang Zhou, Yingxia Shao, Hao Wang, Shazia Sadiq
Pages: 1631-1640
doi>10.1145/2806416.2806500
Full text: PDFPDF

Point-of-Interest (POI) recommendation has become an important means to help people discover attractive and interesting locations, especially when users travel out of town. However, extreme sparsity of user-POI matrix creates a severe challenge. To cope ...
expand
ORec: An Opinion-Based Point-of-Interest Recommendation Framework
Jia-Dong Zhang, Chi-Yin Chow, Yu Zheng
Pages: 1641-1650
doi>10.1145/2806416.2806516
Full text: PDFPDF

As location-based social networks (LBSNs) rapidly grow, it is a timely topic to study how to recommend users with interesting locations, known as points-of-interest (POIs). Most existing POI recommendation techniques only employ the check-in data ...
expand
Toward Dual Roles of Users in Recommender Systems
Suhang Wang, Jiliang Tang, Huan Liu
Pages: 1651-1660
doi>10.1145/2806416.2806520
Full text: PDFPDF

Users usually play dual roles in real-world recommender systems. One is as a reviewer who writes reviews for items with rating scores, and the other is as a rater who rates the helpfulness scores of reviews. Traditional recommender systems mainly consider ...
expand
TriRank: Review-aware Explainable Recommendation by Modeling Aspects
Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen
Pages: 1661-1670
doi>10.1145/2806416.2806504
Full text: PDFPDF

Most existing collaborative filtering techniques have focused on modeling the binary relation of users to items by extracting from user ratings. Aside from users' ratings, their affiliated reviews often provide the rationale for their ratings and identify ...
expand
SESSION: Short Papers: Databases
RoadRank: Traffic Diffusion and Influence Estimation in Dynamic Urban Road Networks
Tarique Anwar, Chengfei Liu, Hai L. Vu, Md. Saiful Islam
Pages: 1671-1674
doi>10.1145/2806416.2806588
Full text: PDFPDF

With the rapidly growing population in urban areas, these days the urban road networks are expanding at a faster rate. The frequent movement of people on them leads to traffic congestions. These congestions originate from some crowded road segments, ...
expand
On Query-Update Independence for SPARQL
Nicola Guido, Pierre Genevès, Nabil Layaïda, Cécile Roisin
Pages: 1675-1678
doi>10.1145/2806416.2806586
Full text: PDFPDF

This paper investigates techniques for detecting independence of SPARQL queries from updates. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful ...
expand
A Structured Query Model for the Deep Relational Web
Hasan M. Jamil, Hosagrahar V. Jagadish
Pages: 1679-1682
doi>10.1145/2806416.2806589
Full text: PDFPDF

The deep web is very large and diverse and queries evaluated against the deep web can provide great value. While there have been attempts at accessing the data in the deep web, these are clever "one-of'' systems and techniques. In this paper, we describe ...
expand
A Flash-aware Buffering Scheme using On-the-fly Redo
Kyosung Jeong, Sang-Wook Kim, Sungchae Lim
Pages: 1683-1686
doi>10.1145/2806416.2806587
Full text: PDFPDF

In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped with SSD (Solid State Drive). We propose a novel buffering scheme that evicts a dirty page X without flushing it into SSD, and restores the right image of ...
expand
Defragging Subgraph Features for Graph Classification
Haishuai Wang, Peng Zhang, Ivor Tsang, Ling Chen, Chengqi Zhang
Pages: 1687-1690
doi>10.1145/2806416.2806585
Full text: PDFPDF

Graph classification is an important tool for analysing structured and semi-structured data, where subgraphs are commonly used as the feature representation. However, the number and size of subgraph features crucially depend on the threshold parameters ...
expand
Structural Constraints for Multipartite Entity Resolution with Markov Logic Network
Tengyuan Ye, Hady W. Lauw
Pages: 1691-1694
doi>10.1145/2806416.2806590
Full text: PDFPDF

Multipartite entity resolution seeks to match entity mentions across several collections. An entity mention is presumed unique within a collection, and thus could match at most one entity mention in each of the other collections. In addition to domain-specific ...
expand
SESSION: Short Papers: Information Retrieval
Know Your Onions: Understanding the User Experience with the Knowledge Module in Web Search
Ioannis Arapakis, Luis A. Leiva, B. Barla Cambazoglu
Pages: 1695-1698
doi>10.1145/2806416.2806591
Full text: PDFPDF

The increasing availability of large volumes of human-curated content is shifting web search towards a paradigm that introduces seamlessly more semantic information to search engine result pages. This trend has resulted in the design of a new element ...
expand
Personalized Federated Search at LinkedIn
Dhruv Arya, Viet Ha-Thuc, Shakti Sinha
Pages: 1699-1702
doi>10.1145/2806416.2806615
Full text: PDFPDF

LinkedIn has grown to become a platform hosting diverse sources of information ranging from member profiles, jobs, professional groups, slideshows etc. Given the existence of multiple sources, when a member issues a query like "software engineer", the ...
expand
Balancing Exploration and Exploitation: Empirical Parameterization of Exploratory Search Systems
Kumaripaba Ahukorala, Alan Medlar, Kalle Ilves, Dorota Glowacka
Pages: 1703-1706
doi>10.1145/2806416.2806609
Full text: PDFPDF

Exploratory searches are where a user has insufficient knowledge to define exact search criteria or does not otherwise know what they are looking for. Reinforcement learning techniques have demonstrated great potential for supporting exploratory search ...
expand
On Predicting Deletions of Microblog Posts
Mossaab Bagdouri, Douglas W. Oard
Pages: 1707-1710
doi>10.1145/2806416.2806600
Full text: PDFPDF

Among the many classification tasks on Twitter content, predicting whether a tweet will be deleted has to date received relatively little attention. Deletions occur for a variety of reasons, which can make the classification task challenging. Moreover, ...
expand
Semi-Automated Text Classification for Sensitivity Identification
Giacomo Berardi, Andrea Esuli, Craig Macdonald, Iadh Ounis, Fabrizio Sebastiani
Pages: 1711-1714
doi>10.1145/2806416.2806597
Full text: PDFPDF

Sensitive documents are those that cannot be made public, e.g., for personal or organizational privacy reasons. For instance, documents requested through Freedom of Information mechanisms must be manually reviewed for the presence of sensitive information ...
expand
Identification of Microblogs Prominent Users during Events by Learning Temporal Sequences of Features
Imen Bizid, Nibal Nayef, Patrice Boursier, Sami Faiz, Antoine Doucet
Pages: 1715-1718
doi>10.1145/2806416.2806612
Full text: PDFPDF

During specific real-world events, some users of microblogging platforms could provide exclusive information about those events. The identification of such prominent users depends on several factors such as the freshness and the relevance of their shared ...
expand
A Real-Time Eye Tracking Based Query Expansion Approach via Latent Topic Modeling
Yongqiang Chen, Peng Zhang, Dawei Song, Benyou Wang
Pages: 1719-1722
doi>10.1145/2806416.2806602
Full text: PDFPDF

Formulating and reformulating reliable textual queries have been recognized as a challenging task in Information Retrieval (IR), even for experienced users. Most existing query expansion methods, especially those based on implicit relevance feedback, ...
expand
Clustered Semi-Supervised Relevance Feedback
Kripabandhu Ghosh, Swapan Kumar Parui
Pages: 1723-1726
doi>10.1145/2806416.2806596
Full text: PDFPDF

In relevance feedback, first-round search results are used to boost second-round search results. Two forms have been traditionally considered: exhaustively labelled feedback, where all first-round results to depth k are annotated for relevance ...
expand
On the Effect of "Stupid" Search Components on User Interaction with Search Engines
Lidia Grauer, Aleksandra Lomakina
Pages: 1727-1730
doi>10.1145/2806416.2806601
Full text: PDFPDF

Using eye-tracking, we investigate how searchers interact with Web search engines which get affected by nonsensical results. We conduct a user survey to choose "stupid" components for our laboratory experiment and explore the most conspicuous ones. This ...
expand
Social-Relational Topic Model for Social Networks
Weiyu Guo, Shu Wu, Liang Wang, Tieniu Tan
Pages: 1731-1734
doi>10.1145/2806416.2806611
Full text: PDFPDF

Social networking services, such as Twitter and Sina Weibo, have tremendous popularity in recent years. Mass of short texts and social links are aggregated into these service platforms. To realize personalized services on social network, topic inference ...
expand
Building Effective Query Classifiers: A Case Study in Self-harm Intent Detection
Ashiqur R. KhudaBukhsh, Paul N. Bennett, Ryen W. White
Pages: 1735-1738
doi>10.1145/2806416.2806594
Full text: PDFPDF

Query-based triggers play a crucial role in modern search systems, e.g., in deciding when to display direct answers on result pages. We address a common scenario in designing such triggers for real-world settings where positives are rare and search providers ...
expand
Modelling the Usefulness of Document Collections for Query Expansion in Patient Search
Nut Limsopatham, Craig Macdonald, Iadh Ounis
Pages: 1739-1742
doi>10.1145/2806416.2806614
Full text: PDFPDF

Dealing with the medical terminology is a challenge when searching for patients based on the relevance of their medical records towards a given query. Existing work used query expansion (QE) to extract expansion terms from different document collections ...
expand
A Convolutional Click Prediction Model
Qiang Liu, Feng Yu, Shu Wu, Liang Wang
Pages: 1743-1746
doi>10.1145/2806416.2806603
Full text: PDFPDF

The explosion in online advertisement urges to better estimate the click prediction of ads. For click prediction on single ad impression, we have access to pairwise relevance among elements in an impression, but not to global interaction among key features ...
expand
A Study of Query Length Heuristics in Information Retrieval
Yuanhua Lv
Pages: 1747-1750
doi>10.1145/2806416.2806592
Full text: PDFPDF

Query length has generally been regarded as a query-specific constant that does not affect document ranking. In this paper, we reveal that query length actually interacts with term frequency (TF) normalization, a key component of all effective retrieval ...
expand
Detect Rumors Using Time Series of Social Context Information on Microblogging Websites
Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, Kam-Fai Wong
Pages: 1751-1754
doi>10.1145/2806416.2806607
Full text: PDFPDF

Automatically identifying rumors from online social media especially microblogging websites is an important research issue. Most of existing work for rumor detection focuses on modeling features related to microblog contents, users and propagation patterns, ...
expand
Query Auto-Completion for Rare Prefixes
Bhaskar Mitra, Nick Craswell
Pages: 1755-1758
doi>10.1145/2806416.2806599
Full text: PDFPDF

Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking ...
expand
Pooled Evaluation Over Query Variations: Users are as Diverse as Systems
Alistair Moffat, Falk Scholer, Paul Thomas, Peter Bailey
Pages: 1759-1762
doi>10.1145/2806416.2806606
Full text: PDFPDF

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive ...
expand
The Influence of Pre-processing on the Estimation of Readability of Web Documents
João Rafael de Moura Palotti, Guido Zuccon, Allan Hanbury
Pages: 1763-1766
doi>10.1145/2806416.2806613
Full text: PDFPDF

This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely ...
expand
Atypical Queries in eCommerce
Neeraj Pradhan, Vinay Deolalikar, Kang Li
Pages: 1767-1770
doi>10.1145/2806416.2806605
Full text: PDFPDF

Understanding how specific, ambiguous, or broad the intent of a search query is, across all users of the system, is important in improving search relevance in eCommerce. There is scant literature on such a structural characterization of queries in eCommerce. ...
expand
Bottom-up Faceted Search: Creating Search Neighbourhoods with Datacube Cells
Mark Sifer
Pages: 1771-1774
doi>10.1145/2806416.2806593
Full text: PDFPDF

Browsing a collection can start with a keyword search. A user visits a library, performs a keyword search to find a few books of interest; finding their location in the library. Then they go to these locations; the corresponding bookshelves, where they ...
expand
Personalized Recommendation Meets Your Next Favorite
Qiang Song, Jian Cheng, Ting Yuan, Hanqing Lu
Pages: 1775-1778
doi>10.1145/2806416.2806598
Full text: PDFPDF

A comprehensive understanding of user's item selection behavior is not only essential to many scientific disciplines, but also has a profound business impact on online recommendation. Recent researches have discovered that user's favorites can be divided ...
expand
Recommending Short-lived Dynamic Packages for Golf Booking Services
Robin Swezey, Young-joo Chung
Pages: 1779-1782
doi>10.1145/2806416.2806608
Full text: PDFPDF

We introduce an approach to recommending short-lived dynamic packages for golf booking services. Two challenges are addressed in this work. The first is the short life of the items, which puts the system in a state of a permanent cold start. The second ...
expand
Large-Scale Question Answering with Joint Embedding and Proof Tree Decoding
Zhenghao Wang, Shengquan Yan, Huaming Wang, Xuedong Huang
Pages: 1783-1786
doi>10.1145/2806416.2806616
Full text: PDFPDF

Question answering (QA) over a large-scale knowledge base (KB) such as Freebase is an important natural language processing application. There are linguistically oriented semantic parsing techniques and machine learning motivated statistical methods. ...
expand
Query Length, Retrievability Bias and Performance
Colin Wilkie, Leif Azzopardi
Pages: 1787-1790
doi>10.1145/2806416.2806604
Full text: PDFPDF

Past work has shown that longer queries tend to lead to better retrieval performance. However, this comes at the cost of increased user effort effort and additional system processing. In this paper, we examine whether there are benefits of longer queries ...
expand
Gauging Correct Relative Rankings For Similarity Search
Weiren Yu, Julie McCann
Pages: 1791-1794
doi>10.1145/2806416.2806610
Full text: PDFPDF

One of the important tasks in link analysis is to quantify the similarity between two objects based on hyperlink structure. SimRank is an attractive similarity measure of this type. Existing work mainly focuses on absolute SimRank scores, and often harnesses ...
expand
Learning User Preferences for Topically Similar Documents
Mustafa Zengin, Ben Carterette
Pages: 1795-1798
doi>10.1145/2806416.2806617
Full text: PDFPDF

Similarity measures have been used widely in information retrieval research. Most research has been done on query-document or document-document similarity without much attention to the user's perception of similarity in the context of the information ...
expand
Modeling Parameter Interactions in Ranking SVM
Yaogong Zhang, Jun Xu, Yanyan Lan, Jiafeng Guo, Maoqiang Xie, Yalou Huang, Xueqi Cheng
Pages: 1799-1802
doi>10.1145/2806416.2806595
Full text: PDFPDF

Ranking SVM, which formalizes the problem of learning a ranking model as that of learning a binary SVM on preference pairs of documents, is a state-of-the-art ranking model in information retrieval. The dual form solution of Ranking SVM model can be ...
expand
SESSION: Short Papers: Knowledge Management
Best First Over-Sampling for Multilabel Classification
Xusheng Ai, Jian Wu, Victor S. Sheng, Yufeng Yao, Pengpeng Zhao, Zhiming Cui
Pages: 1803-1806
doi>10.1145/2806416.2806634
Full text: PDFPDF

Learning from imbalanced multilabel data is a challenging task. It has attracted considerable attention recently. In this paper we propose a MultiLabel Best First Over-sampling (ML-BFO) to improve the performance of multilabel classification algorithms, ...
expand
Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity
Melissa Ailem, François Role, Mohamed Nadif
Pages: 1807-1810
doi>10.1145/2806416.2806639
Full text: PDFPDF

We present Coclus, a novel diagonal co-clustering algorithm which is able to effectively co-cluster binary or contingency matrices by directly maximizing an adapted version of the modularity measure traditionally used for networks. While some effective ...
expand
A Data-Driven Approach to Distinguish Cyber-Attacks from Physical Faults in a Smart Grid
Adnan Anwar, Abdun Naser Mahmood, Zubair Shah
Pages: 1811-1814
doi>10.1145/2806416.2806648
Full text: PDFPDF

Recently, there has been significant increase in interest on Smart Grid security. Researchers have proposed various techniques to detect cyber-attacks using sensor data. However, there has been little work to distinguish a cyber-attack from a power system ...
expand
Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text
Andrea Ceroni, Ujwal Kumar Gadiraju, Marco Fisichella
Pages: 1815-1818
doi>10.1145/2806416.2806624
Full text: PDFPDF

Manually inspecting text to assess whether an event occurs in a document collection is an onerous and time consuming task. Although a manual inspection to discard the false events would increase the precision of automatically detected sets of events, ...
expand
DAAV: Dynamic API Authority Vectors for Detecting Software Theft
Dong-Kyu Chae, Sang-Wook Kim, Seong-Je Cho, Yesol Kim
Pages: 1819-1822
doi>10.1145/2806416.2806646
Full text: PDFPDF

This paper proposes a novel birthmark, a dynamic API authority vector (DAAV), for detecting software theft. DAAV satisfies four essential requirements for good birthmarks--credibility, resiliency, scalability, and packing-free--while existing birthmarks ...
expand
Towards Multi-level Provenance Reconstruction of Information Diffusion on Social Media
Tom De Nies, Io Taxidou, Anastasia Dimou, Ruben Verborgh, Peter M. Fischer, Erik Mannens, Rik Van de Walle
Pages: 1823-1826
doi>10.1145/2806416.2806642
Full text: PDFPDF

In order to assess the trustworthiness of information on social media, a consumer needs to understand where this information comes from, and which processes were involved in its creation. The entities, agents and activities involved in the creation of ...
expand
Profiling Pedestrian Distribution and Anomaly Detection in a Dynamic Environment
Minh Tuan Doan, Sutharshan Rajasegarar, Mahsa Salehi, Masud Moshtaghi, Christopher Leckie
Pages: 1827-1830
doi>10.1145/2806416.2806645
Full text: PDFPDF

Pedestrians movements have a major impact on the dynamics of cities and provide valuable guidance to city planners. In this paper we model the normal behaviours of pedestrian flows and detect anomalous events from pedestrian counting data of the City ...
expand
A Clustering-based Approach to Detect Probable Outcomes of Lawsuits
Daniel Lemes Gribel, Maira Gatti de Bayser, Leonardo Guerreiro Azevedo
Pages: 1831-1834
doi>10.1145/2806416.2806640
Full text: PDFPDF

The numerous lawsuits in progress or already judged by the Brazilian Supreme Court consists of a large amount of non-structured data. This leads to a large number of hidden or unknown information, since some relationships between lawsuits are not explicit ...
expand
Detecting Check-worthy Factual Claims in Presidential Debates
Naeemul Hassan, Chengkai Li, Mark Tremayne
Pages: 1835-1838
doi>10.1145/2806416.2806652
Full text: PDFPDF

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
expand
Where You Go Reveals Who You Know: Analyzing Social Ties from Millions of Footprints
Hsun-Ping Hsieh, Rui Yan, Cheng-Te Li
Pages: 1839-1842
doi>10.1145/2806416.2806653
Full text: PDFPDF

This paper aims to investigate how the geographical footprints of users correlate to their social ties. While conventional wisdom told us that the more frequently two users co-locate in geography, the higher probability they are friends, we find that ...
expand
Message Clustering based Matrix Factorization Model for Retweeting Behavior Prediction
Bo Jiang, Jiguang Liang, Ying Sha, Lihong Wang
Pages: 1843-1846
doi>10.1145/2806416.2806650
Full text: PDFPDF

Retweeting is an important mechanism for information diffusion in social networks. Through retweeting, message is reshared from one user to another user, forming large cascades of message forwarding. Most existing researches of predicting retweeting ...
expand
Heterogeneous Multi-task Semantic Feature Learning for Classification
Xin Jin, Fuzhen Zhuang, Sinno Jialin Pan, Changying Du, Ping Luo, Qing He
Pages: 1847-1850
doi>10.1145/2806416.2806644
Full text: PDFPDF

Multi-task Learning (MTL) aims to learn multiple related tasks simultaneously instead of separately to improve generalization performance of each task. Most existing MTL methods assumed that the multiple tasks to be learned have the same feature representation. ...
expand
Top-k Reliable Edge Colors in Uncertain Graphs
Arijit Khan, Francesco Gullo, Thomas Wohler, Francesco Bonchi
Pages: 1851-1854
doi>10.1145/2806416.2806619
Full text: PDFPDF

We study the fundamental problem of finding the set of top-k edge colors that maximizes the reliability between a source node and a destination node in an uncertain and edge-colored graph. Our top-k reliable color set problem naturally ...
expand
Probabilistic Non-negative Inconsistent-resolution Matrices Factorization
Masahiro Kohjima, Tatsushi Matsubayashi, Hiroshi Sawada
Pages: 1855-1858
doi>10.1145/2806416.2806636
Full text: PDFPDF

In this paper, we tackle with the problem of analyzing datasets with different resolution such as a pair of user's individual data and user group's data, for example "userA visited shopA 5 times" and "users whose attributes are men purchased itemA 80 ...
expand
Identifying Attractive News Headlines for Social Media
Sawa Kourogi, Hiroyuki Fujishiro, Akisato Kimura, Hitoshi Nishikawa
Pages: 1859-1862
doi>10.1145/2806416.2806631
Full text: PDFPDF

In the past, leading newspaper companies and broadcasters were the sole distributors of news articles, and thus news consumers simply received news articles from those outlets at regular intervals. However, the growth of social media and smart devices ...
expand
A Probabilistic Rating Auto-encoder for Personalized Recommender Systems
Huizhi Liang, Timothy Baldwin
Pages: 1863-1866
doi>10.1145/2806416.2806633
Full text: PDFPDF

User profiling is a key component of personalized recommender systems, and is used to generate user profiles that describe individual user interests and preferences. The increasing availability of big data is driving the urgent need for user profiling ...
expand
Real-time Rumor Debunking on Twitter
Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, Sameena Shah
Pages: 1867-1870
doi>10.1145/2806416.2806651
Full text: PDFPDF

In this paper, we propose the first real time rumor debunking algorithm for Twitter. We use cues from 'wisdom of the crowds', that is, the aggregate 'common sense' and investigative journalism of Twitter users. We concentrate on identification of a rumor ...
expand
Fraud Transaction Recognition: A Money Flow Network Approach
Renxin Mao, Zhao Li, Jinhua Fu
Pages: 1871-1874
doi>10.1145/2806416.2806647
Full text: PDFPDF

In this paper, we provide some insights into analysis of fraud transaction recognition on Alipay's Money Flow Network. We first show that the Money Flow Network follows a power-law distribution on daily, monthly or yearly basis, based on which we propose ...
expand
Identifying Top-k Consistent News-Casters on Twitter
Sahisnu Mazumder, Sameep Mehta, Dhaval Patel
Pages: 1875-1878
doi>10.1145/2806416.2806649
Full text: PDFPDF

News-casters are Twitter users who periodically pick up interesting news from online news media and spread it to their followers' network. Existing works on Twitter user analysis have only analysed a pre-defined set of users for user modeling, ...
expand
Mining the Minds of Customers from Online Chat Logs
Kunwoo Park, Jaewoo Kim, Jaram Park, Meeyoung Cha, Jiin Nam, Seunghyun Yoon, Eunhee Rhim
Pages: 1879-1882
doi>10.1145/2806416.2806621
Full text: PDFPDF

This study investigates factors that may determine satisfaction in customer service operations. We utilized more than 170,000 online chat sessions between customers and agents to identify characteristics of chat sessions that incurred dissatisfying experience. ...
expand
A Fast k-Nearest Neighbor Search Using Query-Specific Signature Selection
Youngki Park, Heasoo Hwang, Sang-goo Lee
Pages: 1883-1886
doi>10.1145/2806416.2806632
Full text: PDFPDF

k-nearest neighbor (k-NN) search aims at finding k points nearest to a query point in a given dataset. k-NN search is important in various applications, but it becomes extremely expensive in a high-dimensional large dataset. To address this performance ...
expand
Core-Sets For Canonical Correlation Analysis
Saurabh Paul
Pages: 1887-1890
doi>10.1145/2806416.2806618
Full text: PDFPDF

Canonical Correlation Analysis (CCA) is a technique that finds how "similar" are the subspaces that are spanned by the columns of two different matrices A έℜ(of size m-x-n) and B έℜ(of size m-x-l). ...
expand
DeepCamera: A Unified Framework for Recognizing Places-of-Interest based on Deep ConvNets
Pai Peng, Hongxiang Chen, Lidan Shou, Ke Chen, Gang Chen, Chang Xu
Pages: 1891-1894
doi>10.1145/2806416.2806620
Full text: PDFPDF

In this work, we present a novel project called DeepCamera(DC) for recognizing places-of-interest(POI) with smartphones. Our framework is based on deep convolutional neural networks(ConvNets) which are currently state-of-the-art solutions to vision recognition ...
expand
Structured Sparse Regression for Recommender Systems
Mingjie Qian, Liangjie Hong, Yue Shi, Suju Rajan
Pages: 1895-1898
doi>10.1145/2806416.2806641
Full text: PDFPDF

Feature-based collaborative filtering models, such as state-of-the-art factorization machines and regression-based latent factor models, rarely consider features' structural information, ignoring the heterogeneity of inter-type and intra-type relationships. ...
expand
Analyzing Document Intensive Business Processes using Ontology
Suman Roychoudhury, Vinay Kulkarni, Nikhil Bellarykar
Pages: 1899-1902
doi>10.1145/2806416.2806638
Full text: PDFPDF

Knowledge is manifested in an enterprise in various forms ranging from unstructured operational data, to structured information like programs, as well as relational data stored in databases to semi-structured information stored in XML files. This information ...
expand
Transductive Domain Adaptation with Affinity Learning
Le Shu, Longin Jan Latecki
Pages: 1903-1906
doi>10.1145/2806416.2806643
Full text: PDFPDF

We study the problem of domain adaptation, which aims to adapt the classifiers trained on a labeled source domain to an unlabeled target domain. We propose a novel method to solve domain adaptation task in a transductive setting. The proposed method ...
expand
Update Summarization using Semi-Supervised Learning Based on Hellinger Distance
Dingding Wang, Sahar Sohangir, Tao Li
Pages: 1907-1910
doi>10.1145/2806416.2806628
Full text: PDFPDF

Update summarization aims to generate brief summaries of recent documents to capture new information different from earlier documents. In this paper, we propose a new method to generate the sentence similarity graph using a novel similarity measure based ...
expand
Multi-view Clustering via Structured Low-rank Representation
Dong Wang, Qiyue Yin, Ran He, Liang Wang, Tieniu Tan
Pages: 1911-1914
doi>10.1145/2806416.2806629
Full text: PDFPDF

In this paper, we present a novel solution to multi-view clustering through a structured low-rank representation. When assuming similar samples can be linearly reconstructed by each other, the resulting representational matrix reflects the cluster structure ...
expand
Partially Labeled Data Tuple Can Optimize Multivariate Performance Measures
Jim Jing-Yan Wang, Xin Gao
Pages: 1915-1918
doi>10.1145/2806416.2806630
Full text: PDFPDF

Multivariate performance measure optimization refers to learning predictive models such that a desired complex performance measure can be optimized over a training set, such as the F1 score. Up to now, all the existing multivariate performance measure ...
expand
Modeling Infinite Topics on Social Behavior Data with Spatio-temporal Dependence
Peng Wang, Peng Zhang, Chuan Zhou, Zhao Li, Guo Li
Pages: 1919-1922
doi>10.1145/2806416.2806635
Full text: PDFPDF

The problem of modeling topics on user behavior data in social networks has been widely studied in social marketing and social emotion analysis, where latent topic models are commonly used as the solutions. The user behavior data are highly related in ...
expand
ASEM: Mining Aspects and Sentiment of Events from Microblog
Ruhui Wang, Weijing Huang, Wei Chen, Tengjiao Wang, Kai Lei
Pages: 1923-1926
doi>10.1145/2806416.2806622
Full text: PDFPDF

Microblogs contain the most up-to-date and abundant opinion information on current events. Aspect-based opinion mining is a good way to get a comprehensive summarization of events. The most popular aspect based opinion mining models are used in the field ...
expand
Enhanced Word Embeddings from a Hierarchical Neural Language Model
Xun Wang, Katsuhoto Sudoh, Masaaki Nagata
Pages: 1927-1930
doi>10.1145/2806416.2806637
Full text: PDFPDF

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e.., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while ...
expand
Improving Label Quality in Crowdsourcing Using Noise Correction
Jing Zhang, Victor S. Sheng, Jian Wu, Xiaoqin Fu, Xindong Wu
Pages: 1931-1934
doi>10.1145/2806416.2806627
Full text: PDFPDF

This paper proposes a novel framework that introduces noise correction techniques to further improve label quality after ground truth inference in crowdsourcing. In the framework, an adaptive voting noise correction algorithm (AVNC) is proposed to identify ...
expand
Improving Collaborative Filtering via Hidden Structured Constraint
Qing Zhang, Houfeng Wang
Pages: 1935-1938
doi>10.1145/2806416.2806623
Full text: PDFPDF

Matrix factorization models, as one of the most powerful Collaborative Filtering approaches, have greatly advanced the recommendation tasks. However, few of them are able to explicitly consider structured constraint for modeling user interests. To solve ...
expand
WORKSHOP SESSION: Workshop Reports
DOLAP 2015 Workshop Summary
Carlos Garcia-Alvarado, Carlos Ordonez, Il-Yeol Song
Pages: 1939-1940
doi>10.1145/2806416.2806876
Full text: PDFPDF

The ACM DOLAP workshop presents research that bridges data warehousing, On-Line Analytical Processing (OLAP), and other large-scale data processing platforms. The program has four interesting sessions on data warehouse design, database modeling, query ...
expand
DTMBIO 2015: International Workshop on Data and Text Mining in Biomedical Informatics
Min Song, Doheon Lee, Karin Verspoor
Pages: 1941-1942
doi>10.1145/2806416.2806880
Full text: PDFPDF

Held each year in conjunction with one of the largest data management conferences, CIKM, the Ninth ACM International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO'15) is organized to bring together researchers interested in development ...
expand
ECol 2015: First international workshop on the Evaluation on Collaborative Information Seeking and Retrieval
Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine
Pages: 1943-1944
doi>10.1145/2806416.2806881
Full text: PDFPDF

Collaborative Information Seeking/Retrieval (CIS/CIR) has given rise to several challenges in terms of search behavior analysis, retrieval model formalization as well as interface design. However, the major issue of evaluation in CIS/CIR is still underexplored. ...
expand
Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR'15)
Krisztian Balog, Jeffrey Dalton, Antoine Doucet, Yusra Ibrahim
Pages: 1945-1946
doi>10.1145/2806416.2806879
Full text: PDFPDF

The amount of structured content published on the Web has been growing rapidly, making it possible to address increasingly complex information access tasks. Recent years have witnessed the emergence of large scale human-curated knowledge bases as well ...
expand
LSDS-IR'15: 2015 Workshop on Large-Scale and Distributed Systems for Information Retrieval
Ismail Sengor Altingovde, B. Barla Cambazoglu, Nicola Tonellotto
Pages: 1947-1948
doi>10.1145/2806416.2806877
Full text: PDFPDF

The growth of the Web and other Big Data sources lead to important performance problems for large-scale and distributed information retrieval systems. The scalability and efficiency of such information retrieval systems have an impact on their effectiveness, ...
expand
NWSearch 2015: International Workshop on Novel Web Search Interfaces and Systems
Davood Rafiei, Katsumi Tanaka
Pages: 1949-1950
doi>10.1145/2806416.2806882
Full text: PDFPDF

Held for the first time in conjunction with the ACM International Conference on Information and Knowledge Management (CIKM), NWSearch 2015 aims to bring together researchers, developers and practitioners who are interested in pushing the search boundary ...
expand
PIKM 2015: The 8th ACM Workshop for Ph.D. Students in Information and Knowledge Management
Mouna Kacimi, Nicoleta Preda, Maya Ramanath
Pages: 1951-1952
doi>10.1145/2806416.2806873
Full text: PDFPDF

The PIKM workshop offers Ph.D. students the opportunity to bring their work to an international and interdisciplinary research community, and create a network of young researchers to exchange and develop new and promising ideas. Similar to the CIKM, ...
expand
TM 2015 -- Topic Models: Post-Processing and Applications Workshop
Nikolaos Aletras, Jey Han Lau, Timothy Baldwin, Mark Stevenson
Pages: 1953-1954
doi>10.1145/2806416.2806875
Full text: PDFPDF

The main objective of the workshop is to bring together researchers who are interested in applications of topic models and improving their output. Our goal is to create a broad platform for researchers to share ideas that could improve the usability ...
expand
UCUI'15: The 1st International Workshop on Understanding the City with Urban Informatics
Yashar Moshfeghi, Iadh Ounis, Craig Macdonald, Joemon M. Jose, Peter Triantafillou, Mark Livingston, Piyushimita Thakuriah
Pages: 1955-1956
doi>10.1145/2806416.2806878
Full text: PDFPDF

Urban Informatics aims to exploit the large quantities of information produced by modern cities in order to gain insights into how they function. These insights lay the foundation for improving the lives of citizens, by improving the efficacy and efficiency ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us