Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e.., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while each higher-level unit hierarchically influences its containing units. Such an architecture enables the learned word embeddings to encode both global and local information. We evaluate the learned word embeddings and experiments demonstrate the effectiveness of our model.

top of pageAUTHORS

Author image not provided  Xun Wang

No contact information provided yet.

Bibliometrics: publication history
Publication years2015-2015
Publication count3
Citation Count0
Available for download3
Downloads (6 Weeks)2
Downloads (12 Months)23
Downloads (cumulative)281
Average downloads per article93.67
Average citations per article0.00
View colleagues of Xun Wang

Author image not provided  Katsuhoto Sudoh

No contact information provided yet.

View colleagues of Katsuhoto Sudoh

Author image not provided  Masaaki Nagata

No contact information provided yet.

Bibliometrics: publication history
Publication years1992-2016
Publication count46
Citation Count156
Available for download34
Downloads (6 Weeks)101
Downloads (12 Months)562
Downloads (cumulative)6,394
Average downloads per article188.06
Average citations per article3.39
View colleagues of Masaaki Nagata

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. (2006). Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
JeffreyPennington, R. and Manning, C. (2014). Glove: Global vectors for word representation.
Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
Li, J., Jurafsky, D., and Hovy, E. (2015a). When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185.
Li, J., Li, R., and Hovy, E. (2014). Recursive deep models for discourse parsing.
Li, J., Luong, M.-T., and Jurafsky, D. (2015b). A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057.
Luong, M.-T., Socher, R., and Manning, C. (2013). Better word representations with recursive neural networks for morphology. CoNLL-2013, 104.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, pages 1045--1048.
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011). Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751. Citeseer.
Miller, G. A. and Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1--28.
Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP.
Srivastava, N. (2013). Improving neural networks with dropout. PhD thesis, University of Toronto.
Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.
Vaswani, A., Zhao, Y., Fossum, V., and Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In EMNLP, pages 1387--1392. Citeseer.
Zaremba, W. and Sutskever, I. (2014). Learning to execute. arXiv preprint arXiv:1410.4615.

top of pageCITED BY

Citings are not available

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title CIKM '15 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management table of contents
General Chairs James Bailey The University of Melbourne
Alistair Moffat The University of Melbourne
Program Chairs Charu C. Aggarwal IBM
Maarten de Rijke University of Amsterdam
Ravi Kumar Google
Vanessa Murdock Microsoft
Timos Sellis RMIT University
Jeffrey Xu Yu Chinese University of Hong Kong
Pages 1927-1930
Publication Date2015-10-17 (yyyy-mm-dd)
Sponsors SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR ACM Special Interest Group on Information Retrieval
PublisherACM New York, NY, USA ©2015
ISBN: 978-1-4503-3794-6 Order Number: 605159 doi>10.1145/2806416.2806637
Conference CIKMConference on Information and Knowledge Management CIKM logo
Paper Acceptance Rate 165 of 646 submissions, 26%
Overall Acceptance Rate 1,960 of 10,758 submissions, 18%
Year Submitted Accepted Rate
CIKM '05 425 77 18%
CIKM '06 537 81 15%
CIKM '07 512 86 17%
CIKM '08 772 132 17%
CIKM '09 847 123 15%
CIKM '10 945 126 13%
CIKM '11 918 228 25%
CIKM '12 1088 146 13%
CIKM '13 848 143 17%
CIKM '14 838 175 21%
CIKM '15 646 165 26%
CIKM '16 701 160 23%
CIKM '17 855 171 20%
CIKM '18 826 147 18%
Overall 10,758 1,960 18%

Artificial Intelligence
Digital Content

top of pageREVIEWS

Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
Table of Contents
SESSION: Keynote Address I
Session details: Keynote Address I
Alistair Moffat
Slow Search: Improving Information Retrieval Using Human Assistance
Jaime Teevan
Pages: 1-1
Full text: PDFPDF

We live in a world where the pace of everything from communication to transportation is getting faster. In recent years a number of "slow movements" have emerged that advocate for reducing speed in exchange for increasing quality, including the slow ...
SESSION: Session 1A: Scalability
Session details: Session 1A: Scalability
Rui Zhang
External Data Access And Indexing In AsterixDB
Abdullah A. Alamoudi, Raman Grover, Michael J. Carey, Vinayak Borkar
Pages: 3-12
Full text: PDFPDF

Traditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-resident ...
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
Kasper Grud Skat Madsen, Yongluan Zhou
Pages: 13-22
Full text: PDFPDF

The emerging interest in Massively Parallel Stream Processing Engines (MPSPEs), which are able to process long-standing computations over data streams with ever-growing velocity at a large-scale cluster, calls for efficient dynamic resource management ...
A Parallel GPU-Based Approach to Clustering Very Fast Data Streams
Pengtao Huang, Xiu Li, Bo Yuan
Pages: 23-32
Full text: PDFPDF

Clustering data streams has become a hot topic in the era of big data. Driven by the ever increasing volume, velocity and variety of data, more efficient algorithms for clustering large-scale complex data streams are needed. In this paper, we present ...
Scalable Clustering Algorithm via a Triangle Folding Processing for Complex Networks
Ying Kang, Xiaoyan Gu, Weiping Wang, Dan Meng
Pages: 33-42
Full text: PDFPDF

Facing up to the incessant growth of complex networks, more and more researchers start turning to a multilevel computing paradigm with high scalability for clustering. By virtue of iterative coarsening level by level, the clustering results which are ...
SESSION: Session 1B: Personal Search
Session details: Session 1B: Personal Search
Peter Bailey
Understanding the Impact of the Role Factor in Collaborative Information Retrieval
Lynda Tamine, Laure Soulier
Pages: 43-52
Full text: PDFPDF

Collaborative information retrieval systems often rely on division of labor policies. Such policies allow work to be divided among collaborators with the aim of preventing redundancy and optimizing the synergic effects of collaboration. Most of the underlying ...
Experiments with a Venue-Centric Model for Personalisedand Time-Aware Venue Suggestion
Romain Deveaud, M-Dyaa Albakour, Craig Macdonald, Iadh Ounis
Pages: 53-62
Full text: PDFPDF

Location-based social networks (LBSNs), such as Foursquare, fostered the emergence of new tasks such as recommending venues a user might wish to visit. In the literature, recommending venues has typically been addressed using user-centric recommendation ...
Search Result Diversification Based on Hierarchical Intents
Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen
Pages: 63-72
Full text: PDFPDF

A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware ...
Category-Driven Approach for Local Related Business Recommendations
Yonathan Perez, Michael Schueppert, Matthew Lawlor, Shaunak Kishore
Pages: 73-82
Full text: PDFPDF

When users search online for a business, the search engine may present them with a list of related business recommendations. We address the problem of constructing a useful and diverse list of such recommendations that would include an optimal combination ...
SESSION: Session 1C: Learning
Session details: Session 1C: Learning
Leif Azzopardi
A Soft Computing Approach for Learning to Aggregate Rankings
Javier Alvaro Vargas Muñoz, Ricardo da Silva Torres, Marcos André Gonçalves
Pages: 83-92
Full text: PDFPDF

This paper presents an approach to combine rank aggregation techniques using a soft computing technique -- Genetic Programming -- in order to improve the results in Information Retrieval tasks. Previous work shows that by combining rank aggregation techniques ...
Approximate String Matching by End-Users using Active Learning
Lutz Büch, Artur Andrzejak
Pages: 93-102
Full text: PDFPDF

Identifying approximately identical strings is key for many data cleaning and data integration processes, including similarity join and record matching. The accuracy of such tasks crucially depends on appropriate choices of string similarity measures ...
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
Shoaib Jameel, Wai Lam, Steven Schockaert, Lidong Bing
Pages: 103-112
Full text: PDFPDF

While most methods for learning-to-rank documents only consider relevance scores as features, better results can often be obtained by taking into account the latent topic structure of the document collection. Existing approaches that consider latent ...
Collaborating between Local and Global Learning for Distributed Online Multiple Tasks
Xin Jin, Ping Luo, Fuzhen Zhuang, Jia He, Qing He
Pages: 113-122
Full text: PDFPDF

This paper studies the novel learning scenarios of Distributed Online Multi-tasks (DOM), where the learning individuals with continuously arriving data are distributed separately and meanwhile they need to learn individual models collaboratively. It ...
SESSION: Session 1D: Text Processing
Session details: Session 1D: Text Processing
Tim Baldwin
Lifespan-based Partitioning of Index Structures for Time-travel Text Search
Animesh Nandi, Suriya Subramanian, Sriram Lakshminarasimhan, Prasad M. Deshpande, Sriram Raghavan
Pages: 123-132
Full text: PDFPDF

Time-travel text search over a temporally evolving document collection is useful in various applications. Supporting a wide range of query classes demanded by these applications require different index layouts optimized for their respective query access ...
Contextual Text Understanding in Distributional Semantic Space
Jianpeng Cheng, Zhongyuan Wang, Ji-Rong Wen, Jun Yan, Zheng Chen
Pages: 133-142
Full text: PDFPDF

Representing discrete words in a continuous vector space turns out to be useful for natural language applications related to text understanding. Meanwhile, it poses extensive challenges, one of which is due to the polysemous nature of human language. ...
External Knowledge and Query Strategies in Active Learning: a Study in Clinical Information Extraction
Mahnoosh Kholghi, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen
Pages: 143-152
Full text: PDFPDF

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning ...
Ranking Deep Web Text Collections for Scalable Information Extraction
Pablo Barrio, Luis Gravano, Chris Develder
Pages: 153-162
Full text: PDFPDF

Information extraction (IE) systems discover structured information from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a computationally expensive ...
SESSION: Session 1E: Applications
Session details: Session 1E: Applications
Huizhi (Elly) Liang
Forming Online Support Groups for Internet and Behavior Related Addictions
Chih-Ya Shen, Hong-Han Shuai, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S. Yu, Ming-Syan Chen
Pages: 163-172
Full text: PDFPDF

While online social networks have become a part of many people's daily lives, Internet and social network addictions (ISNAs) have been noted recently. With increased patients in addictive Internet use, clinicians often form support groups to help patients. ...
Concept-Based Relevance Models for Medical and Semantic Information Retrieval
Chunye Wang, Ramakrishna Akella
Pages: 173-182
Full text: PDFPDF

Relevance models provide an important approach for estimating probabilities of words in the relevant class. However, the associated bag-of-words assumption breaks dependencies between words, especially between those within a phrase. If such dependencies ...
PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface
Longqi Yang, Yin Cui, Fan Zhang, John P. Pollak, Serge Belongie, Deborah Estrin
Pages: 183-192
Full text: PDFPDF

Food preference learning is an important component of wellness applications and restaurant recommender systems as it provides personalized information for effective food targeting and suggestions. However, existing systems require some form of food journaling ...
Data Driven Water Pipe Failure Prediction: A Bayesian Nonparametric Approach
Peng Lin, Bang Zhang, Yi Wang, Zhidong Li, Bin Li, Yang Wang, Fang Chen
Pages: 193-202
Full text: PDFPDF

Water pipe failures can cause significant economic and social costs, hence have become the primary challenge to water utilities. In this paper, we propose a Bayesian nonparametric approach, namely the Dirichlet process mixture of hierarchical beta process ...
SESSION: Session 1F: Social Media 1
Session details: Session 1F: Social Media 1
Lynda Tamine
Tumblr Blog Recommendation with Boosted Inductive Matrix Completion
Donghyuk Shin, Suleyman Cetintas, Kuang-Chih Lee, Inderjit S. Dhillon
Pages: 203-212
Full text: PDFPDF

Popular microblogging sites such as Tumblr have attracted hundreds of millions of users as a content sharing platform, where users can create rich content in the form of posts that are shared with other users who follow them. Due to the sheer amount ...
BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media
Haokai Lu, James Caverlee, Wei Niu
Pages: 213-222
Full text: PDFPDF

We propose a lightweight system for (i) semi-automatically discovering and tracking bias themes associated with opposing sides of a topic; (ii) identifying strong partisans who drive the online discussion; and (iii) inferring the opinion bias of "regular" ...
Knowlywood: Mining Activity Knowledge From Hollywood Narratives
Niket Tandon, Gerard de Melo, Abir De, Gerhard Weikum
Pages: 223-232
Full text: PDFPDF

Despite the success of large knowledge bases, one kind of knowledge that has not received attention so far is that of human activities. An example of such an activity is proposing to someone (to get married). For the computer, knowing that this involves ...
Entity and Aspect Extraction for Organizing News Comments
Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt
Pages: 233-242
Full text: PDFPDF

News websites give their users the opportunity to participate in discussions about published articles, by writing comments. Typically, these comments are unstructured making it hard to understand the flow of user discussions. Thus, there is a need for ...
SESSION: Session 2A: Graphs
Session details: Session 2A: Graphs
Sourav S. Bhowmick
HDRF: Stream-Based Partitioning for Power-Law Graphs
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, Giorgio Iacoboni
Pages: 243-252
Full text: PDFPDF

Balanced graph partitioning is a fundamental problem that is receiving growing attention with the emergence of distributed graph-computing (DGC) frameworks. In these frameworks, the partitioning strategy plays an important role since it drives the communication ...
Towards Scale-out Capability on Social Graphs
Haichuan Shang, Xiang Zhao, Uday Kiran, Masaru Kitsuregawa
Pages: 253-262
Full text: PDFPDF

The development of cloud storage and computing has facilitated the rise of various big data applications. As a representative high performance computing (HPC) workload, graph processing is becoming a part of cloud computing. However, scalable computing ...
Identifying Top-k Structural Hole Spanners in Large-Scale Social Networks
Mojtaba Rezvani, Weifa Liang, Wenzheng Xu, Chengfei Liu
Pages: 263-272
Full text: PDFPDF

Recent studies have shown that in social networks, users who bridge different communities, known as structural hole spanners, have great potentials to acquire available resources from these communities and gain access to multiple sources of information ...
Scalable Facility Location for Massive Graphs on Pregel-like Systems
Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, Mauro Sozio
Pages: 273-282
Full text: PDFPDF

We propose a new scalable algorithm for the facility-location problem. We study the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on a graph. This setting is applicable to various ...
SESSION: Session 2B: Retrieval Algorithms
Session details: Session 2B: Retrieval Algorithms
Guido Zuccon
Rank by Time or by Relevance?: Revisiting Email Search
David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, Ariel Raviv
Pages: 283-292
Full text: PDFPDF

With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in ...
On the Cost of Extracting Proximity Features for Term-Dependency Models
Xiaolu Lu, Alistair Moffat, J. Shane Culpepper
Pages: 293-302
Full text: PDFPDF

Sophisticated ranking mechanisms make use of term dependency features in order to compute similarity scores for documents. These features often include exact phrase occurrences, and term proximity estimates. Both cases build on the intuition that if ...
An Optimization Framework for Merging Multiple Result Lists
Chia-Jung Lee, Qingyao Ai, W. Bruce Croft, Daniel Sheldon
Pages: 303-312
Full text: PDFPDF

Developing effective methods for fusing multiple ranked lists of documents is crucial to many applications. Federated web search, for instance, has become a common practice where a query is issued to different verticals and a single ranked list of blended ...
Searching and Stopping: An Analysis of Stopping Rules and Strategies
David Maxwell, Leif Azzopardi, Kalervo Järvelin, Heikki Keskustalo
Pages: 313-322
Full text: PDFPDF

Searching naturally involves stopping points, both at a query level (how far down the ranked list should I go?) and at a session level (how many queries should I issue?). Understanding when searchers stop has been of much interest to the ...
SESSION: Session 2C: Text Analysis
Session details: Session 2C: Text Analysis
Krisztian Balog
Automated News Suggestions for Populating Wikipedia Entity Pages
Besnik Fetahu, Katja Markert, Avishek Anand
Pages: 323-332
Full text: PDFPDF

Wikipedia entity pages are a valuable source of information for direct consumption and for knowledge-base construction, update and maintenance. Facts in these entity pages are typically supported by references. Recent studies show that as much as 20% ...
Mining Coordinated Intent Representation for Entity Search and Recommendation
Huizhong Duan, ChengXiang Zhai
Pages: 333-342
Full text: PDFPDF

We study the problem of learning query intent representation for an entity search task such as product retrieval, where a user would use a keyword query to retrieve entities based on their structured attribute value descriptions. Existing intent representation ...
Sentiment Extraction by Leveraging Aspect-Opinion Association Structure
Li Zhao, Minlie Huang, Jiashen Sun, Hengliang Luo, Xiankai Yang, Xiaoyan Zhu
Pages: 343-352
Full text: PDFPDF

Sentiment extraction aims to extract and group the task of extracting and grouping aspect and opinion words from online reviews. Previous works usually extract aspect and opinion words by leveraging association between a single pair of aspect and opinion ...
Leveraging Joint Interactions for Credibility Analysis in News Communities
Subhabrata Mukherjee, Gerhard Weikum
Pages: 353-362
Full text: PDFPDF

Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. ...
SESSION: Session 2D: Clustering
Session details: Session 2D: Clustering
Ravi Kumar
Clustering-based Active Learning on Sensor Type Classification in Buildings
Dezhi Hong, Hongning Wang, Kamin Whitehouse
Pages: 363-372
Full text: PDFPDF

Commercial and industrial buildings account for a considerable portion of all energy consumed in the U.S., and thus reducing this energy consumption is a national grand challenge. Based on the large deployment of sensors in modern commercial buildings, ...
gSparsify: Graph Motif Based Sparsification for Graph Clustering
Peixiang Zhao
Pages: 373-382
Full text: PDFPDF

Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intra-cluster density and intercluster sparsity. In many real-world applications, however, ...
Incomplete Multi-view Clustering via Subspace Learning
Qiyue Yin, Shu Wu, Liang Wang
Pages: 383-392
Full text: PDFPDF

Multi-view clustering, which explores complementary information between multiple distinct feature sets for better clustering, has a wide range of applications, e.g., knowledge management and information retrieval. Traditional multi-view clustering methods ...
Robust Subspace Clustering via Tighter Rank Approximation
Zhao Kang, Chong Peng, Qiang Cheng
Pages: 393-401
Full text: PDFPDF

Matrix rank minimization problem is in general NP-hard. The nuclear norm is used to substitute the rank function in many recent studies. Nevertheless, the nuclear norm approximation adds all singular values together and the approximation error may depend ...
SESSION: Session 2E: Users and Predictions
Session details: Session 2E: Users and Predictions
James Caverlee
Interactive User Group Analysis
Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, Alexandre Termier
Pages: 403-412
Full text: PDFPDF

User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. The analysis of user data is appealing to scientists who work on population studies, recommendations, and large-scale data analytics. ...
Viewability Prediction for Online Display Ads
Chong Wang, Achir Kalra, Cristian Borcea, Yi Chen
Pages: 413-422
Full text: PDFPDF

As a massive industry, display advertising delivers advertisers' marketing messages to attract customers through graphic banners on webpages. Advertisers are charged by ad serving, where their ads are shown in web pages. However, recent studies show ...
10 Bits of Surprise: Detecting Malicious Users with Minimum Information
Reza Zafarani, Huan Liu
Pages: 423-431
Full text: PDFPDF

Malicious users are a threat to many sites and defending against them demands innovative countermeasures. When malicious users join sites, they provide limited information about themselves. With this limited information, sites can find it difficult to ...
MAPer: A Multi-scale Adaptive Personalized Model for Temporal Human Behavior Prediction
Sarah Masud Preum, John A. Stankovic, Yanjun Qi
Pages: 433-442
Full text: PDFPDF

The primary objective of this research is to develop a simple and interpretable predictive framework to perform temporal modeling of individual user's behavior traits based on each person's past observed traits/behavior. Individual-level human behavior ...
SESSION: Session 2F: Heterogeneous Networks
Session details: Session 2F: Heterogeneous Networks
Michael Quan Z. Sheng
Classification with Active Learning and Meta-Paths in Heterogeneous Information Networks
Chang Wan, Xiang Li, Ben Kao, Xiao Yu, Quanquan Gu, David Cheung, Jiawei Han
Pages: 443-452
Full text: PDFPDF

A heterogeneous information network (HIN) is used to model objects of different types and their relationships. Meta-paths are sequences of object types. They are used to represent complex relationships between objects beyond what links in a homogeneous ...
Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks
Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, Bin Wu
Pages: 453-462
Full text: PDFPDF

Recently heterogeneous information network (HIN) analysis has attracted a lot of attention, and many data mining tasks have been exploited on HIN. As an important data mining task, recommender system includes a lot of object types (e.g., users, movies, ...
A Graph-based Recommendation across Heterogeneous Domains
Deqing Yang, Jingrui He, Huazheng Qin, Yanghua Xiao, Wei Wang
Pages: 463-472
Full text: PDFPDF

Given the users from a social network site, who have been tagged with a set of terms, how can we recommend the movies tagged with a completely different set of terms hosted by another website? Given the users from a website dedicated to Type I and Type ...
Query Relaxation across Heterogeneous Data Sources
Verena Kantere, George Orfanoudakis, Anastasios Kementsietsidis, Timos Sellis
Pages: 473-482
Full text: PDFPDF

The fundamental assumption for query rewriting in heterogeneous environments is that the mappings used for the rewriting are complete, i.e., every relation and attribute mentioned in the query is associated, through mappings, to relations ...
SESSION: Session 3A: Veracity
Session details: Session 3A: Veracity
Laure Berti-?quille
Approximated Summarization of Data Provenance
Eleanor Ainy, Pierre Bourhis, Susan B. Davidson, Daniel Deutch, Tova Milo
Pages: 483-492
Full text: PDFPDF

Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult ...
An Integrated Bayesian Approach for Effective Multi-Truth Discovery
Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, Xue Li
Pages: 493-502
Full text: PDFPDF

Truth-finding is the fundamental technique for corroborating reports from multiple sources in both data integration and collective intelligent applications. Traditional truth-finding methods assume a single true value for each data item and therefore ...
Approximate Truth Discovery via Problem Scale Reduction
Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Xue Li, Xiaofei Xu, Lina Yao
Pages: 503-512
Full text: PDFPDF

Many real-world applications rely on multiple data sources to provide information on their interested items. Due to the noises and uncertainty in data, given a specific item, the information from different sources may conflict. To make reliable decisions ...
SESSION: Session 3B: Social Networks 1
Session details: Session 3B: Social Networks 1
Niloy Ganguly
Organic or Organized?: Exploring URL Sharing Behavior
Cheng Cao, James Caverlee, Kyumin Lee, Hancheng Ge, Jinwook Chung
Pages: 513-522
Full text: PDFPDF

URL sharing has become one of the most popular activities on many online social media platforms. Shared URLs are an avenue to interesting news articles, memes, photos, as well as low-quality content like spam, promotional ads, and phishing sites. While ...
Mining Brokers in Dynamic Social Networks
Chonggang Song, Wynne Hsu, Mong Li Lee
Pages: 523-532
Full text: PDFPDF

The theory of brokerage in sociology suggests if contacts between two parties are enabled through a third party, the latter occupies a strategic position of controlling information flows. Such individuals are called brokers and they play a key ...
Who Will You "@"?
Yeyun Gong, Qi Zhang, Xuyang Sun, Xuanjing Huang
Pages: 533-542
Full text: PDFPDF

In Twitter-like social networking services, people can use the "@" symbol to mention other users in tweets and send them a message or link to their profiles. In recent years, social media services are rapidly growing with thousands of millions of users ...
SESSION: Session 3C: Query Completion
Session details: Session 3C: Query Completion
Maarten de Rijke
Characterizing and Predicting Voice Query Reformulation
Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, Rosie Jones
Pages: 543-552
Full text: PDFPDF

Voice interactions are becoming more prevalent as the usage of voice search and intelligent assistants gains more popularity. Users frequently reformulate their requests in hope of getting better results either because the system was unable to recognize ...
A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion
Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, Jian-Yun Nie
Pages: 553-562
Full text: PDFPDF

Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous ...
A Network-Aware Approach for Searching As-You-Type in Social Media
Paul Lagrée, Bogdan Cautis, Hossein Vahabi
Pages: 563-572
Full text: PDFPDF

We present in this paper a novel approach for as-you-type top-k keyword search over social media. We adopt a natural "network-aware" interpretation for information relevance, by which information produced by users who are closer to the seeker ...
SESSION: Session 3D: Microblogs
Session details: Session 3D: Microblogs
Antoine Doucet
Improving Microblog Retrieval with Feedback Entity Model
Feifan Fan, Runwei Qiang, Chao Lv, Jianwu Yang
Pages: 573-582
Full text: PDFPDF

When searching over the microblogging, users prefer using queries including terms that represent some specific entities. Meanwhile, tweets, though limited within 140 characters, are often generated with one or more entities. Entities, as an important ...
Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, Saptarshi Ghosh
Pages: 583-592
Full text: PDFPDF

Microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however, this information is immersed among hundreds ...
Profession-Based Person Search in Microblogs: Using Seed Sets to Find Journalists
Mossaab Bagdouri, Douglas W. Oard
Pages: 593-602
Full text: PDFPDF

We introduce the problem of searching for professionals in microblogging platforms. We describe a study of how a group of professional journalists with some common characteristics (e.g., works in a specific language, belongs to certain region, or specializes ...
SESSION: Session 3E: Graph-Based Analysis
Session details: Session 3E: Graph-Based Analysis
Lina Yao
Learning Entity Types from Query Logs via Graph-Based Modeling
Jingyuan Zhang, Luo Jie, Altaf Rahman, Sihong Xie, Yi Chang, Philip S. Yu
Pages: 603-612
Full text: PDFPDF

Entities (e.g., person, movie or place) play an important role in real-world applications and learning entity types has attracted much attention in recent years. Most conventional automatic techniques use large corpora, such as news articles, to learn ...
Collaborative Prediction for Multi-entity Interaction With Hierarchical Representation
Qiang Liu, Shu Wu, Liang Wang
Pages: 613-622
Full text: PDFPDF

With the rapid growth of Internet applications, there are more and more entities in interaction scenarios, and thus collaborative prediction for multi-entity interaction is becoming a significant problem. The state-of-the-art methods, e.g., tensor factorization ...
Learning to Represent Knowledge Graphs with Gaussian Embedding
Shizhu He, Kang Liu, Guoliang Ji, Jun Zhao
Pages: 623-632
Full text: PDFPDF

The representation of a knowledge graph (KG) in a latent space recently has attracted more and more attention. To this end, some proposed models (e.g., TransE) embed entities and relations of a KG into a "point" vector space by optimizing ...
SESSION: Session 3F: Classification 1
Session details: Session 3F: Classification 1
Alexandra Uitdenbogerd
Associative Classification with Statistically Significant Positive and Negative Rules
Jundong Li, Osmar Zaiane
Pages: 633-642
Full text: PDFPDF

Rule-based classifier has shown its popularity in building many decision support systems such as medical diagnosis and financial fraud detection. One major advantage is that the models are human understandable and can be edited. Associative classifiers, ...
A Min-Max Optimization Framework For Online Graph Classification
Peng Yang, Peilin Zhao
Pages: 643-652
Full text: PDFPDF

Traditional online learning for graph node classification adapts graph regularization into ridge regression, which may not be suitable when data is adversarially generated. To solve this issue, we propose a more general min-max optimization framework ...
An Inference Approach to Basic Level of Categorization
Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, Yanghua Xiao
Pages: 653-662
Full text: PDFPDF

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications ...
SESSION: Keynote Address II
Session details: Keynote Address II
James Bailey
Making Sense of Spatial Trajectories
Xiaofang Zhou, Kai Zheng, Hoyoung Jueng, Jiajie Xu, Shazia Sadiq
Pages: 671-672
Full text: PDFPDF

Spatial trajectory data is widely available today. Over a sustained period of time, trajectory data has been collected from numerous GPS devices, smartphones, sensors and social media applications. Daily increases of real-time trajectory data have also ...
SESSION: Session 4A: Location-Based Services
Session details: Session 4A: Location-Based Services
Timos Sellis
ReverseCloak: Protecting Multi-level Location Privacy over Road Networks
Chao Li, Balaji Palanisamy
Pages: 673-682
Full text: PDFPDF

With advances in sensing and positioning technology, fueled by the ubiquitous deployment of wireless networks, location-aware computing has become a fundamental model for offering a wide range of life enhancing services. However, the ability to locate ...
GLUE: a Parameter-Tuning-Free Map Updating System
Hao Wu, Chuanchuan Tu, Weiwei Sun, Baihua Zheng, Hao Su, Wei Wang
Pages: 683-692
Full text: PDFPDF

Map data are widely used in mobile services, but most maps might not be complete. Updating the map automatically is an important problem because road networks are frequently changed with the development of the city. This paper studies the problem of ...
A Cost-based Method for Location-Aware Publish/Subscribe Services
Minghe Yu, Guoliang Li, Jianhua Feng
Pages: 693-702
Full text: PDFPDF

Location-based services have attracted significant attentions from both industry and academia, thanks to modern smartphones and mobile Internet. To provide users with gratifications, location-aware publish/subscribe has been recently proposed, which ...
Probabilistic Forecasts of Bike-Sharing Systems for Journey Planning
Nicolas Gast, Guillaume Massonnet, Daniel Reijsbergen, Mirco Tribastone
Pages: 703-712
Full text: PDFPDF

We study the problem of making forecasts about the future availability of bicycles in stations of a bike-sharing system (BSS). This is relevant in order to make recommendations guaranteeing that the probability that a user will be able to make a journey ...
SESSION: Session 4B: Query Explanation
Session details: Session 4B: Query Explanation
Sebastian Link
Efficient Computation of Polynomial Explanations of Why-Not Questions
Nicole Bidoit, Melanie Herschel, Aikaterini Tzompanaki
Pages: 713-722
Full text: PDFPDF

Answering a Why-Not question consists in explaining why a query result does not contain some expected data, called missing answers. This paper focuses on processing Why-Not questions in a query-based approach that identifies the culprit query components. ...
Interruption-Sensitive Empty Result Feedback: Rethinking the Visual Query Feedback Paradigm for Semistructured Data
Sourav S Bhowmick, Curtis Dyreson, Byron Choi, Min-Hwee Ang
Pages: 723-732
Full text: PDFPDF

The usability of visual querying schemes for tree and graph-structured data can be greatly enhanced by providing feedback during query construction, but feedback at inopportune times can hamper query construction. In this paper, we rethink the traditional ...
Implementing Query Completeness Reasoning
Werner Nutt, Sergey Paramonov, Ognjen Savkovic
Pages: 733-742
Full text: PDFPDF

Data completeness is commonly regarded as one of the key aspects of data quality. With this paper we make two main contributions: (i) we develop techniques to reason about the completeness of a query answer over a partially complete database, taking ...
Towards Scalable and Complete Query Explanation with OWL 2 EL Ontologies
Zhe Wang, Mahsa Chitsaz, Kewen Wang, Jianfeng Du
Pages: 743-752
Full text: PDFPDF

Ontology-mediated data access and management systems are rapidly emerging. Besides standard query answering, there is also a need for such systems to be coupled with explanation facilities, in particular to explain missing query answers (i.e. desired ...
SESSION: Session 4C: Crowds
Session details: Session 4C: Crowds
Falk Scholer
Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons
Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, Gergely V. Zaruba
Pages: 753-762
Full text: PDFPDF

This is the first study of crowdsourcing Pareto-optimal object finding over partial orders and by pairwise comparisons, which has applications in public opinion collection, group decision making, and information exploration. Departing from prior studies ...
Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics
Alexey Drutsa, Anna Ufliand, Gleb Gusev
Pages: 763-772
Full text: PDFPDF

Online controlled experiments, e.g., A/B testing, is the state-of-the-art approach used by modern Internet companies to improve their services based on data-driven decisions. The most challenging problem is to define an appropriate online metric of user ...
Generalized Team Draft Interleaving
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis
Pages: 773-782
Full text: PDFPDF

Interleaving is an online evaluation method that compares two ranking functions by mixing their results and interpreting the users' click feedback. An important property of an interleaving method is its sensitivity, i.e. the ability to obtain reliable ...
Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes
Martin Davtyan, Carsten Eickhoff, Thomas Hofmann
Pages: 783-790
Full text: PDFPDF

The use of crowdsourcing for document relevance assessment has been found to be a viable alternative to corpus annotation by highly trained experts. The question of quality control is a recurring challenge that is often addressed by aggregating multiple ...
SESSION: Session 4D: Optimization
Session details: Session 4D: Optimization
Xuan Vinh Nguyen
L2Knng: Fast Exact K-Nearest Neighbor Graph Construction with L2-Norm Pruning
David C. Anastasiu, George Karypis
Pages: 791-800
Full text: PDFPDF

The k-nearest neighbor graph is often used as a building block in information retrieval, clustering, online advertising, and recommender systems algorithms. The complexity of constructing the exact k-nearest neighbor graph is quadratic on the ...
Lingo: Linearized Grassmannian Optimization for Nuclear Norm Minimization
Qian Li, Wenjia Niu, Gang Li, Yanan Cao, Jianlong Tan, Li Guo
Pages: 801-809
Full text: PDFPDF

As a popular heuristic to the matrix rank minimization problem, nuclear norm minimization attracts intensive research attentions. Matrix factorization based algorithms can reduce the expensive computation cost of SVD for nuclear norm ...
Deep Collaborative Filtering via Marginalized Denoising Auto-encoder
Sheng Li, Jaya Kawale, Yun Fu
Pages: 811-820
Full text: PDFPDF

Collaborative filtering (CF) has been widely employed within recommender systems to solve many real-world problems. Learning effective latent factors plays the most important role in collaborative filtering. Traditional CF methods based upon matrix factorization ...
Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation
Tong Zhao, Julian McAuley, Irwin King
Pages: 821-830
Full text: PDFPDF

Latent Factor models, which transform both users and items into the same latent feature space, are one of the most successful and ubiquitous models in recommender systems. Most existing models in this paradigm define both users' and items' latent factors ...
SESSION: Session 4E: Social Networks 2
Session details: Session 4E: Social Networks 2
Hongzhi Yin
Node Immunization over Infectious Period
Chonggang Song, Wynne Hsu, Mong Li Lee
Pages: 831-840
Full text: PDFPDF

Locating nodes to immunize in computer/social networks to control the spread of virus or rumors has become an important problem. In real world contagions, nodes may get infected by external sources when the propagation is underway. While most studies ...
Enterprise Social Link Recommendation
Jiawei Zhang, Yuanhua Lv, Philip Yu
Pages: 841-850
Full text: PDFPDF

Many companies have started to use Enterprise Social Networks (ESNs), such as Yammer, to facilitate collaboration and communication amongst their employees in the business context. Social link recommendation, which finds and suggests whom one wants to ...
Exploiting Game Theoretic Analysis for Link Recommendation in Social Networks
Tong Zhao, H. Vicky Zhao, Irwin King
Pages: 851-860
Full text: PDFPDF

The popularity of Online Social Networks (OSNs) has attracted great research interests in different fields. In Economics, researchers use game theory to analyze the mechanism of network formation, which is called Network Formation Game. While in Computer ...
Extracting Interest Tags for Non-famous Users in Social Network
Wei He, Hongyan Liu, Jun He, Shu Tang, Xiaoyong Du
Pages: 861-870
Full text: PDFPDF

Inferring interests of users in social network is important for many applications such as personalized search, recommender systems and online advertising. Most previous studies inferred users' interests based on text posted in social network, which is ...
SESSION: Session 4F: Matrix Factorization
Session details: Session 4F: Matrix Factorization
Jeffrey Chan
Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF
Hongchang Gao, Feiping Nie, Weidong Cai, Heng Huang
Pages: 871-880
Full text: PDFPDF

As an important matrix factorization model, Nonnegative Matrix Factorization (NMF) has been widely used in information retrieval and data mining research. Standard Nonnegative Matrix Factorization is known to use the Frobenius norm to calculate the residual, ...
MF-Tree: Matrix Factorization Tree for Large Multi-Class Learning
Lei Liu, Pang-Ning Tan, Xi Liu
Pages: 881-890
Full text: PDFPDF

Many big data applications require accurate classification of objects into one of possibly thousands or millions of categories. Such classification tasks are challenging due to issues such as class imbalance, high testing cost, and model interpretability ...
GraRep: Learning Graph Representations with Global Structural Information
Shaosheng Cao, Wei Lu, Qiongkai Xu
Pages: 891-900
Full text: PDFPDF

In this paper, we present {GraRep}, a novel model for learning vertex representations of weighted graphs. This model learns low dimensional vectors to represent vertices appearing in a graph and, unlike existing work, integrates global structural information ...
Context-Adaptive Matrix Factorization for Multi-Context Recommendation
Tong Man, Huawei Shen, Junming Huang, Xueqi Cheng
Pages: 901-910
Full text: PDFPDF

Data sparsity is a long-standing challenge for recommender systems based on collaborative filtering. A promising solution for this problem is multi-context recommendation, i.e., leveraging users' explicit or implicit feedback from multiple contexts. ...
SESSION: Session 5A: Trips and Trajectories
Session details: Session 5A: Trips and Trajectories
Iadh Ounis
Personalized Trip Recommendation with POI Availability and Uncertain Traveling Time
Chenyi Zhang, Hongwei Liang, Ke Wang, Jianling Sun
Pages: 911-920
Full text: PDFPDF

As location-based social network (LBSN) services become increasingly popular, trip recommendation that recommends a sequence of points of interest (POIs) to visit for a user emerges as one of many important applications of LBSNs. Personalized ...
Range Search on Uncertain Trajectories
Liming Zhan, Ying Zhang, Wenjie Zhang, Xiaoyang Wang, Xuemin Lin
Pages: 921-930
Full text: PDFPDF

The range search on trajectories is fundamental in a wide spectrum of applications such as environment monitoring and location based services. In practice, a large portion of spatio-temporal data in the above applications is generated with low sampling ...
Efficient Computation of Trips with Friends and Families
Tanzima Hashem, Sukarna Barua, Mohammed Eunus Ali, Lars Kulik, Egemen Tanin
Pages: 931-940
Full text: PDFPDF

A group of friends located at their working places may want to plan a trip to visit a shopping center, have dinner at a restaurant, watch a movie at a theater, and then finally return to their homes with the minimum total trip distance. For a group of ...
Sampling Big Trajectory Data
Yanhua Li, Chi-Yin Chow, Ke Deng, Mingxuan Yuan, Jia Zeng, Jia-Dong Zhang, Qiang Yang, Zhi-Li Zhang
Pages: 941-950
Full text: PDFPDF

The increasing prevalence of sensors and mobile devices has led to an explosive increase of the scale of spatio-temporal data in the form of trajectories. A trajectory aggregate query, as a fundamental functionality for measuring trajectory data, aims ...
SESSION: Session 5B: Retrieval Enhancements 1
Session details: Session 5B: Retrieval Enhancements 1
J. Shane Culpepper
EsdRank: Connecting Query and Documents through External Semi-Structured Data
Chenyan Xiong, Jamie Callan
Pages: 951-960
Full text: PDFPDF

This paper presents EsdRank, a new technique for improving ranking using external semi-structured data such as controlled vocabularies and knowledge bases. EsdRank treats vocabularies, terms and entities from external data, as objects connecting query ...
A Probabilistic Framework for Temporal User Modeling on Microblogs
Jitao Sang, Dongyuan Lu, Changsheng Xu
Pages: 961-970
Full text: PDFPDF

In social media, users have contributed enormous behavior data online which can be leveraged for user modeling and conduct personalized services. Temporal user modeling, which incorporates the timestamp of these behavior data and understands users' interest ...
Deriving Intensional Descriptions for Web Services
Maria Koutraki, Dan Vodislav, Nicoleta Preda
Pages: 971-980
Full text: PDFPDF

Many data providers make their data available through Web service APIs. In order to unleash the potential of these sources for intelligent applications, the data has to be combined across different APIs. However, due to the heterogeneity of schemas, ...
An Optimization Framework for Propagation of Query-Document Features by Query Similarity Functions
Maxim Zhukovskiy, Tsimafei Khatkevich, Gleb Gusev, Pavel Serdyukov
Pages: 981-990
Full text: PDFPDF

It is well known that a great number of query--document features which significantly improve the quality of ranking for popular queries, however, do not provide any benefit for new or rare queries since there is typically not enough data associated with ...
SESSION: Session 5C: Privacy
Session details: Session 5C: Privacy
James Thom
Rank Consistency based Multi-View Learning: A Privacy-Preserving Approach
Han-Jia Ye, De-Chuan Zhan, Yuan Miao, Yuan Jiang, Zhi-Hua Zhou
Pages: 991-1000
Full text: PDFPDF

Complex media objects are often described by multi-view feature groups collected from diverse domains or information channels. Multi-view learning, which attempts to exploit the relationship among multiple views to improve learning performance, has drawn ...
Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach
Haoran Li, Li Xiong, Xiaoqian Jiang, Jinfei Liu
Pages: 1001-1010
Full text: PDFPDF

Differential privacy has recently become a de facto standard for private statistical data release. Many algorithms have been proposed to generate differentially private histograms or synthetic data. However, most of them focus on "one-time" release of ...
WaveCluster with Differential Privacy
Ling Chen, Ting Yu, Rada Chirkova
Pages: 1011-1020
Full text: PDFPDF

WaveCluster is an important family of grid-based clustering algorithms that are capable of finding clusters of arbitrary shapes. In this paper, we investigate techniques to perform WaveCluster while ensuring differential privacy.Our goal is to develop ...
Process-Driven Data Privacy
Weiyi Xia, Murat Kantarcioglu, Zhiyu Wan, Raymond Heatherly, Yevgeniy Vorobeychik, Bradley Malin
Pages: 1021-1030
Full text: PDFPDF

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt ...
SESSION: Session 5D: Data Streams
Session details: Session 5D: Data Streams
Anthony Wirth
Unsupervised Feature Selection on Data Streams
Hao Huang, Shinjae Yoo, Shiva Prasad Kasiviswanathan
Pages: 1031-1040
Full text: PDFPDF

Massive data streams are continuously being generated from sources such as social media, broadcast news, etc., and typically these datapoints lie in high-dimensional spaces (such as the vocabulary space of a language). Timely and accurate feature subset ...
Unsupervised Streaming Feature Selection in Social Media
Jundong Li, Xia Hu, Jiliang Tang, Huan Liu
Pages: 1041-1050
Full text: PDFPDF

The explosive growth of social media sites brings about massive amounts of high-dimensional data. Feature selection is effective in preparing high-dimensional data for data analytics. The characteristics of social media present novel challenges for feature ...
Weighted Similarity Estimation in Data Streams
Konstantin Kutzkov, Mohamed Ahmed, Sofia Nikitaki
Pages: 1051-1060
Full text: PDFPDF

Similarity computation between pairs of objects is often a bottleneck in many applications that have to deal with massive volumes of data. Motivated by applications such as collaborative filtering in large-scale recommender systems, and influence probabilities ...
Private Analysis of Infinite Data Streams via Retroactive Grouping
Rui Chen, Yilin Shen, Hongxia Jin
Pages: 1061-1070
Full text: PDFPDF

With the rapid advances in hardware technology, data streams are being generated daily in large volumes, enabling a wide range of real-time analytical tasks. Yet data streams from many sources are inherently sensitive, and thus providing continuous privacy ...
SESSION: Session 5E: Classification 2
Session details: Session 5E: Classification 2
Ping Luo
Parallel Lazy Semi-Naive Bayes Strategies for Effective and Efficient Document Classification
Felipe Viegas, Marcos André Gonçalves, Wellington Martins, Leonardo Rocha
Pages: 1071-1080
Full text: PDFPDF

Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering and content organization. Naive Bayes (NB) approaches are a widely used classification paradigm, due to their simplicity, efficiency, absence of ...
A Novel Class Noise Estimation Method and Application in Classification
Lin Gui, Qin Lu, Ruifeng Xu, Minglei Li, Qikang Wei
Pages: 1081-1090
Full text: PDFPDF

Noise in class labels of any training set can lead to poor classification results no matter what machine learning method is used. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, ...
Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
Meenakshi Mishra, Jun Huan
Pages: 1091-1100
Full text: PDFPDF

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image annotation, ...
KSGM: Keynode-driven Scalable Graph Matching
Xilun Chen, K. Selçuk Candan, Maria Luisa Sapino, Paulo Shakarian
Pages: 1101-1110
Full text: PDFPDF

Understanding how a given pair of graphs align with each other (also known as the graph matching problem) is a critical task in many search, classification, and analysis applications. Unfortunately, the problem of maximum common subgraph isomorphism ...
SESSION: Session 5F: Sentiment and Content Analysis
Session details: Session 5F: Sentiment and Content Analysis
Ke Deng
Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework
Bing Hu, Bin Liu, Neil Zhenqiang Gong, Deguang Kong, Hongxia Jin
Pages: 1111-1120
Full text: PDFPDF

Mobile applications (Apps) could expose children or adolescents to mature themes such as sexual content, violence and drug use, which results in an inappropriate security and privacy risk for them. Therefore, mobile platforms provide rating policies ...
The Role of Query Sessions in Interpreting Compound Noun Phrases
Marius Pasca
Pages: 1121-1129
Full text: PDFPDF

The meaning of compound noun phrases can be approximated in the form of lexical interpretations extracted from text. The interpretations hint at the role that modifiers play relative to heads within the noun phrases. In a study examining the role of ...
Deep Semantic Frame-Based Deceptive Opinion Spam Analysis
Seongsoon Kim, Hyeokyoon Chang, Seongwoon Lee, Minhwan Yu, Jaewoo Kang
Pages: 1131-1140
Full text: PDFPDF

User-generated content is becoming increasingly valuable to both individuals and businesses due to its usefulness and influence in e-commerce markets. As consumers rely more on such information, posting deceptive opinions, which can be deliberately used ...
Topic Modeling in Semantic Space with Keywords
Xiaojia Pu, Rong Jin, Gangshan Wu, Dingyi Han, Gui-Rong Xue
Pages: 1141-1150
Full text: PDFPDF

A common and convenient approach for user to describe his information need is to provide a set of keywords. Therefore, the technique to understand the need becomes crucial. In this paper, for the information need about a topic or category, we propose ...
SESSION: Session 6A: Time Series and Streams
Session details: Session 6A: Time Series and Streams
Jenny Xuizhen Zhang
F1: Accelerating the Optimization of Aggregate Continuous Queries
Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis
Pages: 1151-1160
Full text: PDFPDF

Data Stream Management Systems performing on-line analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). The state-of-the-art WeaveShare optimizer uses the Weavability concept in order to selectively ...
Fast Distributed Correlation Discovery Over Streaming Time-Series Data
Tian Guo, Saket Sathe, Karl Aberer
Pages: 1161-1170
Full text: PDFPDF

The dramatic rise of time-series data in a variety of contexts, such as social networks, mobile sensing, data centre monitoring, etc., has fuelled interest in obtaining real-time insights from such data using distributed stream processing systems. One ...
Time Series Analysis of Nursing Notes for Mortality Prediction via a State Transition Topic Model
Yohan Jo, Natasha Loghmanpour, Carolyn Penstein Rosé
Pages: 1171-1180
Full text: PDFPDF

Accurate mortality prediction is an important task in intensive care units in order to channel prompt care to patients in the most critical condition and to reduce nurses' alarm fatigue. Nursing notes carry valuable information in this regard, but nothing ...
SESSION: Session 6B: Adaptive Learning
Session details: Session 6B: Adaptive Learning
Damiano Spina
Learning Relative Similarity from Data Streams: Active Online Learning Approaches
Shuji Hao, Peilin Zhao, Steven C.H. Hoi, Chunyan Miao
Pages: 1181-1190
Full text: PDFPDF

Relative similarity learning, as an important learning scheme for information retrieval, aims to learn a bi-linear similarity function from a collection of labeled instance-pairs, and the learned function would assign a high similarity value for a similar ...
Ad Hoc Monitoring of Vocabulary Shifts over Time
Tom Kenter, Melvin Wevers, Pim Huijnen, Maarten de Rijke
Pages: 1191-1200
Full text: PDFPDF

Word meanings change over time. Detecting shifts in meaning for particular words has been the focus of much research recently. We address the complementary problem of monitoring shifts in vocabulary over time. That is, given a small seed set of words, ...
Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events
Tuan A. Tran, Claudia Niederee, Nattiya Kanhabua, Ujwal Gadiraju, Avishek Anand
Pages: 1201-1210
Full text: PDFPDF

Long-running, high-impact events such as the Boston Marathon bombing often develop through many stages and involve a large number of entities in their unfolding. Timeline summarization of an event by key sentences eases story digestion, but does not ...
SESSION: Session 6C: Points-of-Interest
Session details: Session 6C: Points-of-Interest
Egemen Tanin
Location-Based Influence Maximization in Social Networks
Tao Zhou, Jiuxin Cao, Bo Liu, Shuai Xu, Ziqing Zhu, Junzhou Luo
Pages: 1211-1220
Full text: PDFPDF

In this paper, we aim at the product promotion in O2O model and carry out the research of location-based influence maximization on the platform of LBSN. As offline consuming behavior exists under the O2O environment, the traditional online influence ...
Location and Time Aware Social Collaborative Retrieval for New Successive Point-of-Interest Recommendation
Wei Zhang, Jianyong Wang
Pages: 1221-1230
Full text: PDFPDF

In location-based social networks (LBSNs), new successive point-of-interest (POI) recommendation is a newly formulated task which tries to regard the POI a user currently visits as his POI-related query and recommend new POIs the user has not visited ...
Where you Instagram?: Associating Your Instagram Photos with Points of Interest
Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, Shonali Krishnaswamy
Pages: 1231-1240
Full text: PDFPDF

Instagram, an online photo-sharing platform, has gained increasing popularity. It allows users to take photos, apply digital filters and share them with friends instantaneously by using mobile devices.Instagram provides users with the functionality to ...
SESSION: Session 6D: Matrices
Session details: Session 6D: Matrices
Weidong Cai
Gradient-based Signatures for Efficient Similarity Search in Large-scale Multimedia Databases
Christian Beecks, Merih Seran Uysal, Judith Hermanns, Thomas Seidl
Pages: 1241-1250
Full text: PDFPDF

With the continuous rise of multimedia, the question of how to access large-scale multimedia databases efficiently has become of crucial importance. Given a multimedia database comprising millions of multimedia objects, how to approximate the content-based ...
Cross-Modal Similarity Learning: A Low Rank Bilinear Formulation
Cuicui Kang, Shengcai Liao, Yonghao He, Jian Wang, Wenjia Niu, Shiming Xiang, Chunhong Pan
Pages: 1251-1260
Full text: PDFPDF

The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. ...
Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis
Yong-Yeon Jo, Sang-Wook Kim, Duck-Ho Bae
Pages: 1261-1270
Full text: PDFPDF

As a number of social network services appear online recently, there have been many attempts to analyze social networks for extracting valuable information. Most existing methods first represent a social network as a quite sparse adjacency matrix, ...
SESSION: Session 6E: Citation Networks
Session details: Session 6E: Citation Networks
Zhifeng Bao
The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset
Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal
Pages: 1271-1280
Full text: PDFPDF

The impact and significance of a scientific publication is measured mostly by the number of citations it accumulates over the years. Early prediction of the citation profile of research articles is a significant as well as challenging problem. In this ...
Discovering Canonical Correlations between Topical and Topological Information in Document Networks
Yuan He, Cheng Wang, Changjun Jiang
Pages: 1281-1290
Full text: PDFPDF

Document network is a kind of intriguing dataset which can provide both topical (textual content) and topological (relational link) information. A key point in viably modeling such datasets is to discover proper denominators beneath the two different ...
Chronological Citation Recommendation with Information-Need Shifting
Zhuoren Jiang, Xiaozhong Liu, Liangcai Gao
Pages: 1291-1300
Full text: PDFPDF

As the volume of publications has increased dramatically, an urgent need has developed to assist researchers in locating high-quality, candidate-cited papers from a research repository. Traditional scholarly-recommendation approaches ignore the chronological ...
SESSION: Session 6F: Knowledge Bases
Session details: Session 6F: Knowledge Bases
Vanessa Murdock
Answering Questions with Complex Semantic Constraints on Open Knowledge Bases
Pengcheng Yin, Nan Duan, Ben Kao, Junwei Bao, Ming Zhou
Pages: 1301-1310
Full text: PDFPDF

A knowledge-based question-answering system (KB-QA) is one that answers natural language questions with information stored in a large-scale knowledge base (KB). Existing KB-QA systems are either powered by curated KBs in which factual knowledge ...
Inducing Space Dirichlet Process Mixture Large-Margin Entity RelationshipInference in Knowledge Bases
Sotirios P. Chatzis
Pages: 1311-1320
Full text: PDFPDF

In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it. This is an essential problem of knowledge representation systems, since knowledge bases typically ...
Semi-Automated Exploration of Data Warehouses
Thibault Sellam, Emmanuel Müller, Martin Kersten
Pages: 1321-1330
Full text: PDFPDF

Exploratory data analysis tries to discover novel dependencies and unexpected patterns in large databases. Traditionally, this process is manual and hypothesis-driven. However, analysts can come short of patience and imagination. In this paper, we introduce ...
Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances
Zhuoyu Wei, Jun Zhao, Kang Liu, Zhenyu Qi, Zhengya Sun, Guanhua Tian
Pages: 1331-1340
Full text: PDFPDF

Constructing large-scale knowledge bases has attracted much attention in recent years, for which Knowledge Base Completion (KBC) is a key technique. In general, inferring new facts in a large-scale knowledge base is not a trivial task. The large number ...
SESSION: Keynote Address III
Session details: Keynote Address III
Shonali Krishnaswamy
Large-Scale Analysis of Dynamics of Choice Among Discrete Alternatives
Andrew Tomkins
Pages: 1349-1349
Full text: PDFPDF

The online world is rife with scenarios in which a user must select one from a finite set of alternatives: which movie to watch, which song to play, which camera to order, which website to visit. There is a long history of study of these types of questions ...
SESSION: Session 7A: Database Optimization
Session details: Session 7A: Database Optimization
Sven Helmer
On Gapped Set Intersection Size Estimation
Chen Chen, Jianbin Qin, Wei Wang
Pages: 1351-1360
Full text: PDFPDF

There exists considerable literature on estimating the cardinality of set intersection result. In this paper, we consider a generalized problem for integer sets where, given a gap parameter δ, two elements are deemed as matches if their numeric ...
Inclusion Dependencies Reloaded
Henning Köhler, Sebastian Link
Pages: 1361-1370
Full text: PDFPDF

Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data cleaning and profiling, entity resolution and schema matching. Surprisingly, ...
Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents
Ioana Giurgiu, Mirela Botezatu, Dorothea Wiesmann
Pages: 1371-1380
Full text: PDFPDF

Configuring enterprise database management systems is a notoriously hard problem. The combinatorial parameter space makes it intractable to run and observe the DBMS behavior in all scenarios. Thus, the database administrator has the difficult task of ...
An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model
Krzysztof Marcin Choromanski, Afshin Rostamizadeh, Umar Syed
Pages: 1381-1390
Full text: PDFPDF

We give the first Õ(1 over √ T)-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm is optimal up to the poly(log(T)) factor ...
SESSION: Session 7B: Retrieval Enhancements 2
Session details: Session 7B: Retrieval Enhancements 2
Mark Sanderson
Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model
Pavel Metrikov, Virgil Pavlu, Javed A. Aslam
Pages: 1391-1400
Full text: PDFPDF

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the ...
Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization
Peng Li, Weidong Cai, Heng Huang
Pages: 1401-1410
Full text: PDFPDF

In this paper, we propose a new weakly supervised abstractive news summarization framework using pattern based approaches. Our system first generates meaningful patterns from sentences. Then, in order to precisely cluster patterns, we propose a novel ...
Short Text Similarity with Word Embeddings
Tom Kenter, Maarten de Rijke
Pages: 1411-1420
Full text: PDFPDF

Determining semantic similarity between texts is important in many tasks in information retrieval such as search, query suggestion, automatic summarization and image finding. Many approaches have been suggested, based on lexical matching, handcrafted ...
Building Representative Composite Items
VIncent Leroy, Sihem Amer-Yahia, Eric Gaussier, Hamid Mirisaee
Pages: 1421-1430
Full text: PDFPDF

The problem of summarizing a large collection of homogeneous items has been addressed extensively in particular in the case of geo-tagged datasets (e.g. Flickr photos and tags). In our work, we study the problem of summarizing large collections ...
SESSION: Session 7C: Search Mechanisms
Session details: Session 7C: Search Mechanisms
Justin Zobel
More Accurate Question Answering on Freebase
Hannah Bast, Elmar Haussmann
Pages: 1431-1440
Full text: PDFPDF

Real-world factoid or list questions often have a simple structure, yet are hard to match to facts in a given knowledge base due to high representational and linguistic variability. For example, to answer "who is the ceo of apple" on Freebase requires ...
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
Jyun-Yu Jiang, Jing Liu, Chin-Yew Lin, Pu-Jen Cheng
Pages: 1441-1450
Full text: PDFPDF

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
Assessing the Impact of Syntactic and Semantic Structures for Answer Passages Reranking
Kateryna Tymoshenko, Alessandro Moschitti
Pages: 1451-1460
Full text: PDFPDF

In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task. We propose several dependency-based structures enriched with Linked Open Data (LD) ...
Ranking Entities for Web Queries Through Text and Knowledge
Michael Schuhmacher, Laura Dietz, Simone Paolo Ponzetto
Pages: 1461-1470
Full text: PDFPDF

When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style ...
SESSION: Session 7D: Social Networks 3
Session details: Session 7D: Social Networks 3
Carsten Eickhoff
What Is a Network Community?: A Novel Quality Function and Detection Algorithms
Atsushi Miyauchi, Yasushi Kawase
Pages: 1471-1480
Full text: PDFPDF

In this study, we introduce a novel quality function for a network community, which we refer to as the communitude. The communitude has a strong statistical background. Specifically, it measures the Z-score of a subset of vertices S with respect ...
DifRec: A Social-Diffusion-Aware Recommender System
Hossein Vahabi, Iordanis Koutsopoulos, Francesco Gullo, Maria Halkidi
Pages: 1481-1490
Full text: PDFPDF

Recommender systems used in current online social platforms make recommendations by only considering how relevant an item is to a specific user but they ignore the fact that, thanks to mechanisms like sharing or re-posting across the underlying social ...
Who With Whom And How?: Extracting Large Social Networks Using Search Engines
Stefan Siersdorfer, Philipp Kemkes, Hanno Ackermann, Sergej Zerr
Pages: 1491-1500
Full text: PDFPDF

Social network analysis is leveraged in a variety of applications such as identifying influential entities, detecting communities with special interests, and determining the flow of information and innovations. However, existing approaches for extracting ...
Modeling Individual-Level Infection Dynamics Using Social Network Information
Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram
Pages: 1501-1510
Full text: PDFPDF

Epidemic monitoring systems engaged in accurate discovery of infected individuals enable better understanding of the dynamics of epidemics and thus may promote effective disease mitigation or prevention. Currently, infection discovery systems require ...
SESSION: Session 8A: Query Evaluation
Session details: Session 8A: Query Evaluation
Yiqun Liu
Finding Probabilistic k-Skyline Sets on Uncertain Data
Jinfei Liu, Haoyu Zhang, Li Xiong, Haoran Li, Jun Luo
Pages: 1511-1520
Full text: PDFPDF

Skyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is ...
Ordering Selection Operators Under Partial Ignorance
Khaled H. Alyoubi, Sven Helmer, Peter T. Wood
Pages: 1521-1530
Full text: PDFPDF

Optimising queries in real-world situations under imperfect conditions is still a problem that has not been fully solved. We consider finding the optimal order in which to execute a given set of selection operators under partial ignorance of their selectivities. ...
Querying Temporal Drifts at Multiple Granularities
Sofia Kleisarchaki, Sihem Amer-Yahia, Ahlame Douzal-Chouakria, Vassilis Christophides
Pages: 1531-1540
Full text: PDFPDF

There exists a large body of work on online drift detection with the goal of dynamically finding and maintaining changes in data streams. In this paper, we adopt a query-based approach to drift detection. Our approach relies on a drift index, ...
Efficient Incremental Evaluation of Succinct Regular Expressions
Henrik Björklund, Wim Martens, Thomas Timm
Pages: 1541-1550
Full text: PDFPDF

Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming ...
SESSION: Session 8B: Web Search
Session details: Session 8B: Web Search
David Hawking
Struggling and Success in Web Search
Daan Odijk, Ryen W. White, Ahmed Hassan Awadallah, Susan T. Dumais
Pages: 1551-1560
Full text: PDFPDF

Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle ...
Behavioral Dynamics from the SERP's Perspective: What are Failed SERPs and How to Fix Them?
Julia Kiseleva, Jaap Kamps, Vadim Nikulin, Nikita Makarov
Pages: 1561-1570
Full text: PDFPDF

Web search is always in a state of flux: queries, their intent, and the most relevant content are changing over time, in predictable and unpredictable ways. Modern search technology has made great strides in keeping up to pace with these changes, but ...
What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries
Michael Völske, Pavel Braslavski, Matthias Hagen, Galina Lezina, Benno Stein
Pages: 1571-1580
Full text: PDFPDF

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users' needs. Based on a dataset of about one billion question queries submitted during ...
Does Vertical Bring more Satisfaction?: Predicting Search Satisfaction in a Heterogeneous Environment
Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, Shaoping Ma
Pages: 1581-1590
Full text: PDFPDF

The study of search satisfaction is one of the prime concerns in search performance evaluation research. Most existing works on search satisfaction primarily rely on the hypothesis that all results on search engine result pages (SERPs) are homogeneous. ...
SESSION: Session 8C: Social Media 2
Session details: Session 8C: Social Media 2
Karin Verspoor
Characterizing and Predicting Viral-and-Popular Video Content
David Vallet, Shlomo Berkovsky, Sebastien Ardon, Anirban Mahanti, Mohamed Ali Kafaar
Pages: 1591-1600
Full text: PDFPDF

The proliferation of online video content has triggered numerous works on its evolution and popularity, as well as on the effect of social sharing on content propagation. In this paper, we focus on the observable dependencies between the virality of ...
Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization
Fangzhao Wu, Jinyun Shu, Yongfeng Huang, Zhigang Yuan
Pages: 1601-1610
Full text: PDFPDF

The popularity of microblogging platforms, such as Twitter, makes them important for information dissemination and sharing. However, they are also recognized as ideal places by spammers to conduct social spamming. Massive social spammers and spam messages ...
Central Topic Model for Event-oriented Topics Mining in Microblog Stream
Min Peng, Jiahui Zhu, Xuhui Li, Jiajia Huang, Hua Wang, Yanchun Zhang
Pages: 1611-1620
Full text: PDFPDF

To date, data generates and arrives in the form of stream to propagate discussions of public events in microblog services. Discovering event-oriented topics from the stream will lead to a better understanding of the change of public concern. However, ...
Video Popularity Prediction by Sentiment Propagation via Implicit Network
Wanying Ding, Yue Shang, Lifan Guo, Xiaohua Hu, Rui Yan, Tingting He
Pages: 1621-1630
Full text: PDFPDF

Video popularity prediction plays a foundational role in many aspects of life, such as recommendation systems and investment consulting. Because of its technological and economic importance, this problem has been extensively studied for years. However, ...
SESSION: Session 8D: Recommendation
Session details: Session 8D: Recommendation
Gangshan Wu
Joint Modeling of User Check-in Behaviors for Point-of-Interest Recommendation
Hongzhi Yin, Xiaofang Zhou, Yingxia Shao, Hao Wang, Shazia Sadiq
Pages: 1631-1640
Full text: PDFPDF

Point-of-Interest (POI) recommendation has become an important means to help people discover attractive and interesting locations, especially when users travel out of town. However, extreme sparsity of user-POI matrix creates a severe challenge. To cope ...
ORec: An Opinion-Based Point-of-Interest Recommendation Framework
Jia-Dong Zhang, Chi-Yin Chow, Yu Zheng
Pages: 1641-1650
Full text: PDFPDF

As location-based social networks (LBSNs) rapidly grow, it is a timely topic to study how to recommend users with interesting locations, known as points-of-interest (POIs). Most existing POI recommendation techniques only employ the check-in data ...
Toward Dual Roles of Users in Recommender Systems
Suhang Wang, Jiliang Tang, Huan Liu
Pages: 1651-1660
Full text: PDFPDF

Users usually play dual roles in real-world recommender systems. One is as a reviewer who writes reviews for items with rating scores, and the other is as a rater who rates the helpfulness scores of reviews. Traditional recommender systems mainly consider ...
TriRank: Review-aware Explainable Recommendation by Modeling Aspects
Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen
Pages: 1661-1670
Full text: PDFPDF

Most existing collaborative filtering techniques have focused on modeling the binary relation of users to items by extracting from user ratings. Aside from users' ratings, their affiliated reviews often provide the rationale for their ratings and identify ...
SESSION: Short Papers: Databases
RoadRank: Traffic Diffusion and Influence Estimation in Dynamic Urban Road Networks
Tarique Anwar, Chengfei Liu, Hai L. Vu, Md. Saiful Islam
Pages: 1671-1674
Full text: PDFPDF

With the rapidly growing population in urban areas, these days the urban road networks are expanding at a faster rate. The frequent movement of people on them leads to traffic congestions. These congestions originate from some crowded road segments, ...
On Query-Update Independence for SPARQL
Nicola Guido, Pierre Genevès, Nabil Layaïda, Cécile Roisin
Pages: 1675-1678
Full text: PDFPDF

This paper investigates techniques for detecting independence of SPARQL queries from updates. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful ...
A Structured Query Model for the Deep Relational Web
Hasan M. Jamil, Hosagrahar V. Jagadish
Pages: 1679-1682
Full text: PDFPDF

The deep web is very large and diverse and queries evaluated against the deep web can provide great value. While there have been attempts at accessing the data in the deep web, these are clever "one-of'' systems and techniques. In this paper, we describe ...
A Flash-aware Buffering Scheme using On-the-fly Redo
Kyosung Jeong, Sang-Wook Kim, Sungchae Lim
Pages: 1683-1686
Full text: PDFPDF

In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped with SSD (Solid State Drive). We propose a novel buffering scheme that evicts a dirty page X without flushing it into SSD, and restores the right image of ...
Defragging Subgraph Features for Graph Classification
Haishuai Wang, Peng Zhang, Ivor Tsang, Ling Chen, Chengqi Zhang
Pages: 1687-1690
Full text: PDFPDF

Graph classification is an important tool for analysing structured and semi-structured data, where subgraphs are commonly used as the feature representation. However, the number and size of subgraph features crucially depend on the threshold parameters ...
Structural Constraints for Multipartite Entity Resolution with Markov Logic Network
Tengyuan Ye, Hady W. Lauw
Pages: 1691-1694
Full text: PDFPDF

Multipartite entity resolution seeks to match entity mentions across several collections. An entity mention is presumed unique within a collection, and thus could match at most one entity mention in each of the other collections. In addition to domain-specific ...
SESSION: Short Papers: Information Retrieval
Know Your Onions: Understanding the User Experience with the Knowledge Module in Web Search
Ioannis Arapakis, Luis A. Leiva, B. Barla Cambazoglu
Pages: 1695-1698
Full text: PDFPDF

The increasing availability of large volumes of human-curated content is shifting web search towards a paradigm that introduces seamlessly more semantic information to search engine result pages. This trend has resulted in the design of a new element ...
Personalized Federated Search at LinkedIn
Dhruv Arya, Viet Ha-Thuc, Shakti Sinha
Pages: 1699-1702
Full text: PDFPDF

LinkedIn has grown to become a platform hosting diverse sources of information ranging from member profiles, jobs, professional groups, slideshows etc. Given the existence of multiple sources, when a member issues a query like "software engineer", the ...
Balancing Exploration and Exploitation: Empirical Parameterization of Exploratory Search Systems
Kumaripaba Ahukorala, Alan Medlar, Kalle Ilves, Dorota Glowacka
Pages: 1703-1706
Full text: PDFPDF

Exploratory searches are where a user has insufficient knowledge to define exact search criteria or does not otherwise know what they are looking for. Reinforcement learning techniques have demonstrated great potential for supporting exploratory search ...
On Predicting Deletions of Microblog Posts
Mossaab Bagdouri, Douglas W. Oard
Pages: 1707-1710
Full text: PDFPDF

Among the many classification tasks on Twitter content, predicting whether a tweet will be deleted has to date received relatively little attention. Deletions occur for a variety of reasons, which can make the classification task challenging. Moreover, ...
Semi-Automated Text Classification for Sensitivity Identification
Giacomo Berardi, Andrea Esuli, Craig Macdonald, Iadh Ounis, Fabrizio Sebastiani
Pages: 1711-1714
Full text: PDFPDF

Sensitive documents are those that cannot be made public, e.g., for personal or organizational privacy reasons. For instance, documents requested through Freedom of Information mechanisms must be manually reviewed for the presence of sensitive information ...
Identification of Microblogs Prominent Users during Events by Learning Temporal Sequences of Features
Imen Bizid, Nibal Nayef, Patrice Boursier, Sami Faiz, Antoine Doucet
Pages: 1715-1718
Full text: PDFPDF

During specific real-world events, some users of microblogging platforms could provide exclusive information about those events. The identification of such prominent users depends on several factors such as the freshness and the relevance of their shared ...
A Real-Time Eye Tracking Based Query Expansion Approach via Latent Topic Modeling
Yongqiang Chen, Peng Zhang, Dawei Song, Benyou Wang
Pages: 1719-1722
Full text: PDFPDF

Formulating and reformulating reliable textual queries have been recognized as a challenging task in Information Retrieval (IR), even for experienced users. Most existing query expansion methods, especially those based on implicit relevance feedback, ...
Clustered Semi-Supervised Relevance Feedback
Kripabandhu Ghosh, Swapan Kumar Parui
Pages: 1723-1726
Full text: PDFPDF

In relevance feedback, first-round search results are used to boost second-round search results. Two forms have been traditionally considered: exhaustively labelled feedback, where all first-round results to depth k are annotated for relevance ...
On the Effect of "Stupid" Search Components on User Interaction with Search Engines
Lidia Grauer, Aleksandra Lomakina
Pages: 1727-1730
Full text: PDFPDF

Using eye-tracking, we investigate how searchers interact with Web search engines which get affected by nonsensical results. We conduct a user survey to choose "stupid" components for our laboratory experiment and explore the most conspicuous ones. This ...
Social-Relational Topic Model for Social Networks
Weiyu Guo, Shu Wu, Liang Wang, Tieniu Tan
Pages: 1731-1734
Full text: PDFPDF

Social networking services, such as Twitter and Sina Weibo, have tremendous popularity in recent years. Mass of short texts and social links are aggregated into these service platforms. To realize personalized services on social network, topic inference ...
Building Effective Query Classifiers: A Case Study in Self-harm Intent Detection
Ashiqur R. KhudaBukhsh, Paul N. Bennett, Ryen W. White
Pages: 1735-1738
Full text: PDFPDF

Query-based triggers play a crucial role in modern search systems, e.g., in deciding when to display direct answers on result pages. We address a common scenario in designing such triggers for real-world settings where positives are rare and search providers ...
Modelling the Usefulness of Document Collections for Query Expansion in Patient Search
Nut Limsopatham, Craig Macdonald, Iadh Ounis
Pages: 1739-1742
Full text: PDFPDF

Dealing with the medical terminology is a challenge when searching for patients based on the relevance of their medical records towards a given query. Existing work used query expansion (QE) to extract expansion terms from different document collections ...
A Convolutional Click Prediction Model
Qiang Liu, Feng Yu, Shu Wu, Liang Wang
Pages: 1743-1746
Full text: PDFPDF

The explosion in online advertisement urges to better estimate the click prediction of ads. For click prediction on single ad impression, we have access to pairwise relevance among elements in an impression, but not to global interaction among key features ...
A Study of Query Length Heuristics in Information Retrieval
Yuanhua Lv
Pages: 1747-1750
Full text: PDFPDF

Query length has generally been regarded as a query-specific constant that does not affect document ranking. In this paper, we reveal that query length actually interacts with term frequency (TF) normalization, a key component of all effective retrieval ...
Detect Rumors Using Time Series of Social Context Information on Microblogging Websites
Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, Kam-Fai Wong
Pages: 1751-1754
Full text: PDFPDF

Automatically identifying rumors from online social media especially microblogging websites is an important research issue. Most of existing work for rumor detection focuses on modeling features related to microblog contents, users and propagation patterns, ...
Query Auto-Completion for Rare Prefixes
Bhaskar Mitra, Nick Craswell
Pages: 1755-1758
Full text: PDFPDF

Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking ...
Pooled Evaluation Over Query Variations: Users are as Diverse as Systems
Alistair Moffat, Falk Scholer, Paul Thomas, Peter Bailey
Pages: 1759-1762
Full text: PDFPDF

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive ...
The Influence of Pre-processing on the Estimation of Readability of Web Documents
João Rafael de Moura Palotti, Guido Zuccon, Allan Hanbury
Pages: 1763-1766
Full text: PDFPDF

This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely ...
Atypical Queries in eCommerce
Neeraj Pradhan, Vinay Deolalikar, Kang Li
Pages: 1767-1770
Full text: PDFPDF

Understanding how specific, ambiguous, or broad the intent of a search query is, across all users of the system, is important in improving search relevance in eCommerce. There is scant literature on such a structural characterization of queries in eCommerce. ...
Bottom-up Faceted Search: Creating Search Neighbourhoods with Datacube Cells
Mark Sifer
Pages: 1771-1774
Full text: PDFPDF

Browsing a collection can start with a keyword search. A user visits a library, performs a keyword search to find a few books of interest; finding their location in the library. Then they go to these locations; the corresponding bookshelves, where they ...
Personalized Recommendation Meets Your Next Favorite
Qiang Song, Jian Cheng, Ting Yuan, Hanqing Lu
Pages: 1775-1778
Full text: PDFPDF

A comprehensive understanding of user's item selection behavior is not only essential to many scientific disciplines, but also has a profound business impact on online recommendation. Recent researches have discovered that user's favorites can be divided ...
Recommending Short-lived Dynamic Packages for Golf Booking Services
Robin Swezey, Young-joo Chung
Pages: 1779-1782
Full text: PDFPDF

We introduce an approach to recommending short-lived dynamic packages for golf booking services. Two challenges are addressed in this work. The first is the short life of the items, which puts the system in a state of a permanent cold start. The second ...
Large-Scale Question Answering with Joint Embedding and Proof Tree Decoding
Zhenghao Wang, Shengquan Yan, Huaming Wang, Xuedong Huang
Pages: 1783-1786
Full text: PDFPDF

Question answering (QA) over a large-scale knowledge base (KB) such as Freebase is an important natural language processing application. There are linguistically oriented semantic parsing techniques and machine learning motivated statistical methods. ...
Query Length, Retrievability Bias and Performance
Colin Wilkie, Leif Azzopardi
Pages: 1787-1790
Full text: PDFPDF

Past work has shown that longer queries tend to lead to better retrieval performance. However, this comes at the cost of increased user effort effort and additional system processing. In this paper, we examine whether there are benefits of longer queries ...
Gauging Correct Relative Rankings For Similarity Search
Weiren Yu, Julie McCann
Pages: 1791-1794
Full text: PDFPDF

One of the important tasks in link analysis is to quantify the similarity between two objects based on hyperlink structure. SimRank is an attractive similarity measure of this type. Existing work mainly focuses on absolute SimRank scores, and often harnesses ...
Learning User Preferences for Topically Similar Documents
Mustafa Zengin, Ben Carterette
Pages: 1795-1798
Full text: PDFPDF

Similarity measures have been used widely in information retrieval research. Most research has been done on query-document or document-document similarity without much attention to the user's perception of similarity in the context of the information ...
Modeling Parameter Interactions in Ranking SVM
Yaogong Zhang, Jun Xu, Yanyan Lan, Jiafeng Guo, Maoqiang Xie, Yalou Huang, Xueqi Cheng
Pages: 1799-1802
Full text: PDFPDF

Ranking SVM, which formalizes the problem of learning a ranking model as that of learning a binary SVM on preference pairs of documents, is a state-of-the-art ranking model in information retrieval. The dual form solution of Ranking SVM model can be ...
SESSION: Short Papers: Knowledge Management
Best First Over-Sampling for Multilabel Classification
Xusheng Ai, Jian Wu, Victor S. Sheng, Yufeng Yao, Pengpeng Zhao, Zhiming Cui
Pages: 1803-1806
Full text: PDFPDF

Learning from imbalanced multilabel data is a challenging task. It has attracted considerable attention recently. In this paper we propose a MultiLabel Best First Over-sampling (ML-BFO) to improve the performance of multilabel classification algorithms, ...
Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity
Melissa Ailem, François Role, Mohamed Nadif
Pages: 1807-1810
Full text: PDFPDF

We present Coclus, a novel diagonal co-clustering algorithm which is able to effectively co-cluster binary or contingency matrices by directly maximizing an adapted version of the modularity measure traditionally used for networks. While some effective ...
A Data-Driven Approach to Distinguish Cyber-Attacks from Physical Faults in a Smart Grid
Adnan Anwar, Abdun Naser Mahmood, Zubair Shah
Pages: 1811-1814
Full text: PDFPDF

Recently, there has been significant increase in interest on Smart Grid security. Researchers have proposed various techniques to detect cyber-attacks using sensor data. However, there has been little work to distinguish a cyber-attack from a power system ...
Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text
Andrea Ceroni, Ujwal Kumar Gadiraju, Marco Fisichella
Pages: 1815-1818
Full text: PDFPDF

Manually inspecting text to assess whether an event occurs in a document collection is an onerous and time consuming task. Although a manual inspection to discard the false events would increase the precision of automatically detected sets of events, ...
DAAV: Dynamic API Authority Vectors for Detecting Software Theft
Dong-Kyu Chae, Sang-Wook Kim, Seong-Je Cho, Yesol Kim
Pages: 1819-1822
Full text: PDFPDF

This paper proposes a novel birthmark, a dynamic API authority vector (DAAV), for detecting software theft. DAAV satisfies four essential requirements for good birthmarks--credibility, resiliency, scalability, and packing-free--while existing birthmarks ...
Towards Multi-level Provenance Reconstruction of Information Diffusion on Social Media
Tom De Nies, Io Taxidou, Anastasia Dimou, Ruben Verborgh, Peter M. Fischer, Erik Mannens, Rik Van de Walle
Pages: 1823-1826
Full text: PDFPDF

In order to assess the trustworthiness of information on social media, a consumer needs to understand where this information comes from, and which processes were involved in its creation. The entities, agents and activities involved in the creation of ...
Profiling Pedestrian Distribution and Anomaly Detection in a Dynamic Environment
Minh Tuan Doan, Sutharshan Rajasegarar, Mahsa Salehi, Masud Moshtaghi, Christopher Leckie
Pages: 1827-1830
Full text: PDFPDF

Pedestrians movements have a major impact on the dynamics of cities and provide valuable guidance to city planners. In this paper we model the normal behaviours of pedestrian flows and detect anomalous events from pedestrian counting data of the City ...
A Clustering-based Approach to Detect Probable Outcomes of Lawsuits
Daniel Lemes Gribel, Maira Gatti de Bayser, Leonardo Guerreiro Azevedo
Pages: 1831-1834
Full text: PDFPDF

The numerous lawsuits in progress or already judged by the Brazilian Supreme Court consists of a large amount of non-structured data. This leads to a large number of hidden or unknown information, since some relationships between lawsuits are not explicit ...
Detecting Check-worthy Factual Claims in Presidential Debates
Naeemul Hassan, Chengkai Li, Mark Tremayne
Pages: 1835-1838
Full text: PDFPDF

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
Where You Go Reveals Who You Know: Analyzing Social Ties from Millions of Footprints
Hsun-Ping Hsieh, Rui Yan, Cheng-Te Li
Pages: 1839-1842
Full text: PDFPDF

This paper aims to investigate how the geographical footprints of users correlate to their social ties. While conventional wisdom told us that the more frequently two users co-locate in geography, the higher probability they are friends, we find that ...
Message Clustering based Matrix Factorization Model for Retweeting Behavior Prediction
Bo Jiang, Jiguang Liang, Ying Sha, Lihong Wang
Pages: 1843-1846
Full text: PDFPDF

Retweeting is an important mechanism for information diffusion in social networks. Through retweeting, message is reshared from one user to another user, forming large cascades of message forwarding. Most existing researches of predicting retweeting ...
Heterogeneous Multi-task Semantic Feature Learning for Classification
Xin Jin, Fuzhen Zhuang, Sinno Jialin Pan, Changying Du, Ping Luo, Qing He
Pages: 1847-1850
Full text: PDFPDF

Multi-task Learning (MTL) aims to learn multiple related tasks simultaneously instead of separately to improve generalization performance of each task. Most existing MTL methods assumed that the multiple tasks to be learned have the same feature representation. ...
Top-k Reliable Edge Colors in Uncertain Graphs
Arijit Khan, Francesco Gullo, Thomas Wohler, Francesco Bonchi
Pages: 1851-1854
Full text: PDFPDF

We study the fundamental problem of finding the set of top-k edge colors that maximizes the reliability between a source node and a destination node in an uncertain and edge-colored graph. Our top-k reliable color set problem naturally ...
Probabilistic Non-negative Inconsistent-resolution Matrices Factorization
Masahiro Kohjima, Tatsushi Matsubayashi, Hiroshi Sawada
Pages: 1855-1858
Full text: PDFPDF

In this paper, we tackle with the problem of analyzing datasets with different resolution such as a pair of user's individual data and user group's data, for example "userA visited shopA 5 times" and "users whose attributes are men purchased itemA 80 ...
Identifying Attractive News Headlines for Social Media
Sawa Kourogi, Hiroyuki Fujishiro, Akisato Kimura, Hitoshi Nishikawa
Pages: 1859-1862
Full text: PDFPDF

In the past, leading newspaper companies and broadcasters were the sole distributors of news articles, and thus news consumers simply received news articles from those outlets at regular intervals. However, the growth of social media and smart devices ...
A Probabilistic Rating Auto-encoder for Personalized Recommender Systems
Huizhi Liang, Timothy Baldwin
Pages: 1863-1866
Full text: PDFPDF

User profiling is a key component of personalized recommender systems, and is used to generate user profiles that describe individual user interests and preferences. The increasing availability of big data is driving the urgent need for user profiling ...
Real-time Rumor Debunking on Twitter
Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, Sameena Shah
Pages: 1867-1870
Full text: PDFPDF

In this paper, we propose the first real time rumor debunking algorithm for Twitter. We use cues from 'wisdom of the crowds', that is, the aggregate 'common sense' and investigative journalism of Twitter users. We concentrate on identification of a rumor ...
Fraud Transaction Recognition: A Money Flow Network Approach
Renxin Mao, Zhao Li, Jinhua Fu
Pages: 1871-1874
Full text: PDFPDF

In this paper, we provide some insights into analysis of fraud transaction recognition on Alipay's Money Flow Network. We first show that the Money Flow Network follows a power-law distribution on daily, monthly or yearly basis, based on which we propose ...
Identifying Top-k Consistent News-Casters on Twitter
Sahisnu Mazumder, Sameep Mehta, Dhaval Patel
Pages: 1875-1878
Full text: PDFPDF

News-casters are Twitter users who periodically pick up interesting news from online news media and spread it to their followers' network. Existing works on Twitter user analysis have only analysed a pre-defined set of users for user modeling, ...
Mining the Minds of Customers from Online Chat Logs
Kunwoo Park, Jaewoo Kim, Jaram Park, Meeyoung Cha, Jiin Nam, Seunghyun Yoon, Eunhee Rhim
Pages: 1879-1882
Full text: PDFPDF

This study investigates factors that may determine satisfaction in customer service operations. We utilized more than 170,000 online chat sessions between customers and agents to identify characteristics of chat sessions that incurred dissatisfying experience. ...
A Fast k-Nearest Neighbor Search Using Query-Specific Signature Selection
Youngki Park, Heasoo Hwang, Sang-goo Lee
Pages: 1883-1886
Full text: PDFPDF

k-nearest neighbor (k-NN) search aims at finding k points nearest to a query point in a given dataset. k-NN search is important in various applications, but it becomes extremely expensive in a high-dimensional large dataset. To address this performance ...
Core-Sets For Canonical Correlation Analysis
Saurabh Paul
Pages: 1887-1890
Full text: PDFPDF

Canonical Correlation Analysis (CCA) is a technique that finds how "similar" are the subspaces that are spanned by the columns of two different matrices A έℜ(of size m-x-n) and B έℜ(of size m-x-l). ...
DeepCamera: A Unified Framework for Recognizing Places-of-Interest based on Deep ConvNets
Pai Peng, Hongxiang Chen, Lidan Shou, Ke Chen, Gang Chen, Chang Xu
Pages: 1891-1894
Full text: PDFPDF

In this work, we present a novel project called DeepCamera(DC) for recognizing places-of-interest(POI) with smartphones. Our framework is based on deep convolutional neural networks(ConvNets) which are currently state-of-the-art solutions to vision recognition ...
Structured Sparse Regression for Recommender Systems
Mingjie Qian, Liangjie Hong, Yue Shi, Suju Rajan
Pages: 1895-1898
Full text: PDFPDF

Feature-based collaborative filtering models, such as state-of-the-art factorization machines and regression-based latent factor models, rarely consider features' structural information, ignoring the heterogeneity of inter-type and intra-type relationships. ...
Analyzing Document Intensive Business Processes using Ontology
Suman Roychoudhury, Vinay Kulkarni, Nikhil Bellarykar
Pages: 1899-1902
Full text: PDFPDF

Knowledge is manifested in an enterprise in various forms ranging from unstructured operational data, to structured information like programs, as well as relational data stored in databases to semi-structured information stored in XML files. This information ...
Transductive Domain Adaptation with Affinity Learning
Le Shu, Longin Jan Latecki
Pages: 1903-1906
Full text: PDFPDF

We study the problem of domain adaptation, which aims to adapt the classifiers trained on a labeled source domain to an unlabeled target domain. We propose a novel method to solve domain adaptation task in a transductive setting. The proposed method ...
Update Summarization using Semi-Supervised Learning Based on Hellinger Distance
Dingding Wang, Sahar Sohangir, Tao Li
Pages: 1907-1910
Full text: PDFPDF

Update summarization aims to generate brief summaries of recent documents to capture new information different from earlier documents. In this paper, we propose a new method to generate the sentence similarity graph using a novel similarity measure based ...
Multi-view Clustering via Structured Low-rank Representation
Dong Wang, Qiyue Yin, Ran He, Liang Wang, Tieniu Tan
Pages: 1911-1914
Full text: PDFPDF

In this paper, we present a novel solution to multi-view clustering through a structured low-rank representation. When assuming similar samples can be linearly reconstructed by each other, the resulting representational matrix reflects the cluster structure ...
Partially Labeled Data Tuple Can Optimize Multivariate Performance Measures
Jim Jing-Yan Wang, Xin Gao
Pages: 1915-1918
Full text: PDFPDF

Multivariate performance measure optimization refers to learning predictive models such that a desired complex performance measure can be optimized over a training set, such as the F1 score. Up to now, all the existing multivariate performance measure ...
Modeling Infinite Topics on Social Behavior Data with Spatio-temporal Dependence
Peng Wang, Peng Zhang, Chuan Zhou, Zhao Li, Guo Li
Pages: 1919-1922
Full text: PDFPDF

The problem of modeling topics on user behavior data in social networks has been widely studied in social marketing and social emotion analysis, where latent topic models are commonly used as the solutions. The user behavior data are highly related in ...
ASEM: Mining Aspects and Sentiment of Events from Microblog
Ruhui Wang, Weijing Huang, Wei Chen, Tengjiao Wang, Kai Lei
Pages: 1923-1926
Full text: PDFPDF

Microblogs contain the most up-to-date and abundant opinion information on current events. Aspect-based opinion mining is a good way to get a comprehensive summarization of events. The most popular aspect based opinion mining models are used in the field ...
Enhanced Word Embeddings from a Hierarchical Neural Language Model
Xun Wang, Katsuhoto Sudoh, Masaaki Nagata
Pages: 1927-1930
Full text: PDFPDF

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e.., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while ...
Improving Label Quality in Crowdsourcing Using Noise Correction
Jing Zhang, Victor S. Sheng, Jian Wu, Xiaoqin Fu, Xindong Wu
Pages: 1931-1934
Full text: PDFPDF

This paper proposes a novel framework that introduces noise correction techniques to further improve label quality after ground truth inference in crowdsourcing. In the framework, an adaptive voting noise correction algorithm (AVNC) is proposed to identify ...
Improving Collaborative Filtering via Hidden Structured Constraint
Qing Zhang, Houfeng Wang
Pages: 1935-1938
Full text: PDFPDF

Matrix factorization models, as one of the most powerful Collaborative Filtering approaches, have greatly advanced the recommendation tasks. However, few of them are able to explicitly consider structured constraint for modeling user interests. To solve ...
WORKSHOP SESSION: Workshop Reports
DOLAP 2015 Workshop Summary
Carlos Garcia-Alvarado, Carlos Ordonez, Il-Yeol Song
Pages: 1939-1940
Full text: PDFPDF

The ACM DOLAP workshop presents research that bridges data warehousing, On-Line Analytical Processing (OLAP), and other large-scale data processing platforms. The program has four interesting sessions on data warehouse design, database modeling, query ...
DTMBIO 2015: International Workshop on Data and Text Mining in Biomedical Informatics
Min Song, Doheon Lee, Karin Verspoor
Pages: 1941-1942
Full text: PDFPDF

Held each year in conjunction with one of the largest data management conferences, CIKM, the Ninth ACM International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO'15) is organized to bring together researchers interested in development ...
ECol 2015: First international workshop on the Evaluation on Collaborative Information Seeking and Retrieval
Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine
Pages: 1943-1944
Full text: PDFPDF

Collaborative Information Seeking/Retrieval (CIS/CIR) has given rise to several challenges in terms of search behavior analysis, retrieval model formalization as well as interface design. However, the major issue of evaluation in CIS/CIR is still underexplored. ...
Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR'15)
Krisztian Balog, Jeffrey Dalton, Antoine Doucet, Yusra Ibrahim
Pages: 1945-1946
Full text: PDFPDF

The amount of structured content published on the Web has been growing rapidly, making it possible to address increasingly complex information access tasks. Recent years have witnessed the emergence of large scale human-curated knowledge bases as well ...
LSDS-IR'15: 2015 Workshop on Large-Scale and Distributed Systems for Information Retrieval
Ismail Sengor Altingovde, B. Barla Cambazoglu, Nicola Tonellotto
Pages: 1947-1948
Full text: PDFPDF

The growth of the Web and other Big Data sources lead to important performance problems for large-scale and distributed information retrieval systems. The scalability and efficiency of such information retrieval systems have an impact on their effectiveness, ...
NWSearch 2015: International Workshop on Novel Web Search Interfaces and Systems
Davood Rafiei, Katsumi Tanaka
Pages: 1949-1950
Full text: PDFPDF

Held for the first time in conjunction with the ACM International Conference on Information and Knowledge Management (CIKM), NWSearch 2015 aims to bring together researchers, developers and practitioners who are interested in pushing the search boundary ...
PIKM 2015: The 8th ACM Workshop for Ph.D. Students in Information and Knowledge Management
Mouna Kacimi, Nicoleta Preda, Maya Ramanath
Pages: 1951-1952
Full text: PDFPDF

The PIKM workshop offers Ph.D. students the opportunity to bring their work to an international and interdisciplinary research community, and create a network of young researchers to exchange and develop new and promising ideas. Similar to the CIKM, ...
TM 2015 -- Topic Models: Post-Processing and Applications Workshop
Nikolaos Aletras, Jey Han Lau, Timothy Baldwin, Mark Stevenson
Pages: 1953-1954
Full text: PDFPDF

The main objective of the workshop is to bring together researchers who are interested in applications of topic models and improving their output. Our goal is to create a broad platform for researchers to share ideas that could improve the usability ...
UCUI'15: The 1st International Workshop on Understanding the City with Urban Informatics
Yashar Moshfeghi, Iadh Ounis, Craig Macdonald, Joemon M. Jose, Peter Triantafillou, Mark Livingston, Piyushimita Thakuriah
Pages: 1955-1956
Full text: PDFPDF

Urban Informatics aims to exploit the large quantities of information produced by modern cities in order to gain insights into how they function. These insights lay the foundation for improving the lives of citizens, by improving the efficacy and efficiency ...

Powered by The ACM Guide to Computing Literature

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us