Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Critical business applications in domains ranging from technical support to healthcare increasingly rely on large-scale, automatically constructed knowledge graphs. These applications use the results of complex queries over knowledge graphs in order to help users in taking crucial decisions such as which drug to administer, or whether certain actions are compliant with all the regulatory requirements and so on. However, these knowledge graphs constantly evolve, and the newer versions may adversely impact the results of queries that the previously taken business decisions were based on. We propose a framework based on provenance polynomials to track the impact of knowledge graph changes on arbitrary SPARQL query results. Focusing on the deletion of facts, we show how to efficiently determine the queries impacted by the change, develop ways to incrementally maintain these polynomials, and present an efficient implementation on top of RDF graph databases. Our experimental evaluation over large-scale RDF/SPARQL benchmarks show the effectiveness of our proposal.

top of pageAUTHORS

Author image not provided  Garima Gaur

No contact information provided yet.

Bibliometrics: publication history
Publication years2017-2017
Publication count1
Citation Count0
Available for download1
Downloads (6 Weeks)15
Downloads (12 Months)75
Downloads (cumulative)125
Average downloads per article125.00
Average citations per article0.00
View colleagues of Garima Gaur

Srikanta J. Bedathur Srikanta J. Bedathur

Bibliometrics: publication history
Publication years2003-2017
Publication count42
Citation Count373
Available for download29
Downloads (6 Weeks)129
Downloads (12 Months)765
Downloads (cumulative)9,431
Average downloads per article325.21
Average citations per article8.88
View colleagues of Srikanta J. Bedathur

Arnab Bhattacharya Arnab Bhattacharya

Bibliometrics: publication history
Publication years2005-2017
Publication count33
Citation Count84
Available for download19
Downloads (6 Weeks)183
Downloads (12 Months)764
Downloads (cumulative)3,929
Average downloads per article206.79
Average citations per article2.55
View colleagues of Arnab Bhattacharya

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

Yael Amsterdamer, Daniel Deutch, and Val Tannen. 2011. On the Limitations of Provenance for Queries with Difference. In TaPP.
Floris Geerts and Antonella Poggi. 2010. On database query languages for k-relations. Journal of Applied Logic 8, 2 (2010).

top of pageCITED BY

Citings are not available

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title CIKM '17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management table of contents
General Chairs Ee-Peng Lim Singapore Management University, Singapore
Marianne Winslett University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
Program Chairs Mark Sanderson RMIT, Australia
Ada Fu Chinese University of Hong Kong, Hong Kong
Jimeng Sun Georgia Tech, USA
Shane Culpepper RMIT, Australia
Eric Lo Chinese University of Hong Kong, Hong Kong
Joyce Ho Emory University, USA
Debora Donato Mix Tech, Inc., USA
Rakesh Agrawal Data Insights Laboratories, USA
Yu Zheng Microsoft Research Asia, China
Carlos Castillo Qatar Computing Research Institute, Qatar
Aixin Sun Nanyang Technological University, Singapore
Vincent S. Tseng National Cheng Kung University, Taiwan
Chenliang Li Wuhan University, China
Pages 2079-2082
Publication Date2017-11-06 (yyyy-mm-dd)
Sponsors SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR ACM Special Interest Group on Information Retrieval
PublisherACM New York, NY, USA ©2017
ISBN: 978-1-4503-4918-5 doi>10.1145/3132847.3133118
Conference CIKMConference on Information and Knowledge Management CIKM logo
Paper Acceptance Rate 171 of 855 submissions, 20%
Overall Acceptance Rate 1,960 of 10,758 submissions, 18%
Year Submitted Accepted Rate
CIKM '05 425 77 18%
CIKM '06 537 81 15%
CIKM '07 512 86 17%
CIKM '08 772 132 17%
CIKM '09 847 123 15%
CIKM '10 945 126 13%
CIKM '11 918 228 25%
CIKM '12 1088 146 13%
CIKM '13 848 143 17%
CIKM '14 838 175 21%
CIKM '15 646 165 26%
CIKM '16 701 160 23%
CIKM '17 855 171 20%
CIKM '18 826 147 18%
Overall 10,758 1,960 18%

Artificial Intelligence
Digital Content

top of pageREVIEWS

Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Table of Contents
SESSION: Keynote & Invited Talks
Machine Learning @ Amazon
Rajeev Rastogi
Pages: 1-1
Full text: PDFPDF

In this talk, I will first provide an overview of key problem areas where we are applying Machine Learning (ML) techniques within Amazon such as product demand forecasting, product search, and information extraction from reviews, and associated technical ...
Deception Detection: When Computers Become Better than Humans
Rada Mihalcea
Pages: 3-3
Full text: PDFPDF

Whether we like it or not, deception happens every day and everywhere: thousands of trials taking place daily around the world; little white lies: "I'm busy that day!" even if your calendar is blank; news "with a twist" (a.k.a. fake news) meant to attract ...
When Deep Learning Meets Transfer Learning
Qiang Yang
Pages: 5-5
Full text: PDFPDF

Deep learning has achieved great success as evidenced by many practical applications and contests. However, deep learning developed so far also has some inherent limitations. In particular, deep learning is not yet very adaptable to different related ...
A Hyper-connected World
K. Ananth Krishnan
Pages: 7-7
Full text: PDFPDF

As the world gets hyper-connected, cities are evolving into complex ecosystems, technically and behaviourally. Machines and humans interact continually, generating streams of data and behavior patterns. To be a true smart city in a hyper-connected world, ...
SESSION: Session 1A: Multimedia
Jointly Modeling Static Visual Appearance and Temporal Pattern for Unsupervised Video Hashing
Chao Li, Yang Yang, Jiewei Cao, Zi Huang
Pages: 9-17
Full text: PDFPDF

Recently, hashing has been evidenced as an efficient and effective method to facilitate large-scale video retrieval. Most of existing hashing methods are based on visual features, which are expected to capture the appearance of videos. The intrinsic ...
Construction of a National Scale ENF Map using Online Multimedia Data
Hyunsoo Kim, Youngbae Jeon, Ji Won Yoon
Pages: 19-28
Full text: PDFPDF

The frequency of power distribution networks in a power grid is called electrical network frequency (ENF). Because it provides the spatio-temporal changes of the power grid in a particular location, ENF is used in many application domains including the ...
Dual Learning for Cross-domain Image Captioning
Wei Zhao, Wei Xu, Min Yang, Jianbo Ye, Zhou Zhao, Yabing Feng, Yu Qiao
Pages: 29-38
Full text: PDFPDF

Recent AI research has witnessed increasing interests in automatically generating image descriptions in text, which is coined as theimage captioning problem. Significant progresses have been made in domains where plenty of labeled training data ...
A New Approach to Compute CNNs for Extremely Large Images
Sai Wu, Mengdan Zhang, Gang Chen, Ke Chen
Pages: 39-48
Full text: PDFPDF

CNN (Convolution Neural Network) is widely used in visual analysis and achieves exceptionally high performances in image classification, face detection, object recognition, image recoloring, and other learning jobs. Using deep learning frameworks, such ...
SESSION: Session 1B: IR evaluation
Active Sampling for Large-scale Information Retrieval Evaluation
Dan Li, Evangelos Kanoulas
Pages: 49-58
Full text: PDFPDF

Evaluation is crucial in Information Retrieval. The development of models, tools and methods has significantly benefited from the availability of reusable test collections formed through a standardized and thoroughly tested methodology, known as the ...
Intent Based Relevance Estimation from Click Logs
Prakash Mandayam Comar, Srinivasan H. Sengamedu
Pages: 59-66
Full text: PDFPDF

Estimating the relevance of documents based on the user feedback is an essential component of search, retrieval and ranking problems. User click modeling in search has focused primarily on factoring out the position bias. It is easy to see that the query ...
A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries
Gaurav Baruah, Richard McCreadie, Jimmy Lin
Pages: 67-76
Full text: PDFPDF

There is growing interest in systems that generate timeline summaries by filtering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend ...
Sensitive and Scalable Online Evaluation with Theoretical Guarantees
Harrie Oosterhuis, Maarten de Rijke
Pages: 77-86
Full text: PDFPDF

Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. ...
SESSION: Session 1C: Sentiment
Users Are Known by the Company They Keep: Topic Models for Viewpoint Discovery in Social Networks
Thibaut Thonet, Guillaume Cabanac, Mohand Boughanem, Karen Pinel-Sauvagnat
Pages: 87-96
Full text: PDFPDF

Social media platforms such as weblogs and social networking sites provide Internet users with an unprecedented means to express their opinions and debate on a wide range of issues. Concurrently with their growing importance in public communication, ...
Aspect-level Sentiment Classification with HEAT (HiErarchical ATtention) Network
Jiajun Cheng, Shenglin Zhao, Jiani Zhang, Irwin King, Xin Zhang, Hui Wang
Pages: 97-106
Full text: PDFPDF

Aspect-level sentiment classification is a fine-grained sentiment analysis task, which aims to predict the sentiment of a text in different aspects. One key point of this task is to allocate the appropriate sentiment words for the given aspect.Recent ...
Dyadic Memory Networks for Aspect-based Sentiment Analysis
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Pages: 107-116
Full text: PDFPDF

This paper proposes Dyadic Memory Networks (DyMemNN), a novel extension of end-to-end memory networks (memNN) for aspect-based sentiment analysis (ABSA). Originally designed for question answering tasks, memNN operates via a memory selection operation ...
Modeling Language Discrepancy for Cross-Lingual Sentiment Analysis
Qiang Chen, Chenliang Li, Wenjie Li
Pages: 117-126
Full text: PDFPDF

Language discrepancy is inherent and be part of human languages. Thereby, the same sentiment would be expressed in different patterns across different languages. Unfortunately, the language discrepancy is overlooked by existing works of cross-lingual ...
SESSION: Session 1D: Network Embedding 1
Multi-view Clustering with Graph Embedding for Connectome Analysis
Guixiang Ma, Lifang He, Chun-Ta Lu, Weixiang Shao, Philip S. Yu, Alex D. Leow, Ann B. Ragin
Pages: 127-136
Full text: PDFPDF

Multi-view clustering has become a widely studied problem in the area of unsupervised learning. It aims to integrate multiple views by taking advantages of the consensus and complimentary information from multiple views. Most of the existing works in ...
Attributed Signed Network Embedding
Suhang Wang, Charu Aggarwal, Jiliang Tang, Huan Liu
Pages: 137-146
Full text: PDFPDF

The major task of network embedding is to learn low-dimensional vector representations of social-network nodes. It facilitates many analytical tasks such as link prediction and node clustering and thus has attracted increasing attention. The majority ...
Enhancing the Network Embedding Quality with Structural Similarity
Tianshu Lyu, Yuan Zhang, Yan Zhang
Pages: 147-156
Full text: PDFPDF

Neural network techniques are widely used in network embedding, boosting the result of node classification, link prediction, visualization and other tasks in both aspects of efficiency and quality. All the state of art algorithms put effort on the neighborhood ...
On Embedding Uncertain Graphs
Jiafeng Hu, Reynold Cheng, Zhipeng Huang, Yixang Fang, Siqiang Luo
Pages: 157-166
Full text: PDFPDF

Graph data are prevalent in communication networks, social media, and biological networks. These data, which are often noisy or inexact, can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that ...
SESSION: Session 1E: Web/App data
A Large Scale Prediction Engine for App Install Clicks and Conversions
Narayan Bhamidipati, Ravi Kant, Shaunak Mishra
Pages: 167-175
Full text: PDFPDF

Predicting the probability of users clicking on app install ads and installing those apps comes with its own specific challenges. In this paper, we describe (a) how we built a scalable machine learning pipeline from scratch to predict the probability ...
Building Natural Language Interfaces to Web APIs
Yu Su, Ahmed Hassan Awadallah, Madian Khabsa, Patrick Pantel, Michael Gamon, Mark Encarnacion
Pages: 177-186
Full text: PDFPDF

As the Web evolves towards a service-oriented architecture, application program interfaces (APIs) are becoming an increasingly important way to provide access to data, services, and devices. We study the problem of natural language interface to APIs ...
UFeed: Refining Web Data Integration Based on User Feedback
Ahmed El-Roby, Ashraf Aboulnaga
Pages: 187-196
Full text: PDFPDF

One of the main challenges in large-scale data integration for relational schemas is creating an accurate mediated schema, and generating accurate semantic mappings between heterogeneous data sources and this mediated schema. Some applications can start ...
Extracting Records from the Web Using a Signal Processing Approach
Roberto Panerai Velloso, Carina F. Dorneles
Pages: 197-206
Full text: PDFPDF

Extracting records from web pages enables a number of important applications and has immense value due to the amount and diversity of available information that can be extracted. This problem, although vastly studied, remains open because it is not a ...
SESSION: Session 1F: Graph data
A Scalable Graph-Coarsening Based Index for Dynamic Graph Databases
Akshay Kansal, Francesca Spezzano
Pages: 207-216
Full text: PDFPDF

A graph database D is a collection of graphs. To speed up subgraph query answering on graph databases, indexes are commonly used. State-of-the-art graph database indexes do not adapt or scale well to dynamic graph database use; they are static, and their ...
Natural Language Question/Answering: Let Users Talk With The Knowledge Graph
Weiguo Zheng, Hong Cheng, Lei Zou, Jeffrey Xu Yu, Kangfei Zhao
Pages: 217-226
Full text: PDFPDF

The ever-increasing knowledge graphs impose an urgent demand of providing effective and easy-to-use query techniques for end users. Structured query languages, such as SPARQL, offer a powerful expression ability to query RDF datasets. However, they are ...
Keyword Search on RDF Graphs - A Query Graph Assembly Approach
Shuo Han, Lei Zou, Jeffery Xu Yu, Dongyan Zhao
Pages: 227-236
Full text: PDFPDF

Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the ...
Region Representation Learning via Mobility Flow
Hongjian Wang, Zhenhui Li
Pages: 237-246
Full text: PDFPDF

Increasing amount of urban data are being accumulated and released to public; this enables us to study the urban dynamics and address urban issues such as crime, traffic, and quality of living. In this paper, we are interested in learning vector representations ...
SESSION: Session 2A: Ranking
Learning Visual Features from Snapshots for Web Search
Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Liang Pang, Xueqi Cheng
Pages: 247-256
Full text: PDFPDF

When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, ...
DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, Xueqi Cheng
Pages: 257-266
Full text: PDFPDF

This paper concerns a deep learning approach to relevance ranking in information retrieval (IR). Existing deep IR models such as DSSM and CDSSM directly apply neural networks to generate ranking scores, without explicit understandings of the relevance. ...
Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities
Asia J. Biega, Azin Ghazimatin, Hakan Ferhatosmanoglu, Krishna P. Gummadi, Gerhard Weikum
Pages: 267-276
Full text: PDFPDF

Search engines in online communities such as Twitter or Facebook not only return matching posts, but also provide links to the profiles of the authors. Thus, when a user appears in the top-k results for a sensitive keyword query, she becomes widely ...
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
Harrie Oosterhuis, Maarten de Rijke
Pages: 277-286
Full text: PDFPDF

In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model by interacting with users. When learning from user behavior, systems must interact with users while simultaneously learning from those interactions. Unlike other Learning to ...
SESSION: Session 2B: Crowdsourcing 1
Crowd-enabled Pareto-Optimal Objects Finding Employing Multi-Pairwise-Comparison Questions
Chang Liu, Yinan Zhang, Lei Liu, Lizhen Cui, Dong Yuan, Chunyan Miao
Pages: 287-295
Full text: PDFPDF

Today, Pareto-optimal objects finding has been applied in various fields, such as group decision making and opinion collection. Many of the existing solutions to this problem require explicit attributes for objects. However, these attributes cannot be ...
Destination-aware Task Assignment in Spatial Crowdsourcing
Yan Zhao, Yang Li, Yu Wang, Han Su, Kai Zheng
Pages: 297-306
Full text: PDFPDF

With the proliferation of GPS-enabled smart devices and increased availability of wireless network, spatial crowdsourcing (SC) has been recently proposed as a framework to automatically request workers (i.e., smart device carriers) to perform location-sensitive ...
Crowdsourced Selection on Multi-Attribute Data
Xueping Weng, Guoliang Li, Huiqi Hu, Jianhua Feng
Pages: 307-316
Full text: PDFPDF

Crowdsourced selection asks the crowd to select entities that satisfy a query condition, e.g., selecting the photos of people wearing sunglasses from a given set of photos. Existing studies focus on a single query predicate and in this paper we study ...
Select Your Questions Wisely: For Entity Resolution With Crowd Errors
Vijaya Krishna Yalavarthi, Xiangyu Ke, Arijit Khan
Pages: 317-326
Full text: PDFPDF

Crowdsourcing is becoming increasingly important in entity resolution tasks due to their inherent complexity such as clustering of images and natural language processing. Humans can provide more insightful information for these difficult problems compared ...
SESSION: Session 2C: Recommendation 1
Reply With: Proactive Recommendation of Email Attachments
Christophe Van Gysel, Bhaskar Mitra, Matteo Venanzi, Roy Rosemarin, Grzegorz Kukla, Piotr Grudzien, Nicola Cancedda
Pages: 327-336
Full text: PDFPDF

Email responses often contain items---such as a file or a hyperlink to an external document---that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these ...
Learning and Transferring Social and Item Visibilities for Personalized Recommendation
Lin Xiao, Zhang Min, Zhang Yongfeng, Liu Yiqun, Ma Shaoping
Pages: 337-346
Full text: PDFPDF

User feedback in the form of movie-watching history, item ratings, or product consumption is very helpful in training recommender systems. However, relatively few interactions between items and users can be observed. Instances of missing user--item entries ...
Joint Topic-Semantic-aware Social Recommendation for Online Voting
Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, Minyi Guo
Pages: 347-356
Full text: PDFPDF

Online voting is an emerging feature in social networks, in which users can express their attitudes toward various issues and show their unique interest. Online voting imposes new challenges on recommendation, because the propagation of votings heavily ...
Interactive Social Recommendation
Xin Wang, Steven C.H. Hoi, Chenghao Liu, Martin Ester
Pages: 357-366
Full text: PDFPDF

Social recommendation has been an active research topic over the last decade, based on the assumption that social information from friendship networks is beneficial for improving recommendation accuracy, especially when dealing with cold-start users ...
SESSION: Session 2D: Network Embedding 2
From Properties to Links: Deep Network Embedding on Incomplete Graphs
Dejian Yang, Senzhang Wang, Chaozhuo Li, Xiaoming Zhang, Zhoujun Li
Pages: 367-376
Full text: PDFPDF

As an effective way of learning node representations in networks, network embedding has attracted increasing research interests recently. Most existing approaches use shallow models and only work on static networks by extracting local or global topology ...
Learning Community Embedding with Community Detection and Node Embedding on Graphs
Sandro Cavallari, Vincent W. Zheng, Hongyun Cai, Kevin Chen-Chuan Chang, Erik Cambria
Pages: 377-386
Full text: PDFPDF

In this paper, we study an important yet largely under-explored setting of graph embedding, i.e., embedding communities instead of each individual nodes. We find that community embedding is not only useful for community-level applications such as graph ...
Attributed Network Embedding for Learning in a Dynamic Environment
Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, Huan Liu
Pages: 387-396
Full text: PDFPDF

Network embedding leverages the node proximity manifested to learn a low-dimensional node vector representation for each node in the network. The learned embeddings could advance various learning tasks such as node classification, network clustering, ...
Learning Node Embeddings in Interaction Graphs
Yao Zhang, Yun Xiong, Xiangnan Kong, Yangyong Zhu
Pages: 397-406
Full text: PDFPDF

Node embedding techniques have gained prominence since they produce continuous and low-dimensional features, which are effective for various tasks. Most existing approaches learn node embeddings by exploring the structure of networks and are mainly focused ...
SESSION: Session 2E: Skyline Queries
Efficient Computation of Subspace Skyline over Categorical Domains
Md Farhadur Rahman, Abolfazl Asudeh, Nick Koudas, Gautam Das
Pages: 407-416
Full text: PDFPDF

Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering ...
Fast Algorithms for Pareto Optimal Group-based Skyline
Wenhui Yu, Zheng Qin, Jinfei Liu, Li Xiong, Xu Chen, Huidi Zhang
Pages: 417-426
Full text: PDFPDF

Skyline, aiming at finding a Pareto optimal subset of points in a multi-dimensional dataset, has gained great interest due to its extensive use for multi-criteria analysis and decision making. Skyline consists of all points that are not dominated by, ...
Probabilistic Skyline on Incomplete Data
Kaiqi Zhang, Hong Gao, Xixian Han, Zhipeng Cai, Jianzhong Li
Pages: 427-436
Full text: PDFPDF

The skyline query is important in database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users ...
Communication-Efficient Distributed Skyline Computation
Haoyu Zhang, Qin Zhang
Pages: 437-446
Full text: PDFPDF

In this paper we study skyline queries in the distributed computational model, where we have s remote sites and a central coordinator; each site holds a piece of data, and the coordinator wants to compute the skyline of the union of the s datasets. The ...
SESSION: Session 2F: Social Media Analysis
Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary
Krishnaram Kenthapadi, Stuart Ambler, Liang Zhang, Deepak Agarwal
Pages: 447-455
Full text: PDFPDF

The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of ...
Efficient Document Filtering Using Vector Space Topic Expansion and Pattern-Mining: The Case of Event Detection in Microposts
Julia Proskurnia, Ruslan Mavlyutov, Carlos Castillo, Karl Aberer, Philippe Cudré-Mauroux
Pages: 457-466
Full text: PDFPDF

Automatically extracting information from social media is challenging given that social content is often noisy, ambiguous, and inconsistent. However, as many stories break on social channels first before being picked up by mainstream media, developing ...
LARM: A Lifetime Aware Regression Model for Predicting YouTube Video Popularity
Changsha Ma, Zhisheng Yan, Chang Wen Chen
Pages: 467-476
Full text: PDFPDF

Online content popularity prediction provides substantial value to a broad range of applications in the end-to-end social media systems, from network resource allocation to targeted advertising. While using historical popularity can predict the near-term ...
Modeling Affinity based Popularity Dynamics
Minkyoung Kim, Daniel A. McFarland, Jure Leskovec
Pages: 477-486
Full text: PDFPDF

Information items draw collective attention across a heterogeneous social system, leading to great disparities of popularity. Unveiling underlying diffusion processes is very challenging, since a social system consists of time-evolving subgroups interacting ...
SESSION: Session 3A: Spatiotemporal
Scenic Routes Now: Efficiently Solving the Time-Dependent Arc Orienteering Problem
Ying Lu, Gregor Josse, Tobias Emrich, Ugur Demiryurek, Matthias Renz, Cyrus Shahabi, Matthias Schubert
Pages: 487-496
Full text: PDFPDF

Due to the availability of large transportation (e.g., road network sensor data) and transportation-related (e.g., pollution, crime) data as well as the ubiquity of car navigation systems, recent route planning techniques need to optimize for multiple ...
Modeling Temporal-Spatial Correlations for Crime Prediction
Xiangyu Zhao, Jiliang Tang
Pages: 497-506
Full text: PDFPDF

Crime prediction plays a crucial role in improving public security and reducing the financial loss of crimes. The vast majority of traditional algorithms performed the prediction by leveraging demographic data, which could fail to capture the dynamics ...
Spatiotemporal Event Forecasting from Incomplete Hyper-local Price Data
Xuchao Zhang, Liang Zhao, Arnold P. Boedihardjo, Chang-Tien Lu, Naren Ramakrishnan
Pages: 507-516
Full text: PDFPDF

Hyper-local pricing data, e.g., about foods and commodities, exhibit subtle spatiotemporal variations that can be useful as crucial precursors of future events. Three major challenges in modeling such pricing data include: i) temporal dependencies underlying ...
Exploiting Spatio-Temporal User Behaviors for User Linkage
Wei Chen, Hongzhi Yin, Weiqing Wang, Lei Zhao, Wen Hua, Xiaofang Zhou
Pages: 517-526
Full text: PDFPDF

Cross-device and cross-domain user linkage have been attracting a lot of attention recently. An important branch of the study is to achieve user linkage with spatio-temporal data generated by the ubiquitous GPS-enabled devices. The main task in this ...
SESSION: Session 3B: Short text retrieval
Similarity-based Distant Supervision for Definition Retrieval
Jiepu Jiang, James Allan
Pages: 527-536
Full text: PDFPDF

Recognizing definition sentences from free text corpora often requires hand-crafted patterns or explicitly labeled training instances. We present a distant supervision approach addressing this challenge without using explicitly labeled data. We use plausibly ...
Hybrid BiLSTM-Siamese network for FAQ Assistance
Prerna Khurana, Puneet Agarwal, Gautam Shroff, Lovekesh Vig, Ashwin Srinivasan
Pages: 537-545
Full text: PDFPDF

We describe an automated assistant for answering frequently asked questions; our system has been deployed, and is currently answering HR-related queries in two different areas (leave management and health insurance) to a large number of users. The needs ...
Regularized and Retrofitted models for Learning Sentence Representation with Context
Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, Mohammad Al Hasan
Pages: 547-556
Full text: PDFPDF

Vector representation of sentences is important for many text processing tasks that involve classifying, clustering, or ranking sentences. For solving these tasks, bag-of-word based representation has been used for a long time. In recent years, distributed ...
Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks
Jinfeng Rao, Ferhan Ture, Hua He, Oliver Jojic, Jimmy Lin
Pages: 557-566
Full text: PDFPDF

We tackle the novel problem of navigational voice queries posed against an entertainment system, where viewers interact with a voice-enabled remote controller to specify the TV program to watch. This is a difficult problem for several reasons: such queries ...
SESSION: Session 3C: Community Detection
GPU-Accelerated Graph Clustering via Parallel Label Propagation
Yusuke Kozawa, Toshiyuki Amagasa, Hiroyuki Kitagawa
Pages: 567-576
Full text: PDFPDF

Graph clustering has recently attracted much attention as a technique to extract community structures from various kinds of graph data. Since available graph data becomes increasingly large, the acceleration of graph clustering is an important issue ...
Temporally Like-minded User Community Identification through Neural Embeddings
Hossein Fani, Ebrahim Bagheri, Weichang Du
Pages: 577-586
Full text: PDFPDF

We propose a neural embedding approach to identify temporally like-minded user communities, i.e., those communities of users who have similar temporal alignment in their topics of interest. Like-minded user communities in social networks are usually ...
Community-Based Network Alignment for Large Attributed Network
Zheng Chen, Xinli Yu, Bo Song, Jianliang Gao, Xiaohua Hu, Wei-Shih Yang
Pages: 587-596
Full text: PDFPDF

Network alignment is becoming an active topic in network data analysis. Despite extensive research, we realize that efficient use of topological and attribute information for large attributed network alignment has not been sufficiently addressed in previous ...
A Non-negative Symmetric Encoder-Decoder Approach for Community Detection
Bing-Jie Sun, Huawei Shen, Jinhua Gao, Wentao Ouyang, Xueqi Cheng
Pages: 597-606
Full text: PDFPDF

Community detection or graph clustering is crucial to understanding the structure of complex networks and extracting relevant knowledge from networked data. Latent factor model, e.g., non-negative matrix factorization and mixed membership block model, ...
SESSION: Session 3D: Time Series
Fast Word Recognition for Noise channel-based Models in Scenarios with Noise Specific Domain Knowledge
Marco Cristo, Raíza Hanada, André Carvalho, Fernando Anglada Lores, Maria da Graça C. Pimentel
Pages: 607-616
Full text: PDFPDF

Word recognition is a challenging task faced by many applications, specially in very noisy scenarios. This problem is usually seen as the transmission of a word through a noisy-channel, such that it is necessary to determine which known word of a lexicon ...
Detecting Multiple Periods and Periodic Patterns in Event Time Sequences
Quan Yuan, Jingbo Shang, Xin Cao, Chao Zhang, Xinhe Geng, Jiawei Han
Pages: 617-626
Full text: PDFPDF

Periodicity is prevalent in physical world, and many events involve more than one periods, eg individual's mobility, tide pattern, and massive transportation utilization. Knowing the true periods of events can benefit a number of applications, ...
Finding Periodic Discrete Events in Noisy Streams
Abhirup Ghosh, Christopher Lucas, Rik Sarkar
Pages: 627-636
Full text: PDFPDF

Periodic phenomena are ubiquitous, but detecting and predicting periodic events can be difficult in noisy environments. We describe a model of periodic events that covers both idealized and realistic scenarios characterized by multiple kinds of noise. ...
Fast and Accurate Time Series Classification with WEASEL
Patrick Schäfer, Ulf Leser
Pages: 637-646
Full text: PDFPDF

Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting ...
SESSION: Session 3E: Query processing
QLever: A Query Engine for Efficient SPARQL+Text Search
Hannah Bast, Björn Buchhold
Pages: 647-656
Full text: PDFPDF

We present QLever, a query engine for efficient combined search on a knowledge base and a text corpus, in which named entities from the knowledge base have been identified (that is, recognized and disambiguated). The query language is SPARQL extended ...
A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture
Xuntao Cheng, Bingsheng He, Xiaoli Du, Chiew Tong Lau
Pages: 657-666
Full text: PDFPDF

Advanced processor architectures have been driving new designs, implementations and optimizations of main-memory hash join algorithms recently. The newly released Intel Xeon Phi many-core processor of the Knights Landing architecture (KNL) embraces interesting ...
PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization
Yingfan Liu, Hong Cheng, Jiangtao Cui
Pages: 667-676
Full text: PDFPDF

Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer ...
ANS-Based Index Compression
Alistair Moffat, Matthias Petri
Pages: 677-686
Full text: PDFPDF

Techniques for effectively representing the postings lists associated with inverted indexes have been studied for many years. Here we combine the recently developed "asymmetric numeral systems" (ANS) approach to entropy coding and a range of previous ...
SESSION: Session 3F: Temporal data
Covering the Optimal Time Window Over Temporal Data
Bin Cao, Chenyu Hou, Jing Fan
Pages: 687-696
Full text: PDFPDF

In this paper, we propose a new problem: covering the optimal time window over temporal data. Given a duration constraint d and a set of users where each user has multiple time intervals, the goal is to find all time windows which (1) are greater than ...
Scaling Probabilistic Temporal Query Evaluation
Melisachew Wudage Chekol
Pages: 697-706
Full text: PDFPDF

Open information extraction has driven automatic construction of (temporal) knowledge graphs (e.g. YAGO) that maintain probabilistic (temporal) facts and inference rules. One of the most important tasks in these knowledge graphs is query evaluation. ...
Efficient Discovery of Abnormal Event Sequences in Enterprise Security Systems
Boxiang Dong, Zhengzhang Chen, Hui (Wendy) Wang, Lu-An Tang, Kai Zhang, Ying Lin, Zhichun Li, Haifeng Chen
Pages: 707-715
Full text: PDFPDF

Intrusion detection system (IDS) is an important part of enterprise security system architecture. In particular, anomaly-based IDS has been widely applied to detect single abnormal process events that deviate from the majority. However, intrusion activity ...
Temporal Analog Retrieval using Transformation over Dual Hierarchical Structures
Yating Zhang, Adam Jatowt, Katsumi Tanaka
Pages: 717-726
Full text: PDFPDF

In recent years, we have witnessed a rapid increase of text con- tent stored in digital archives such as newspaper archives or web archives. Many old documents have been converted to digital form and made accessible online. Due to the passage of time, ...
SESSION: Session 4A: Evaluation
Does That Mean You're Happy?: RNN-based Modeling of User Interaction Sequences to Detect Good Abandonment
Kyle Williams, Imed Zitouni
Pages: 727-736
Full text: PDFPDF

Queries for which there are no clicks are known as abandoned queries. Differentiating between good and bad abandonment queries has become an important task in search engine evaluation since it allows for better measurement of search engine features that ...
Deep Sequential Models for Task Satisfaction Prediction
Rishabh Mehrotra, Ahmed Hassan Awadallah, Milad Shokouhi, Emine Yilmaz, Imed Zitouni, Ahmed El Kholy, Madian Khabsa
Pages: 737-746
Full text: PDFPDF

Detecting and understanding implicit signals of user satisfaction are essential for experimentation aimed at predicting searcher satisfaction. As retrieval systems have advanced, search tasks have steadily emerged as accurate units not only to capture ...
Adaptive Persistence for Search Effectiveness Measures
Jiepu Jiang, James Allan
Pages: 747-756
Full text: PDFPDF

Many search effectiveness evaluation measures penalize the importance of results at lower ranks. This is usually explained as an attempt to model users' persistence when sequentially examining results---lower ranked results are less important because ...
Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments
Widad Machmouchi, Ahmed Hassan Awadallah, Imed Zitouni, Georg Buscher
Pages: 757-765
Full text: PDFPDF

User satisfaction metrics are an integral part of search engine development as they help system developers to understand and evaluate the quality of the user experience. Research to date has mostly focused on predicting success or frustration as a proxy ...
SESSION: Session 4B: News and credibility
Linking News across Multiple Streams for Timeliness Analysis
Ida Mele, Seyed Ali Bahrainian, Fabio Crestani
Pages: 767-776
Full text: PDFPDF

Linking multiple news streams based on the reported events and analyzing the streams' temporal publishing patterns are two very important tasks for information analysis, discovering newsworthy stories, studying the event evolution, and detecting untrustworthy ...
Growing Story Forest Online from Massive Breaking News
Bang Liu, Di Niu, Kunfeng Lai, Linglong Kong, Yu Xu
Pages: 777-785
Full text: PDFPDF

We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements ...
iFACT: An Interactive Framework to Assess Claims from Tweets
Wee Yong Lim, Mong Li Lee, Wynne Hsu
Pages: 787-796
Full text: PDFPDF

Posts by users on microblogs such as Twitter provide diverse real-time updates to major events. Unfortunately, not all the information are credible. Previous works that assess the credibility of information in Twitter have focused on extracting features ...
CSI: A Hybrid Deep Model for Fake News Detection
Natali Ruchansky, Sungyong Seo, Yan Liu
Pages: 797-806
Full text: PDFPDF

The topic of fake news has drawn attention both from the public and the academic communities. Such misinformation has the potential of affecting public opinion, providing an opportunity for malicious parties to manipulate the outcomes of public events ...
SESSION: Session 4C: Outliers and Anomaly Detection
Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data
Guansong Pang, Hongzuo Xu, Longbing Cao, Wentao Zhao
Pages: 807-816
Full text: PDFPDF

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. ...
Outlier Detection in Sparse Data with Factorization Machines
Mengxiao Zhu, Charu C. Aggarwal, Shuai Ma, Hui Zhang, Jinpeng Huai
Pages: 817-826
Full text: PDFPDF

In sparse data, a large fraction of the entries take on zero values. Some examples of sparse data include short text snippets (such as tweets in Twitter) or some feature representations of categorical data sets with a large number of values, in which ...
Anomaly Detection in Dynamic Networks using Multi-view Time-Series Hypersphere Learning
Xian Teng, Yu-Ru Lin, Xidao Wen
Pages: 827-836
Full text: PDFPDF

Detecting anomalous patterns from dynamic and multi-attributed network systems has been a challenging problem due to the complication of temporal dynamics and the variations reflected in multiple data sources. We propose a Multi-view Time-Series Hypersphere ...
A Fast Trajectory Outlier Detection Approach via Driving Behavior Modeling
Hao Wu, Weiwei Sun, Baihua Zheng
Pages: 837-846
Full text: PDFPDF

Trajectory outlier detection is a fundamental building block for many location-based service (LBS) applications, with a large application base. We dedicate this paper on detecting the outliers from vehicle trajectories efficiently and effectively. In ...
SESSION: Session 4D: Graph Mining 1
BL-ECD: Broad Learning based Enterprise Community Detection via Hierarchical Structure Fusion
Jiawei Zhang, Limeng Cui, Philip S. Yu, Yuanhua Lv
Pages: 859-868
Full text: PDFPDF

Employees in companies can be divided into different social communities, and those who frequently socialize with each other will be treated as close friends and are grouped in the same community. In the enterprise context, a large amount of information ...
Highly Efficient Mining of Overlapping Clusters in Signed Weighted Networks
Tuan-Anh Hoang, Ee-Peng Lim
Pages: 869-878
Full text: PDFPDF

In many practical contexts, networks are weighted as their links are assigned numerical weights representing relationship strengths or intensities of inter-node interaction. Moreover, the links' weight can be positive or negative, depending on the relationship ...
To Be Connected, or Not to Be Connected: That is the Minimum Inefficiency Subgraph Problem
Natali Ruchansky, Francesco Bonchi, David Garcia-Soriano, Francesco Gullo, Nicolas Kourtellis
Pages: 879-888
Full text: PDFPDF

We study the problem of extracting a selective connector for a given set of query vertices Q subset of V in a graph G = (V,E). A selective connector is a subgraph of G which exhibits some cohesiveness property, and contains the query vertices but does ...
MGAE: Marginalized Graph Autoencoder for Graph Clustering
Chun Wang, Shirui Pan, Guodong Long, Xingquan Zhu, Jing Jiang
Pages: 889-898
Full text: PDFPDF

Graph clustering aims to discovercommunity structures in networks, the task being fundamentally challenging mainly because the topology structure and the content of the graphs are difficult to represent for clustering analysis. Recently, graph ...
SESSION: Session 4E: Online learning, Stream mining
BoostVHT: Boosting Distributed Streaming Decision Trees
Theodore Vasiloudis, Foteini Beligianni, Gianmarco De Francisci Morales
Pages: 899-908
Full text: PDFPDF

Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, ...
Stream Aggregation Through Order Sampling
Nick Duffield, Yunhong Xu, Liangzhen Xia, Nesreen K. Ahmed, Minlan Yu
Pages: 909-918
Full text: PDFPDF

This paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and efficient method for weighted sampling from a stream of uniquely keyed items, there ...
FUSION: An Online Method for Multistream Classification
Ahsanul Haque, Zhuoyi Wang, Swarup Chandra, Bo Dong, Latifur Khan, Kevin W. Hamlen
Pages: 919-928
Full text: PDFPDF

Traditional data stream classification assumes that data is generated from a single non-stationary process. On the contrary, multistream classification problem involves two independent non-stationary data generating processes. One of them is the source ...
SESSION: Session 5A: Tensor analysis
Maintaining Densest Subsets Efficiently in Evolving Hypergraphs
Shuguang Hu, Xiaowei Wu, T-H. Hubert Chan
Pages: 929-938
Full text: PDFPDF

In this paper we study the densest subgraph problem, which plays a key role in many graph mining applications. The goal of the problem is to find a subset of nodes that induces a graph with maximum average degree. The problem has been extensively studied ...
Coupled Sparse Matrix Factorization for Response Time Prediction in Logistics Services
Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, Philip S. Yu
Pages: 939-947
Full text: PDFPDF

Nowadays, there is an emerging way of connecting logistics orders and van drivers, where it is crucial to predict the order response time. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but ...
Tensor Rank Estimation and Completion via CP-based Nuclear Norm
Qiquan Shi, Haiping Lu, Yiu-ming Cheung
Pages: 949-958
Full text: PDFPDF

Tensor completion (TC) is a challenging problem of recovering missing entries of a tensor from its partial observation. One main TC approach is based on CP/Tucker decomposition. However, this approach often requires the determination of a tensor rank ...
Smart Infrastructure Maintenance Using Incremental Tensor Analysis: Extended Abstract
Nguyen Lu Dang Khoa, Ali Anaissi, Yang Wang
Pages: 959-967
Full text: PDFPDF

Civil infrastructures are key to the flow of people and goods in urban environments. Structural Health Monitoring (SHM) is a condition-based maintenance technology, which provides and predicts actionable information on the current and future states of ...
SESSION: Session 5B: Application driven mining
Collaborative Filtering as a Case-Study for Model Parallelism on Bulk Synchronous Systems
Ariyam Das, Ishan Upadhyaya, Xiangrui Meng, Ameet Talwalkar
Pages: 969-977
Full text: PDFPDF

Industrial-scale machine learning applications often train and maintain massive models that can be on the order of hundreds of millions to billions of parameters. Model parallelism thus plays a significant role to support these machine learning tasks. ...
Modeling Student Learning Styles in MOOCs
Yuling Shi, Zhiyong Peng, Hongning Wang
Pages: 979-988
Full text: PDFPDF

The recorded student activities in Massive Open Online Course (MOOC) provide us a unique opportunity to model their learning behaviors, identify their particular learning intents, and enable personalized assistance and guidance in online education. In ...
Tracking Knowledge Proficiency of Students with Educational Priors
Yuying Chen, Qi Liu, Zhenya Huang, Le Wu, Enhong Chen, Runze Wu, Yu Su, Guoping Hu
Pages: 989-998
Full text: PDFPDF

Diagnosing students' knowledge proficiency, i.e., the mastery degrees of a particular knowledge point in exercises, is a crucial issue for numerous educational applications, e.g., targeted knowledge training and exercise recommendation. Educational theories ...
Spreadsheet Property Detection With Rule-assisted Active Learning
Zhe Chen, Sasha Dadiomov, Richard Wesley, Gang Xiao, Daniel Cory, Michael Cafarella, Jock Mackinlay
Pages: 999-1008
Full text: PDFPDF

Spreadsheets are a critical and widely-used data management tool. Converting spreadsheet data into relational tables would bring benefits to a number of fields, including public policy, public health, and economics. Research to date has focused on designing ...
SESSION: Session 5C: Deep Learning 1
Learning Knowledge Embeddings by Combining Limit-based Scoring Loss
Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo
Pages: 1009-1018
Full text: PDFPDF

In knowledge graph embedding models, the margin-based ranking loss as the common loss function is usually used to encourage discrimination between golden triplets and incorrect triplets, which has proved effective in many translation-based models for ...
Length Adaptive Recurrent Model for Text Classification
Zhengjie Huang, Zi Ye, Shuangyin Li, Rong Pan
Pages: 1019-1027
Full text: PDFPDF

In recent years, recurrent neural networks have been widely used for various text classification tasks. However, most of the recurrent architectures will not assign a class label to a text until they read the last word, while human beings are able to ...
Multi-Task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Yi Tay, Luu Anh Tuan, Minh C. Phan, Siu Cheung Hui
Pages: 1029-1038
Full text: PDFPDF

Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge ...
Movie Fill in the Blank with Adaptive Temporal Attention and Description Update
Jie Chen, Jie Shao, Fumin Shen, Chengkun He, Lianli Gao, Heng Tao Shen
Pages: 1039-1048
Full text: PDFPDF

Recently, a new type of video understanding task called Movie-Fill-in-the-Blank (MovieFIB) has attracted many research attentions. Given a pair of movie clip and description with one blank word as input, MovieFIB aims to automatically predict the blank ...
SESSION: Session 6A: Crowdsourcing 2
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang-Tien Lu, Naren Ramakrishnan
Pages: 1049-1057
Full text: PDFPDF

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a ...
Budgeted Task Scheduling for Crowdsourced Knowledge Acquisition
Tao Han, Hailong Sun, Yangqiu Song, Zizhe Wang, Xudong Liu
Pages: 1059-1068
Full text: PDFPDF

Knowledge acquisition (e.g. through labeling) is one of the most successful applications in crowdsourcing. In practice, collecting as specific as possible knowledge via crowdsourcing is very useful since specific knowledge can be generalized easily if ...
Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing
Jiyi Li, Yukino Baba, Hisashi Kashima
Pages: 1069-1078
Full text: PDFPDF

Quality control is one of the major problems in crowdsourcing. One of the primary approaches to rectify this issue is to assign the same task to different workers and then aggregate their answers to obtain a reliable answer. In addition to simple aggregation ...
Modeling Menu Bundle Designs of Crowdfunding Projects
Yusan Lin, Peifeng Yin, Wang-Chien Lee
Pages: 1079-1088
Full text: PDFPDF

Offering products in the forms of menu bundles is a common practice in marketing to attract customers and maximize revenues. In crowdfunding platforms such as Kickstarter, rewards also play an important part in influencing project success. Designing ...
SESSION: Session 6B: User behavior and targeting
Forecasting Ad-Impressions on Online Retail Websites using Non-homogeneous Hawkes Processes
Krunal Parmar, Samuel Bushi, Sourangshu Bhattacharya, Surender Kumar
Pages: 1089-1098
Full text: PDFPDF

Promotional listing of products or advertisements is a major source of revenue for online retail companies. These advertisements are often sold in the guaranteed delivery market, serving of which critically depends on the ability to predict supply or ...
Volume Ranking and Sequential Selection in Programmatic Display Advertising
Yuxuan Song, Kan Ren, Han Cai, Weinan Zhang, Yong Yu
Pages: 1099-1107
Full text: PDFPDF

Programmatic display advertising, which enables advertisers to make real-time decisions on individual ad display opportunities so as to achieve a precise audience marketing, has become a key technique for online advertising. However, the constrained ...
On Migratory Behavior in Video Consumption
Huan Yan, Tzu-Heng Lin, Gang Wang, Yong Li, Haitao Zheng, Depeng Jin, Ben Y. Zhao
Pages: 1109-1118
Full text: PDFPDF

Today's video streaming market is crowded with various content providers (CPs). For individual CPs, understanding user behavior, in particular how users migrate among different CPs, is crucial for improving users' on-site experience and the CP's chance ...
FM-Hawkes: A Hawkes Process Based Approach for Modeling Online Activity Correlations
Sha Li, Xiaofeng Gao, Weiming Bao, Guihai Chen
Pages: 1119-1128
Full text: PDFPDF

Understanding and predicting user behavior on online platforms has proved to be of significant value, with applications spanning from targeted advertising, political campaigning, anomaly detection to user self-monitoring. With the growing functionality ...
SESSION: Session 6C: Deep Learning 2
Deep Learning Based Forecasting of Critical Infrastructure Data
Zahra Zohrevand, Uwe Glässer, Mohammad A. Tayebi, Hamed Yaghoubi Shahir, Mehdi Shirmaleki, Amir Yaghoubi Shahir
Pages: 1129-1138
Full text: PDFPDF

Intelligent monitoring and control of critical infrastructure such as electric power grids, public water utilities and transportation systems produces massive volumes of time series data from heterogeneous sensor networks. Time Series Forecasting (TSF) ...
Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information
Wonsung Lee, Kyungwoo Song, Il-Chul Moon
Pages: 1139-1148
Full text: PDFPDF

Recommender systems offer critical services in the age of mass information. A good recommender system selects a certain item for a specific user by recognizing why the user might like the item. This awareness implies that the system should model the ...
DeepHawkes: Bridging the Gap between Prediction and Understanding of Information Cascades
Qi Cao, Huawei Shen, Keting Cen, Wentao Ouyang, Xueqi Cheng
Pages: 1149-1158
Full text: PDFPDF

Online social media remarkably facilitates the production and delivery of information, intensifying the competition among vast information for users' attention and highlighting the importance of predicting the popularity of information. Existing approaches ...
CNN-IETS: A CNN-based Probabilistic Approach for Information Extraction by Text Segmentation
Meng Hu, Zhixu Li, Yongxin Shen, An Liu, Guanfeng Liu, Kai Zheng, Lei Zhao
Pages: 1159-1168
Full text: PDFPDF

Information Extraction by Text Segmentation (IETS) aims at segmenting text inputs to extract implicit data values contained in them.The state-of-art IETS approaches mainly rely on machine learning techniques, either supervised or unsupervised.However, ...
SESSION: Session 7A: Health Analytics 1
A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection
Zitao Liu, Milos Hauskrecht
Pages: 1169-1177
Full text: PDFPDF

Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific ...
DiagTree: Diagnostic Tree for Differential Diagnosis
Yejin Kim, Jingyun Choi, Yosep Chong, Xiaoqian Jiang, Hwanjo Yu
Pages: 1179-1188
Full text: PDFPDF

Differential diagnosis is detection of one disease among similar diseases using evidence such as pathologic tests. A Partially Observed Markov Decision Process (POMDP) formulates the complex differential diagnosis process into a probabilistic decision-making ...
Fine-grained Patient Similarity Measuring using Deep Metric Learning
Jiazhi Ni, Jie Liu, Chenxin Zhang, Dan Ye, Zhirou Ma
Pages: 1189-1198
Full text: PDFPDF

Patient similarity measuring plays a significant role in many healthcare applications, such as cohort study and treatment comparative effectiveness research. Existing methods mainly rely on supervised metric learning method to study patient similarity ...
Differentially Private Regression for Discrete-Time Survival Analysis
Thông T. Nguyên, Siu Cheung Hui
Pages: 1199-1208
Full text: PDFPDF

In survival analysis, regression models are used to understand the effects of explanatory variables (e.g., age, sex, weight, etc.) to the survival probability. However, for sensitive survival data such as medical data, there are serious concerns about ...
SESSION: Session 7B: Privacy Preserving Data Mining
From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs
Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, Depeng Jin
Pages: 1209-1218
Full text: PDFPDF

It is well-known that online services resort to various cookies to track users through users' online service identifiers (IDs) - in other words, when users access online services, various "fingerprints" are left behind in the cyberspace. As they roam ...
Privacy-Preserving Collaborative Deep Learning with Application to Human Activity Recognition
Lingjuan Lyu, Xuanli He, Yee Wei Law, Marimuthu Palaniswami
Pages: 1219-1228
Full text: PDFPDF

The proliferation of wearable devices has contributed to the emergence of mobile crowdsensing, which leverages the power of the crowd to collect and report data to a third party for large-scale sensing and collaborative learning. However, since the third ...
Privacy Aware Temporal Profiling of Emails in Distributed Setup
Sutapa Mondal, Manish Shukla, Sachin Lodha
Pages: 1229-1238
Full text: PDFPDF

The enterprise email promises to be a rich source for knowledge discovery. This is made possible due to the direct nature of communication, support for diverse media types, active participation of entities and presence of chronological ordering of messages. ...
Name Disambiguation in Anonymized Graphs using Network Embedding
Baichuan Zhang, Mohammad Al Hasan
Pages: 1239-1248
Full text: PDFPDF

In real-world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesake of one another. Such mistakes deteriorate the performance of document retrieval, web search, ...
SESSION: Session 7C: Social Networks 1
Weakly-Guided User Stance Prediction via Joint Modeling of Content and Social Interaction
Rui Dong, Yizhou Sun, Lu Wang, Yupeng Gu, Yuan Zhong
Pages: 1249-1258
Full text: PDFPDF

Social media websites have become a popular outlet for online users to express their opinions on controversial issues, such as gun control and abortion. Understanding users' stances and their arguments is a critical task for policy-making process and ...
Social Media for Opioid Addiction Epidemiology: Automatic Detection of Opioid Addicts from Twitter and Case Studies
Yujie Fan, Yiming Zhang, Yanfang Ye, Xin li, Wanhong Zheng
Pages: 1259-1267
Full text: PDFPDF

Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. To combat such deadly epidemic, there is an urgent need for novel tools and methodologies to gain new insights into the behavioral ...
Understanding and Predicting Weight Loss with Mobile Social Networking Data
Zhiwei Wang, Tyler Derr, Dawei Yin, Jiliang Tang
Pages: 1269-1278
Full text: PDFPDF

It has become increasingly popular to use mobile social networking applications for weight loss and management. Users can not only create profiles and maintain their records but also perform a variety of social activities that shatter the barrier to ...
Tweet Geolocation: Leveraging Location, User and Peer Signals
Wen-Haw Chong, Ee-Peng Lim
Pages: 1279-1288
Full text: PDFPDF

Which venue is a tweet posted from? We referred this as fine-grained geolocation. To solve this problem effectively, we develop novel techniques to exploit each posting user's content history. This is motivated by our finding that most users do not share ...
SESSION: Session 7D: Application driven analysis
A Two-step Information Accumulation Strategy for Learning from Highly Imbalanced Data
Bin Liu, Min Zhang, Weizhi Ma, Xin Li, Yiqun Liu, Shaoping Ma
Pages: 1289-1298
Full text: PDFPDF

Highly imbalanced data is common in the real world and it is important but difficult to train an effective classifier. In this paper, Our major point is that the imbalance is the observed phenomenon but not the cause of the problem. The challenge is ...
Understanding Database Performance Inefficiencies in Real-world Web Applications
Cong Yan, Alvin Cheung, Junwen Yang, Shan Lu
Pages: 1299-1308
Full text: PDFPDF

Many modern database-backed web applications are built upon Object Relational Mapping (ORM) frameworks. While such frame- works ease application development by abstracting persistent data as objects, such convenience comes with a performance cost. In ...
Data Driven Chiller Plant Energy Optimization with Domain Knowledge
Hoang Dung Vu, Kok Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang
Pages: 1309-1317
Full text: PDFPDF

Refrigeration and chiller optimization is an important and well studied topic in mechanical engineering, mostly taking advantage of physical models, designed on top of over-simplified assumptions, over the equipments. Conventional optimization techniques ...
Partitioning Orders in Online Shopping Services
Sreenivas Gollapudi, Ravi Kumar, Debmalya Panigrahy, Rina Panigrahy
Pages: 1319-1328
Full text: PDFPDF

The rapid growth of the Internet has led to the widespread use of newer and richer models of online shopping and delivery services. The race to efficient large scale on-demand delivery has transformed such services into complex networks of shoppers (typically ...
SESSION: Session 7E: Text Mining
Taxonomy Induction Using Hypernym Subsequences
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Pages: 1329-1338
Full text: PDFPDF

We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic ...
Unsupervised Concept Categorization and Extraction from Scientific Document Titles
Adit Krishnan, Aravind Sankar, Shi Zhi, Jiawei Han
Pages: 1339-1348
Full text: PDFPDF

This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. ...
MIKE: Keyphrase Extraction by Integrating Multidimensional Information
Yuxiang Zhang, Yaocheng Chang, Xiaoqing Liu, Sujatha Das Gollapalli, Xiaoli Li, Chunjing Xiao
Pages: 1349-1358
Full text: PDFPDF

Traditional supervised keyphrase extraction models depend on the features of labelled keyphrases while prevailing unsupervised models mainly rely on structure of the word graph, with candidate words as nodes and edges capturing the co-occurrence information ...
QALink: Enriching Text Documents with Relevant Q&A Site Contents
Yixuan Tang, Weilong Huang, Qi Liu, Anthony K.H. Tung, Xiaoli Wang, Jisong Yang, Beibei Zhang
Pages: 1359-1368
Full text: PDFPDF

With rapid development of Q&A sites such as Quora and StackExchange, high quality question-answer pairs have been produced by users. These Q&A contents cover a wide range of topics, and they are useful for users to resolve queries and obtain ...
SESSION: Session 7F: Efficient Learning
Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory
Yanan Zheng, Lijie Wen, Jianmin Wang, Jun Yan, Lei Ji
Pages: 1369-1378
Full text: PDFPDF

Deep Generative Models (DGMs) are able to extract high-level representations from massive unlabeled data and are explainable from a probabilistic perspective. Such characteristics favor sequence modeling tasks. However, it still remains a huge challenge ...
Active Learning for Large-Scale Entity Resolution
Kun Qian, Lucian Popa, Prithviraj Sen
Pages: 1379-1388
Full text: PDFPDF

Entity resolution (ER) is the task of identifying different representations of the same real-world object across datasets. Designing and tuning ER algorithms is an error-prone, labor-intensive process, which can significantly benefit from data-driven, ...
Indexable Bayesian Personalized Ranking for Efficient Top-k Recommendation
Dung D. Le, Hady W. Lauw
Pages: 1389-1398
Full text: PDFPDF

Top-k recommendation seeks to deliver a personalized recommendation list of k items to a user. The dual objectives are (1) accuracy in identifying the items a user is likely to prefer, and (2) efficiency in constructing the recommendation list in real ...
Latency Reduction via Decision Tree Based Query Construction
Aman Grover, Dhruv Arya, Ganesh Venkataraman
Pages: 1399-1407
Full text: PDFPDF

LinkedIn as a professional network serves the career needs of 450 Million plus members. The task of job recommendation system is to nd the suitable job among a corpus of several million jobs and serve this in real time under tight latency constraints. ...
SESSION: Session 7G: Recommendation 2
Broad Learning based Multi-Source Collaborative Recommendation
Junxing Zhu, Jiawei Zhang, Lifang He, Quanyuan Wu, Bin Zhou, Chenwei Zhang, Philip S. Yu
Pages: 1409-1418
Full text: PDFPDF

Anchor links connect information entities, such as entities of movies or products, across networks from different sources, and thus information in these networks can be transferred directly via anchor links. Therefore, anchor links have great value to ...
Neural Attentive Session-based Recommendation
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, Jun Ma
Pages: 1419-1428
Full text: PDFPDF

Given e-commerce scenarios that user profiles are invisible, session-based recommendation is proposed to generate recommendation results from short sessions. Previous work only considers the user's sequential behavior in the current session, whereas ...
A Deep Recurrent Collaborative Filtering Framework for Venue Recommendation
Jarana Manotumruksa, Craig Macdonald, Iadh Ounis
Pages: 1429-1438
Full text: PDFPDF

Venue recommendation is an important application for Location-Based Social Networks (LBSNs), such as Yelp, and has been extensively studied in recent years. Matrix Factorisation (MF) is a popular Collaborative Filtering (CF) technique that can suggest ...
Recommendation with Capacity Constraints
Konstantina Christakopoulou, Jaya Kawale, Arindam Banerjee
Pages: 1439-1448
Full text: PDFPDF

In many recommendation settings, the candidate items for recommendation are associated with a maximum capacity, i.e., number of seats in a Point-of-Interest (POI) or number of item copies in the inventory. However, despite the prevalence of the capacity ...
SESSION: Session 8A: Recommendation 3
Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources
Yongfeng Zhang, Qingyao Ai, Xu Chen, W. Bruce Croft
Pages: 1449-1458
Full text: PDFPDF

The Web has accumulated a rich source of information, such as text, image, rating, etc, which represent different aspects of user preferences. However, the heterogeneous nature of this information makes it difficult for recommender systems to leverage ...
Interacting Attention-gated Recurrent Networks for Recommendation
Wenjie Pei, Jie Yang, Zhu Sun, Jie Zhang, Alessandro Bozzon, David M.J. Tax
Pages: 1459-1468
Full text: PDFPDF

Capturing the temporal dynamics of user preferences over items is important for recommendation. Existing methods mainly assume that all time steps in user-item interaction history are equally relevant to recommendation, which however does not apply in ...
A Personalised Ranking Framework with Multiple Sampling Criteria for Venue Recommendation
Jarana Manotumruksa, Craig Macdonald, Iadh Ounis
Pages: 1469-1478
Full text: PDFPDF

Recommending a ranked list of interesting venues to users based on their preferences has become a key functionality in Location-Based Social Networks (LBSNs) such as Yelp and Gowalla. Bayesian Personalised Ranking (BPR) is a popular pairwise recommendation ...
BayDNN: Friend Recommendation with Bayesian Personalized Ranking Deep Neural Network
Daizong Ding, Mi Zhang, Shao-Yuan Li, Jie Tang, Xiaotie Chen, Zhi-Hua Zhou
Pages: 1479-1488
Full text: PDFPDF

Friendship is the cornerstone to build a social network. In online social networks, statistics show that the leading reason for user to create a new friendship is due to recommendation. Thus the accuracy of recommendation matters. In this paper, we propose ...
SESSION: Session 8B: Text analysis
A Topic Model Based on Poisson Decomposition
Haixin Jiang, Rui Zhou, Limeng Zhang, Hua Wang, Yanchun Zhang
Pages: 1489-1498
Full text: PDFPDF

Determining appropriate statistical distributions for modeling text corpora is important for accurate estimation of numerical characteristics. Based on the validity of the test on a claim that the data conforms to Poisson distribution we propose Poisson ...
A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings
Rui Wang, Wei Liu, Chris McDonald
Pages: 1499-1507
Full text: PDFPDF

The meaning of a multi-word phrase not only depends on the meaning of its constituent words, but also the rules of composing them to give the so-called compositional semantic. However, many deep learning models for learning compositional semantics target ...
Words are Malleable: Computing Semantic Shifts in Political and Media Discourse
Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, Jaap Kamps
Pages: 1509-1518
Full text: PDFPDF

Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions ...
A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation
Gaurav Singh, Iain J. Marshall, James Thomas, John Shawe-Taylor, Byron C. Wallace
Pages: 1519-1528
Full text: PDFPDF

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically, we aim to build a model to infer distinct sets of (ontological) concepts describing complementary ...
SESSION: Session 8C: Adversarial IR
Sybil Defense in Crowdsourcing Platforms
Dong Yuan, Guoliang Li, Qi Li, Yudian Zheng
Pages: 1529-1538
Full text: PDFPDF

Crowdsourcing platforms have been widely deployed to solve many computer-hard problems, e.g., image recognition and entity resolution. Quality control is an important issue in crowdsourcing, which has been extensively addressed by existing quality-control ...
HoloScope: Topology-and-Spike Aware Fraud Detection
Shenghua Liu, Bryan Hooi, Christos Faloutsos
Pages: 1539-1548
Full text: PDFPDF

As online fraudsters invest more resources, including purchasing large pools of fake user accounts and dedicated IPs, fraudulent attacks become less obvious and their detection becomes increasingly challenging. Existing approaches such as average degree ...
Building a Dossier on the Cheap: Integrating Distributed Personal Data Resources Under Cost Constraints
Imrul Chowdhury Anindya, Harichandan Roy, Murat Kantarcioglu, Bradley Malin
Pages: 1549-1558
Full text: PDFPDF

A wide variety of personal data is routinely collected by numerous organizations that, in turn, share and sell their collections for analytic investigations (e.g., market research). To preserve privacy, certain identifiers are often redacted, perturbed ...
DeMalC: A Feature-rich Machine Learning Framework for Malicious Call Detection
Yuhong Li, Dongmei Hou, Aimin Pan, Zhiguo Gong
Pages: 1559-1567
Full text: PDFPDF

Malicious phone call is a plague, in which unscrupulous salesmen or criminals make to acquire money illegally from the victims. As a result, there has been broad interest in deveploing systems to make the end-users vigilant when receiving such phone ...
SESSION: Session 8D: Health Analytics 2/ Top-k
FA*IR: A Fair Top-k Ranking Algorithm
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates
Pages: 1569-1578
Full text: PDFPDF

In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n » k candidates, maximizing utility (i.e., select the "best" candidates) subject to group ...
Capturing Feature-Level Irregularity in Disease Progression Modeling
Kaiping Zheng, Wei Wang, Jinyang Gao, Kee Yuan Ngiam, Beng Chin Ooi, Wei Luen James Yip
Pages: 1579-1588
Full text: PDFPDF

Disease progression modeling (DPM) analyzes patients' electronic medical records (EMR) to predict the health state of patients, which facilitates accurate prognosis, early detection and treatment of chronic diseases. However, EMR are irregular because ...
Health Forum Thread Recommendation Using an Interest Aware Topic Model
Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama
Pages: 1589-1598
Full text: PDFPDF

We introduce a general, interest-aware topic model (IATM), in which known higher-level interests on topics expressed by each user can be modeled. We then specialize the IATM for use in consumer health forum thread recommendation by equating each user's ...
SESSION: Session 8E: Social Networks 2
HotSpots: Failure Cascades on Heterogeneous Critical Infrastructure Networks
Liangzhe Chen, Xinfeng Xu, Sangkeun Lee, Sisi Duan, Alfonso G. Tarditi, Supriya Chinthavali, B. Aditya Prakash
Pages: 1599-1607
Full text: PDFPDF

Critical Infrastructure Systems such as transportation, water and power grid systems are vital to our national security, economy, and public safety. Recent events, like the 2012 hurricane Sandy, show how the interdependencies among different CI networks ...
SOPER: Discovering the Influence of Fashion and the Many Faces of User from Session Logs using Stick Breaking Process
Lucky Dhakad, Mrinal Das, Chiranjib Bhattacharyya, Samik Datta, Mihir Kale, Vivek Mehta
Pages: 1609-1618
Full text: PDFPDF

Recommending lifestyle articles is of immediate interest to the e-commerce industry and is beginning to attract research attention. Often followed strategies, such as recommending popular items are inadequate for this vertical because of two reasons. ...
Semi-Supervised Event-related Tweet Identification with Dynamic Keyword Generation
Xin Zheng, Aixin Sun, Sibo Wang, Jialong Han
Pages: 1619-1628
Full text: PDFPDF

Twitter provides us a convenient channel to get access to the immediate information about major events. However, it is challenging to acquire a clean and complete set of event-related data due to the characteristics of tweets, eg short and noisy. ...
Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks
Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, Jiawei Han
Pages: 1629-1638
Full text: PDFPDF

Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the ...
SESSION: Session 8F: Feature/Entity Selection
Unsupervised Feature Selection with Joint Clustering Analysis
Shuai An, Jun Wang, Jinmao Wei, Zhenglu Yang
Pages: 1639-1648
Full text: PDFPDF

Unsupervised feature selection has raised considerable interests in the past decade, due to its remarkable performance in reducing dimensionality without any prior class information. Preserving reliable locality information and achieving excellent cluster ...
Multi-Label Feature Selection using Correlation Information
Ali Braytee, Wei Liu, Daniel R. Catchpoole, Paul J. Kennedy
Pages: 1649-1656
Full text: PDFPDF

High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification ...
Content Recommendation by Noise Contrastive Transfer Learning of Feature Representation
Yiyang Li, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang
Pages: 1657-1665
Full text: PDFPDF

Personalized recommendation has been proved effective as a content discovery tool for many online news publishers. As fresh news articles are frequently coming to the system while the old ones are fading away quickly, building a consistent and coherent ...
NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation
Minh C. Phan, Aixin Sun, Yi Tay, Jialong Han, Chenliang Li
Pages: 1667-1676
Full text: PDFPDF

Entity disambiguation, also known as entity linking, is the task of mapping mentions in text to the corresponding entities in a given knowledge base, e.g. Wikipedia. Two key challenges are making use of mention's context to disambiguate (i.e. local objective), ...
SESSION: Session 8G: Graph Mining 2
Relaxing Graph Pattern Matching With Explanations
Jia Li, Yang Cao, Shuai Ma
Pages: 1677-1686
Full text: PDFPDF

Traditional graph pattern matching is based on subgraph isomorphism, which is often too restrictive to identify meaningful matches. To handle this, taxonomy subgraph isomorphism has been proposed to relax the label constraints in the matching. Nonetheless, ...
Active Network Alignment: A Matching-Based Approach
Eric Malmi, Aristides Gionis, Evimaria Terzi
Pages: 1687-1696
Full text: PDFPDF

Network alignment is the problem of matching the nodes of two graphs, maximizing the similarity of the matched nodes and the edges between them. This problem is encountered in a wide array of applications---from biological networks to social networks ...
Discovering Graph Temporal Association Rules
Mohammad Hossein Namaki, Yinghui Wu, Qi Song, Peng Lin, Tingjian Ge
Pages: 1697-1706
Full text: PDFPDF

Detecting regularities between complex events in temporal graphs is critical for emerging applications. This paper proposes graph temporal association rules (GTAR). A GTAR extends traditional association rules to discover temporal associations for complex ...
Minimizing Tension in Teams
Behzad Golshan, Evimaria Terzi
Pages: 1707-1715
Full text: PDFPDF

In large organizations (e.g., companies, universities, etc.) individual experts with different work habits are asked to work together in order to complete projects or tasks. Oftentimes, the differences in the inherent work habits of these experts causes ...
SESSION: Session 9A: Queries
Interactive Spatial Keyword Querying with Semantics
Jiabao Sun, Jiajie Xu, Kai Zheng, Chengfei Liu
Pages: 1727-1736
Full text: PDFPDF

Conventional spatial keyword queries confront the difficulty of returning desired objects that are synonyms but morphologically different to query keywords. To overcome this flaw, this paper investigates the interactive spatial keyword querying with ...
From Query-By-Keyword to Query-By-Example: LinkedIn Talent Search Approach
Viet Ha-Thuc, Yan Yan, Xianren Wu, Vijay Dialani, Abhishek Gupta, Shakti Sinha
Pages: 1737-1745
Full text: PDFPDF

One key challenge in talent search is to translate complex criteria of a hiring position into a search query, while it is relatively easy for a searcher to list examples of suitable candidates for a given position. To improve search e ciency, we propose ...
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury
Pages: 1747-1756
Full text: PDFPDF

Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search ...
Deep Context Modeling for Web Query Entity Disambiguation
Zhen Liao, Xinying Song, Yelong Shen, Saekoo Lee, Jianfeng Gao, Ciya Liao
Pages: 1757-1765
Full text: PDFPDF

In this paper, we presented a new study for Web query entity disambiguation (QED), which is the task of disambiguating different candidate entities in a knowledge base given their mentions in a query. QED is particularly challenging because queries are ...
SESSION: Session 9B: Representation learning
An Attention-based Collaboration Framework for Multi-View Network Representation Learning
Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, Jiawei Han
Pages: 1767-1776
Full text: PDFPDF

Learning distributed node representations in networks has been attracting increasing attention recently due to its effectiveness in a variety of applications. Existing approaches usually study networks with a single type of proximity between nodes, which ...
Representation Learning of Large-Scale Knowledge Graphs via Entity Feature Combinations
Zhen Tan, Xiang Zhao, Wei Wang
Pages: 1777-1786
Full text: PDFPDF

Knowledge graphs are typical large-scale multi-relational structures, which comprise a large amount of fact triplets. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation ...
Learning Edge Representations via Low-Rank Asymmetric Projections
Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou
Pages: 1787-1796
Full text: PDFPDF

We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, ...
HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning
Tao-yang Fu, Wang-Chien Lee, Zhen Lei
Pages: 1797-1806
Full text: PDFPDF

In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich ...
SESSION: Session 9C: Graph Mining 3
Core Decomposition and Densest Subgraph in Multilayer Networks
Edoardo Galimberti, Francesco Bonchi, Francesco Gullo
Pages: 1807-1816
Full text: PDFPDF

Multilayer networks are a powerful paradigm to model complex systems, where various relations might occur among the same set of entities. Despite the keen interest in a variety of problems, algorithms, and analysis methods in this type of network, the ...
Fully Dynamic Algorithm for Top-k Densest Subgraphs
Muhammad Anis Uddin Nasir, Aristides Gionis, Gianmarco De Francisci Morales, Sarunas Girdzijauskas
Pages: 1817-1826
Full text: PDFPDF

Given a large graph,the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-k version of this problem, a naïve solution is to iteratively find the densest subgraph and remove it in each iteration. ...
Minimizing Dependence between Graphs
Yu Rong, Hong Cheng
Pages: 1827-1836
Full text: PDFPDF

In recent years, modeling the relation between two graphs has received unprecedented attention from researchers due to its wide applications in many areas, such as social analysis and bioinformatics. The nature of relations between two graphs can be ...
Exploiting Electronic Health Records to Mine Drug Effects on Laboratory Test Results
Mohamed Ghalwash, Ying Li, Ping Zhang, Jianying Hu
Pages: 1837-1846
Full text: PDFPDF

The proliferation of Electronic Health Records (EHRs) challenges data miners to discover potential and previously unknown patterns from a large collection of medical data. One of the tasks that we address in this paper is to reveal previously unknown ...
SESSION: Session 9D: Relational Mining
Efficient Discovery of Ontology Functional Dependencies
Sridevi Baskaran, Alexander Keller, Fei Chiang, Lukasz Golab, Jaroslaw Szlichta
Pages: 1847-1856
Full text: PDFPDF

Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We enhance dependency-based data cleaning ...
Automatic Navbox Generation by Interpretable Clustering over Linked Entities
Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang
Pages: 1857-1865
Full text: PDFPDF

Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership ...
A Two-Stage Framework for Computing Entity Relatedness in Wikipedia
Marco Ponza, Paolo Ferragina, Soumen Chakrabarti
Pages: 1867-1876
Full text: PDFPDF

Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on ...
Incorporating the Latent Link Categories in Relational Topic Modeling
Yuan He, Cheng Wang, Changjun Jiang
Pages: 1877-1886
Full text: PDFPDF

The soaring of social media services has greatly propelled the prevalence of document networks. Rather than a set of plain texts, documents are nodes in graphs. An observable link connects the documents at its two ends, thus it implicitly reflects the ...
SESSION: Session 9E: User characteristics
Tone Analyzer for Online Customer Service: An Unsupervised Model with Interfered Training
Peifeng Yin, Zhe Liu, Anbang Xu, Taiga Nakamura
Pages: 1887-1895
Full text: PDFPDF

Emotion analysis of online customer service conservation is important for good user experience and customer satisfaction. However, conventional metrics do not fit this application scenario. In this work, by collecting and labeling online conversations ...
Nationality Classification Using Name Embeddings
Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin, Steven Skiena
Pages: 1897-1906
Full text: PDFPDF

Nationality identification unlocks important demographic information, with many applications in biomedical and sociological research. Existing name-based nationality classifiers use name substrings as features and are trained on small, unrepresentative ...
Emotions in Social Networks: Distributions, Patterns, and Models
Shengmin Jin, Reza Zafarani
Pages: 1907-1916
Full text: PDFPDF

Understanding the role emotions play in social interactions has been a central research question in the social sciences. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored: ...
Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases
Yan Zhuang, Guoliang Li, Zhuojian Zhong, Jianhua Feng
Pages: 1917-1926
Full text: PDFPDF

With the vigorous development of the World Wide Web, many large-scale knowledge bases (KBs) have been generated. To improve the coverage of KBs, an important task is to integrate the heterogeneous KBs. Several automatic alignment methods have been proposed ...
SESSION: Session 9F: Engagement
Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems
Qingyun Wu, Hongning Wang, Liangjie Hong, Yue Shi
Pages: 1927-1936
Full text: PDFPDF

In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution ...
Predicting Startup Crowdfunding Success through Longitudinal Social Engagement Analysis
Qizhen Zhang, Tengyuan Ye, Meryem Essaidi, Shivani Agarwal, Vincent Liu, Boon Thau Loo
Pages: 1937-1946
Full text: PDFPDF

A key ingredient to a startup's success is its ability to raise funding at an early stage. Crowdfunding has emerged as an exciting new mechanism for connecting startups with potentially thousands of investors. Nonetheless, little is known about its effectiveness, ...
Optimizing Email Volume For Sitewide Engagement
Rupesh Gupta, Guanfeng Liang, Romer Rosales
Pages: 1947-1955
Full text: PDFPDF

In this paper we focus on the problem of optimizing email volume for maximizing sitewide engagement of an online social networking service. Email volume optimization approaches published in the past have proposed optimization of email volume for maximization ...
Understanding Engagement through Search Behaviour
Mengdie Zhuang, Gianluca Demartini, Elaine G. Toms
Pages: 1957-1966
Full text: PDFPDF

Evaluating user engagement with search is a critical aspect of understanding how to assess and improve information retrieval systems. While standard techniques for measuring user engagement use questionnaires, these are obtrusive to user interaction, ...
SESSION: Short Papers (alphabetical by lead authors' last names)
Citation Metadata Extraction via Deep Neural Network-based Segment Sequence Labeling
Dong An, Liangcai Gao, Zhuoren Jiang, Runtao Liu, Zhi Tang
Pages: 1967-1970
Full text: PDFPDF

Citation metadata extraction plays an important role in academic information retrieval and knowledge management. Current works on this task generally use rule-based, template-based or learning-based approaches but these methods usually either rely on ...
A Novel Approach for Efficient Computation of Community Aware Ridesharing Groups
Samiul Anwar, Shuha Nabila, Tanzima Hashem
Pages: 1971-1974
Full text: PDFPDF

The evolution of ridesharing services has reduced the road traffic congestions in recent years. However, a major concern for ridesharing services is sharing rides with strangers. To address this issue, a few ridesharing approaches have considered social ...
Extracting Entities of Interest from Comparative Product Reviews
Jatin Arora, Sumit Agrawal, Pawan Goyal, Sayan Pathak
Pages: 1975-1978
Full text: PDFPDF

This paper presents a deep learning based approach to extract product comparison information out of user reviews on various e-commerce websites. Any comparative product review has three major entities of information: the names of the products being compared, ...
A Neural Collaborative Filtering Model with Interaction-based Neighborhood
Ting Bai, Ji-Rong Wen, Jun Zhang, Wayne Xin Zhao
Pages: 1979-1982
Full text: PDFPDF

Recently, deep neural networks have been widely applied to recommender systems. A representative work is to utilize deep learning for modeling complex user-item interactions. However, similar to traditional latent factor models by factorizing user-item ...
Profiling DRDoS Attacks with Data Analytics Pipeline
Laure Berti-Equille, Yury Zhauniarovich
Pages: 1983-1986
Full text: PDFPDF

A large amount of Distributed Reflective Denial-of-Service (DRDoS) attacks are launched every day, and our understanding of the modus operandi of their perpetrators is yet very limited as we are submerged with so Big Data to analyze and do not have reliable ...
A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection
Weijie Bian, Si Li, Zhao Yang, Guang Chen, Zhiqing Lin
Pages: 1987-1990
Full text: PDFPDF

Answer selection for question answering is a challenging task, since it requires effective capture of the complex semantic relations between questions and answers. Previous remarkable approaches mainly adopt general Compare-Aggregate framework that performs ...
Learning Biological Sequence Types Using the Literature
Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel
Pages: 1991-1994
Full text: PDFPDF

We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used ...
Detecting Social Bots by Jointly Modeling Deep Behavior and Content Information
Chiyu Cai, Linjing Li, Daniel Zeng
Pages: 1995-1998
Full text: PDFPDF

Bots are regarded as the most common kind of malwares in the era of Web 2.0. In recent years, Internet has been populated by hundreds of millions of bots, especially on social media. Thus, the demand on effective and efficient bot detection algorithms ...
PMS: an Effective Approximation Approach for Distributed Large-scale Graph Data Processing and Mining
Yingjie Cao, Yangyang Zhang, Jianxin Li
Pages: 1999-2002
Full text: PDFPDF

Recently, large-scale graph data processing and mining has drawn great attention, and many distributed graph processing systems have been proposed. However, large-scale graph processing remains a challenging problem. Because the computation time in some ...
Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
Miriam Cha, Youngjune Gwon, H. T. Kung
Pages: 2003-2006
Full text: PDFPDF

We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that ...
Compact Multiple-Instance Learning
Jing Chai, Weiwei Liu, Ivor W. Tsang, Xiaobo Shen
Pages: 2007-2010
Full text: PDFPDF

The weakly supervised Multiple-Instance Learning (MIL) problem has been successfully applied in information retrieval tasks. Two related issues might affect the performance of MIL algorithms: how to cope with label ambiguities and how to deal with non-discriminative ...
Text Embedding for Sub-Entity Ranking from User Reviews
Chih-Yu Chao, Yi-Fan Chu, Hsiu-Wei Yang, Chuan-Ju Wang, Ming-Feng Tsai
Pages: 2011-2014
Full text: PDFPDF

This paper attempts to conduct analysis for one certain type of user reviews; that is, the reviews on a super-entity (e.g., restaurant) involve descriptions for many sub-entities (e.g., dishes). To deal with such analysis, we propose a text embedding ...
Summarizing Significant Changes in Network Traffic Using Contrast Pattern Mining
Elaheh Alipour Chavary, Sarah M. Erfani, Christopher Leckie
Pages: 2015-2018
Full text: PDFPDF

Extracting knowledge from the massive volumes of network traffic is an important challenge in network and security management. In particular, network managers require concise reports about significant changes in their network traffic. While most existing ...
Modeling Opinion Influence with User Dual Identity
Chengyao Chen, Zhitao Wang, Wenjie Li
Pages: 2019-2022
Full text: PDFPDF

Exploring the mechanism that explains how a user's opinion changes under the influence of his/her neighbors is of practical importance (e.g., for predicting the sentiment of his/her future opinion) and has attracted wide attention from both enterprises ...
An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias
Ruey-Cheng Chen, Leif Azzopardi, Falk Scholer
Pages: 2023-2026
Full text: PDFPDF

Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability ...
Text Coherence Analysis Based on Deep Neural Network
Baiyun Cui, Yingming Li, Yaqing Zhang, Zhongfei Zhang
Pages: 2027-2030
Full text: PDFPDF

In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation ...
Unsupervised Matrix-valued Kernel Learning For One Class Classification
Shaobo Dang, Xiongcai Cai, Yang Wang, Jianjia Zhang, Fang Chen
Pages: 2031-2034
Full text: PDFPDF

This paper is concerned with the one class classification(OCC) problem. By introducing the vector-valued function with regularizations in Y-valued Reproducing Hilbert Kernel Space(RHKS), we build an unsupervised classifier and discover ...
Analysis of Telegram, An Instant Messaging Service
Arash Dargahi Nobari, Negar Reshadatmand, Mahmood Neshati
Pages: 2035-2038
Full text: PDFPDF

Telegram has become one of the most successful instant messaging services in recent years. In this paper, we developed a crawler to gather its public data. To the best of our knowledge, this paper is the first attempt to analyze the structural and topical ...
Estimating Event Focus Time Using Neural Word Embeddings
Supratim Das, Arunav Mishra, Klaus Berberich, Vinay Setty
Pages: 2039-2042
Full text: PDFPDF

Time associated with news events has been leveraged as a complementary dimension to text in several applications such as temporal information retrieval, news event linking, etc. Short textual event descriptions (e.g., single sentences) are prevalent ...
Personalized Image Aesthetics Assessment
Xiang Deng, Chaoran Cui, Huidi Fang, Xiushan Nie, Yilong Yin
Pages: 2043-2046
Full text: PDFPDF

Automatically assessing image quality from an aesthetic perspective is of great interest to the high-level vision research community. Existing methods are typically non-personalized and quantify image aesthetics with a universal label. However, given ...
Efficient Fault-Tolerant Group Recommendation Using alpha-beta-core
Danhao Ding, Hui Li, Zhipeng Huang, Nikos Mamoulis
Pages: 2047-2050
Full text: PDFPDF

Fault-tolerant group recommendation systems based on subspace clustering successfully alleviate high-dimensionality and sparsity problems. However, the cost of recommendation grows exponentially with the size of dataset. To address this issue, we model ...
On Discovering the Number of Document Topics via Conceptual Latent Space
Nghia Duong-Trung, Lars Schmidt-Thieme
Pages: 2051-2054
Full text: PDFPDF

Topic modeling is a widely used technique in knowledge discovery and data mining. However, finding the right number of topics in a given text source has remained a challenging issue. In this paper, we study the concept of conceptual stability via nonnegative ...
Chinese Named Entity Recognition with Character-Word Mixed Embedding
Shijia E, Yang Xiang
Pages: 2055-2058
Full text: PDFPDF

Named Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, ...
An Empirical Study of Embedding Features in Learning to Rank
Faezeh Ensan, Ebrahim Bagheri, Amal Zouaq, Alexandre Kouznetsov
Pages: 2059-2062
Full text: PDFPDF

This paper explores the possibility of using neural embedding features for enhancing the effectiveness of ad hoc document ranking based on learning to rank models. We have extensively introduced and investigated the effectiveness of features learnt based ...
Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums
Sedigheh Eslami, Asia J. Biega, Rishiraj Saha Roy, Gerhard Weikum
Pages: 2063-2066
Full text: PDFPDF

Users who wish to leave an online forum often do not have the freedom to erase their data completely from the service providers' (SP) system. The primary reason behind this is that analytics on such user data form a core component of many online providers' ...
QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks
Zhou Fang, Tong Yu, Ole J. Mengshoel, Rajesh K. Gupta
Pages: 2067-2070
Full text: PDFPDF

Deep neural networks (DNNs) are popular in diverse fields such as computer vision and natural language processing. DNN inference tasks are emerging as a service provided by cloud computing environments. However, cloud-hosted DNN inference faces new challenges ...
Geographic and Temporal Trends in Fake News Consumption During the 2016 US Presidential Election
Adam Fourney, Miklos Z. Racz, Gireeja Ranade, Markus Mobius, Eric Horvitz
Pages: 2071-2074
Full text: PDFPDF

We present an analysis of traffic to websites known for publishing fake news in the months preceding the 2016 US presidential election. The study is based on the combined instrumentation data from two popular desktop web browsers: Internet Explorer 11 ...
Inferring Appliance Energy Usage from Smart Meters using Fully Convolutional Encoder Decoder Networks
Felan Carlo C. Garcia, Erees Queen B. Macabebe
Pages: 2075-2078
Full text: PDFPDF

Energy management presents one of the principal sustainability challenges within urban centers given that they account for 75% of the energy consumption worldwide. In the context of a smart city framework, the use of intelligent urban systems provides ...
Tracking the Impact of Fact Deletions on Knowledge Graph Queries using Provenance Polynomials
Garima Gaur, Srikanta J. Bedathur, Arnab Bhattacharya
Pages: 2079-2082
Full text: PDFPDF

Critical business applications in domains ranging from technical support to healthcare increasingly rely on large-scale, automatically constructed knowledge graphs. These applications use the results of complex queries over knowledge graphs in order ...
An Euclidean Distance based on the Weighted Self-information Related Data Transformation for Nominal Data Clustering
Lei Gu, Liying Zhang, Yang Zhao
Pages: 2083-2086
Full text: PDFPDF

Numerical data clustering is a tractable task since well-defined numerical measures like traditional Euclidean distance can be directly used for it, but nominal data clustering is a very difficult problem because there exists no natural relative ordering ...
Interest Diffusion in Heterogeneous Information Network for Personalized Item Ranking
Mukul Gupta, Pradeep Kumar, Rajhans Mishra
Pages: 2087-2090
Full text: PDFPDF

Personalized item ranking for recommending top-N items of interest to a user is an interesting and challenging problem in e-commerce. Researchers and practitioner are continuously trying to devise new methodologies to improve the accuracy of recommendations. ...
Source Retrieval for Web-Scale Text Reuse Detection
Matthias Hagen, Martin Potthast, Payam Adineh, Ehsan Fatehifar, Benno Stein
Pages: 2091-2094
Full text: PDFPDF

The first step of text reuse detection addresses the source retrieval problem: given a suspicious document, a set of candidate sources from which text might have been reused have to be retrieved by querying a search engine. Afterwards, in a second step, ...
Smart City Analytics: Ensemble-Learned Prediction of Citizen Home Care
Casper Hansen, Christian Hansen, Stephen Alstrup, Christina Lioma
Pages: 2095-2098
Full text: PDFPDF

We present an ensemble learning method that predicts large increases in the hours of home care received by citizens. The method is supervised, and uses different ensembles of either linear (logistic regression) or non-linear (random forests) classifiers. ...
Fast K-means for Large Scale Clustering
Qinghao Hu, Jiaxiang Wu, Lu Bai, Yifan Zhang, Jian Cheng
Pages: 2099-2102
Full text: PDFPDF

K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even ...
Graph Ladder Networks for Network Classification
Ruiqi Hu, Shirui Pan, Jing Jiang, Guodong Long
Pages: 2103-2106
Full text: PDFPDF

Numerous network representation-based algorithms for network classification have emerged in recent years, but many suffer from two limitations. First, they separate the network representation learning and node classification in networks into two steps, ...
A Communication Efficient Parallel DBSCAN Algorithm based on Parameter Server
Xu Hu, Jun Huang, Minghui Qiu
Pages: 2107-2110
Full text: PDFPDF

Recent benchmark studies show that MPI-based distributed implementations of DBSCAN, e.g., PDSDBSCAN, outperform other implementations such as apache Spark etc. However, the communication cost of MPI DBSCAN increases drastically with the number of processors, ...
KIEM: A Knowledge Graph based Method to Identify Entity Morphs
Longtao Huang, Lin Zhao, Shangwen Lv, Fangzhou Lu, Yue Zhai, Songlin Hu
Pages: 2111-2114
Full text: PDFPDF

An entity on the web can be referred by numerous morphs that are always ambiguous, implicit and informal, which makes it challenging to accurately identify all the morphs corresponding to a specific entity. In this paper, we introduce a novel method ...
Ontology-based Graph Visualization for Summarized View
Xin Huang, Byron Choi, Jianliang Xu, William K. Cheung, Yanchun Zhang, Jiming Liu
Pages: 2115-2118
Full text: PDFPDF

Data summarization that presents a small subset of a dataset to users has been widely applied in numerous applications and systems. Many datasets are coded with hierarchical terminologies, e.g., the international classification of Diseases-9, Medical ...
An Ad CTR Prediction Method Based on Feature Learning of Deep and Shallow Layers
Zai Huang, Zhen Pan, Qi Liu, Bai Long, Haiping Ma, Enhong Chen
Pages: 2119-2122
Full text: PDFPDF

In online advertising, Click-Through Rate (CTR) prediction is a crucial task, as it may benefit the ranking and pricing of online ads. To the best of our knowledge, most of the existing CTR prediction methods are shallow layer models (e.g., Logistic ...
A Framework for Estimating Execution Times of IO Traces on SSDs
Yoonsuk Kang, Yong-Yeon Jo, Jaehyuk Cha, Wan D. Bae, Sang-Wook Kim
Pages: 2123-2126
Full text: PDFPDF

With the NAND flash memory technology of solid-state drives (SSDs), the usage of SSDs is expanded to various devices. Due to the cost and time limitations of measuring the actual execution time of each application on SSDs, it is difficult for users to ...
Ranking Rich Mobile Verticals based on Clicks and Abandonment
Mami Kawasaki, Inho Kang, Tetsuya Sakai
Pages: 2127-2130
Full text: PDFPDF

We consider the problem of ranking rich verticals, which we call "cards," for a given mobile search query. Examples of card types include "SHOP" (showing access and contact information of a shop), "WEATHER" (showing a weather forecast for a particular ...
Semantic Rules for Machine Diagnostics: Execution and Management
Evgeny Kharlamov, Ognjen Savkoviý, Guohui Xiao, Rafael Penaloza, Gulnar Mehdi, Mikhail Roshchin, Ian Horrocks
Pages: 2131-2134
Full text: PDFPDF

Rule-based diagnostics of equipment is an important task in industry. In this paper we present how semantic technologies can enhance diagnostics. In particular, we present our semantic rule language sigRL that is inspired by the real diagnostic languages ...
Machine Learning based Performance Modeling of Flash SSDs
Jaehyung Kim, Jinuk Park, Sanghyun Park
Pages: 2135-2138
Full text: PDFPDF

Flash memory based solid state drives(SSDs) have alleviated the I/O bottleneck by exploiting its data parallel design. In an enterprise environment, Flash SSD used in the form of a hybrid storage architecture to achieve the better performance with lower ...
A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information
Sunjae Kwon, Youngjoong Ko, Jungyun Seo
Pages: 2139-2142
Full text: PDFPDF

Korean named-entity recognition (NER) systems have been developed mainly on the morphological-level, and they are commonly based on a pipeline framework that identifies named-entities (NEs) following the morphological analysis. However, this framework ...
IDAE: Imputation-boosted Denoising Autoencoder for Collaborative Filtering
Jae-woong Lee, Jongwuk Lee
Pages: 2143-2146
Full text: PDFPDF

In recent years, while deep neural networks have shown impressive performance to solve various recognition and classification problems, collaborative filtering (CF) received relatively little attention to utilize deep neural networks. Because of inherent ...
Computing Betweenness Centrality in B-hypergraphs
Kwang Hee Lee, Myoung Ho Kim
Pages: 2147-2150
Full text: PDFPDF

The directed hypergraph (especially B-hypergraph) has hyperedges that represent relations of a set of source nodes to a single target node. Author-cited networks and cellular signaling pathways can be modeled as a B-hypergraph. In this paper every source ...
Structural-fitting Word Vectors to Linguistic Ontology for Semantic Relatedness Measurement
Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Hsin-Hsi Chen
Pages: 2151-2154
Full text: PDFPDF

With the aid of recently proposed word embedding algorithms, the study of semantic relatedness has progressed and advanced rapidly. In this research, we propose a novel structural-fitting method that utilizes the linguistic ontology into vector space ...
Alternating Pointwise-Pairwise Learning for Personalized Item Ranking
Yu Lei, Wenjie Li, Ziyu Lu, Miao Zhao
Pages: 2155-2158
Full text: PDFPDF

Pointwise and pairwise collaborative ranking are two major classes of algorithms for personalized item ranking. This paper proposes a novel joint learning method named alternating pointwise-pairwise learning (APPL) to improve ranking performance. APPL ...
Deep Multi-Similarity Hashing for Multi-label Image Retrieval
Tong Li, Sheng Gao, Yajing Xu
Pages: 2159-2162
Full text: PDFPDF

mage retrieval based on deep hashing methods has attracted more and more attentions from both academic and industry, due to the out-standing performance of deep neural network in various tasks of computer vision. However, most of the hashing methods ...
Learning Graph-based Embedding For Time-Aware Product Recommendation
Yuqi Li, Weizheng Chen, Hongfei Yan
Pages: 2163-2166
Full text: PDFPDF

In this paper, we propose a novel Product Graph Embedding (PGE) model to investigate time-aware product recommendation by leveraging the network representation learning technique. Our model captures the sequential influences of products by transforming ...
An Enhanced Topic Modeling Approach to Multiple Stance Identification
Junjie Lin, Wenji Mao, Yuhao Zhang
Pages: 2167-2170
Full text: PDFPDF

People often publish online texts to express their stances, which reflect the essential viewpoints they stand. Stance identification has been an important research topic in text analysis and facilitates many applications in business, public security ...
TICC: Transparent Inter-Column Compression for Column-Oriented Database Systems
Hao Liu, Yudian Ji, Jiang Xiao, Haoyu Tan, Qiong Luo, Lionel M. Ni
Pages: 2171-2174
Full text: PDFPDF

In this paper, we present TICC, an automatic data compression component that can transparently eliminate data redundancies across columns in column-oriented database systems. We further propose two approaches to integrate inter-column compression into ...
Exploiting User Consuming Behavior for Effective Item Tagging
Shen Liu, Hongyan Liu
Pages: 2175-2178
Full text: PDFPDF

Automatic tagging techniques are important for many applications such as searching and recommendation, which has attracted many researchers' attention in recent years. Existing methods mainly rely on users' tagging behavior or items' content information ...
SEQ: Example-based Query for Spatial Objects
Siqiang Luo, Jiafeng Hu, Reynold Cheng, Jing Yan, Ben Kao
Pages: 2179-2182
Full text: PDFPDF

Spatial object search is prevalent in map services (e.g., Google Maps). To rent an apartment, for example, one will take into account its nearby facilities, such as supermarkets, hospitals, and subway stations. Traditional keyword search solutions, such ...
Truth Discovery by Claim and Source Embedding
Shanshan Lyu, Wentao Ouyang, Huawei Shen, Xueqi Cheng
Pages: 2183-2186
Full text: PDFPDF

Information gathered from multiple sources on the Web often exhibits conflicts. This phenomenon motivates the need of truth discovery, which aims to automatically find the true claim among multiple conflicting claims. Existing truth discovery methods ...
Automatic Catchphrase Identification from Legal Court Case Documents
Arpan Mandal, Kripabandhu Ghosh, Arindam Pal, Saptarshi Ghosh
Pages: 2187-2190
Full text: PDFPDF

Automatically identifying catchphrases from legal court case documents is an important problem in Legal Information Retrieval, which has not been extensively studied. In this work, we propose an unsupervised approach for extraction and ranking of catchphrases ...
Learning Temporal Ambiguity in Web Search Queries
Behrooz Mansouri, Mohammad Sadegh Zahedi, Maseud Rahgozar, Farhad Oroumchian, Ricardo Campos
Pages: 2191-2194
Full text: PDFPDF

Time has strong influence on web search. The temporal intent of the searcher adds an important dimension to the relevance judgments of web queries. However, lack of understanding their temporal requirements increases the ambiguity of the queries, turning ...
Online Expectation-Maximization for Click Models
Ilya Markov, Alexey Borisov, Maarten de Rijke
Pages: 2195-2198
Full text: PDFPDF

Click models allow us to interpret user click behavior in search interactions and to remove various types of bias from user clicks. Existing studies on click models consider a static scenario where user click behavior does not change over time. We show ...
Task Embeddings: Learning Query Embeddings using Task Context
Rishabh Mehrotra, Emine Yilmaz
Pages: 2199-2202
Full text: PDFPDF

Continuous space word embedding have been shown to be highly effective in many information retrieval tasks. Embedding representation models make use of local information available in immediately surrounding words to project nearby context words closer ...
Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection
Zhao Meng, Lili Mou, Zhi Jin
Pages: 2203-2206
Full text: PDFPDF

Speaker change detection (SCD) is an important task in dialog modeling. Our paper addresses the problem of text-based SCD, which differs from existing audio-based studies and is useful in various scenarios, for example, processing dialog transcripts ...
Predicting Short-Term Public Transport Demand via Inhomogeneous Poisson Processes
Aditya Krishna Menon, Young Lee
Pages: 2207-2210
Full text: PDFPDF

Forecasting short term passenger demand for public transport is a core problem in urban mobility. Typically, this is addressed using Poisson regression or homogeneous Poisson processes. However, such approaches have several limitations, including susceptibility ...
Analyzing Mathematical Content to Detect Academic Plagiarism
Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomas Skopal, Bela Gipp
Pages: 2211-2214
Full text: PDFPDF

This paper presents, to our knowledge, the first study on analyzing mathematical expressions to detect academic plagiarism. We make the following contributions. First, we investigate confirmed cases of plagiarism to categorize the similarities of mathematical ...
Learning Entity Type Embeddings for Knowledge Graph Completion
Changsung Moon, Paul Jones, Nagiza F. Samatova
Pages: 2215-2218
Full text: PDFPDF

Missing data is a severe problem for algorithms that operate over knowledge graphs (KGs). Most previous research in KG completion has focused on the problem of inferring missing entities and missing relation types between entities. However, in addition ...
Identifying Top-K Influential Nodes in Networks
Sara Mumtaz, Xiaoyang Wang
Pages: 2219-2222
Full text: PDFPDF

Network Centrality is one of the core concepts in network analysis, which ranks the importance of a node in a network. A considerably extensive range of centrality measures exist that serve the purpose of quantifying the importance of a node according ...
Paraphrastic Fusion for Abstractive Multi-Sentence Compression Generation
Mir Tafseer Nayeem, Yllias Chali
Pages: 2223-2226
Full text: PDFPDF

This paper presents a first attempt towards finding an abstractive compression generation system for a set of related sentences which jointly models sentence fusion and paraphrasing using continuous vector representations. Our paraphrastic fusion system ...
J-REED: Joint Relation Extraction and Entity Disambiguation
Dat Ba Nguyen, Martin Theobald, Gerhard Weikum
Pages: 2227-2230
Full text: PDFPDF

Information extraction (IE) from text sources can either be performed as Model-based IE (i.e, by using a pre-specified domain of target entities and relations) or as Open IE (i.e., with no particular assumptions about the target domain). While Model-based ...
Collaborative Topic Regression with Denoising AutoEncoder for Content and Community Co-Representation
Trong T. Nguyen, Hady W. Lauw
Pages: 2231-2234
Full text: PDFPDF

Personalized recommendation of items frequently faces scenarios where we have sparse observations on users' adoption of items. In the literature, there are two promising directions. One is to connect sparse items through similarity in content. The other ...
Accurate Sentence Matching with Hybrid Siamese Networks
Massimo Nicosia, Alessandro Moschitti
Pages: 2235-2238
Full text: PDFPDF

Recent neural network approaches to sentence matching compute the probability of two sentences being similar by minimizing a logistic loss. In this paper, we learn sentence representations by means of a siamese network, which: (i) uses encoders that ...
Collaborative Sequence Prediction for Sequential Recommender
Shuzi Niu, Rongzhi Zhang
Pages: 2239-2242
Full text: PDFPDF

With the surge of deep learning, more and more attention has been put on the sequential recommender. It can be casted as sequence prediction problem, where we will predict the next item given the previous items. RNN approaches are able to capture the ...
Boolean Matrix Decomposition by Formal Concept Sampling
Petr Osicka, Martin Trnecka
Pages: 2243-2246
Full text: PDFPDF

Finding interesting patterns is a classical problem in data mining. Boolean matrix decomposition is nowadays a standard tool that can find a set of patterns-also called factors-in Boolean data that explain the data well. We describe and experimentally ...
Enhancing Knowledge Graph Completion By Embedding Correlations
Soumajit Pal, Jacopo Urbani
Pages: 2247-2250
Full text: PDFPDF

Despite their large sizes, modern Knowledge Graphs (KGs) are still highly incomplete. Statistical relational learning methods can detect missing links by "embedding" the nodes and relations into latent feature tensors. Unfortunately, these methods are ...
Robust Heterogeneous Discriminative Analysis for Single Sample Per Person Face Recognition
Meng Pang, Yiu-ming Cheung, Binghui Wang, Risheng Liu
Pages: 2251-2254
Full text: PDFPDF

Single sample face recognition is one of the most challenging problems in face recognition (FR), where only one single sample per person (SSPP) is enrolled in the gallery set for training. Although patch-based methods have achieved great success in FR ...
Deep Neural Networks for News Recommendations
Keunchan Park, Jisoo Lee, Jaeho Choi
Pages: 2255-2258
Full text: PDFPDF

A fundamental role of news websites is to recommend articles that are interesting to read. The key challenge of news recommendation is to recommend newly published articles. Unlike other domains, outdated items are considered to be irrelevant in the ...
TATHYA: A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates
Ayush Patwari, Dan Goldwasser, Saurabh Bagchi
Pages: 2259-2262
Full text: PDFPDF

Fact-checking political discussions has become an essential clog in computational journalism. This task encompasses an important sub-task---identifying the set of statements with 'check-worthy' claims. Previous work has treated this as a simple text ...
A Collaborative Ranking Model for Cross-Domain Recommendations
Dimitrios Rafailidis, Fabio Crestani
Pages: 2263-2266
Full text: PDFPDF

With the advent of social media, generating high quality cross-domain recommendations has become more and more important for users of heterogeneous domains. In this study, we propose a collaborative ranking model to generate cross-domain recommendations. ...
Combining Local and Global Word Embeddings for Microblog Stemming
Anurag Roy, Trishnendu Ghorai, Kripabandhu Ghosh, Saptarshi Ghosh
Pages: 2267-2270
Full text: PDFPDF

Stemming is a vital step employed to improve retrieval performance through efficient unification of morphological variants of a word. We propose an unsupervised, context-specific stemming algorithm for microblogs, based on both local and global word ...
An Improved Test Collection and Baselines for Bibliographic Citation Recommendation
Dwaipayan Roy
Pages: 2271-2274
Full text: PDFPDF

The problem of recommending bibliographic citations to an author who is writing an article has been well-studied. However, different researchers have used different datasets to evaluate proposed techniques, and have sometimes reported contradictory findings ...
A Way to Boost Semi-NMF for Document Clustering
Aghiles Salah, Melissa Ailem, Mohamed Nadif
Pages: 2275-2278
Full text: PDFPDF

Semi-Non Negative Matrix Factorization (Semi-NMF) is one of the most popular extensions of NMF, it extends the applicable range of NMF models, to data having mixed signs, as well as strengthens their relation to clustering. However, Semi-NMF has been ...
Recipe Popularity Prediction with Deep Visual-Semantic Fusion
Satoshi Sanjo, Marie Katsurai
Pages: 2279-2282
Full text: PDFPDF

Predicting the popularity of user-created recipes has great potential to be adopted in several applications on recipe-sharing websites. To ensure timely prediction when a recipe is uploaded, a prediction model needs to be trained based on the recipe's ...
Revealing the Hidden Links in Content Networks: An Application to Event Discovery
Antonia Saravanou, Ioannis Katakis, George Valkanas, Vana Kalogeraki, Dimitrios Gunopulos
Pages: 2283-2286
Full text: PDFPDF

Social networks have become the de facto online resource for people to share, comment on and be informed about events pertinent to their interests and livelihood, ranging from road traffic or an illness to concerts and earthquakes, to economics and politics. ...
When Labels Fall Short: Property Graph Simulation via Blending of Network Structure and Vertex Attributes
Arun V. Sathanur, Sutanay Choudhury, Cliff Joslyn, Sumit Purohit
Pages: 2287-2290
Full text: PDFPDF

Property graphs can be used to represent heterogeneous networks with labeled (attributed) vertices and edges. Given a property graph, simulating another graph with same or greater size with the same statistical properties with respect to the labels and ...
Integrating the Framing of Clinical Questions via PICO into the Retrieval of Medical Literature for Systematic Reviews
Harrisen Scells, Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi, Shlomo Geva
Pages: 2291-2294
Full text: PDFPDF

The PICO process is a technique used in evidence based practice to frame and answer clinical questions. It involves structuring the question around four types of clinical information: population, intervention, control or comparison and outcome. The PICO ...
pm-SCAN: an I/O Efficient Structural Clustering Algorithm for Large-scale Graphs
Jung Hyuk Seo, Myoung Ho Kim
Pages: 2295-2298
Full text: PDFPDF

Most existing algorithms for graph clustering, including SCAN, are not designed to cope with large volumes of data that cannot fit in main memory. When there is not enough memory, those algorithms will incur thrashing, i.e. result in huge I/O costs. ...
Knowledge Graph Embedding with Triple Context
Jun Shi, Huan Gao, Guilin Qi, Zhangquan Zhou
Pages: 2299-2302
Full text: PDFPDF

Knowledge graph embedding, which aims to represent entities and relations in vector spaces, has shown outstanding performance on a few knowledge graph completion tasks. Most existing methods are based on the assumption that a knowledge graph is a set ...
Hybrid MemNet for Extractive Summarization
Abhishek Kumar Singh, Manish Gupta, Vasudeva Varma
Pages: 2303-2306
Full text: PDFPDF

Extractive text summarization has been an extensive research problem in the field of natural language understanding. While the conventional approaches rely mostly on manually compiled features to generate the summary, few attempts have been made in developing ...
Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model
Luca Soldaini, Andrew Yates, Nazli Goharian
Pages: 2307-2310
Full text: PDFPDF

The rapid increase of medical literature poses a significant challenge for physicians, who have repeatedly reported to struggle to keep up to date with developments in research. This gap is one of the main challenges in integrating recent advances in ...
SIMD-Based Multiple Sets Intersection with Dual-Scale Search Algorithm
Xingshen Song, Yuexiang Yang, Xiaoyong Li
Pages: 2311-2314
Full text: PDFPDF

Conjunctive Boolean query is one fundamental operation for document retrieval in many information systems and databases. Various algorithms have been put up in terms of maximizing the query efficiency. In recent years, researchers began to exploit the ...
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Avikalp Srivastava, Madhav Datt
Pages: 2315-2318
Full text: PDFPDF

Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant ...
How Safe is Your (Taxi) Driver?
Rade Stanojevic
Pages: 2319-2322
Full text: PDFPDF

For an auto insurer, understanding the risk of individual drivers is a critical factor in building a healthy and profitable portfolio. For decades, assessing the risk of drivers has relied on demographic information which allows the insurer to segment ...
Sentence Retrieval with Sentiment-specific Topical Anchoring for Review Summarization
Jiaxing Tan, Alexander Kotov, Rojiar Pir Mohammadiani, Yumei Huo
Pages: 2323-2326
Full text: PDFPDF

We propose Topic Anchoring-based Review Summarization (TARS), a two-step extractive summarization method, which creates review summaries from the sentences that represent the most important aspects of a review. In the first step, the proposed method ...
Visualizing Deep Neural Networks with Interaction of Super-pixels
Shixin Tian, Ying Cai
Pages: 2327-2330
Full text: PDFPDF

An effective way to visualize the prediction of deep neural networks on an image is to decompose the prediction into the contribution of units (pixels or patches). In the existing works, these units are largely considered independently, thus limiting ...
Collecting Non-Geotagged Local Tweets via Bandit Algorithms
Saki Ueda, Yuto Yamaguchi, Hiroyuki Kitagawa
Pages: 2331-2334
Full text: PDFPDF

How can we collect non-geotagged tweets posted by users in a specific location as many as possible in a limited time span? How can we find such users if we do not have much information about the specified location? Although there are varieties of methods ...
A Temporal Attentional Model for Rumor Stance Classification
Amir Pouran Ben Veyseh, Javid Ebrahimi, Dejing Dou, Daniel Lowd
Pages: 2335-2338
Full text: PDFPDF

Rumor stance classification is the task of determining the stance towards a rumor in text. This is the first step in effective rumor tracking on social media which is an increasingly important task. In this work, we analyze Twitter users' stance toward ...
Improving the Gain of Visual Perceptual Behaviour on Topic Modeling for Text Recommendation
Cheng Wang, Yujuan Fang, Zheng Tan, Yuan He
Pages: 2339-2342
Full text: PDFPDF

Internet information services have been greatly improved profiting from the growing performance of interest mining technology. Visual perceptual behaviours, a new hotspot of mining user's interests, have resulted in great gains in some typical Internet ...
Semantic Annotation for Places in LBSN through Graph Embedding
Yan Wang, Zongxu Qin, Jun Pang, Yang Zhang, Jin Xin
Pages: 2343-2346
Full text: PDFPDF

With the prevalence of location-based social networks (LBSNs), automated semantic annotation for places plays a critical role in many LBSN-related applications. Although a line of research continues to enhance labeling accuracy, there is still a lot ...
A Study of Feature Construction for Text-based Forecasting of Time Series Variables
Yiren Wang, Dominic Seyler, Shubhra Kanti Karmaker Santu, ChengXiang Zhai
Pages: 2347-2350
Full text: PDFPDF

Time series are ubiquitous in the world since they are used to measure various phenomena (e.g., temperature, spread of a virus, sales, etc.). Forecasting of time series is highly beneficial (and necessary) for optimizing decisions, yet is a very challenging ...
Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter
Yiwei Wang, Mark James Carman, Yuan-Fang Li
Pages: 2351-2354
Full text: PDFPDF

Modern Knowledge Graphs such as DBPedia contain significant information regarding Named Entities and the logical relationships which exist between them. Twitter on the other hand, contains important information on the popularity and frequency with which ...
Integrating Side Information for Boosting Machine Comprehension
Yutong Wang, Yixin Xu, Min Yang, Zhou Zhao, Jun Xiao, Yueting Zhuang
Pages: 2355-2358
Full text: PDFPDF

Machine Reading and Comprehension recently has drawn a fair amount of attention in the field of natural language processing. In this paper, we consider integrating side information to improve machine comprehension on answering cloze-style questions more ...
Unsupervised Feature Selection with Heterogeneous Side Information
Xiaokai Wei, Bokai Cao, Philip S. Yu
Pages: 2359-2362
Full text: PDFPDF

Compared to supervised feature selection, unsupervised feature selection tends to be more challenging due to the lack of guidance from class labels. Along with the increasing variety of data sources, many datasets are also equipped with certain side ...
An Empirical Study of Community Overlap: Ground-truth, Algorithmic Solutions, and Implications
Joyce Jiyoung Whang
Pages: 2363-2366
Full text: PDFPDF

In real-world social networks, communities tend to be overlapped with each other because a vertex can belong to multiple communities. To identify these overlapping communities, a number of overlapping community detection methods have been proposed over ...
Non-Exhaustive, Overlapping Co-Clustering
Joyce Jiyoung Whang, Inderjit S. Dhillon
Pages: 2367-2370
Full text: PDFPDF

The goal of co-clustering is to simultaneously identify a clustering of the rows as well as the columns of a two dimensional data matrix. Most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters. However, ...
Simulating Zero-Resource Spoken Term Discovery
Jerome White, Douglas W. Oard
Pages: 2371-2374
Full text: PDFPDF

If search engines are ever to index all of the spoken content in the world, they will need to handle hundreds of languages for which no automatic speech recognition systems exist. Zero-resource spoken term discovery, in which repeated content is detected ...
Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable?
Colin Wilkie, Leif Azzopardi
Pages: 2375-2378
Full text: PDFPDF

Algorithmic bias presents a difficult challenge within Information Retrieval. Long has it been known that certain algorithms favour particular documents due to attributes of these documents that are not directly related to relevance. The evaluation of ...
Session-aware Information Embedding for E-commerce Product Recommendation
Chen Wu, Ming Yan
Pages: 2379-2382
Full text: PDFPDF

Most of the existing recommender systems assume that user's visiting history can be constantly recorded. However, in recent online services, the user identification may be usually unknown and only limited online user behaviors can be used. It is of great ...
Conflict of Interest Declaration and Detection System in Heterogeneous Networks
Siyuan Wu, Leong Hou U, Sourav S. Bhowmick, Wolfgang Gatterbauer
Pages: 2383-2386
Full text: PDFPDF

Peer review is the most critical process in evaluating an article to be accepted for publication in an academic venue. When assigning a reviewer to evaluate an article, the assignment should be aware of conflicts of interest (COIs) such that the reviews ...
Common-Specific Multimodal Learning for Deep Belief Network
Changsheng Xiang, Xiaoming Jin
Pages: 2387-2390
Full text: PDFPDF

Multimodal Deep Belief Network has been widely used to extract representations for multimodal data by fusing the high-level features of each data modality into common representations. Such straightforward fusion strategy can benefit the classification ...
JointSem: Combining Query Entity Linking and Entity based Document Ranking
Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Eduard Hovy
Pages: 2391-2394
Full text: PDFPDF

Entity-based ranking systems often employ entity linking systems to align entities to query and documents. Previously, entity linking systems were not designed specifically for search engines and were mostly used as a preprocessing step. This work presents ...
Learning to Rank with Query-level Semi-supervised Autoencoders
Bo Xu, Hongfei Lin, Yuan Lin, Kan Xu
Pages: 2395-2398
Full text: PDFPDF

Learning to rank utilizes machine learning methods to solve ranking problems by constructing ranking models in a supervised way, which needs fixed-length feature vectors of documents as inputs, and outputs the ranking models learned by iteratively reducing ...
MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis
Nan Xu, Wenji Mao
Pages: 2399-2402
Full text: PDFPDF

With the prevalence of more diverse and multiform user-generated content in social networking sites, multimodal sentiment analysis has become an increasingly important research topic in recent years. Previous work on multimodal sentiment analysis directly ...
Attentive Graph-based Recursive Neural Network for Collective Vertex Classification
Qiongkai Xu, Qing Wang, Chenchen Xu, Lizhen Qu
Pages: 2403-2406
Full text: PDFPDF

Vertex classification is a critical task in graph analysis, where both contents and linkage of vertices are incorporated during classification. Recently, researchers proposed using deep neural network to build an end-to-end framework, which can capture ...
Bayesian Heteroscedastic Matrix Factorization for Conversion Rate Prediction
Hongxia Yang
Pages: 2407-2410
Full text: PDFPDF

Display Advertising has generated billions of revenue and originated hundreds of scientific papers and patents, yet the accuracy of prediction technologies leaves much to be desired. Conversion rates (CVR) predictions can often be formulated as a matrix ...
SERM: A Recurrent Model for Next Location Prediction in Semantic Trajectories
Di Yao, Chao Zhang, Jianhui Huang, Jingping Bi
Pages: 2411-2414
Full text: PDFPDF

Predicting the next location a user tends to visit is an important task for applications like location-based advertising, traffic planning, and tour recommendation. We consider the next location prediction problem for semantic trajectory data, wherein ...
Low-Rank Matrix Completion over Finite Abelian Group Algebras for Context-Aware Recommendation
Chia-An Yu, Tak-Shing Chan, Yi-Hsuan Yang
Pages: 2415-2418
Full text: PDFPDF

The incorporation of contextual information is an important part of context-aware recommendation. Many context-aware recommendation systems adopt tensor completion to include contextual information. However, the symmetries between dimensions of a tensor ...
Spectrum-based Deep Neural Networks for Fraud Detection
Shuhan Yuan, Xintao Wu, Jun Li, Aidong Lu
Pages: 2419-2422
Full text: PDFPDF

In this paper, we focus on fraud detection on a signed graph with only a small set of labeled training data. We propose a novel framework that combines deep neural networks and spectral graph analysis. In particular, we use the node projection (called ...
RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation
Yu Zhang, Wei Wei, Binxuan Huang, Kathleen M. Carley, Yan Zhang
Pages: 2423-2426
Full text: PDFPDF

Real-time location inference of social media users is the fundamental of some spatial applications such as localized search and event detection. While tweet text is the most commonly used feature in location estimation, most of the prior works suffer ...
Missing Value Learning
Zhi-Lin Zhao, Chang-Dong Wang, Kun-Yu Lin, Jian-Huang Lai
Pages: 2427-2430
Full text: PDFPDF

Missing value is common in many machine learning problems and much effort has been made to handle missing data to improve the performance of the learned model. Sometimes, our task is not to train a model using those unlabeled/labeled data with missing ...
Local Ensemble across Multiple Sources for Collaborative Filtering
Jing Zheng, Fuzhen Zhuang, Chuan Shi
Pages: 2431-2434
Full text: PDFPDF

Recently, Transfer Collaborative Filtering (TCF) methods across multiple source domains, which employ knowledge from different source domains to improve the recommendation performance in the target domain, have been applied in recommender systems. The ...
Cluster-level Emotion Pattern Matching for Cross-Domain Social Emotion Classification
Endong Zhu, Yanghui Rao, Haoran Xie, Yuwei Liu, Jian Yin, Fu Lee Wang
Pages: 2435-2438
Full text: PDFPDF

This paper addresses the task of cross-domain social emotion classification of online documents. The cross-domain task is formulated as using abundant labeled documents from a source domain and a small amount of labeled documents from a target domain, ...
Knowledge-based Question Answering by Jointly Generating, Copying and Paraphrasing
Shuguang Zhu, Xiang Cheng, Sen Su, Shuang Lang
Pages: 2439-2442
Full text: PDFPDF

With the development of large-scale knowledge bases, people are building systems which give simple answers to questions based on consolidate facts. In this paper, we focus on simple questions, which ask about only a subject and relation in the knowledge ...
SESSION: Demonstrations (alphabetical by lead authors' last names)
PODIUM: Procuring Opinions from Diverse Users in a Multi-Dimensional World
Yael Amsterdamer, Oded Goldreich
Pages: 2443-2446
Full text: PDFPDF

The procurement of opinions is an important task in many contexts. When selecting members of a certain population to ask for their opinions, diversity inside the selected subset is a central consideration. People with diverse profiles are assumed to ...
VizQ: A System for Scalable Processing of Visibility Queries in 3D Spatial Databases
Arif Arman, Mohammed Eunus Ali, Farhana Murtaza Choudhury, Kaysar Abdullah
Pages: 2447-2450
Full text: PDFPDF

In this demonstration, we present VizQ, an efficient, scalable, and interactive system to process and visualize a comprehensive collection of novel visibility queries in the presence of obstacles in 3D space. Specifically, we demonstrate four types of ...
CoreDB: a Data Lake Service
Amin Beheshti, Boualem Benatallah, Reza Nouri, Van Munin Chhieng, HuangTao Xiong, Xu Zhao
Pages: 2451-2454
Full text: PDFPDF

The continuous improvement in connectivity, storage and data processing capabilities allow access to a data deluge from sensors, social-media, news, user-generated, government and private data sources. Accordingly, in a modern data-oriented landscape, ...
SimMeme: Semantic-Based Meme Search
Maya Ekron, Tova Milo, Brit Youngmann
Pages: 2455-2458
Full text: PDFPDF

With the proliferation of social image-sharing applications, image search becomes an increasingly common activity. In this work, we focus on a particular class of images that convey semantic meaning beyond the visual appearance, and whose search presents ...
SummIt: A Tool for Extractive Summarization, Discovery and Analysis
Guy Feigenblat, Odellia Boni, Haggai Roitman, David Konopnicki
Pages: 2459-2462
Full text: PDFPDF

We propose to demonstrate SummIt -- a tool for extractive summarization, discovery and analysis. The main goal of SummIt is to provide consumable summaries that are driven by users' information intents. To this end, SummIt discovers and analyzes potential ...
Rapid Analysis of Network Connectivity
Scott Freitas, Hanghang Tong, Nan Cao, Yinglong Xia
Pages: 2463-2466
Full text: PDFPDF

This research focuses on accelerating the computational time of two base network algorithms (k-simple shortest paths and minimum spanning tree for a subset of nodes)---cornerstones behind a variety of network connectivity mining tasks---with the goal ...
HyPerInsight: Data Exploration Deep Inside HyPer
Nina Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, Alfons Kemper, Thomas Neumann
Pages: 2467-2470
Full text: PDFPDF

Nowadays we are drowning in data of various varieties. For all these mixed types and categories of data there exist even more different analysis approaches, often done in single hand-written solutions. We propose to extend HyPer, a main memory database ...
Interactive System for Reasoning about Document Age
Adam Jatowt, Ricardo Campos
Pages: 2471-2474
Full text: PDFPDF

Recently, many historical texts have become digitized and made accessible for search and browsing. Professionals who work with collections of such texts often need to verify the correctness of documents' key metadata - their creation dates. In this paper, ...
SemFacet: Making Hard Faceted Search Easier
Evgeny Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, Bernardo Cuenca Grau, Egor V. Kostylev, Ian Horrocks
Pages: 2475-2478
Full text: PDFPDF

Faceted search is a prominent search paradigm that became the standard in many Web applications and has also been recently proposed as a suitable paradigm for exploring and querying RDF graphs. One of the main challenges that hampers usability of faceted ...
Metacrate: Organize and Analyze Millions of Data Profiles
Sebastian Kruse, David Hahn, Marius Walter, Felix Naumann
Pages: 2483-2486
Full text: PDFPDF

Databases are one of the great success stories in IT. However, they have been continuously increasing in complexity, hampering operation, maintenance, and upgrades. To face this complexity, sophisticated methods for schema summarization, data cleaning, ...
SemVis: Semantic Visualization for Interactive Topical Analysis
Tuan M. V. Le, Hady W. Lauw
Pages: 2487-2490
Full text: PDFPDF

Exploratory analysis of a text corpus is an important task that can be aided by informative visualization. One spatially-oriented form of document visualization is a scatterplot, whereby every document is associated with a coordinate, and relationships ...
Exploring the Veracity of Online Claims with BackDrop
Julien Leblay, Weiling Chen, Steven Lynden
Pages: 2491-2494
Full text: PDFPDF

Using the Web to assess the validity of claims presents many challenges. Whether the data comes from social networks or established media outlets, individual or institutional data publishers, one has to deal with scale and heterogeneity, as well as with ...
AliMe Assist : An Intelligent Assistant for Creating an Innovative E-commerce Experience
Feng-Lin Li, Minghui Qiu, Haiqing Chen, Xiongwei Wang, Xing Gao, Jun Huang, Juwei Ren, Zhongzhou Zhao, Weipeng Zhao, Lei Wang,