Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

A similarity join aims to find all similar pairs between two collections of records. Established approaches usually deal with synthetic differences like typos and abbreviations, but neglect the semantic relations between words. Such relations, however, are helpful for obtaining high-quality joining results. In this paper, we leverage the taxonomy knowledge (i.e., a set of IS-A hierarchical relations) to define a similarity measure which finds semantic-similar records from two datasets. Based on this measure, we develop a similarity join algorithm with prefix filtering framework to prune away irrelevant pairs effectively. Our technical contribution here is an algorithm that judiciously selects critical parameters in a prefix filter to maximise its filtering power, supported by an estimation technique and Monte Carlo simulation process. Empirical experiments show that our proposed methods exhibit high efficiency and scalability, outperforming the state-of-art by a large margin.

top of pageAUTHORS

Author image not provided  Pengfei Xu

No contact information provided yet.

Bibliometrics: publication history
Publication years2018-2018
Publication count1
Citation Count0
Available for download1
Downloads (6 Weeks)11
Downloads (12 Months)63
Downloads (cumulative)63
Average downloads per article63.00
Average citations per article0.00
View colleagues of Pengfei Xu

Jiaheng Lu Jiaheng Lu

Bibliometrics: publication history
Publication years2004-2018
Publication count55
Citation Count668
Available for download29
Downloads (6 Weeks)111
Downloads (12 Months)1,040
Downloads (cumulative)10,800
Average downloads per article372.41
Average citations per article12.15
View colleagues of Jiaheng Lu

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik. 2008. Transformation-based Framework for Record Matching. In ICDE. 40--49.
Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. 2007. Scaling up all pairs similarity search. In WWW. ACM, 131--140.
Tony Finch. 2009. Incremental calculation of weighted mean and variance. University of Cambridge, Vol. 4 (2009), 11--15.
Gavin Giovannoni. 2017. Personalized medicine in multiple sclerosis. NDM, Vol. 7, 6s (2017), 13--17.
Chen Li, Jiaheng Lu, and Yiming Lu. 2008. Efficient Merging and Filtering Algorithms for Approximate String Searches. In ICDE. 257--266.
Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, and Haiyong Wang. 2013. String similarity measures and joins with synonyms. In SIGMOD. ACM, 373--384.
Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, and Xiaokui Xiao. 2015. Boosting the Quality of Approximate String Matching by Synonyms. ACM Trans. Database Syst., Vol. 40, 3 (2015), 15:1--15:42.
J. Munkres. 1957. Algorithms for the Assignment and Transportation Problems. JSIAM, Vol. 5, 1 (1957), 32--38.
Sunita Sarawagi and Alok Kirpal. 2004. Efficient set joins on similarity predicates. In SIGMOD. 743--754.
Zeyuan Shang, Yaxiao Liu, Guoliang Li, and Jianhua Feng. 2016. K-Join: Knowledge-Aware Similarity Join. TKDE, Vol. 28, 12 (2016), 3293--3308.
Jiannan Wang, Guoliang Li, and Jianhua Feng. 2012. Can we beat the prefix filtering? an adaptive framework for similarity join and search. In SIGMOD. 85--96.
Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Efficient similarity joins for near duplicate detection. In WWW. ACM, 131--140.
Pengfei Xu and Jiaheng Lu. 2017. Top-k String Auto-Completion with Synonyms. In DASFAA. 202--218.
Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. 2014. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In SIGIR. ACM, 83--92.

top of pageCITED BY

Citings are not available

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title CIKM '18 Proceedings of the 27th ACM International Conference on Information and Knowledge Management table of contents
General Chairs Alfredo Cuzzocrea University of Trieste, Italy
Program Chairs James Allan University of Massachusetts, USA
Norman Paton University of Manchester, United Kingdom
Divesh Srivastava AT&T Labs Research, USA
Rakesh Agrawal Data Insights Lab, USA
Andrei Broder Google Research, USA
Mohammed Zaki Rensselaer Polytechnic Institute, USA
Selcuk Candan Arizona State University, USA
Alexandros Labrinidis University of Pittsburgh, USA
Assaf Schuster Technion, Israel
Haixun Wang Google Research, USA
Pages 1563-1566
Publication Date2018-10-17 (yyyy-mm-dd)
Funding Source Academy of Finland
Sponsors SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR ACM Special Interest Group on Information Retrieval
PublisherACM New York, NY, USA ©2018
ISBN: 978-1-4503-6014-2 doi>10.1145/3269206.3269236
Conference CIKMConference on Information and Knowledge Management CIKM logo
Paper Acceptance Rate 147 of 826 submissions, 18%
Overall Acceptance Rate 1,960 of 10,758 submissions, 18%
Year Submitted Accepted Rate
CIKM '05 425 77 18%
CIKM '06 537 81 15%
CIKM '07 512 86 17%
CIKM '08 772 132 17%
CIKM '09 847 123 15%
CIKM '10 945 126 13%
CIKM '11 918 228 25%
CIKM '12 1088 146 13%
CIKM '13 848 143 17%
CIKM '14 838 175 21%
CIKM '15 646 165 26%
CIKM '16 701 160 23%
CIKM '17 855 171 20%
CIKM '18 826 147 18%
Overall 10,758 1,960 18%

Artificial Intelligence
Digital Content

top of pageREVIEWS

Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 27th ACM International Conference on Information and Knowledge Management
Table of Contents
previousprevious proceeding |no next proceeding
SESSION: Keynotes
Alexa and Her Shopping Journey
Yoelle Maarek
Pages: 1-1
Full text: PDFPDF

Voice-enabled intelligent assistants, such as Amazon Alexa, Google Assistant, Microsoft Cortana or Apple Siri, are on their way to revolutionize the way humans interact with machines. Their ubiquitous presence in our homes, offices, cars, etc. and their ...
Shifting Information Interactions
Maarten de Rijke
Pages: 3-3
Full text: PDFPDF

Modern information retrieval systems, such as search engines, recommender systems, and conversational agents, are best thought of as interactive systems, that is, systems that interact with and learn from user behavior. The ways in which people interact ...
Teaching Artificial Agents to Understand Language by Modelling Reward
Edward Grefenstette
Pages: 5-6
Full text: PDFPDF

Recent progress in Deep Reinforcement Learning has shown that agents can be taught complex behaviour and solve difficult tasks, such as playing video games from pixel observations, or mastering the game of Go without observing human games, with relatively ...
SESSION: Session 1A: Content Understanding
Multi-Source Pointer Network for Product Title Summarization
Fei Sun, Peng Jiang, Hanxiao Sun, Changhua Pei, Wenwu Ou, Xiaobo Wang
Pages: 7-16
Full text: PDFPDF

In this paper, we study the product title summarization problem in E-commerce applications for display on mobile devices. Comparing with conventional sentence summarization, product title summarization has some extra and essential constraints. For example, ...
Exploring a High-quality Outlying Feature Value Set for Noise-Resilient Outlier Detection in Categorical Data
Hongzuo Xu, Yongjun Wang, Li Cheng, Yijie Wang, Xingkong Ma
Pages: 17-26
Full text: PDFPDF

Unavoidable noise in real-world categorical data presents significant challenges to existing outlier detection methods because they normally fail to separate noisy values from outlying values. Feature subspace-based methods inevitably mix noisy values ...
Neural Relational Topic Models for Scientific Article Analysis
Haoli Bai, Zhuangbin Chen, Michael R. Lyu, Irwin King, Zenglin Xu
Pages: 27-36
Full text: PDFPDF

Topic modelling and citation recommendation of scientific articles are important yet challenging research problems in scientific article analysis. In particular, the inference on coherent topics can be easily affected by irrelevant contents in articles. ...
Mathematics Content Understanding for Cyberlearning via Formula Evolution Map
Zhuoren Jiang, Liangcai Gao, Ke Yuan, Zheng Gao, Zhi Tang, Xiaozhong Liu
Pages: 37-46
Full text: PDFPDF

Although the scientific digital library is growing at a rapid pace, scholars/students often find reading Science, Technology, Engineering, and Mathematics (STEM) literature daunting, especially for the math-content/formula. In this paper, we propose ...
SESSION: Session 1B: Top-K
The Range Skyline Query
Theodoros Tzouramanis, Eleftherios Tiakas, Apostolos N. Papadopoulos, Yannis Manolopoulos
Pages: 47-56
Full text: PDFPDF

The range skyline query retrieves the dynamic skyline for every individual query point in a range by generalizing the point-based dynamic skyline query. Its wide-ranging applications enable users to submit their preferences within an interval of 'ideally ...
FA + TA <FSA: Flexible Score Aggregation
Paolo Ciaccia, Davide Martinenghi
Pages: 57-66
Full text: PDFPDF

The problem of aggregating scores, so as to provide a ranking of objects in a dataset according to different evaluation criteria, is central to many modern data-intensive applications. Although efficient (instance optimal) algorithms exist to this purpose ...
FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning
Shuai Yang, Xipeng Shen
Pages: 67-76
Full text: PDFPDF

Citation KNN is an important but compute-intensive algorithm for multiple instance learning (MIL). This paper presents FALCON, a fast replacement of Citation KNN. FALCON accelerates Citation KNN by removing unnecessary distance calculations through two ...
Secure Top-k Inner Product Retrieval
Zhilin Zhang, Ke Wang, Chen Lin, Weipeng Lin
Pages: 77-86
Full text: PDFPDF

Secure top-k inner product retrieval allows the users to outsource encrypted data vectors to a cloud server and at some later time find the k vectors producing largest inner products giving an encrypted query vector. Existing solutions suffer poor performance ...
SESSION: Session 1C: Graph Learning 1
A Quest for Structure: Jointly Learning the Graph Structure and Semi-Supervised Classification
Xuan Wu, Lingxiao Zhao, Leman Akoglu
Pages: 87-96
Full text: PDFPDF

Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely ...
TGNet: Learning to Rank Nodes in Temporal Graphs
Qi Song, Bo Zong, Yinghui Wu, Lu-An Tang, Hui Zhang, Guofei Jiang, Haifeng Chen
Pages: 97-106
Full text: PDFPDF

Node ranking in temporal networks are often impacted by heterogeneous context from node content, temporal, and structural dimensions. This paper introduces TGNet , a deep learning framework for node ranking in heterogeneous temporal graphs. TGNet utilizes ...
Mining (maximal) Span-cores from Temporal Networks
Edoardo Galimberti, Alain Barrat, Francesco Bonchi, Ciro Cattuto, Francesco Gullo
Pages: 107-116
Full text: PDFPDF

When analyzing temporal networks, a fundamental task is the identification of dense structures (i.e., groups of vertices that exhibit a large number of links), together with their temporal span (i.e., the period of time for which the high density holds). ...
REGAL: Representation Learning-based Graph Alignment
Mark Heimann, Haoming Shen, Tara Safavi, Danai Koutra
Pages: 117-126
Full text: PDFPDF

Problems involving multiple networks are prevalent in many scientific and other domains. In particular, network alignment, or the task of identifying corresponding nodes in different networks, has applications across the social and natural sciences. ...
SESSION: Session 1D: Neural Recommendation
Attention-based Adaptive Model to Unify Warm and Cold Starts Recommendation
Shaoyun Shi, Min Zhang, Yiqun Liu, Shaoping Ma
Pages: 127-136
Full text: PDFPDF

Nowadays, recommender systems provide essential web services on the Internet. There are mainly two categories of traditional recommendation algorithms: Content-Based (CB) and Collaborative Filtering (CF). CF methods make recommendations mainly according ...
CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks
Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, Jung-Tae Lee
Pages: 137-146
Full text: PDFPDF

Generative Adversarial Networks (GAN) have achieved big success in various domains such as image generation, music generation, and natural language generation. In this paper, we propose a novel GAN-based collaborative filtering (CF) framework to provide ...
ANR: Aspect-based Neural Recommender
Jin Yao Chin, Kaiqi Zhao, Shafiq Joty, Gao Cong
Pages: 147-156
Full text: PDFPDF

Textual reviews, which are readily available on many e-commerce and review websites such as Amazon and Yelp, serve as an invaluable source of information for recommender systems. However, not all parts of the reviews are equally important, and the same ...
An Attentive Interaction Network for Context-aware Recommendations
Lei Mei, Pengjie Ren, Zhumin Chen, Liqiang Nie, Jun Ma, Jian-Yun Nie
Pages: 157-166
Full text: PDFPDF

Context-aware Recommendations (CARS) have attracted a lot of attention recently because of the impact of contextual information on user behaviors. Recent state-of-the-art methods represent the relations between users/items and contexts as a tensor, with ...
SESSION: Session 1E: Interactive IR 1
Contrasting Search as a Learning Activity with Instructor-designed Learning
Felipe Moraes, Sindunuraga Rikarno Putra, Claudia Hauff
Pages: 167-176
Full text: PDFPDF

The field of Search as Learning addresses questions surrounding human learning during the search process. Existing research has largely focused on observing how users with learning-oriented information needs behave and interact with search engines. What ...
Towards Conversational Search and Recommendation: System Ask, User Respond
Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, W. Bruce Croft
Pages: 177-186
Full text: PDFPDF

Conversational search and recommendation based on user-system dialogs exhibit major differences from conventional search and recommendation tasks in that 1) the user and system can interact for multiple semantically coherent rounds on a task through ...
Effective User Interaction for High-Recall Retrieval: Less is More
Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark D. Smucker, Gordon V. Cormack, Maura R. Grossman
Pages: 187-196
Full text: PDFPDF

High-recall retrieval --- finding all or nearly all relevant documents --- is critical to applications such as electronic discovery, systematic review, and the construction of test collections for information retrieval tasks. The effectiveness of current ...
RIN: Reformulation Inference Network for Context-Aware Query Suggestion
Jyun-Yu Jiang, Wei Wang
Pages: 197-206
Full text: PDFPDF

Search engine users always endeavor to reformulate queries during search sessions for articulating their information needs because it is not always easy to articulate the search intents. To further ameliorate the reformulation process, search engines ...
SESSION: Session 2A: Data Integration
Improving the Efficiency of Inclusion Dependency Detection
Nuhad Shaabani, Christoph Meinel
Pages: 207-216
Full text: PDFPDF

The detection of all inclusion dependencies (INDs) in an unknown dataset is at the core of any data profiling effort. Apart from the discovery of foreign key relationships, INDs can help perform data integration, integrity checking, schema (re-)design, ...
Web Table Understanding by Collective Inference
San Kim, Guoliang Li, Jianhua Feng, Kaiyu Li
Pages: 217-226
Full text: PDFPDF

Web tables have become very popular and important in many real applications, such as search engines and knowledge base enrichment. Due to its benefit, it is very urgent to understand web tables. An important task in web table understanding is the column-type ...
A Content-Based Approach for Modeling Analytics Operators
Ioannis Giannakopoulos, Dimitrios Tsoumakos, Nectarios Koziris
Pages: 227-236
Full text: PDFPDF

The plethora of publicly available data sources has given birth to a wealth of new needs and opportunities. The ever increasing amount of data has shifted the analysts' attention from optimizing the operators for specific business cases, to focusing ...
SESSION: Session 2B: Knowledge Graph Learning
Coarse-to-Fine Annotation Enrichment for Semantic Segmentation Learning
Yadan Luo, Ziwei Wang, Zi Huang, Yang Yang, Cong Zhao
Pages: 237-246
Full text: PDFPDF

Rich high-quality annotated data is critical for semantic segmentation learning, yet acquiring dense and pixel-wise ground-truth is both labor- and time-consuming. Coarse annotations (e.g., scribbles, coarse polygons) offer an economical alternative, ...
Shared Embedding Based Neural Networks for Knowledge Graph Completion
Saiping Guan, Xiaolong Jin, Yuanzhuo Wang, Xueqi Cheng
Pages: 247-256
Full text: PDFPDF

Knowledge Graphs (KGs) have facilitated many real-world applications (e.g., vertical search and intelligent question answering). However, they are usually incomplete, which affects the performance of such KG based applications. To alleviate this problem, ...
Knowledge Graph Completion by Context-Aware Convolutional Learning with Multi-Hop Neighborhoods
Byungkook Oh, Seungmin Seo, Kyong-Ho Lee
Pages: 257-266
Full text: PDFPDF

The main focus of relational learning for knowledge graph completion (KGC) lies in exploiting rich contextual information for facts. Many state-of-the-art models incorporate fact sequences, entity types, and even textual information. Unfortunately, most ...
SESSION: Session 2C: Data Quality
Smooth q-Gram, and Its Applications to Detection of Overlaps among Long, Error-Prone Sequencing Reads
Haoyu Zhang, Qin Zhang, Haixu Tang
Pages: 267-276
Full text: PDFPDF

We propose smooth q-gram, the first variant of q-gram that captures q-gram pair within a small edit distance. We apply smooth q-gram to the problem of detecting overlapping pairs of error-prone reads produced by single molecule real time sequencing (SMRT), ...
Multi-View Group Anomaly Detection
Hongtao Wang, Pan Su, Miao Zhao, Hongmei Wang, Gang Li
Pages: 277-286
Full text: PDFPDF

Multi-view anomaly detection is a challenging issue due to diverse data generation mechanisms and inconsistent cluster structures of different views. Existing methods of point anomaly detection are ineffective for scenarios where individual instances ...
Detecting Outliers in Data with Correlated Measures
Yu-Hsuan Kuo, Zhenhui Li, Daniel Kifer
Pages: 287-296
Full text: PDFPDF

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize ...
SESSION: Session 2D: Online Learning
Insights from the Long-Tail: Learning Latent Representations of Online User Behavior in the Presence of Skew and Sparsity
Adit Krishnan, Ashish Sharma, Hari Sundaram
Pages: 297-306
Full text: PDFPDF

This paper proposes an approach to learn robust behavior representations in online platforms by addressing the challenges of user behavior skew and sparse participation. Latent behavior models are important in a wide variety of applications: recommender ...
Inductive Framework for Multi-Aspect Streaming Tensor Completion with Side Information
Madhav Nimishakavi, Bamdev Mishra, Manish Gupta, Partha Talukdar
Pages: 307-316
Full text: PDFPDF

Low rank tensor completion is a well studied problem and has applications in various fields. However, in many real world applications the data is dynamic, i.e., new data arrives at different time intervals. As a result, the tensors used to represent ...
Online Learning for Non-Stationary A/B Tests
Andrés Muñoz Medina, Sergei Vassilvitskii, Dong Yin
Pages: 317-326
Full text: PDFPDF

The rollout of new versions of a feature in modern applications is a manual multi-stage process, as the feature is released to ever larger groups of users, while its performance is carefully monitored. This kind of A/B testing is ubiquitous, but suboptimal, ...
SESSION: Session 2E: Personalization
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks
Jing Zhang, Bo Chen, Xianming Wang, Hong Chen, Cuiping Li, Fengmei Jin, Guojie Song, Yutao Zhang
Pages: 327-336
Full text: PDFPDF

Aligning users across multiple heterogeneous social networks is a fundamental issue in many data mining applications. Methods that incorporate user attributes and network structure have received much attention. However, most of them suffer from error ...
Learning User Preferences and Understanding Calendar Contexts for Event Scheduling
Donghyeon Kim, Jinhyuk Lee, Donghee Choi, Jaehoon Choi, Jaewoo Kang
Pages: 337-346
Full text: PDFPDF

With online calendar services gaining popularity worldwide, calendar data has become one of the richest context sources for understanding human behavior. However, event scheduling is still time-consuming even with the development of online calendars. ...
Personalizing Search Results Using Hierarchical RNN with Query-aware Attention
Songwei Ge, Zhicheng Dou, Zhengbao Jiang, Jian-Yun Nie, Ji-Rong Wen
Pages: 347-356
Full text: PDFPDF

Search results personalization has become an effective way to improve the quality of search engines. Previous studies extracted information such as past clicks, user topical interests, query click entropy and so on to tailor the original ranking. However, ...
SESSION: Session 3A: Social Data Analytics 1
Adaptive Implicit Friends Identification over Heterogeneous Network for Social Recommendation
Junliang Yu, Min Gao, Jundong Li, Hongzhi Yin, Huan Liu
Pages: 357-366
Full text: PDFPDF

The explicitly observed social relations from online social platforms have been widely incorporated into recommender systems to mitigate the data sparsity issue. However, the direct usage of explicit social relations may lead to an inferior performance ...
Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum
Adam Tsakalidis, Nikolaos Aletras, Alexandra I. Cristea, Maria Liakata
Pages: 367-376
Full text: PDFPDF

Modelling user voting intention in social media is an important research area, with applications in analysing electorate behaviour, online political campaigning and advertising. Previous approaches mainly focus on predicting national general elections, ...
Inferring Probabilistic Contagion Models Over Networks Using Active Queries
Abhijin Adiga, Vanessa Cedeno-Mieles, Chris J. Kuhlman, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns
Pages: 377-386
Full text: PDFPDF

The problem of inferring unknown parameters of a networked social system is of considerable practical importance. We consider this problem for the independent cascade model using an active query framework. More specifically, given a network whose edge ...
SESSION: Session 3B: Evaluation in IR 1
Trustworthy Experimentation Under Telemetry Loss
Jayant Gupchup, Yasaman Hosseinkashi, Pavel Dmitriev, Daniel Schneider, Ross Cutler, Andrei Jefremov, Martin Ellis
Pages: 387-396
Full text: PDFPDF

Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. ...
When Rank Order Isn't Enough: New Statistical-Significance-Aware Correlation Measures
Mucahid Kutlu, Tamer Elsayed, Maram Hasanain, Matthew Lease
Pages: 397-406
Full text: PDFPDF

Because it is expensive to construct test collections for Cranfield-based evaluation of information retrieval systems, a variety of lower-cost methods have been proposed. The reliability of these methods is often validated by measuring rank correlation ...
On Building Fair and Reusable Test Collections using Bandit Techniques
Ellen M. Voorhees
Pages: 407-416
Full text: PDFPDF

While test collections are a vital piece of the research infrastructure for information retrieval, constructing fair, reusable test collections for large data sets is challenging because of the number of human relevance assessments required. Various ...
SESSION: Session 3C: Graphs
RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems
Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
Pages: 417-426
Full text: PDFPDF

To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as ...
Exploiting Structural and Temporal Evolution in Dynamic Link Prediction
Huiyuan Chen, Jing Li
Pages: 427-436
Full text: PDFPDF

Link prediction in dynamic networks is an important task with many real-life applications in different domains, such as social networks, cyber-physical systems, and bioinformatics. There are two key processes in dynamic networks: network structural evolution ...
Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings
Rana Hussein, Dingqi Yang, Philippe Cudré-Mauroux
Pages: 437-446
Full text: PDFPDF

The graph embedding paradigm projects nodes of a graph into a vector space, which can facilitate various downstream graph analysis tasks such as node classification and clustering. To efficiently learn node embeddings from a graph, graph embedding techniques ...
SESSION: Session 3D: Facets and Entities
Distribution Distance Minimization for Unsupervised User Identity Linkage
Chaozhuo Li, Senzhang Wang, Philip S. Yu, Lei Zheng, Xiaoming Zhang, Zhoujun Li, Yanbo Liang
Pages: 447-456
Full text: PDFPDF

Nowadays, it is common for one natural person to join multiple social networks to enjoy different services. Linking identical users across different social networks, also known as the User Identity Linkage (UIL), is an important problem of great research ...
Short Text Entity Linking with Fine-grained Topics
Lihan Chen, Jiaqing Liang, Chenhao Xie, Yanghua Xiao
Pages: 457-466
Full text: PDFPDF

A wide range of web corpora are in the form of short text, such as QA queries, search queries and news titles. Entity linking for these short texts is quite important. Most of supervised approaches are not effective for short text entity linking. The ...
StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction
Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt
Pages: 467-476
Full text: PDFPDF

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even ...
SESSION: Session 3E: Indexing
PSLSH: An Index Structure for Efficient Execution of Set Queries in High-Dimensional Spaces
Parth Nagarkar, K. Selçuk Candan
Pages: 477-486
Full text: PDFPDF

Efficient implementations of range and nearest neighbor queries are critical in many large multimedia applications. Locality Sensitive Hashing (LSH) is a popular technique for performing approximate searches in high-dimensional multimedia, such as image ...
GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks
Dhruv Gupta, Klaus Berberich
Pages: 487-496
Full text: PDFPDF

In this work, we describe GYANI (gyan stands for knowledge in Hindi), an indexing infrastructure for search and analysis of large semantically annotated document collections. To facilitate the search for sentences or text regions for many knowledge-centric ...
From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, Jaap Kamps
Pages: 497-506
Full text: PDFPDF

The availability of massive data and computing power allowing for effective data driven neural approaches is having a major impact on machine learning and information retrieval research, but these models have a basic problem with efficiency. Current ...
SESSION: Session 4A: Stream Analytics 1
ChangeDAR: Online Localized Change Detection for Sensor Data on a Graph
Bryan Hooi, Leman Akoglu, Dhivya Eswaran, Amritanshu Pandey, Marko Jereminov, Larry Pileggi, Christos Faloutsos
Pages: 507-516
Full text: PDFPDF

Given electrical sensors placed on the power grid, how can we automatically determine when electrical components (e.g. power lines) fail? Or, given traffic sensors which measure the speed of vehicles passing over them, how can we determine when traffic ...
Adversarial Training Model Unifying Feature Driven and Point Process Perspectives for Event Popularity Prediction
Qitian Wu, Chaoqi Yang, Hengrui Zhang, Xiaofeng Gao, Paul Weng, Guihai Chen
Pages: 517-526
Full text: PDFPDF

This paper targets a general popularity prediction problem for event sequence, which has recently gained great attention due to its extensive applications in various domains. Feature driven method and point process method are two basic thinking paradigms ...
Learning under Feature Drifts in Textual Streams
Damianos P. Melidis, Myra Spiliopoulou, Eirini Ntoutsi
Pages: 527-536
Full text: PDFPDF

Huge amounts of textual streams are generated nowadays, especially in social networks like Twitter and Facebook. As the discussion topics and user opinions on those topics change drastically with time, those streams undergo changes in data distribution, ...
CRPP: Competing Recurrent Point Process for Modeling Visibility Dynamics in Information Diffusion
Avirup Saha, Bidisha Samanta, Niloy Ganguly, Abir De
Pages: 537-546
Full text: PDFPDF

Accurate modeling of how the visibility of a piece of information varies across time has a wide variety of applications. For example, in an e-commerce site like Amazon, it can help to identify which product is preferred over others; in Twitter, it can ...
SESSION: Session 4B: Network Models
Finding a Dense Subgraph with Sparse Cut
Atsushi Miyauchi, Naonori Kakimura
Pages: 547-556
Full text: PDFPDF

Community detection is one of the fundamental tasks in graph mining, which has many real-world applications in diverse domains. In this study, we propose an optimization model for finding a community that is densely connected internally but sparsely ...
Signed Network Modeling Based on Structural Balance Theory
Tyler Derr, Charu Aggarwal, Jiliang Tang
Pages: 557-566
Full text: PDFPDF

The modeling of networks, specifically generative models, has been shown to provide a plethora of information about the underlying network structures, as well as many other benefits behind their construction. There has been a considerable increase in ...
On Rich Clubs of Path-Based Centralities in Networks
Soumya Sarkar, Sanjukta Bhowmick, Animesh Mukherjee
Pages: 567-576
Full text: PDFPDF

Many scale-free networks exhibit a "rich club" structure, where high degree vertices form tightly interconnected subgraphs. In this paper, we explore the emergence of "rich clubs" in the context of shortest path based centrality metrics. We term these ...
VTeller: Telling the Values Somewhere, Sometime in a Dynamic Network of Urban Systems
Yan Li, Tingjian Ge, Cindy Chen
Pages: 577-586
Full text: PDFPDF

Dynamic networks are very common in urban systems today. As data are acquired, unfortunately, they are rarely complete observations of the whole system. It is important to reliably infer the unobserved attribute values anywhere in the graphs, at certain ...
SESSION: Session 4C: News
Open-Schema Event Profiling for Massive News Corpora
Quan Yuan, Xiang Ren, Wenqi He, Chao Zhang, Xinhe Geng, Lifu Huang, Heng Ji, Chin-Yew Lin, Jiawei Han
Pages: 587-596
Full text: PDFPDF

With the rapid growth of online information services, a sheer volume of news data becomes available. To help people quickly digest the explosive information, we define a new problem - schema-based news event profiling - profiling events reported in open-domain ...
Newsfeed Filtering and Dissemination for Behavioral Therapy on Social Network Addictions
Hong-Han Shuai, Yen-Chieh Lien, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S. Yu
Pages: 597-606
Full text: PDFPDF

While the popularity of online social network (OSN) apps continues to grow, little attention has been drawn to the increasing cases of Social Network Addictions (SNAs). In this paper, we argue that by mining OSN data in support of online intervention ...
Hierarchical Modeling and Shrinkage for User Session LengthPrediction in Media Streaming
Antoine Dedieu, Rahul Mazumder, Zhen Zhu, Hossein Vahabi
Pages: 607-616
Full text: PDFPDF

An important metric of users' satisfaction and engagement within on-line streaming services is the user session length, i.e. the amount of time they spend on a service continuously without interruption. Being able to predict this value directly benefits ...
Question Headline Generation for News Articles
Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Huanhuan Cao, Xueqi Cheng
Pages: 617-626
Full text: PDFPDF

In this paper, we introduce and tackle the Question Headline Generation (QHG) task. The motivation comes from the investigation of a real-world news portal where we find that news articles with question headlines often receive much higher click-through ...
SESSION: Session 4D: Joint Models for IR
Relevance Estimation with Multiple Information Sources on Search Engine Result Pages
Junqi Zhang, Yiqun Liu, Shaoping Ma, Qi Tian
Pages: 627-636
Full text: PDFPDF

Relevance estimation is among the most important tasks in the ranking of search results because most search engines follow the Probability Ranking Principle. Current relevance estimation methodologies mainly concentrate on text matching between the query ...
JIM: Joint Influence Modeling for Collective Search Behavior
Shubhra Kanti Karmaker Santu, Liangda Li, Yi Chang, ChengXiang Zhai
Pages: 637-646
Full text: PDFPDF

Previous work has shown that popular trending events are important external factors which pose significant influence on user search behavior and also provided a way to computationally model this influence. However, their problem formulation was based ...
Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, Junji Tomita
Pages: 647-656
Full text: PDFPDF

This study considers the task of machine reading at scale (MRS) wherein, given a question, a system first performs the information retrieval (IR) task of finding relevant passages in a knowledge source and then carries out the reading comprehension (RC) ...
Bug Localization by Learning to Rank and Represent Bug Inducing Changes
Pablo Loyola, Kugamoorthy Gajananan, Fumiko Satoh
Pages: 657-665
Full text: PDFPDF

In software development, bug localization is the process finding portions of source code associated to a submitted bug report. This task has been modeled as an information retrieval task at source code file, where the report is the query. In this work, ...
SESSION: Session 4E: Recommendation 1
CoNet: Collaborative Cross Networks for Cross-Domain Recommendation
Guangneng Hu, Yu Zhang, Qiang Yang
Pages: 667-676
Full text: PDFPDF

The cross-domain recommendation technique is an effective way of alleviating the data sparse issue in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this ...
PARL: Let Strangers Speak Out What You Like
Libing Wu, Cong Quan, Chenliang Li, Donghong Ji
Pages: 677-686
Full text: PDFPDF

Review-based methods are one of the dominant methods to address the data sparsity problem of recommender system. However, the performance of most existing review-based methods will degrade when the review is also sparse. To this end, we propose a method ...
Regularizing Matrix Factorization with User and Item Embeddings for Recommendation
Thanh Tran, Kyumin Lee, Yiming Liao, Dongwon Lee
Pages: 687-696
Full text: PDFPDF

Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: ...
Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence
Chen Ma, Yingxue Zhang, Qinglong Wang, Xue Liu
Pages: 697-706
Full text: PDFPDF

The rapid growth of Location-based Social Networks (LBSNs) provides a great opportunity to satisfy the strong demand for personalized Point-of-Interest (POI) recommendation services. However, with the tremendous increase of users and POIs, POI recommender ...
SESSION: Session 5A: Evaluation in IR 2
Meta-Analysis for Retrieval Experiments Involving Multiple Test Collections
Ian Soboroff
Pages: 713-722
Full text: PDFPDF

Traditional practice recommends that information retrieval experiments be run over multiple test collections, to support, if not prove, that gains in performance are likely to generalize to other collections or tasks. However, because of the pooling ...
Presentation Ordering Effects On Assessor Agreement
Tadele T. Damessie, J. Shane Culpepper, Jaewon Kim, Falk Scholer
Pages: 723-732
Full text: PDFPDF

Consistency of relevance judgments is a vital issue for the construction of test collections in information retrieval. As human relevance assessments are costly, and large collections can contain many documents of varying relevance, collecting reliable ...
Understanding Reading Attention Distribution during Relevance Judgement
Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, Shaoping Ma
Pages: 733-742
Full text: PDFPDF

Reading is a complex cognitive activity in many information retrieval related scenarios, such as relevance judgement and question answering. There exists plenty of works which model these processes as a matching problem, which focuses on how to estimate ...
SESSION: Session 5B: Health and Medical
KAME: Knowledge-based Attention Model for Diagnosis Prediction in Healthcare
Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, Jing Gao
Pages: 743-752
Full text: PDFPDF

The goal of diagnosis prediction task is to predict the future health information of patients from their historical Electronic Healthcare Records (EHR). The most important and challenging problem of diagnosis prediction is to design an accurate, robust ...
"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention
Manas Gaur, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, Jyotishman Pathak
Pages: 753-762
Full text: PDFPDF

Social media platforms are increasingly being used to share and seek advice on mental health issues. In particular, Reddit users freely discuss such issues on various subreddits, whose structure and content can be leveraged to formally interpret and ...
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Anahita Hosseini, Ting Chen, Wenjun Wu, Yizhou Sun, Majid Sarrafzadeh
Pages: 763-772
Full text: PDFPDF

With the recent availability of Electronic Health Records (EHR) and great opportunities they offer for advancing medical informatics, there has been growing interest in mining EHR for improving quality of care. Disease diagnosis due to its sensitive ...
SESSION: Session 5C: Machine Learning 1
"Bridge": Enhanced Signed Directed Network Embedding
Yiqi Chen, Tieyun Qian, Huan Liu, Ke Sun
Pages: 773-782
Full text: PDFPDF

Signed directed networks with positive or negative links convey rich information such as like or dislike, trust or distrust. Existing work of sign prediction mainly focuses on triangles (triadic nodes) motivated by balance theory to predict positive ...
Semi-Supervised Multi-Label Feature Selection by Preserving Feature-Label Space Consistency
Yuanyuan Xu, Jun Wang, Shuai An, Jinmao Wei, Jianhua Ruan
Pages: 783-792
Full text: PDFPDF

Semi-supervised learning and multi-label learning pose different challenges for feature selection, which is one of the core techniques for dimension reduction, and the exploration of reducing feature space for multi-label learning with incomplete label ...
COPA: Constrained PARAFAC2 for Sparse & Large Datasets
Ardavan Afshar, Ioakeim Perros, Evangelos E. Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun
Pages: 793-802
Full text: PDFPDF

PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite ...
SESSION: Session 5D: Similarity 1
The Impact of Name-Matching and Blocking on Author Disambiguation
Tobias Backes
Pages: 803-812
Full text: PDFPDF

In this work, we address the problem of blocking in the context of author name disambiguation. We describe a framework that formalizes different ways of name-matching to determine which names could potentially refer to the same author. We focus on name ...
Parallel Hashing Using Representative Points in Hyperoctants
Chaomin Shen, Mixue Yu, Chenxiao Zhao, Yaxin Peng, Guixu Zhang
Pages: 813-822
Full text: PDFPDF

The goal of hashing is to learn a low-dimensional binary representation of high-dimensional information, leading to a tremendous reduction of computational cost. Previous studies usually achieved this goal by applying projection or quantization methods. ...
SESSION: Session 5E: Neural Ranking
PRRE: Personalized Relation Ranking Embedding for Attributed Networks
Sheng Zhou, Hongxia Yang, Xin Wang, Jiajun Bu, Martin Ester, Pinggang Yu, Jianwei Zhang, Can Wang
Pages: 823-832
Full text: PDFPDF

Attributed network embedding focuses on learning low-dimensional latent representations of nodes which can well preserve the original topological and node attributed proximity at the same time. Existing works usually assume that nodes with similar topology ...
Heterogeneous Neural Attentive Factorization Machine for Rating Prediction
Liang Chen, Yang Liu, Zibin Zheng, Philip Yu
Pages: 833-842
Full text: PDFPDF

Heterogeneous Information Network(HIN) has been employed in recommender system to represent heterogeneous types of data, and meta path has been proposed to capture semantic relationship among objects. When applying HIN to the recommendation, there are ...
Recurrent Neural Networks with Top-k Gains for Session-based Recommendations
Balázs Hidasi, Alexandros Karatzoglou
Pages: 843-852
Full text: PDFPDF

RNNs have been shown to be excellent models for sequential data and in particular for data that is generated by users in an session-based manner. The use of RNNs provides impressive performance benefits over classical methods in session-based recommendations. ...
SESSION: Session 6A: Machine Learning 2
Interactions Modeling in Multi-Task Multi-View Learning with Consistent Task Diversity
Xiaoli Li, Jun Huan
Pages: 853-861
Full text: PDFPDF

Multi-task Multi-view (MTMV) learning has recently undergone noticeable development for dealing with heterogeneous data. To exploit information from both related tasks and related views, a common strategy is to model task relatedness and view consistency ...
Distinguishing Trajectories from Different Drivers using Incompletely Labeled Trajectories
Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen
Pages: 863-872
Full text: PDFPDF

We consider a scenario that occurs often in the auto insurance industry. We are given a large collection of trajectories that stem from many different drivers. Only a small number of the trajectories are labeled with driver identifiers, and only some ...
Modeling Sequential Online Interactive Behaviors with Temporal Point Process
Renqin Cai, Xueying Bai, Zhenrui Wang, Yuling Shi, Parikshit Sondhi, Hongning Wang
Pages: 873-882
Full text: PDFPDF

The massively available data about user engagement with online information service systems provides a gold mine about users' latent intents. It calls for quantitative user behavior modeling. In this paper, we study the problem by looking into users' ...
SESSION: Session 6B: Knowledge Modelling
Towards Practical Open Knowledge Base Canonicalization
Tien-Hsuan Wu, Zhiyong Wu, Ben Kao, Pengcheng Yin
Pages: 883-892
Full text: PDFPDF

An Open Information Extraction (OIE) system processes textual data to extract assertions, which are structured data typically represented in the form of (subject;relation; object) triples. An Open Knowledge Base (OKB) is a collection of such assertions. ...
Semantically-Enhanced Topic Modeling
Felipe Viegas, Washington Luiz, Christian Gomes, Amir Khatibi, Sérgio Canuto, Fernando Mourão, Thiago Salles, Leonardo Rocha, Marcos André Gonçalves
Pages: 893-902
Full text: PDFPDF

In this paper, we advance the state-of-the-art in topic modeling by means of the design and development of a novel (semi-formal) general topic modeling framework. The novel contributions of our solution include: (i) the introduction of new semantically-enhanced ...
METIC: Multi-Instance Entity Typing from Corpus
Bo Xu, Zheng Luo, Luyang Huang, Bin Liang, Yanghua Xiao, Deqing Yang, Wei Wang
Pages: 903-912
Full text: PDFPDF

This paper addresses the problem ofmulti-instance entity typing from corpus. Current approaches mainly rely on the structured features (\textitattributes, attribute-value pairs andtags ) of the entities. However, their effectiveness is largely dependent ...
SESSION: Session 6C: Graph Learning 2
Semi-supervised Learning on Graphs with Generative Adversarial Nets
Ming Ding, Jie Tang, Jie Zhang
Pages: 913-922
Full text: PDFPDF

We investigate how generative adversarial nets (GANs) can help semi-supervised learning on graphs. We first provide insights on working principles of adversarial learning over graphs and then present GraphSGAN, a novel approach to semi-supervised learning ...
Mining Frequent Patterns in Evolving Graphs
Cigdem Aslay, Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, Aristides Gionis
Pages: 923-932
Full text: PDFPDF

Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides ...
Multiresolution Graph Attention Networks for Relevance Matching
Ting Zhang, Bang Liu, Di Niu, Kunfeng Lai, Yu Xu
Pages: 933-942
Full text: PDFPDF

A large number of deep learning models have been proposed for the text matching problem, which is at the core of various typical natural language processing (NLP) tasks. However, existing deep models are mainly designed for the semantic matching between ...
SESSION: Session 6D: Information Credibility
Rumor Detection with Hierarchical Social Attention Network
Han Guo, Juan Cao, Yazi Zhang, Junbo Guo, Jintao Li
Pages: 943-951
Full text: PDFPDF

Microblogs have become one of the most popular platforms for news sharing. However, due to its openness and lack of supervision, rumors could also be easily posted and propagated on social networks, which could cause huge panic and threat during its ...
Modeling Users' Exposure with Social Knowledge Influence and Consumption Influence for Recommendation
Jiawei Chen, Yan Feng, Martin Ester, Sheng Zhou, Chun Chen, Can Wang
Pages: 953-962
Full text: PDFPDF

Users' consumption behaviors are affected by both their personal preference and their exposure to items (i.e. whether a user knows the items).Most of the recent works in social recommendation assume that people share similar preference with their socially ...
Exploring People's Attitudes and Behaviors Toward Careful Information Seeking in Web Search
Takehiro Yamamoto, Yusuke Yamamoto, Sumio Fujita
Pages: 963-972
Full text: PDFPDF

This study investigates how people carefully search for the Web to obtain credible and accurate information. The goal of this study is to better understand people's attitudes toward careful information seeking via Web search, and the relationship between ...
SESSION: Session 6E: Text Classification
Dataless Text Classification: A Topic Modeling Approach with Document Manifold
Ximing Li, Changchun Li, Jinjin Chi, Jihong Ouyang, Chenliang Li
Pages: 973-982
Full text: PDFPDF

Recently, dataless text classification has attracted increasing attention. It trains a classifier using seed words of categories, rather than labeled documents that are expensive to obtain. However, a small set of seed words may provide very limited ...
Weakly-Supervised Neural Text Classification
Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
Pages: 983-992
Full text: PDFPDF

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer ...
Creating Scoring Rubric from Representative Student Answers for Improved Short Answer Grading
Smit Marvaniya, Swarnadeep Saha, Tejas I. Dhamecha, Peter Foltz, Renuka Sindhgatta, Bikram Sengupta
Pages: 993-1002
Full text: PDFPDF

Automatic short answer grading remains one of the key challenges of any dialog-based tutoring system due to the variability in the student answers. Typically, each question may have no or few expert authored exemplary answers which make it difficult ...
SESSION: Session 7A: Social Data Analytics 2
Probabilistic Causal Analysis of Social Influence
Francesco Bonchi, Francesco Gullo, Bud Mishra, Daniele Ramazzotti
Pages: 1003-1012
Full text: PDFPDF

Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, i.e., homophily and other spurious causes. However, most studies to characterize ...
Adversarial Learning of Answer-Related Representation for Visual Question Answering
Yun Liu, Xiaoming Zhang, Feiran Huang, Zhoujun Li
Pages: 1013-1022
Full text: PDFPDF

Visual Question Answering (VQA) aims to learn a joint embedding of the question sentence and the corresponding image to infer the answer. Existing approaches learn the joint embedding don't consider the answer-related information, which results in that ...
Stochastic Coupon Probing in Social Networks
Shaojie Tang
Pages: 1023-1031
Full text: PDFPDF

CMO Council reports that 71% of internet users in the U.S. were influenced by coupons and discounts when making their purchase decisions. It has also been shown that offering coupons to a small fraction of users may affect the purchase decisions of many ...
Type Prediction Combining Linked Open Data and Social Media
Yaroslav Nechaev, Francesco Corcoglioniti, Claudio Giuliano
Pages: 1033-1042
Full text: PDFPDF

Linked Open Data (LOD) and social media often contain the representations of the same real-world entities, such as persons and organizations. These representations are increasingly interlinked, making it possible to combine and leverage both LOD and ...
SESSION: Session 7B: Stream Analytics 2
Efficient and Reliable Estimation of Cell Positions
Mirela T. Cazzolato, Agma J. M. Traina, Klemens Böhm
Pages: 1043-1052
Full text: PDFPDF

Sequences of microscopic images feature the dynamics of developing embryos. Automatically tracking the cells from such sequences of images allows understanding the dynamics which a living element demands to know its cells movement, which ideally should ...
On Real-time Detecting Passenger Flow Anomalies
Bo Tang, Hongyin Tang, Xinzhou Dong, Beihong Jin, Tingjian Ge
Pages: 1053-1062
Full text: PDFPDF

In large and medium-sized cities, detecting unusual changes of crowds of people on the streets is needed for public security, transportation management, emergency control, and terrorism prevention. As public transportation has the capability to bring ...
A Novel Online Stacked Ensemble for Multi-Label Stream Classification
Alican Büyükçakir, Hamed Bonab, Fazli Can
Pages: 1063-1072
Full text: PDFPDF

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is classified ...
RESTFul: Resolution-Aware Forecasting of Behavioral Time Series Data
Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, Louis Faust, Nitesh V. Chawla
Pages: 1073-1082
Full text: PDFPDF

Leveraging historical behavioral data (e.g., sales volume and email communication) for future prediction is of fundamental importance for practical domains ranging from sales to temporal link prediction. Current forecasting approaches often use only ...
SESSION: Session 7C: Hardware Approaches
Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range
Jun-Gi Jang, Dongjin Choi, Jinhong Jung, U Kang
Pages: 1083-1092
Full text: PDFPDF

Given multiple time series data, how can we efficiently find latent patterns in an arbitrary time range? Singular value decomposition (SVD) is a crucial tool to discover hidden factors in multiple time series data, and has been used in many data mining ...
Disk-based Matrix Completion for Memory Limited Devices
Dongha Lee, Jinoh Oh, Christos Faloutsos, Byungju Kim, Hwanjo Yu
Pages: 1093-1102
Full text: PDFPDF

More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, ...
Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification
Yong Zhuang, Yuchin Juan, Guo-Xun Yuan, Chih-Jen Lin
Pages: 1103-1112
Full text: PDFPDF

It is well known that a direct parallelization of sequential optimization methods (e.g., coordinate descent and stochastic gradient methods) is often not effective. The reason is that at each iteration, the number of operations may be too small. We point ...
CUSNTF: A Scalable Sparse Non-negative Tensor Factorization Model for Large-scale Industrial Applications on Multi-GPU
Hao Li, Kenli Li, Jiyao An, Keqin Li
Pages: 1113-1122
Full text: PDFPDF

Given a high-order, large-scale and sparse data from big data and industrial applications, how can we acquire useful patterns in a real-time and low memory overhead manner? Sparse Non-negative tensor factorization (SNTF) possesses high-order representation, ...
SESSION: Session 7D: Recommendation 2
DiVE: Diversifying View Recommendation for Visual Data Exploration
Rischan Mafrur, Mohamed A. Sharaf, Hina A. Khan
Pages: 1123-1132
Full text: PDFPDF

To support effective data exploration, there has been a growing interest in developing solutions that can automatically recommend data visualizations that reveal interesting and useful data-driven insights. In such solutions, a large number of possible ...
Representing and Recommending Shopping Baskets with Complementarity, Compatibility and Loyalty
Mengting Wan, Di Wang, Jie Liu, Paul Bennett, Julian McAuley
Pages: 1133-1142
Full text: PDFPDF

We study the problem of representing and recommending products for grocery shopping. We carefully investigate grocery transaction data and observe three important patterns: products within the same basket complement each other in terms of functionality ...
Recommendation Through Mixtures of Heterogeneous Item Relationships
Wang-Cheng Kang, Mengting Wan, Julian McAuley
Pages: 1143-1152
Full text: PDFPDF

Recommender Systems have proliferated as general-purpose approaches to model a wide variety of consumer interaction data. Specific instances make use of signals ranging from user feedback, item relationships, geographic locality, social influence (etc.). ...
Fairness-Aware Tensor-Based Recommendation
Ziwei Zhu, Xia Hu, James Caverlee
Pages: 1153-1162
Full text: PDFPDF

Tensor-based methods have shown promise in improving upon traditional matrix factorization methods for recommender systems. But tensors may achieve improved recommendation quality while worsening the fairness of the recommendations. Hence, we propose ...
SESSION: Session 7E: Interactive IR 2
Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem
Xiaoyu Liu, Shunda Pan, Qi Zhang, Yu-Gang Jiang, Xuanjing Huang
Pages: 1163-1172
Full text: PDFPDF

In recent years, the task of reformulating natural language queries has received considerable attention from both industry and academic communities. Because of the lexical chasm problem between natural language queries and web documents, if we directly ...
Attentive Neural Architecture for Ad-hoc Structured Document Retrieval
Saeid Balaneshinkordan, Alexander Kotov, Fedor Nikolaev
Pages: 1173-1182
Full text: PDFPDF

The problem of ad-hoc structured document retrieval arises in many information access scenarios, from Web to product search. Yet neither deep neural networks, which have been successfully applied to ad-hoc information retrieval and Web search, nor the ...
Measuring User Satisfaction on Smart Speaker Intelligent Assistants Using Intent Sensitive Query Embeddings
Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, Paul A. Crook
Pages: 1183-1192
Full text: PDFPDF

Intelligent assistants are increasingly being used on smart speaker devices, such as Amazon Echo, Google Home, Apple Homepod, and Harmon Kardon Invoke with Cortana. Typically, user satisfaction measurement relies on user interaction signals, such as ...
Impact of Domain and User's Learning Phase on Task and Session Identification in Smart Speaker Intelligent Assistants
Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, Paul A. Crook
Pages: 1193-1202
Full text: PDFPDF

Task and session identification is a key element of system evaluation and user behavior modeling in Intelligent Assistant (IA) systems. However, identifying task and sessions for IAs is challenging due to the multi-task nature of IAs and the differences ...
SESSION: Session 8A: Similarity 2
Engineering a Simplified 0-Bit Consistent Weighted Sampling
Edward Raff, Jared Sylvester, Charles Nicholas
Pages: 1203-1212
Full text: PDFPDF

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art ...
A Scalable Algorithm for Higher-order Features Generation using MinHash
Pooja A, Naveen Nair, Rajeev Rastogi
Pages: 1213-1222
Full text: PDFPDF

Linear models have been widely used in the industry for their low computation time, small memory footprint and interpretability. However, linear models are not capable of leveraging non-linear feature interactions in predicting the target. This limits ...
Multiperspective Graph-Theoretic Similarity Measure
Dung D. Le, Hady W. Lauw
Pages: 1223-1232
Full text: PDFPDF

Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the structural-context ...
SESSION: Session 8B: Crowdsourcing
Explicit Preference Elicitation for Task Completion Time
Mohammadreza Esfandiari, Senjuti Basu Roy, Sihem Amer-Yahia
Pages: 1233-1242
Full text: PDFPDF

Current crowdsourcing platforms provide little support for worker feedback. Workers are sometimes invited to post free text describing their experience and preferences in completing tasks. They can also use forums such as Turker Nation1 to ...
Network-wide Crowd Flow Prediction of Sydney Trains via Customized Online Non-negative Matrix Factorization
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch
Pages: 1243-1252
Full text: PDFPDF

Crowd Flow Prediction (CFP) is one major challenge in the intelligent transportation systems of the Sydney Trains Network. However, most advanced CFP methods only focus on entrance and exit flows at the major stations or a few subway lines, neglecting ...
Studying Topical Relevance with Evidence-based Crowdsourcing
Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, Lora Aroyo
Pages: 1253-1262
Full text: PDFPDF

Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need ...
SESSION: Session 8C: Privacy
Randomized Bit Vector: Privacy-Preserving Encoding Mechanism
Lin Sun, Lan Zhang, Xiaojun Ye
Pages: 1263-1272
Full text: PDFPDF

Recently, many methods have been proposed to prevent privacy leakage in record linkage by encoding record pair data into another anonymous space. Nevertheless, they cannot perform well in some circumstances due to high computational complexities, low ...
Privacy Protection for Flexible Parametric Survival Models
Thông T. Nguyễn, Siu Cheung Hui
Pages: 1273-1282
Full text: PDFPDF

Data privacy is a major concern in modern society. In this work, we propose two solutions to the privacy-preserving problem of regression models on medical data. We focus on flexible parametric models which are powerful alternatives to the well-known ...
Privacy-Preserving Triangle Counting in Large Graphs
Xiaofeng Ding, Xiaodong Zhang, Zhifeng Bao, Hai Jin
Pages: 1283-1292
Full text: PDFPDF

Triangle count is a critical parameter in mining relationships among people in social networks. However, directly publishing the findings obtained from triangle counts may bring potential privacy concern, which raises great challenges and opportunities ...
SESSION: Session 8D: Ranking
Differentiable Unbiased Online Learning to Rank
Harrie Oosterhuis, Maarten de Rijke
Pages: 1293-1302
Full text: PDFPDF

Online Learning to Rank (OLTR) methods optimize rankers based on user interactions. State-of-the-art OLTR methods are built specifically for linear models. Their approaches do not extend well to non-linear models such as neural networks. We introduce ...
A Quantum Many-body Wave Function Inspired Language Modeling Approach
Peng Zhang, Zhan Su, Lipeng Zhang, Benyou Wang, Dawei Song
Pages: 1303-1312
Full text: PDFPDF

The recently proposed quantum language model (QLM) aimed at a principled approach to modeling term dependency by applying the quantum probability theory. The latest development for a more effective QLM has adopted word embeddings as a kind of global ...
The LambdaLoss Framework for Ranking Metric Optimization
Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, Marc Najork
Pages: 1313-1322
Full text: PDFPDF

How to optimize ranking metrics such as Normalized Discounted Cumulative Gain (NDCG) is an important but challenging problem, because ranking metrics are either flat or discontinuous everywhere, which makes them hard to be optimized directly. Among existing ...
SESSION: Session 8E: Optimization
PolyHJ: A Polymorphic Main-Memory Hash Join Paradigm for Multi-Core Machines
Omar Khattab, Mohammad Hammoud, Omar Shekfeh
Pages: 1323-1332
Full text: PDFPDF

Relational join is a central data management operation that influences the performance of almost every database query. In this paper, we show that different input features and hardware settings necessitate different main-memory hash join models. Subsequently, ...
When Optimizer Chooses Table Scans: How to Make Them More Responsive
Lijian Wan, Tingjian Ge
Pages: 1333-1342
Full text: PDFPDF

Recent studies show that table scans are increasingly more common than using secondary indices. Given that the optimizer may choose table scans when the selectivity is as low as 0.5% with large data, it is important to make initial query results faster ...
Construction of Efficient V-Gram Dictionary for Sequential Data Analysis
Igor Kuralenok, Natalia Starikova, Aleksandr Khvorov, Julian Serdyuk
Pages: 1343-1352
Full text: PDFPDF

This paper presents a new method for constructing an optimal feature set from sequential data. It creates a dictionary of n-grams of variable length (we call them v-grams), based on the minimum description length principle. The proposed method is a dictionary ...
SESSION: Session 9A: Collaborative Ranking
Neural Collaborative Ranking
Bo Song, Xin Yang, Yi Cao, Congfu Xu
Pages: 1353-1362
Full text: PDFPDF

Recommender systems are aimed at generating a personalized ranked list of items that an end user might be interested in. With the unprecedented success of deep learning in computer vision and speech recognition, recently it has been a hot topic to bridge ...
Collaborative Multi-objective Ranking
Jun Hu, Ping Li
Pages: 1363-1372
Full text: PDFPDF

This paper proposes to jointly resolve row-wise and column-wise ranking problems when an explicit rating matrix is given. The row-wise ranking problem, also known as personalized ranking, aims to build user-specific models such that the correct order ...
SESSION: Session 9B: IR Applications
Mix 'n Match: Integrating Text Matching and Product Substitutability within Product Search
Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
Pages: 1373-1382
Full text: PDFPDF

Two products are substitutes if both can satisfy the same consumer need. Intrinsic incorporation of product substitutability - where substitutability is integrated within latent vector space models - is in contrast to the extrinsic re-ranking of result ...
In Situ and Context-Aware Target Apps Selection for Unified Mobile Search
Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, W. Bruce Croft
Pages: 1383-1392
Full text: PDFPDF

With the recent growth in the use of conversational systems and intelligent assistants such as Google Assistant and Microsoft Cortana, mobile devices are becoming even more pervasive in our lives. As a consequence, users are getting engaged with mobile ...
Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection
Fanghua Ye, Chuan Chen, Zibin Zheng
Pages: 1393-1402
Full text: PDFPDF

Community structure is ubiquitous in real-world complex networks. The task of community detection over these networks is of paramount importance in a variety of applications. Recently, nonnegative matrix factorization (NMF) has been widely adopted for ...
SESSION: Session 9C: Neural Prediction
Explicit State Tracking with Semi-Supervisionfor Neural Dialogue Generation
Xisen Jin, Wenqiang Lei, Zhaochun Ren, Hongshen Chen, Shangsong Liang, Yihong Zhao, Dawei Yin
Pages: 1403-1412
Full text: PDFPDF

The task of dialogue generation aims to automatically provide responses given previous utterances. Tracking dialogue states is an important ingredient in dialogue generation for estimating users' intention. However, the expensive nature of state labeling ...
On Prediction of User Destination by Sub-Trajectory Understanding: A Deep Learning based Approach
Jing Zhao, Jiajie Xu, Rui Zhou, Pengpeng Zhao, Chengfei Liu, Feng Zhu
Pages: 1413-1422
Full text: PDFPDF

Destination prediction is known as an important problem for many location based services (LBSs). Existing solutions generally apply probabilistic models to predict destinations over a sub-trajectory, but their accuracies in fine-granularity prediction ...
DeepCrime: Attentive Hierarchical Recurrent Networks for Crime Prediction
Chao Huang, Junbo Zhang, Yu Zheng, Nitesh V. Chawla
Pages: 1423-1432
Full text: PDFPDF

As urban crimes (e.g., burglary and robbery) negatively impact our everyday life and must be addressed in a timely manner, predicting crime occurrences is of great importance for public safety and urban sustainability. However, existing methods do not ...
SESSION: Session 9D: Advertising
Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising
Kan Ren, Yuchen Fang, Weinan Zhang, Shuhao Liu, Jiajun Li, Ya Zhang, Yong Yu, Jun Wang
Pages: 1433-1442
Full text: PDFPDF

In online advertising, the Internet users may be exposed to a sequence of different ad campaigns, i.e., display ads, search, or referrals from multiple channels, before led up to any final sales conversion and transaction. For both campaigners and publishers, ...
Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, Kun Gai
Pages: 1443-1451
Full text: PDFPDF

Real-time bidding (RTB) is an important mechanism in online display advertising, where a proper bid for each page view plays an essential role for good marketing results. Budget constrained bidding is a typical scenario in RTB where the advertisers hope ...
SESSION: Session 9E: Image Similarity
Deep Semantic Hashing with Multi-Adversarial Training
Bingning Wang, Kang Liu, Jun Zhao
Pages: 1453-1462
Full text: PDFPDF

With the amount of data has been rapidly growing over recent decades, binary hashing has become an attractive approach for fast search over large databases, in which the high-dimensional data such as image, video or text is mapped into a low-dimensional ...
Communication-Efficient Distributed Deep Metric Learning with Hybrid Synchronization
Yuxin Su, Michael Lyu, Irwin King
Pages: 1463-1472
Full text: PDFPDF

Deep metric learning is widely used in extreme classification and image retrieval because of its powerful ability to learn the semantic low-dimensional embedding of high-dimensional data. However, the heavy computational cost of mining valuable pair ...
SESSION: Short Papers
A Dynamical System on Bipartite Graphs
Kishore Papineni, Pratik Worah
Pages: 1479-1482
Full text: PDFPDF

This paper poses a non-linear dynamical system on bipartite graphs and shows its stability under certain conditions. The dynamical system changes the weights on the nodes of the graph in each time step. The underlying weight transformation is non-linear, ...
Abnormal Event Detection via Heterogeneous Information Network Embedding
Shaohua Fan, Chuan Shi, Xiao Wang
Pages: 1483-1486
Full text: PDFPDF

Heteregeneous information networks (HINs) are ubiquitous in the real world, and discovering the abnormal events plays an important role in understanding and analyzing the HIN. The abnormal event usually implies that the number of co-occurrences of entities ...
AceKG: A Large-scale Knowledge Graph for Academic Data Mining
Ruijie Wang, Yuchen Yan, Jialu Wang, Yuting Jia, Ye Zhang, Weinan Zhang, Xinbing Wang
Pages: 1487-1490
Full text: PDFPDF

Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multi-relational information, name ambiguity and improper data format for large-scale machine processing. In this paper, we present AceKG, a new large-scale ...
An Adversarial Approach to Improve Long-Tail Performance in Neural Collaborative Filtering
Adit Krishnan, Ashish Sharma, Aravind Sankar, Hari Sundaram
Pages: 1491-1494
Full text: PDFPDF

In recent times, deep neural networks have found success in Collaborative Filtering (CF) based recommendation tasks. By parametrizing latent factor interactions of users and items with neural architectures, they achieve significant gains in scalability ...
AQuPR: Attention based Query Passage Retrieval
Parth Pathak, Mithun Das Gupta, Niranjan Nayak, Harsh Kohli
Pages: 1495-1498
Full text: PDFPDF

Search queries issued over the Web increasingly look like questions, especially as the domain becomes more specific. Finding good response to such queries amounts to finding relevant passages from Web documents. Traditional information retrieval based ...
Attentive Encoder-based Extractive Text Summarization
Chong Feng, Fei Cai, Honghui Chen, Maarten de Rijke
Pages: 1499-1502
Full text: PDFPDF

In previous work on text summarization, encoder-decoder architectures and attention mechanisms have both been widely used. Attention-based encoder-decoder approaches typically focus on taking the sentences preceding a given sentence in a document into ...
Calibration: A Simple Way to Improve Click Models
Alexey Borisov, Julia Kiseleva, Ilya Markov, Maarten de Rijke
Pages: 1503-1506
Full text: PDFPDF

We show that click models trained with suboptimal hyperparameters suffer from the issue of bad calibration. This means that their predicted click probabilities do not agree with the observed proportions of clicks in the held-out data. To repair this ...
Can User Behaviour Sequences Reflect Perceived Novelty?
Mengdie Zhuang, Elaine G. Toms, Gianluca Demartini
Pages: 1507-1510
Full text: PDFPDF

Serendipity is highly valued as a process for developing original solutions to problems and for innovation. However, it is difficult to capture and thus difficult to measure, but novelty is a key and critical indicator. In this work, we investigate the ...
Causal Dependencies for Future Interest Prediction on Twitter
Negar Arabzadeh, Hossein Fani, Fattane Zarrinkalam, Ahmed Navivala, Ebrahim Bagheri
Pages: 1511-1514
Full text: PDFPDF

The accurate prediction of users' future topics of interests on social networks can facilitate content recommendation and platform engagement. However, researchers have found that future interest prediction, especially on social networks such as Twitter, ...
Challenges of Multileaved Comparison in Practice: Lessons from NTCIR-13 OpenLiveQ Task
Makoto P. Kato, Tomohiro Manabe, Sumio Fujita, Akiomi Nishida, Takehiro Yamamoto
Pages: 1515-1518
Full text: PDFPDF

This paper discusses challenges of an online evaluation technique, multileaved comparison, based on the analysis of evaluation results in a community question-answering (cQA) search service. NTCIR-13 OpenLiveQ task offered a shared task in which participants ...
Compiling Questions into Balanced Quizzes about Documents
Cristina Menghini, Jessica Dehler Zufferey, Robert West
Pages: 1519-1522
Full text: PDFPDF

In the educational framework, knowledge assessment is a critical component, and quizzes (sets of questions with concise answers) are a popular tool for this purpose. This paper focuses on the generation of balanced quizzes, i.e., quizzes that relate ...
Continuation Methods and Curriculum Learning for Learning to Rank
Nicola Ferro, Claudio Lucchese, Maria Maistro, Raffaele Perego
Pages: 1523-1526
Full text: PDFPDF

In this paper we explore the use of Continuation Methods and Curriculum Learning techniques in the area of Learning to Rank. The basic idea is to design the training process as a learning path across increasingly complex training instances and objective ...
Correlated Time Series Forecasting using Multi-Task Deep Neural Networks
Razvan-Gabriel Cirstea, Darius-Valer Micu, Gabriel-Marcel Muresan, Chenjuan Guo, Bin Yang
Pages: 1527-1530
Full text: PDFPDF

Cyber-physical systems often consist of entities that interact with each other over time. Meanwhile, as part of the continued digitization of industrial processes, various sensor technologies are deployed that enable us to record time-varying attributes ...
Cross-domain Aspect/Sentiment-aware Abstractive Review Summarization
Min Yang, Qiang Qu, Jia Zhu, Ying Shen, Zhou Zhao
Pages: 1531-1534
Full text: PDFPDF

This study takes the lead to study the aspect/sentiment-aware abstractive review summarization in domain adaptation scenario. The proposed model CASAS (neural attentive model for Cross-domain Aspect/Sentiment-aware Abstractive review Summarization) leverages ...
Data Structure for Efficient Line of Sight Queries
Swapnil Gaikwad, Melody Moh, David C. Anastasiu
Pages: 1535-1538
Full text: PDFPDF

Given the great amounts of data being transmitted between devices in the 21st century, existing channels of wireless communication are getting congested. In the wireless space, the focus up to now has been on the microwave frequency range. ...
Detecting Parkinson's Disease from Interactions with a Search Engine: Is Expert Knowledge Sufficient?
Liron Allerhand, Brit Youngmann, Elad Yom-Tov, David Arkadir
Pages: 1539-1542
Full text: PDFPDF

Parkinson's disease (PD) is a slowly progressing neurodegenerative disease with early manifestation of motor signs. Recently, there has been a growing interest in developing automatic tools that can assess motor function in PD patients. Here we show ...
DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain
Weihong Wang, Jie Xu, Yang Wang, Chen Cai, Fang Chen
Pages: 1543-1546
Full text: PDFPDF

Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that ...
An Effective Approach for Modelling Time Features for Classifying Bursty Topics on Twitter
Anjie Fang, Iadh Ounis, Craig MacDonald, Philip Habel, Xiaoyu Xiong, Hai-Tao Yu
Pages: 1547-1550
Full text: PDFPDF

Several previous approaches attempted to predict bursty topics on Twitter. Such approaches have usually reported that the time information (e.g. the topic popularity over time) of hashtag topics contribute the most to the prediction of bursty topics. ...
Efficient and Effective Query Expansion for Web Search
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Roberto Trani, Rossano Venturini
Pages: 1551-1554
Full text: PDFPDF

Query Expansion (QE) techniques expand the user queries with additional terms, e.g., synonyms and acronyms, to enhance the system recall. State-of-the-art solutions employ machine learning methods to select the most suitable terms. However, most of them ...
Efficient Energy Management in Distributed Web Search
Matteo Catena, Ophir Frieder, Nicola Tonellotto
Pages: 1555-1558
Full text: PDFPDF

Distributed Web search engines (WSEs) require warehouse-scale computers to deal with the ever-increasing size of the Web and the large amount of user queries they daily receive. The energy consumption of this infrastructure has a major impact on the ...
Efficient Pipeline Processing of Crowdsourcing Workflows
Ken Mizusawa, Keishi Tajima, Masaki Matsubara, Toshiyuki Amagasa, Atsuyuki Morishima
Pages: 1559-1562
Full text: PDFPDF

This paper addresses the pipeline processing of sequential workflows in crowdsourcing. Sequential workflows consisting of several subtasks are ubiquitous in crowdsourcing. Our approach is to control the budget distribution to subtasks in order to balance ...
Efficient Taxonomic Similarity Joins with Adaptive Overlap Constraint
Pengfei Xu, Jiaheng Lu
Pages: 1563-1566
Full text: PDFPDF

A similarity join aims to find all similar pairs between two collections of records. Established approaches usually deal with synthetic differences like typos and abbreviations, but neglect the semantic relations between words. Such relations, however, ...
Embedding Fuzzy K-Means with Nonnegative Spectral Clustering via Incorporating Side Information
Muhan Guo, Rui Zhang, Feiping Nie, Xuelong Li
Pages: 1567-1570
Full text: PDFPDF

As one of the most widely used clustering techniques, the fuzzy K-Means (also called FKM or FCM) assigns every data point to each cluster with a certain degree of membership. However, conventional FKM approach relies on the square data fitting term which ...
Empirical Evidence for Search Effectiveness Models
Alfan Farizki Wicaksono, Alistair Moffat
Pages: 1571-1574
Full text: PDFPDF

Given a SERP in response to a user-originated query, Moffat et al. (CIKM 2013; TOIS 2017) suggest that C(i), the conditional continuation probability of the user examining the (i+1)st element presented in the SERP, given that they are known to have examined ...
An Encoder-Memory-Decoder Framework for Sub-Event Detection in Social Media
Guandan Chen, Nan Xu, Weiji Mao
Pages: 1575-1578
Full text: PDFPDF

Sub-event detection can help faster and deeper understanding of an event by providing human-friendly clusters, and thus has become an important research topic in Web mining and knowledge management. In existing sub-event detection methods, clustering ...
Enhanced Network Embeddings via Exploiting Edge Labels
Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, Steven Skiena
Pages: 1579-1582
Full text: PDFPDF

Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. While achieving competitive performance on a variety of network inference tasks such as node classification and link prediction, these methods treat ...
Enhancing Graph Kernels via Successive Embeddings
Giannis Nikolentzos, Michalis Vazirgiannis
Pages: 1583-1586
Full text: PDFPDF

Graph kernels have recently emerged as a promising approach to perform machine learning on graph-structured data. A graph kernel implicitly embedds graphs in a Hilbert space and computes the inner product between these representations. However, the inner ...
Estimating Clickthrough Bias in the Cascade Model
Praveen Chandar, Ben Carterette
Pages: 1587-1590
Full text: PDFPDF

Recently, there has been considerable interest in the use of historical logged user interaction data—queries and clicks—for evaluation of search systems in the context of counterfactual analysis [8,10]. Recent approaches attempt to de-bias ...
Exploring Neural Translation Models for Cross-Lingual Text Similarity
Kazuhiro Seki
Pages: 1591-1594
Full text: PDFPDF

This paper explores a neural network-based approach to computing similarity of two texts written in different languages. Such similarity can be useful for a variety of applications including cross-lingual information retrieval and cross-lingual text ...
Extracting Figures and Captions from Scientific Publications
Pengyuan Li, Xiangying Jiang, Hagit Shatkay
Pages: 1595-1598
Full text: PDFPDF

Figures and captions convey essential information in scientific publications. As such, there is a growing interest in mining published figures and in utilizing their respective captions as a source of knowledge. There is also much interest in image captioning ...
FactCheck: Validating RDF Triples Using Textual Evidence
Zafar Habeeb Syed, Michael Röder, Axel-Cyrille Ngonga Ngomo
Pages: 1599-1602
Full text: PDFPDF

With the increasing uptake of knowledge graphs comes an increasing need for validating the knowledge contained in these graphs. However, the sheer size and number of knowledge bases used in real-world applications makes manual fact checking impractical. ...
Hierarchical Complementary Attention Network for Predicting Stock Price Movements with News
Qikai Liu, Xiang Cheng, Sen Su, Shuguang Zhu
Pages: 1603-1606
Full text: PDFPDF

It has been shown that stock price movements are influenced by news. To predict stock movements with news, many existing works rely only on the news title since the news content may contain irrelevancies which seriously degrade the prediction accuracy. ...
Holistic Crowd-Powered Sorting via AID: Optimizing for Accuracies, Inconsistencies, and Difficulties
Shreya Rajpal, Aditya Parameswaran
Pages: 1607-1610
Full text: PDFPDF

We revisit the fundamental problem of sorting objects using crowdsourced pairwise comparisons. Prior work either treats these comparisons as independent tasks -- in which case the resulting judgments may end up being inconsistent, or fails to capture ...
Homepage Augmentation by Predicting Links in Heterogenous Networks
Jianming Lv, Jiajie Zhong, Weihang Chen, Qinzhe Xiao, Zhenguo Yang, Qing Li
Pages: 1611-1614
Full text: PDFPDF

Scholars' homepages are important places to show personal research interest and academic achievement through the Web. However, according to our observation, only a small portion of scholars update their publications and related events on their homepages ...
How Consistent is Relevance Feedback in Exploratory Search?
Alan Medlar, Dorota Glowacka
Pages: 1615-1618
Full text: PDFPDF

Search activities involving knowledge acquisition, investigation and synthesis are collectively known as exploratory search. Exploratory search is challenging for users, who may be unable to formulate search queries, have ill-defined search goals or ...
HRAM: A Hybrid Recurrent Attention Machine for News Recommendation
Dhruv Khattar, Vaibhav Kumar, Vasudeva Varma, Manish Gupta
Pages: 1619-1622
Full text: PDFPDF

Popular methods for news recommendation which are based on collaborative filtering and content-based filtering have multiple drawbacks. The former method does not account for the sequential nature of news reading and suffers from the problem of cold-start, ...
A Hybrid Approach for Automatic Model Recommendation
Roman Vainshtein, Asnat Greenstein-Messica, Gilad Katz, Bracha Shapira, Lior Rokach
Pages: 1623-1626
Full text: PDFPDF

One of the challenges of automating machine learning applications is the automatic selection of an algorithmic model for a given problem. We present AutoDi, a novel and resource-efficient approach for model selection. Our approach combines two sources ...
Hybrid Deep Sequential Modeling for Social Text-Driven Stock Prediction
Huizhe Wu, Wei Zhang, Weiwei Shen, Jun Wang
Pages: 1627-1630
Full text: PDFPDF

In addition to only considering stocks' price series, utilizing short and instant texts from social medias like Twitter has potential to yield better stock market prediction. While some previous approaches have explored this direction, their results ...
Imbalanced Sentiment Classification with Multi-Task Learning
Fangzhao Wu, Chuhan Wu, Junxin Liu
Pages: 1631-1634
Full text: PDFPDF

Supervised learning methods are widely used in sentiment classification. However, when sentiment distribution is imbalanced, the performance of these methods declines. In this paper, we propose an effective approach for imbalanced sentiment classification. ...
Impact of Document Representation on Neural Ad hoc Retrieval
Ebrahim Bagheri, Faezeh Ensan, Feras Al-Obeidat
Pages: 1635-1638
Full text: PDFPDF

Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector ...
Implementation Notes for the Soft Cosine Measure
Vít Novotný
Pages: 1639-1642
Full text: PDFPDF

The standard bag-of-words vector space model (VSM) is efficient, and ubiquitous in information retrieval, but it underestimates the similarity of documents with the same meaning, but different terminology. To overcome this limitation, Sidorov et al. ...
Improve Network Embeddings with Regularization
Yi Zhang, Jianguo Lu, Ofer Shai
Pages: 1643-1646
Full text: PDFPDF

Learning network representations is essential for many downstream tasks such as node classification, link prediction, and recommendation. Many algorithms derived from SGNS (skip-gram with negative sampling) have been proposed, such as LINE, DeepWalk, ...
Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions
Jasper Linmans, Bob van de Velde, Evangelos Kanoulas
Pages: 1647-1650
Full text: PDFPDF

Detecting controversy in general web pages is a daunting task, but increasingly essential to efficiently moderate discussions and effectively filter problematic content. Unfortunately, controversies occur across many topics and domains, with great changes ...
Improving Low-Rank Matrix Completion with Self-Expressiveness
Minsu Kwon, Han-Gyu Kim, Ho-Jin Choi
Pages: 1651-1654
Full text: PDFPDF

In this paper, we improve the low-rank matrix completion algorithm by assuming that the data points lie in a union of low dimensional subspaces. We applied the self-expressiveness, which is a property of a dataset when the data points lie in a union ...
Incorporating Corporation Relationship via Graph Convolutional Neural Networks for Stock Price Prediction
Yingmei Chen, Zhongyu Wei, Xuanjing Huang
Pages: 1655-1658
Full text: PDFPDF

In this paper, we propose to incorporate information of related corporations of a target company for its stock price prediction. We first construct a graph including all involved corporations based on investment facts from real market and learn a distributed ...
IntentsKB: A Knowledge Base of Entity-Oriented Search Intents
Darío Garigliotti, Krisztian Balog
Pages: 1659-1662
Full text: PDFPDF

We address the problem of constructing a knowledge base of entity-oriented search intents. Search intents are defined on the level of entity types, each comprising of a high-level intent category (property, website, service, or other), along with a cluster ...
Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval
Jianlong Wu, Zhouchen Lin, Hongbin Zha
Pages: 1663-1666
Full text: PDFPDF

With the increasing of multi-modal data on the internet, cross-modal retrieval has received a lot of attention in recent years. It aims to use one type of data as query and retrieve results of another type. For different modality data, how to reduce ...
K-core Minimization: An Edge Manipulation Approach
Weijie Zhu, Chen Chen, Xiaoyang Wang, Xuemin Lin
Pages: 1667-1670
Full text: PDFPDF

In social networks, dense relationships among users contribute to stable networks. Breakdowns of some relationships may cause users to leave the network hence decrease the network stability. A popular metric to measure the stability of a network is k-core, ...
Label Propagation with Neural Networks
Aditya Pal, Deepayan Chakrabarti
Pages: 1671-1674
Full text: PDFPDF

Label Propagation (LP) is a popular transductive learning method for very large datasets, in part due to its simplicity and ability to parallelize. However, it has limited ability to handle node features, and its accuracy can be sensitive to the number ...
Learning to Geolocalise Tweets at a Fine-Grained Level
Jorge David Gonzalez Paule, Yashar Moshfeghi, Craig Macdonald, Iadh Ounis
Pages: 1675-1678
Full text: PDFPDF

Fine-grained geolocation of tweets has become an important feature for reliably performing a wide range of tasks such as real-time event detection, topic detection or disaster and emergency analysis. Recent work adopted a ranking approach to return a ...
Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects
Vineeth Rakesh, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, Huan Liu
Pages: 1679-1682
Full text: PDFPDF

Modeling spillover effects from observational data is an important problem in economics, business, and other fields of research. It helps us infer the causality between two seemingly unrelated set of events. For example, if consumer spending in the United ...
Local and Global Information Fusion for Top-N Recommendation in Heterogeneous Information Network
Binbin Hu, Chuan Shi, Wayne Xin Zhao, Tianchi Yang
Pages: 1683-1686
Full text: PDFPDF

Since heterogeneous information network (HIN) is able to integrate complex information and contain rich semantics, there is a surge of HIN based recommendation in recent years. Although existing methods have achieved performance improvement to some extent, ...
Long-Term RNN: Predicting Hazard Function for Proactive Maintenance of Water Mains
Bin Liang, Zhidong Li, Yang Wang, Fang Chen
Pages: 1687-1690
Full text: PDFPDF

Failure event prediction is becoming increasingly important in wide applications, such as the planning of proactive maintenance, the active investment management, and disease surveillance. To address the issue, the hazard function in survival analysis ...
Low-Complexity Supervised Rank Fusion Models
André Mourão, João Magalhães
Pages: 1691-1694
Full text: PDFPDF

Combining multiple retrieval functions can lead to notable gains in retrieval performance. Learning to Rank (LETOR) techniques achieve outstanding retrieval results, by learning models with no bounds on model complexity. Often, minor retrieval gains ...
Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion
Shreshtha Mundra, Sachin Kumar, Manjira Sinha, Sandya Mannarswamy
Pages: 1695-1698
Full text: PDFPDF

Today electronic communications have become the prime medium for people to express their opinions and influence the policy preferences. One such popular channel reflecting the voice of the masses is electronic petitions. To understand people's perspective ...
MM: A new Framework for Multidimensional Evaluation of Search Engines
Joao Palotti, Guido Zuccon, Allan Hanbury
Pages: 1699-1702
Full text: PDFPDF

In this paper, we proposed a framework to evaluate information retrieval systems in presence of multidimensional relevance. This is an important problem in tasks such as consumer health search, where the understandability and trustworthiness of information ...
Modeling Consumer Buying Decision for Recommendation Based on Multi-Task Deep Learning
Qiaolin Xia, Peng Jiang, Fei Sun, Yi Zhang, Xiaobo Wang, Zhifang Sui
Pages: 1703-1706
Full text: PDFPDF

Although marketing researchers and sociologists have recognized the importance of buying decision process and its significant influence on consumer's purchasing behaviors, existing recommender systems do not explicitly model the consumer buying decision ...
Modeling Multi-way Relations with Hypergraph Embedding
Chia-An Yu, Ching-Lun Tai, Tak-Shing Chan, Yi-Hsuan Yang
Pages: 1707-1710
Full text: PDFPDF

Hypergraph is a data structure commonly used to represent connections and relations between multiple objects. Embedding a hypergraph into a low-dimensional space and representing each vertex as a vector is useful in various tasks such as visualization, ...
More than Threads: Identifying Related Email Messages
Noa Avigdor-Elgrabli, Roei Gelbhart, Irena Grabovitch-Zuyev, Ariel Raviv
Pages: 1711-1714
Full text: PDFPDF

In the typical state of an ever growing mailbox, it becomes essential to assist the user to better organize and quickly look up the content of his electronic life. Our work addresses this challenge, by identifying related messages within a user's mailbox. ...
MultiE: Multi-Task Embedding for Knowledge Base Completion
Zhao Zhang, Fuzhen Zhuang, Zheng-Yu Niu, Deqing Wang, Qing He
Pages: 1715-1718
Full text: PDFPDF

Completing knowledge bases (KBs) with missing facts is of great importance, since most existing KBs are far from complete. To this end, many knowledge base completion (KBC) methods have been proposed. However, most existing methods embed each relation ...
Multi-Emotion Category Improving Embedding for Sentiment Classification
Shuo Wang, Xiaofeng Meng
Pages: 1719-1722
Full text: PDFPDF

Sentiment analysis and opinion mining are significant and valuable for subject information extraction from the text. Word embedding that can map the words to low-dimensional vector representations has been widely used in natural language processing tasks. ...
Multiple Manifold Regularized Sparse Coding for Multi-View Image Clustering
Xiaofei Zhu, Khoi Duy Vo, Jiafeng Guo, Jiangwu Long
Pages: 1723-1726
Full text: PDFPDF

Multi-view clustering has received an increasing attention in many applications, where different views of objects can provide complementary information to each other. Existing approaches on multi-view clustering mainly focus on extending Non-negative ...
Multiple Pairwise Ranking with Implicit Feedback
Runlong Yu, Yunzhou Zhang, Yuyang Ye, Le Wu, Chao Wang, Qi Liu, Enhong Chen
Pages: 1727-1730
Full text: PDFPDF

As users implicitly express their preferences to items on many real-world applications, the implicit feedback based collaborative filtering has attracted much attention in recent years. Pairwise methods have shown state-of-the-art solutions for dealing ...
Neighborhood Voting: A Novel Search Scheme for Hashing
Yan Xiao, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng
Pages: 1731-1734
Full text: PDFPDF

Hashing techniques for approximate nearest neighbor search (ANNS) encode data points into a set of short binary codes, while trying to preserve the neighborhood structure of the original data as much as possible. With the binary codes, the task of ANNS ...
A Network-embedding Based Method for Author Disambiguation
Jun Xu, Siqi Shen, Dongsheng Li, Yongquan Fu
Pages: 1735-1738
Full text: PDFPDF

Most existing author disambiguation work relies heavily on feature engineering or cannot use multiple paper relationships. In this work, we propose a network-embedding based method for author disambiguation. For each ambiguous name, we construct networks ...
Neural Retrieval with Partially Shared Embedding Spaces
Bo Li, Le Jia
Pages: 1739-1742
Full text: PDFPDF

One category of neural information retrieval models tries to learn text representation in a common embedding space for both queries and documents. However, a single embedding space is not always sufficient, since queries and documents are different in ...
An Option Gate Module for Sentence Inference on Machine Reading Comprehension
Xuming Lin, Ruifang Liu, Yiwei Li
Pages: 1743-1746
Full text: PDFPDF

In machine reading comprehension (MRC) tasks, sentence inference is an important but extremely difficult problem. Most of MRC models directly interact articles with questions from the word level, which ignores inter and intra information of sentences ...
Point Symmetry-based Deep Clustering
Jose G. Moreno
Pages: 1747-1750
Full text: PDFPDF

Clustering is a central task in unsupervised learning. Recent advances that perform clustering into learned deep features (such as DEC[14], IDEC [6] or VaDe [10]) have shown improvements over classical algorithms, but most of them are based on the Euclidean ...
Predicting Personal Life Events from Streaming Social Content
Maryam Khodabakhsh, Hossein Fani, Fattane Zarrinkalam, Ebrahim Bagheri
Pages: 1751-1754
Full text: PDFPDF

Researchers have shown that it is possible to identify reported instances of personal life events from users' social content, e.g., tweets. This is known as personal life event detection. In this paper, we take a step forward and explore the possibility ...
Query Tracking for E-commerce Conversational Search: A Machine Comprehension Perspective
Yunlun Yang, Yu Gong, Xi Chen
Pages: 1755-1758
Full text: PDFPDF

With the development of dialog techniques, conversational search has attracted more and more attention as it enables users to interact with the search engine in a natural and efficient manner. However, comparing with the natural language understanding ...
Query Understanding via Entity Attribute Identification
Arash Dargahi Nobari, Arian Askari, Faegheh Hasibi, Mahmood Neshati
Pages: 1759-1762
Full text: PDFPDF

Understanding searchers' queries is an essential component of semantic search systems. In many cases, search queries involve specific attributes of an entity in a knowledge base (KB), which can be further used to find query answers. In this study, we ...
Ready for Use: Subject-Independent Movement Intention Recognition via a Convolutional Attention Model
Dalin Zhang, Lina Yao, Kaixuan Chen, Sen Wang
Pages: 1763-1766
Full text: PDFPDF

Brain-Computer Interface (BCI) enables human to communicate with and intuitively control an external device through brain signals. Movement intention recognition paves the path for developing BCI applications. The current state-of-the-art in EEG based ...
Recommender Systems with Characterized Social Regularization
Tzu-Heng Lin, Chen Gao, Yong Li
Pages: 1767-1770
Full text: PDFPDF

Social recommendation, which utilizes social relations to enhance recommender systems, has been gaining increasing attention recently with the rapid development of online social network. Existing social recommendation methods are based on the fact that ...
Recommending Serendipitous Items using Transfer Learning
Gaurav Pandey, Denis Kotkov, Alexander Semenov
Pages: 1771-1774
Full text: PDFPDF

Most recommender algorithms are designed to suggest relevant items, but suggesting these items does not always result in user satisfaction. Therefore, the efforts in recommender systems recently shifted towards serendipity, but generating serendipitous ...
A Recurrent Neural Network for Sentiment Quantification
Andrea Esuli, Alejandro Moreo Fernández, Fabrizio Sebastiani
Pages: 1775-1778
Full text: PDFPDF

Quantification is a supervised learning task that consists in predicting, given a set of classes C and a set D of unlabelled items, the prevalence (or relative frequency) p_c(D) of each class c\in\mathcalC in D. Quantification can in principle be solved ...
Re-evaluating Embedding-Based Knowledge Graph Completion Methods
Farahnaz Akrami, Lingbing Guo, Wei Hu, Chengkai Li
Pages: 1779-1782
Full text: PDFPDF

Incompleteness of large knowledge graphs (KG) has motivated many researchers to propose methods to automatically find missing edges in KGs. A promising approach for KG completion (link prediction) is embedding a KG into a continuous vector space. There ...
Re-ranking Web Search Results for Better Fact-Checking: A Preliminary Study
Khaled Yasser, Mucahid Kutlu, Tamer Elsayed
Pages: 1783-1786
Full text: PDFPDF

Even though Web search engines play an important role in finding documents relevant to user queries, there is little to no attention given to how they perform in terms of usefulness for fact-checking claims. In this paper, we introduce a new research ...
Sci-Blogger: A Step Towards Automated Science Journalism
Raghuram Vadapalli, Bakhtiyar Syed, Nishant Prabhu, Balaji Vasan Srinivasan, Vasudeva Varma
Pages: 1787-1790
Full text: PDFPDF

Science journalism is the art of conveying a detailed scientific research paper in a form that non-scientists can understand and appreciate while ensuring that its underlying information is conveyed accurately. It plays a crucial role in making scientific ...
Semi-Supervised Collaborative Learning for Social Spammer and Spam Message Detection in Microblogging
Fangzhao Wu, Chuhan Wu, Junxin Liu
Pages: 1791-1794
Full text: PDFPDF

It is important to detect social spammers and spam messages in microblogging platforms. Existing methods usually handle the detection of social spammers and spam messages as two separate tasks using supervised learning techniques. However, labeled samples ...
A Sequential Neural Information Diffusion Model with Structure Attention
Zhitao Wang, Chengyao Chen, Wenjie LI
Pages: 1795-1798
Full text: PDFPDF

In this paper, we propose a novel sequential neural network with structure attention to model information diffusion. The proposed model explores both sequential nature of an information diffusion process and structural characteristics of user connection ...
A Supervised Learning Framework for Prediction of Incompatible Herb Pair in Traditional Chinese Medicine
Jiajing Zhu, Yongguo Liu, Shangming Yang, Shuangqing Zhai, Yi Zhang, Chuanbiao Wen
Pages: 1799-1802
Full text: PDFPDF

Adverse drug-drug interaction has been a critical issue for the development of drugs. In Traditional Chinese Medicine, adverse herb-herb interaction is a negative reaction in patients after the absorption of decoction of Incompatible Herb Pair (IHP). ...
TED-KISS: A Known-Item Speech Video Search Benchmark
Fan Fang, Bo-Wen Zhang, Xu-Cheng Yin, Hai-Xia Man, Fang Zhou
Pages: 1803-1806
Full text: PDFPDF

Known-item search is an everyday natural scenario that we search for a specific thing (maybe a song) while only remembering some details about it. Existing benchmarks generally focus on brief user requests which specify some metadata like the title, ...
TEQUILA: Temporal Question Answering over Knowledge Bases
Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, Gerhard Weikum
Pages: 1807-1810
Full text: PDFPDF

Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed here, is that of temporal questions, where cues for temporal relations need to ...
Toward Automated Multiparty Privacy Conflict Detection
Haoti Zhong, Anna Squicciarini, David Miller
Pages: 1811-1814
Full text: PDFPDF

In an effort to support users' decision making process in regards to shared and co-managed online images, in this paper we present a novel model to early detect images which may be subject to possible conflicting access control decisions. We present ...
Towards a Quantum-Inspired Framework for Binary Classification
Prayag Tiwari, Massimo Melucci
Pages: 1815-1818
Full text: PDFPDF

Machine Learning models learn the relationship between input and output by examples and then apply the learned models to relate unseen input. Although ML has successfully been used in almost every field, there is always room for improvement. To this ...
Towards Explainable Networked Prediction
Liangyue Li, Hanghang Tong, Huan Liu
Pages: 1819-1822
Full text: PDFPDF

Networked prediction has attracted lots of research attention in recent years. Compared with the traditional learning setting, networked prediction is even harder to understand due to its coupled, \em multi-level nature. The learning process propagates ...
Towards Partition-Aware Lifted Inference
Melisachew Wudage Chekol, Heiner Stuckenschmidt
Pages: 1823-1826
Full text: PDFPDF

There is an ever increasing number of rule learning algorithms and tools for automatic knowledge base (KB) construction. These tools often produce weighted rules and facts that make up a probabilistic KB (PKB). In such a PKB, probabilistic inference ...
Unsupervised Evaluation of Text Co-clustering Algorithms Using Neural Word Embeddings
François Role, Stanislas Morbieu, Mohamed Nadif
Pages: 1827-1830
Full text: PDFPDF

Text clustering, which allows to divide a dataset into groups of similar documents, plays an important role at various stages of the information retrieval process. Co-clustering is an extension of one-side clustering, and consists in simultaneously clustering ...
User Identification with Spatio-Temporal Awareness across Social Networks
Xing Gao, Wenli Ji, Yongjun Li, Yao Deng, Wei Dong
Pages: 1831-1834
Full text: PDFPDF

User Identification with Spatio-Temporal awareness has been attracting much attention from academia. The existing methods not only handle temporal and spatial data separately but also do not consider conflictive check-in records. To tackle these problems, ...
Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance
Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, Mandar Mitra
Pages: 1835-1838
Full text: PDFPDF

Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. ...
Variational Recurrent Model for Session-based Recommendation
Zhitao Wang, Chengyao Chen, Ke Zhang, Yu Lei, Wenjie LI
Pages: 1839-1842
Full text: PDFPDF

Session-based recommendation performance has been significantly improved by Recurrent Neural Networks (RNN). However, existing RNN-based models do not expose the global knowledge of frequent click patterns or consider variability of sequential behaviors ...
vec2Link: Unifying Heterogeneous Data for Social Link Prediction
Fan Zhou, Bangying Wu, Yi Yang, Goce Trajcevski, Kunpeng Zhang, Ting Zhong
Pages: 1843-1846
Full text: PDFPDF

Recent advances in network representation learning have enabled significant improvements in the link prediction task, which is at the core of many downstream applications. As an increasing amount of mobility data becoming available due to the development ...
W2E: A Worldwide-Event Benchmark Dataset for Topic Detection and Tracking
Tuan-Anh Hoang, Khoi Duy Vo, Wolfgang Nejdl
Pages: 1847-1850
Full text: PDFPDF

Topic detection and tracking in document streams is a critical task in many important applications, hence has been attracting research interest in recent decades. With the large size of data streams, there have been a number of works from different approaches ...
Weakly-Supervised Generative Adversarial Nets with Auxiliary Information for Wireless Coverage Estimation
Zhuo Li, Hongwei Wang, Miao Zhao
Pages: 1851-1854
Full text: PDFPDF

Wireless coverage is the received signal strength of a particular region, which is a key prerequisite to provide high quality mobile communication service. In this paper, we aim to estimate the wireless coverage of an area based on the randomly distributed ...
Weave&Rec: A Word Embedding based 3-D Convolutional Network for News Recommendation
Dhruv Khattar, Vaibhav Kumar, Vasudeva Varma, Manish Gupta
Pages: 1855-1858
Full text: PDFPDF

An effective news recommendation system should harness the historical information of the user based on her interactions as well as the content of the articles. In this paper we propose a novel deep learning model for news recommendation which utilizes ...
Word-Driven and Context-Aware Review Modeling for Recommendation
Qianqian Wang, Si Li, Guang Chen
Pages: 1859-1862
Full text: PDFPDF

Recently, convolutional neural networks(CNNs) has been demonstrated to effectively model reviews in recommender systems, due to the learning of contextual features such as surrounding words and word order for reviews. However, CNNs with max-pooling fails ...
SESSION: Demonstration Papers
ASTRO: A Datalog System for Advanced Stream Reasoning
Ariyam Das, Sahil M. Gandhi, Carlo Zaniolo
Pages: 1863-1866
Full text: PDFPDF

The rise of the Internet of Things (IoT) and the recent focus on a gamut of 'Smart City' initiatives world-wide have pushed for new advances in data stream systems to (1) support complex analytics and evolving graph applications as continuous queries, ...
BBoxDB - A Scalable Data Store for Multi-Dimensional Big Data
Jan Kristof Nidzwetzki, Ralf Hartmut Güting
Pages: 1867-1870
Full text: PDFPDF

BBoxDB is a distributed and highly available key-bounding-box-value store which enhances the classical key-value data model with an axis-parallel bounding box. The bounding box describes the location of the values in an n-dimensional space, and enables ...
Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora
Tim Repke, Ralf Krestel, Jakob Edding, Moritz Hartmann, Jonas Hering, Dennis Kipping, Hendrik Schmidt, Nico Scordialo, Alexander Zenner
Pages: 1871-1874
Full text: PDFPDF

The large amount of heterogeneous data in these email corpora renders experts' investigations by hand infeasible. Auditors or journalists, e.g., who are looking for irregular or inappropriate content or suspicious patterns, are in desperate need for ...
CAPatternMiner: Mining Ship Collision Avoidance Behavior from AIS Trajectory Data
Po-Ruey Lei, Li-Pin Xiao, Yu-Ting Wen, Wen-Chih Peng
Pages: 1875-1878
Full text: PDFPDF

The improvement of collision avoidance for ship navigation in encounter situation is an important topic in maritime traffic safety. Most research on maritime collision avoidance has focused on planning a safe path for a ship to keep away from the approaching ...
CEC: Constraints based Explanation for Classifications
Daniel Deutch, Nave Frost
Pages: 1879-1882
Full text: PDFPDF

Explaining the results of data-intensive computation via provenance has been extensively studied in the literature. We focus here on explaining the output of Machine Learning Classifiers, which are main components of many contemporary Data Science applications. ...
CurEx: A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text
Michael Loster, Felix Naumann, Jan Ehmueller, Benjamin Feldmann
Pages: 1883-1886
Full text: PDFPDF

The integration of diverse structured and unstructured information sources into a unified, domain-specific knowledge base is an important task in many areas. A well-maintained knowledge base enables data analysis in complex scenarios, such as risk analysis ...
Demonstration of GenoMetric Query Language
Stefano Ceri, Arif Canakoglu, Andrea Gulino, Abdulrahman Kaitoua, Marco Masseroli, Luca Nanni, Pietro Pinoli
Pages: 1887-1890
Full text: PDFPDF

In the last ten years, genomic computing has made gigantic steps due to Next Generation Sequencing (NGS), a high-throughput, massively parallel technology; the cost of producing a complete human sequence dropped to 1000 US$ in 2015 and is expected to ...
DEVES: Interactive Signal Analytics for Drug Safety
Tabassum Kakar, Xiao Qin, Andrew Schade, Brian McCarthy, Huy Quoc Tran, Brian Zylich, Elke Rundensteiner, Lane Harrison, Sanjay K. Sahoo, Suranjan De
Pages: 1891-1894
Full text: PDFPDF

Drug-drug interaction related adverse events (DIAE) signals are a major public health issue. Drug safety analysts must sift through thousands of adverse event reports submitted daily to U.S. Food and Drug Administration (FDA) to discover unexpected DIAE ...
Distributed Ledger Technology for Document and Workflow Management in Trade and Logistics
Ziyuan Wang, Dain Yap Liffman, Dileban Karunamoorthy, Ermyas Abebe
Pages: 1895-1898
Full text: PDFPDF

Logistics management plays a crucial role in the execution and success of international trades. Current solutions for logistics management suffer from a number of issues. One of the major areas for improvement is inconsistent data and lack of trust and ...
Every Word has its History: Interactive Exploration and Visualization of Word Sense Evolution
Adam Jatowt, Ricardo Campos, Sourav S. Bhowmick, Nina Tahmasebi, Antoine Doucet
Pages: 1899-1902
Full text: PDFPDF

Human language constantly evolves due to the changing world and the need for easier forms of expression and communication. Our knowledge of language evolution is however still fragmentary despite significant interest of both researchers as well as wider ...
Exploring Diversified Similarity with Kundaha
Lucio F. D. Santos, Gustavo Blanco, Daniel de Oliveira, Agma J. M. Traina, Caetano Traina Jr., Marcos V. N. Bedo
Pages: 1903-1906
Full text: PDFPDF

Exploring large medical image sets by means of traditional similarity query criteria (e.g., neighborhood) can be fruitless if retrieved images are too similar among themselves. This demonstration introduces Kundaha, an exploration tool that assists experts ...
Fouilla: Navigating DBpedia by Topic
Tanguy Raynaud, Julien Subercaze, Frédérique Laforest
Pages: 1907-1910
Full text: PDFPDF

Navigating large knowledge bases made of billions of triples is very challenging. In this demonstration, we showcase Fouilla, a topical Knowledge Base browser that offers a seamless navigational experience of DBpedia. We propose an original approach ...
From Copernicus Big Data to Big Information and Big Knowledge: A Demo from the Copernicus App Lab Project
Konstantina Bereta, Hervé Caumont, Erwin Goor, Manolis Koubarakis, Despina-Athanasia Pantazi, George Stamoulis, Sam Ubels, Valentijn Venus, Firman Wahyudi
Pages: 1911-1914
Full text: PDFPDF

Copernicus is the European program for monitoring the Earth. It consists of a set of complex systems that collect data from satellites and in-situ sensors, process this data and provide users with reliable and up-to-date information on a range of environmental ...
I4TSRS: A System to Assist a Data Engineer in Time-Series Dimensionality Reduction in Industry 4.0 Scenarios
Kevin Villalobos, Borja Diez, Arantza Illarramendi, Alfredo Goñi, José Miguel Blanco
Pages: 1915-1918
Full text: PDFPDF

The massive captured data from industrial sensors (time-series data) that could serve as relevant indicators for predictive maintenance of equipment, fault diagnosis, etc. is generating a problem related to the considerable costs associated with their ...
IM Balanced: Influence Maximization Under Balance Constraints
Shay Gershtein, Tova Milo, Brit Youngmann, Gal Zeevi
Pages: 1919-1922
Full text: PDFPDF

Influence Maximization (IM) is the problem of finding a set of influential users in a social network, so that their aggregated influence is maximized. IM has natural applications in viral marketing and has been the focus of extensive recent research. ...
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
Petr Sojka, Michal Růžička, Vít Novotný
Pages: 1923-1926
Full text: PDFPDF

Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable ...
Ontop-temporal: A Tool for Ontology-based Query Answering over Temporal Data
Elem Güzel Kalayci, Guohui Xiao, Vladislav Ryzhikov, Tahir Emre Kalayci, Diego Calvanese
Pages: 1927-1930
Full text: PDFPDF

We present Ontop-temporal, an extension of the ontology-based data access system Ontop for query answering with temporal data and ontologies. Ontop is a system to answer SPARQL queries over various data stores, using standard R2RML mappings and an OWL2QL ...
Preference-driven Interactive Ranking System for Personalized Decision Support
Caitlin Kuhlman, MaryAnn VanValkenburg, Diana Doherty, Malika Nurbekova, Goutham Deva, Zarni Phyo, Elke Rundensteiner, Lane Harrison
Pages: 1931-1934
Full text: PDFPDF

Manually constructing rankings is a tedious ad-hoc process, requiring extensive user effort to evaluate data attribute importance, and often leading to undesirable results. Meanwhile, sophisticated learning-to-rank algorithms are able to leverage large ...
Preserving Privacy of Fraud Detection Rule Sharing Using Intel's SGX
Daniel Deutch, Yehonatan Ginzberg, Tova Milo
Pages: 1935-1938
Full text: PDFPDF

The collaboration of financial institutes against fraudsters is a promising path for reducing resource investments and increasing coverage. Yet, such collaboration is held back by two somewhat conflicting challenges: effective knowledge sharing and limiting ...
searchrefiner: A Query Visualisation and Understanding Tool for Systematic Reviews
Harrisen Scells, Guido Zuccon
Pages: 1939-1942
Full text: PDFPDF

We present an open source tool, searchrefiner, for researchers that conduct medical systematic reviews to assist in formulating, visualising, and understanding Boolean queries. The searchrefiner web interface allows researchers to explore how Boolean ...
Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems
Issac Buenrostro, Abhishek Tiwari, Vasanth Rajamani, Erman Pattuk, Zhixiong Chen
Pages: 1943-1946
Full text: PDFPDF

Strong member privacy in technology enterprises involves, among other objectives, deleting or anonymizing various kinds of data that a company controls. Those requirements are complicated in a heterogeneous data ecosystem where data is stored in multiple ...
SOURCERY: User Driven Multi-Criteria Source Selection
Edward Abel, John A. Keane, Norman W. Paton, Alvaro A.A. Fernandes, Martin Koehler, Nikolaos Konstantinou, Nurzety A. Azuan, Suzanne M. Embury
Pages: 1947-1950
Full text: PDFPDF

Data scientists are usually interested in a subset of sources with properties that are most aligned to intended data use. The SOURCERY system supports interactive multi-criteria user-driven source selection. SOURCERY allows a user to identify criteria ...
Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets
Oleksandra Levchenko, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Boyan Kolev, Dennis Shasha
Pages: 1951-1954
Full text: PDFPDF

A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, ...
Traffic-Cascade: Mining and Visualizing Lifecycles of Traffic Congestion Events Using Public Bus Trajectories
Agus Trisnajaya Kwee, Meng-Fen Chiang, Philips Kokoh Prasetyo, Ee-Peng Lim
Pages: 1955-1958
Full text: PDFPDF

As road transportation supports both economic and social activities in developed cities, it is important to maintain smooth traffic on all highways and local roads. Whenever possible, traffic congestions should be detected early and resolved quickly. ...
X-Rank: Explainable Ranking in Complex Multi-Layered Networks
Jian Kang, Scott Freitas, Haichao Yu, Yinglong Xia, Nan Cao, Hanghang Tong
Pages: 1959-1962
Full text: PDFPDF

In this paper we present a web-based prototype for an explainable ranking algorithm in multi-layered networks, incorporating both network topology and knowledge information. While traditional ranking algorithms such as PageRank and HITS are important ...
YourDigitalSelf: A Personal Digital Trace Integration Tool
Varvara Kalokyri, Alexander Borgida, Amélie Marian
Pages: 1963-1966
Full text: PDFPDF

Personal information is typically fragmented across multiple, heterogeneous, distributed sources and saved as small, heterogeneous data objects, or traces. The DigitalSelf project at Rutgers University focuses on developing tools and techniques to manage ...
SESSION: Industry and Case Study Papers
Automatic Conversational Helpdesk Solution using Seq2Seq and Slot-filling Models
Mayur Patidar, Puneet Agarwal, Lovekesh Vig, Gautam Shroff
Pages: 1967-1975
Full text: PDFPDF

Helpdesk is a key component of any large IT organization, where users can log a ticket about any issue they face related to IT infrastructure, administrative services, human resource services, etc. Normally, users have to assign appropriate set of labels ...
Behavior-based Community Detection: Application to Host Assessment In Enterprise Information Networks
Cheng Cao, Zhengzhang Chen, James Caverlee, Lu-An Tang, Chen Luo, Zhichun Li
Pages: 1977-1985
Full text: PDFPDF

Community detection in complex networks is a fundamental problem that attracts much attention across various disciplines. Previous studies have been mostly focusing on external connections between nodes (i.e., topology structure) in the network whereas ...
Collaborative Alert Ranking for Anomaly Detection
Ying Lin, Zhengzhang Chen, Cheng Cao, Lu-An Tang, Kai Zhang, Wei Cheng, Zhichun Li
Pages: 1987-1995
Full text: PDFPDF

Given a large number of low-quality heterogeneous categorical alerts collected from an anomaly detection system, how to characterize the complex relationships between different alerts and deliver trustworthy rankings to end users? While existing techniques ...
A Combined Representation Learning Approach for Better Job and Skill Recommendation
Vachik S. Dave, Baichuan Zhang, Mohammad Al Hasan, Khalifeh AlJadda, Mohammed Korayem
Pages: 1997-2005
Full text: PDFPDF

Job recommendation is an important task for the modern recruitment industry. An excellent job recommender system not only enables to recommend a higher paying job which is maximally aligned with the skill-set of the current job, but also suggests to ...
Deep Graph Embedding for Ranking Optimization in E-commerce
Chen Chu, Zhao Li, Beibei Xin, Fengchao Peng, Chuanren Liu, Remo Rohs, Qiong Luo, Jingren Zhou
Pages: 2007-2015
Full text: PDFPDF

Matching buyers with most suitable sellers providing relevant items (e.g., products) is essential for e-commerce platforms to guarantee customer experience. This matching process is usually achieved through modeling inter-group (buyer-seller) proximity ...
"Deep" Learning for Missing Value Imputationin Tables with Non-Numerical Data
Felix Biessmann, David Salinas, Sebastian Schelter, Philipp Schmidt, Dustin Lange
Pages: 2017-2025
Full text: PDFPDF

The success of applications that process data critically depends on the quality of the ingested data. Completeness of a data source is essential in many cases. Yet, most missing value imputation approaches suffer from severe limitations. They are almost ...
DeepAuth: A Framework for Continuous User Re-authentication in Mobile Apps
Sara Amini, Vahid Noroozi, Amit Pande, Satyajit Gupte, Philip S. Yu, Chris Kanich
Pages: 2027-2035
Full text: PDFPDF

With the increasing volume of transactions taking place online, mobile fraud has also increased. Mobile applications often authenticate the user only at install time. The user may then remain logged in for hours or weeks. Any unauthorized access may ...
Device-Aware Rule Recommendation for the Internet of Things
Beidou Wang, Xin Guo, Martin Ester, Ziyu Guan, Bandeep Singh, Yu Zhu, Jiajun Bu, Deng Cai
Pages: 2037-2045
Full text: PDFPDF

With over 34 billion IoT devices to be installed by 2020, the Internet of Things (IoT) is fundamentally changing our lives. One of the greatest benefits of the IoT is the powerful automations achieved by applying rules to IoT devices. For instance, a ...
A Fast Linear Computational Framework for User Action Prediction in Tencent MyApp
Yaochen Hu, Di Niu, Jianming Yang
Pages: 2047-2055
Full text: PDFPDF

User action modeling and prediction has long been a topic of importance to recommender systems and user profiling. The quality of the model or accuracy of prediction plays a vital role in related applications like recommendation, advertisement displaying, ...
FastInput: Improving Input Efficiency on Mobile Devices
Jingyuan Zhang, Xin Wang, Yue Feng, Mingming Sun, Ping Li
Pages: 2057-2065
Full text: PDFPDF

Mobile devices (e.g., smartphones) play a crucial role in our daily lives nowadays. People rely heavily on mobile devices for searching online, sending emails, chatting with friends, etc. As a result, input efficiency becomes increasingly important for ...
A Globalization-Semantic Matching Neural Network for Paraphrase Identification
Miao Fan, Wutao Lin, Yue Feng, Mingming Sun, Ping Li
Pages: 2067-2075
Full text: PDFPDF

Paraphrase identification (PI) aims at determining whether two natural language sentences roughly have identical meaning. PI has been conventionally formalized as a binary classification task and widely used in many talks such as text summarization, ...
Heterogeneous Graph Neural Networks for Malicious Account Detection
Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, Le Song
Pages: 2077-2085
Full text: PDFPDF

We present, GEM, the first heterogeneous graph neural network approach for detecting malicious accounts at Alipay, one of the world's leading mobile cashless payment platform. Our approach, inspired from a connected subgraph approach, adaptively learns ...
Image Matters: Visually Modeling User Behaviors Using Advanced Model Server
Tiezheng Ge, Liqin Zhao, Guorui Zhou, Keyu Chen, Shuying Liu, Huimin Yi, Zelin Hu, Bochao Liu, Peng Sun, Haoyu Liu, Pengtao Yi, Sui Huang, Zhiqiang Zhang, Xiaoqiang Zhu, Yu Zhang, Kun Gai
Pages: 2087-2095
Full text: PDFPDF

In Taobao, the largest e-commerce platform in China, billions of items are provided and typically displayed with their images.For better user experience and business effectiveness, Click Through Rate (CTR) prediction in online advertising system exploits ...
Inferring Trip Occupancies in the Rise of Ride-Hailing Services
Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Tuan-Anh Hoang
Pages: 2097-2105
Full text: PDFPDF

The knowledge of all occupied and unoccupied trips made by self-employed drivers are essential for optimized vehicle dispatch by ride-hailing services (e.g., Didi Dache, Uber, Lyft, Grab, etc.). However, the occupancy status of vehicles is not always ...
In-Session Personalization for Talent Search
Sahin Cem Geyik, Vijay Dialani, Meng Meng, Ryan Smith
Pages: 2107-2115
Full text: PDFPDF

Previous efforts in recommendation of candidates for talent search followed the general pattern of receiving an initial search criteria and generating a set of candidates utilizing a pre-trained model. Traditionally, the generated recommendations are ...
Investigating Rumor News Using Agreement-Aware Search
Jingbo Shang, Jiaming Shen, Tianhang Sun, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam D. Lelkes, Cong Yu, Jiawei Han
Pages: 2117-2125
Full text: PDFPDF

Recent years have witnessed a widespread increase of rumor news generated by humans and machines. Therefore, tools for investigating rumor news have become an urgent necessity. One useful function of such tools is to see ways a specific topic or event ...
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler
Pages: 2127-2135
Full text: PDFPDF

User information needs vary significantly across different tasks, and therefore their queries will also differ considerably in their expressiveness and semantics. Many studies have been proposed to model such query diversity by obtaining query types ...
Network-based Receivable Financing
Ilaria Bordino, Francesco Gullo
Pages: 2137-2145
Full text: PDFPDF

Receivable financing -- the process whereby cash is advanced to firms against receivables their customers have yet to pay -- is a well-established funding source for businesses. In this paper we present a novel, collaborative approach to receivable financing: ...
Optimizing Boiler Control in Real-Time with Machine Learning for Sustainability
Yukun Ding, Jinglan Liu, Jinjun Xiong, Meng Jiang, Yiyu Shi
Pages: 2147-2154
Full text: PDFPDF

In coal-fired power plants, it is critical to improve the operational efficiency of boilers for sustainability. In this work, we formulate real-time boiler control as an optimization problem that looks for the best distribution of temperature in different ...
Optimizing Generalized Linear Models with Billions of Variables
Yanbo Liang, Yongyang Yu, Mingjie Tang, Chaozhuo Li, Weiqing Yang, Weichen Xu, Ruifeng Zheng
Pages: 2155-2163
Full text: PDFPDF

The use of large-scale machine learning~(ML) is becoming ubiquitous in various domains ranging from business intelligence to self-driving cars. Many companies are building ML pipelines in a unified data processing environment, and leveraging well-tuned ...
Practical Diversified Recommendations on YouTube with Determinantal Point Processes
Mark Wilhelm, Ajith Ramanathan, Alexander Bonomo, Sagar Jain, Ed H. Chi, Jennifer Gillenwater
Pages: 2165-2173
Full text: PDFPDF

Many recommendation systems produce result sets with large numbers of highly similar items. Diversifying these results is often accomplished with heuristics, which are impoverished models of users' desire for diversity. However, integrating more complex ...
Predictive Analysis by Leveraging Temporal User Behavior and User Embeddings
Charles Chen, Sungchul Kim, Hung Bui, Ryan Rossi, Eunyee Koh, Branislav Kveton, Razvan Bunescu
Pages: 2175-2182
Full text: PDFPDF

The rapid growth of mobile devices has resulted in the generation of a large number of user behavior logs that contain latent intentions and user interests. However, exploiting such data in real-world applications is still difficult for service providers ...
PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
Krishnaram Kenthapadi, Thanh T. L. Tran
Pages: 2183-2191
Full text: PDFPDF

Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable ...
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, Weinan Zhang
Pages: 2193-2201
Full text: PDFPDF

Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize specific goals such as maximizing revenue and return on investment (ROI) led by ad placements, advertisers not only need to estimate the relevance between ...
Recurrent Spatio-Temporal Point Process for Check-in Time Prediction
Guolei Yang, Ying Cai, Chandan K. Reddy
Pages: 2203-2211
Full text: PDFPDF

We introduce a new problem, namely, check-in time prediction where the goal is to predict the time when a given user will check-in to a location of interest. We design a novel Recurrent Spatio-Temporal Point Process (RSTPP) model for check-in time prediction. ...
Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases
Yuhang Zhang, Kee Siong Ng, Tania Churchill, Peter Christen
Pages: 2213-2221
Full text: PDFPDF

Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation ...
tHoops: A Multi-Aspect Analytical Framework for Spatio-Temporal Basketball Data
Evangelos Papalexakis, Konstantinos Pelechrinis
Pages: 2223-2232
Full text: PDFPDF

During the past few years advancements in sports information systems and technology has allowed the collection of a number of detailed spatio-temporal data that capture various aspects of basketball. For example, shot charts, that is, maps capturing ...
The Title Says It All: A Title Term Weighting Strategy For eCommerce Ranking
Anthony Bell, Prathyusha Senthil Kumar, Daniel Miranda
Pages: 2233-2241
Full text: PDFPDF

Search relevance is a very critical component in e-commerce applications. One of the strongest signals that determine the relevance of an item listing to an e-commerce query is the title of the item. Traditional methods for capturing this signal compare ...
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, Fernando Diaz
Pages: 2243-2251
Full text: PDFPDF

Two-sided marketplaces are platforms that have customers not only on the demand side (e.g. users), but also on the supply side (e.g. retailer, artists). While traditional recommender systems focused specifically towards increasing consumer satisfaction ...
Towards Deep and Representation Learning for Talent Search at LinkedIn
Rohan Ramanath, Hakan Inan, Gungor Polatkan, Bo Hu, Qi Guo, Cagri Ozcaglar, Xianren Wu, Krishnaram Kenthapadi, Sahin Cem Geyik
Pages: 2253-2261
Full text: PDFPDF

Talent search and recommendation systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter or a hiring manager expressed in terms of a search query or a job posting. Recent work in this domain has mainly focused ...
Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets
Jianglei Han, Ka Hian Goh, Aixin Sun, Mohammad Akbari
Pages: 2263-2271
Full text: PDFPDF

Software support tickets contain short and noisy text from the customers. Software products are often represented by various surface forms and informal abbreviations. Automatically identifying software mentions from support tickets and determining the ...
A Two-Layer Algorithmic Framework for Service Provider Configuration and Planning with Optimal Spatial Matching
Xijun Li, Jianguo Yao, Mingxuan Yuan, Jia Zeng
Pages: 2273-2281
Full text: PDFPDF

Industrial telecommunication applications prefer to run at the optimal capacity configuration to achieve the required Quality of Service (QoS) at the minimum cost. The optimal capacity configuration is usually achieved through the selection of cell towers ...
Web-based Startup Success Prediction
Boris Sharchilev, Michael Roizner, Andrey Rumyantsev, Denis Ozornin, Pavel Serdyukov, Maarten de Rijke
Pages: 2283-2291
Full text: PDFPDF

We consider the problem of predicting the success of startup companies at their early development stages. We formulate the task as predicting whether a company that has already secured initial (seed or angel) funding will attract a further round of investment ...
SESSION: Tutorials
From Big Data to Big Information and Big Knowledge: the Case of Earth Observation Data
Konstantina Bereta, Manolis Koubarakis, Stefan Manegold, George Stamoulis, Begüm Demir
Pages: 2293-2294
Full text: PDFPDF

Some particularly important rich sources of open and free big geospatial data are the Earth observation (EO) programs of various countries such as the Landsat program of the US and the Copernicus programme of the European Union. EO data is a paradigmatic ...
GraphRep: Boosting Text Mining, NLP and Information Retrieval with Graphs
Michalis Vazirgiannis, Fragkiskos D. Malliaros, Giannis Nikolentzos
Pages: 2295-2296
Full text: PDFPDF

Graphs have been widely used as modeling tools in Natural Language Processing (NLP), Text Mining (TM) and Information Retrieval (IR). Traditionally, the unigram bag-of-words representation is applied; that way, a document is represented as a multiset ...
Incremental Techniques for Large-Scale Dynamic Query Processing
Iman Elghandour, Ahmet Kara, Dan Olteanu, Stijn Vansummeren
Pages: 2297-2298
Full text: PDFPDF

Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years. Traditional approaches rely on updating the results ...
Knowledge Representation as Linked Data: Tutorial
Joachim Van Herwegen, Pieter Heyvaert, Ruben Taelman, Ben De Meester, Anastasia Dimou
Pages: 2299-2300
Full text: PDFPDF

The process of extracting, structuring, and organizing knowledge requires processing large and originally heterogeneous data sources. Offering existing data as Linked Data increases its shareability, extensibility, and reusability. However, using Linking ...
Multi-model Databases and Tightly Integrated Polystores: Current Practices, Comparisons, and Open Challenges
Jiaheng Lu, Irena Holubová, Bogdan Cautis
Pages: 2301-2302
Full text: PDFPDF

One of the most challenging issues in the era of Big Data is the Variety of the data. In general, there are two solutions to directly manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware ...
Semantic Technologies for Data Access and Integration
Diego Calvanese, Guohui Xiao
Pages: 2303-2304
Full text: PDFPDF

Recently, semantic technologies have been successfully deployed to overcome the typical difficulties in accessing and integrating data stored in different kinds of legacy sources. In particular, knowledge graphs are being used as a mechanism to provide ...
Unbiased Learning to Rank: Theory and Practice
Qingyao Ai, Jiaxin Mao, Yiqun Liu, W. Bruce Croft
Pages: 2305-2306
Full text: PDFPDF

Implicit feedback (e.g., user clicks) is an important source of data for modern search engines. While heavily biased [8, 9, 11, 27], it is cheap to collect and particularly useful for user-centric retrieval applications such as search ranking. To develop ...
User Group Analytics: Discovery, Exploration and Visualization
Behrooz Omidvar-Tehrani, Sihem Amer-Yahia
Pages: 2307-2308
Full text: PDFPDF

User data is becoming increasingly available in various domains from the social Web to patient health records. User data is characterized by a combination of demographics (e.g., age, gender, occupation) and user actions (e.g., rating a movie, following ...
SESSION: CIKM 2018 Co-Located Workshops Summary
CIKM 2018 Co-Located Workshops Summary
Alfredo Cuzzocrea, Francesco Bonchi, Dimitris Gunopulos
Pages: 2309-2311
Full text: PDFPDF

This paper provides an overview of the workshops co-located with the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), held during October 22-26, 2018 in Turin, Italy.

Powered by The ACM Guide to Computing Literature

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us