skip to main content
10.1145/2628194acmotherconferencesBook PagePublication PagesideasConference Proceedingsconference-collections
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium
ACM2014 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
IDEAS '14: 18th International Database Engineering & Applications Symposium Porto Portugal July 7 - 9, 2014
ISBN:
978-1-4503-2627-8
Published:
07 July 2014
Sponsors:
ISEP, BytePress, Concordia University

Bibliometrics
Skip Abstract Section
Abstract

Databases are now part of our everyday lives even if at times not explicitly. Organized collections of data provide information, so important in decision making, from the medical area to business. Web mining is important to improve human computer interaction in general and in particular to exploit information available on the Internet. This area benefits from knowledge, concepts and techniques from artificial intelligence, statistics, linguistics and graph theory, among other fields.

Everyday we stumble upon many different kind of data, arising from different sources. The combination of data from several sources, stored using different technologies, provides a unified view of the data and empowers data processing and analysis.

Making data meaningful and worthy in a particular context is an imperative task. The logical structure of data is essential for the correct and efficient storage, organization and processing of data. Current technological developments allow the collection of huge amounts of data that can take decision-making processes to new levels. However, this is only possible if data can be transformed into knowledge. Various kinds of data mining algorithms are used to extract data patterns. The development of data preparation techniques is both a challenging and critical task.

The amount of private and personal data contained in databases has grown radically with the current digitalization of our lives. Moreover, the access to databases is widespread and made easier by the interconnection of information systems. Database systems must be designed in a way that limits the disclosure of private information. Nowadays, business intelligence applications are widely used in organizations and their strategic importance is clearly recognized. The dissemination of data mining tools is increasing in the business intelligence field, as well as the acknowledgement of the relevance of its usage in companies. Also, cloud computing relies on sharing computing resources rather than having local servers or personal devices to handle applications. It enables collaborative work and gives cheaper and continuous access to computational resources.

Automatic data collection and retention of end user actions has become the norm. Typical approaches in mobile crowd sensing applications collect and process sensor data on devices and apply local analytic algorithms to produce consumable data for users. Web crowd-sensing can also contribute with detailed data where proprietary data are extremely costly.

research-article
DARM: a privacy-preserving approach for distributed association rules mining on horizontally-partitioned data

Extracting association rules helps data owners to unveil hidden patterns from their data for the purpose of analyzing and predicting the behavior of their clients. However, mining association rules in a distributed environment is not a trivial task due ...

research-article
Open Access
A method for predicting citations to the scientific publications of individual researchers

Any researcher's publications at any time can be ordered from the highest cited to the lowest cited, yielding a citation curve. We describe a novel method for predicting citation curves of researchers in the future. The method depends on treating the ...

research-article
Visual data integration based on description logic reasoning

Despite many innovative systems supporting the data integration process, designers advocate more abstract metaphors to master the inherent complexity of this activity. In fact, the visual notations provided in many modern data integration systems might ...

research-article
Semantic mediator querying

We present the whole querying process of our ontology-based data integration proposal, that we call Semantic Mediator. The global schema (a TBox) is composed of the source schemas (also Tboxes) and a taxonomy, which links the sources to each other. The ...

research-article
Discovering domain-specific public SPARQL endpoints: a life-sciences use-case

A significant portion of the LOD cloud consists of Life Sciences data sets, which together contain billions of clinical facts that interlink to form a "Web of Clinical Data". However, tools for new publishers to find relevant datasets that could ...

research-article
Mining named entities from search engine query logs

We present a seed expansion based approach to classify named entities in web search queries. Previous approaches to this classification problem relied on contextual clues in the form of keywords surrounding a named entity in the query. Here we propose ...

research-article
Named entities as privileged information for hierarchical text clustering

Text clustering is a text mining task which is often used to aid the organization, knowledge extraction, and exploratory search of text collections. Nowadays, the automatic text clustering becomes essential as the volume and variety of digital text ...

research-article
Multilevel refinement based on neighborhood similarity

The multilevel graph partitioning strategy aims to reduce the computational cost of the partitioning algorithm by applying it on a coarsened version of the original graph. This strategy is very useful when large-scale networks are analyzed. To improve ...

research-article
The state of data

We are currently experiencing an extraordinary acceleration in the growth rate of digital data. One of the reasons for this increase is the digitization of virtually all communications and records. This exponential growth is evidenced by the fact that ...

research-article
A scheme for privacy-preserving ontology mapping

Due to the rapid proliferation of ontology-based information systems and networks, there are strong demands for ontology-mapping in a privacy-aware way. To this problem, in this paper, we propose Privacy-Preserving Quick Ontology Mapping (P2QOM), a ...

research-article
Specifying complex correspondences between relational schemas and RDF models for generating customized R2RML mappings

The W3C RDB2RDF Working Group proposed a standard language to map relational data into RDF triples, called R2RML. However, creating R2RML mappings may sometimes be a difficult task because it involves the creation of views (within the mappings or not) ...

research-article
Ontology-based multi-domain metadata for research data management using triple stores

Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. These are easy to understand and use, but their semantics are limited to general concepts, ...

research-article
Automatic creation of stock market lexicons for sentiment analysis using StockTwits data

Sentiment analysis has been increasingly applied to the stock market domain. In particular, investor sentiment indicators can be used to model and predict stock market variables. In this context, the quality of the sentiment analysis is highly dependent ...

research-article
Dealing with incompleteness and inconsistency in P2P deductive databases

This paper proposes a logic framework for modeling the interaction among incomplete and inconsistent deductive databases in a P2P environment. Each peer joining a P2P system provides or imports data from its neighbors by using a set of mapping rules, ...

research-article
Personalized classifiers: evolving a classifier from a large reference knowledge graph

Identifying the right choice of categories for organizing and representing a large digital library of documents is a challenging task. A completely automated approach to category creation from the underlying collection could be prone to noise. On the ...

research-article
MV-IDX: indexing in multi-version databases

An index in a Multi-Version DBMS (MV-DBMS) has to reflect different tuple versions of a single data item. Existing approaches follow the paradigm of logically separating the tuple version data from the data item, e.g. an index is only allowed to return ...

research-article
A study of machine learning methods for detecting user interest during web sessions

The ability to have an automated real time detection of user interest during a web session is very appealing and can be very useful for a number of web intelligence applications. Low level interaction events associated with user interest manifestations ...

research-article
Improving MMDB distributed transactional concurrency

Main Memory Database Systems (MMDBs) have been studied since the 80s [3,4], when memory was quite costly ($1500 per MByte in 1984). We can now buy memory for about $10 per GByte. An advantage of MMDBs is that serial execution of a non-distributed ...

research-article
Open Access
Condensed representation of frequent itemsets

One of the major problems in pattern mining is still the problem of pattern explosion, i.e., the large amounts of patterns produced by the mining algorithms when analyzing a database with a predefined minimum support threshold. The approach we take to ...

research-article
Survey on open source platform-as-a-service solutions for education

While the cloud computing becomes popular in the industry and companies take advantages of Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) as well as Software-as-a-Service (SaaS) solutions, education is sometimes one step behind. SaaS ...

research-article
RSQL - a query language for dynamic data types

Database Management Systems (DBMS) are used by software applications, to store, manipulate, and retrieve large sets of data. However, the requirements of current software systems pose various challenges to established DBMS. First, most software systems ...

research-article
CloudETL: scalable dimensional ETL for hive

Extract-Transform-Load (ETL) programs process data into data warehouses (DWs). Rapidly growing data volumes demand systems that scale out. Recently, much attention has been given to MapReduce for parallel handling of massive data sets in cloud ...

research-article
A methodology for social BI

Social BI (SBI) is the emerging discipline that aims at combining corporate data with textual user-generated content (UGC) to let decision-makers analyze their business based on the trends perceived from the environment. Despite the increasing diffusion ...

research-article
Optimizing query execution for variable-aligned length compression of bitmap indices

Indexing is a fundamental mechanism for efficient data access. Recently, we proposed the Variable-Aligned Length (VAL) bitmap index encoding framework, which generalizes the commonly used word-aligned compression techniques. VAL presented a variable-...

research-article
A fragmented data-declustering strategy for high skew tolerance and efficient failure recovery

Data declustering is a common technique to improve data I/O performance by retrieving data in parallel from multiple storage nodes. Data-declustering methods with replicated data also increase system availability, reliability and skew tolerance. Current ...

research-article
Optimizing database index performance for solid state drives

As Solid State Disk (SSD) drive technology matures and costs continue to decrease, it is becoming a viable replacement for traditional, rotational hard disk drives. SSDs are based on NAND flash technology, which results in different wear and performance ...

research-article
Algebraic optimization of grouped preference queries

SQL queries containing Group-by are common in data warehouse environments and OLAP. From this the concept of grouped Skyline queries emerged, wherein a Skyline of each group of tuples is requested. Grouped preference queries generalize this kind of ...

short-paper
Portable decision support system for heart failure detection and medical diagnosis

Heart disorders are one of the most problematic issues of human health. There are currently many efforts to reduce the time for first assistance based on electronic systems that continuously records the electric heart activity (ECG), for further ...

short-paper
An experimental evaluation of similarity measures for uncertain time series

Uncertain time series analysis is important in applications such as wireless sensor networks and location-based services. This has been the subject of some recent studies, and a number of solution techniques have been proposed for similarity search ...

short-paper
Integration of linguistic and web information to improve biomedical terminology extraction

Comprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. ...

Contributors
  • Concordia University
  • University of Coimbra, Centre for Informatics and System

Recommendations

Acceptance Rates

Overall Acceptance Rate74of210submissions,35%
YearSubmittedAcceptedRate
IDEAS '20572747%
IDEAS '171023837%
IDEAS '1351918%
Overall2107435%