Concepts inAggregate queries on probabilistic record linkages
Aggregate data
In statistics, aggregate data describes data combined from several measurements. When you aggregate data, you replace groups of observations with summary statistics based on those observations. In economics, aggregate data or data aggregates describes high-level data that is composed from a multitude or combination of other more individual data.
Record linkage
Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g. , data files, books, websites, databases). Record linkage is necessary when joining data sets based on entities that may or may not share a common identifier, as may be the case due to differences in record shape, storage location, and/or curator style or preference.
Counting
Counting is the action of finding the number of elements of a finite set of objects.
Joint probability distribution
In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The equation for joint probability is different for both dependent and independent events.
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data. This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example) domains. Data integration appears with increasing frequency as the volume and the need to share existing data explodes.
Aggregate function
In computer science, an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list. Common aggregate functions include: Average Count Maximum Median Minimum Mode Sum Aggregate functions are common in numerous programming languages such as Ruby, in spreadsheets, and in relational algebra.
