Concepts inEvaluating the novelty of text-mined rules using lexical knowledge
Lexicon
In linguistics, the description of a language is split into two parts, the grammar consisting of rules describing correct sentence formation and the lexicon listing words and phrases that can be used in the sentences. The lexicon of a language is its vocabulary. Statistically, most lexemes contain a single morpheme. Lexemes composed of multiple morpheme also known as compound words such as idiomatic expressions and colocations are also considered part of the lexicon.
more from Wikipedia
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications.
more from Wikipedia
Semantic similarity
Semantic similarity or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.
more from Wikipedia
Word
In language, a word is the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning). This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own. A word may consist of a single morpheme, or several (rocks, redness, quickly, running, unexpected), whereas a morpheme may not be able to stand on its own as a word (in the words just mentioned, these are -s, -ness, -ly, -ing, un-, -ed).
more from Wikipedia
Text mining
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
more from Wikipedia
Data mining
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), a relatively young and interdisciplinary field of computer science, is the process that results in the discovery of new patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.
more from Wikipedia
Correlation and dependence
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.
more from Wikipedia