Concepts inAutomatic labeling of multinomial topic models
Latent Dirichlet allocation
In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics.
more from Wikipedia
Mutual information
In probability theory and information theory, the mutual information (sometimes known by the archaic term transinformation) of two random variables is a quantity that measures the mutual dependence of the two random variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used.
more from Wikipedia
Text mining
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
more from Wikipedia
Multinomial distribution
In probability theory, the multinomial distribution is a generalization of the binomial distribution. The binomial distribution is the probability distribution of the number of "successes" in n independent Bernoulli trials, with the same probability of "success" on each trial.
more from Wikipedia
Mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modelling. Mathematical models are used not only in the natural sciences and engineering disciplines, but also in the social sciences; physicists, engineers, statisticians, operations research analysts and economists use mathematical models most extensively.
more from Wikipedia
Topic model
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998. Another one, called Probabilistic latent semantic indexing (PLSI), was created by Thomas Hofmann in 1999.
more from Wikipedia
Set (mathematics)
A set is a collection of well defined and distinct objects, considered as an object in its own right. Sets are one of the most fundamental concepts in mathematics. Developed at the end of the 19th century, set theory is now a ubiquitous part of mathematics, and can be used as a foundation from which nearly all of mathematics can be derived.
more from Wikipedia
Data
Data are values of qualitative or quantitative variables, belonging to a set of items. Data in computing are often represented by a combination of items organized in rows and multiple variables organized in columns. Data are typically the results of measurements and can be visualised using graphs or images. Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derived. Raw data, i.e.
more from Wikipedia