Concepts inExperiments with non-parametric topic models
Topic model
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998. Another one, called Probabilistic latent semantic indexing (PLSI), was created by Thomas Hofmann in 1999.
more from Wikipedia
Non-parametric statistics
In statistics, the term non-parametric statistics has at least two different meanings: The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others: distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests.
more from Wikipedia
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of random samples from multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult. This sequence can be used to approximate the joint distribution (e.g.
more from Wikipedia
Hierarchical Dirichlet process
In statistics, the hierarchical Dirichlet process is a nonparametric Bayesian approach to modeling grouped data. It uses a Dirichlet process, whose base distribution is itself drawn from a Dirichlet process. This method allows clusters to share statistical strength. If a single Dirichlet process is used instead, no sharing can occur between groups of data as the atoms drawn for each group will be different with probability one.
more from Wikipedia
Language model
A statistical language model assigns a probability to a sequence of m words by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.
more from Wikipedia
Information retrieval
Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis, and technologies.
more from Wikipedia
Symmetric matrix
In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Let A be a symmetric matrix. Then: The entries of a symmetric matrix are symmetric with respect to the main diagonal (top left to bottom right). So if the entries are written as A = (aij), then for all indices i and j. The following 3×3 matrix is symmetric: Every diagonal matrix is symmetric, since all off-diagonal entries are zero.
more from Wikipedia