Concepts inDetecting splogs via temporal dynamics using self-similarity analysis
Spam blog
A spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads. The purpose of a splog can be to increase the PageRank or backlink portfolio of affiliate websites, to artificially inflate paid ad impressions from visitors, and/or use the blog as a link outlet to sell links or get new sites indexed.
more from Wikipedia
Self-similarity
In mathematics, a self-similar object is exactly or approximately similar to a part of itself (i.e. the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically self-similar: parts of them show the same statistical properties at many scales. Self-similarity is a typical property of fractals.
more from Wikipedia
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
more from Wikipedia
Semantic similarity
Semantic similarity or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.
more from Wikipedia
Linear discriminant analysis
Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
more from Wikipedia
Feature vector
In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object. Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, when representing texts perhaps to term occurrence frequencies.
more from Wikipedia
Web search engine
A web search engine is designed to search for information on the World Wide Web. The search results are generally presented in a list of results often referred to as search engine results pages (SERPs). The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories.
more from Wikipedia
Correlation and dependence
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.
more from Wikipedia