Concepts inDistributed similarity search in high dimensions using locality sensitive hashing

Locality-sensitive hashing

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items). Note how locality-sensitive hashing, in many ways, mirrors data clustering.
more from Wikipedia

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the physical space commonly modeled with just three dimensions. There are multiple phenomena referred to by this name in domains such as sampling, combinatorics, machine learning and data mining.
more from Wikipedia

Nearest neighbor search

Nearest neighbor search (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. The problem is: given a set S of points in a metric space M and a query point q ¿ M, find the closest point in S to q. In many cases, M is taken to be d-dimensional Euclidean space and distance is measured by Euclidean distance or Manhattan distance. Donald Knuth in vol.
more from Wikipedia

Euclidean space

In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions. The term ¿Euclidean¿ distinguishes these spaces from the curved spaces of non-Euclidean geometry and Einstein's general theory of relativity, and is named for the Greek mathematician Euclid of Alexandria.
more from Wikipedia

Load balancing (computing)

Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy.
more from Wikipedia

Computer network

A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. Where at least one process in one device is able to send/receive data to/from at least one process residing in a remote device, then the two devices are said to be in a network.
more from Wikipedia