Concepts inDetecting near-duplicates for web crawling
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or¿especially in the FOAF community¿Web scutters. This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data.
more from Wikipedia
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext links. Web pages frequently subsume other resources such as style sheets, scripts and images into their final presentation.
more from Wikipedia
Web document
A web document is similar in concept to a web page, but also satisfies the following broader definition: "... Every Web document has its own URI. Note that a Web document is not the same as a file: a single Web document can be available in many different formats and languages, and a single file, for example a PHP script, may be responsible for generating a large number of Web documents with different URIs.
more from Wikipedia
Web search engine
A web search engine is designed to search for information on the World Wide Web. The search results are generally presented in a list of results often referred to as search engine results pages (SERPs). The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories.
more from Wikipedia
Advertising
Advertising is a form of communication used to encourage or persuade an audience (viewers, readers or listeners; sometimes a specific group of people) to continue or take some new action. Most commonly, the desired result is to drive consumer behavior with respect to a commercial offering, although political and ideological advertising is also common. The purpose of advertising may also be to reassure employees or shareholders that a company is viable or successful.
more from Wikipedia
Fingerprint
A fingerprint in its narrow sense is an impression left by the friction ridges of a human finger. In a wider use of the term, fingerprints are the traces of an impression from the friction ridges of any part of a human or other primate hand. A print from the foot can also leave an impression of friction ridges. A friction ridge is a raised portion of the epidermis on the digits, the palm of the hand or the sole of the foot, consisting of one or more connected ridge units of friction ridge skin.
more from Wikipedia