10.1109/WI.2006.201guideproceedingsArticle/Chapter ViewAbstractPublication PageswiConference Proceedings
Article
Free Access

WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques

ABSTRACT

Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of web pages search results. In particular, we propose an architecture called WISE [1], a meta-search engine that automatically builds clusters of related web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding web documents. The system which is a web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the pre-selection of the retrieved web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters.

Index Terms

  1. WISE

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!