article

A vector space model for automatic indexing

Abstract

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

References

  1. 1 Salton, G. Automatic btformation Organiza;ion and Retrieval. McGraw-Hill, New York, 1968, Ch. 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Salton, G., and Yang, C.S. On the specification of term values in automatic indexing. J. Documen. 29, 4 (Dec. 1973), 351-372.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3 Sparck Jones, K. A statistical interpretation of term specificity and its application to retrieval. J. Documen. 28, 1 (March 1972), 11-20.Google ScholarGoogle ScholarCross RefCross Ref
  4. 4 Williamson, R.E. Real-time document retrieval. Ph.D. Th., Computer Sci. Dep., Cornell U., June 1974.Google ScholarGoogle Scholar
  5. 5 Wong, A. An investigation of the effects of different indexing methods on the document space configuration. Sci. Rep. ISR-22, Computer Sci. Dep., Cornell U., Section II, Nov. 1974.Google ScholarGoogle Scholar
  6. 6 Salton, G. A theory of indexing. Regional Conference Series in Applied Mathematics No. 18, SIAM, Philadelphia, Pa., 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Salton, G., Yang, C.S., and Yu, C.T. Contribution to the theory of indexing. Proc. IFIP Congress 74, Stockholm, August 1974. American Elsevier, New York, 1974.Google ScholarGoogle Scholar

Index Terms

  1. A vector space model for automatic indexing

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)1,181
        • Downloads (Last 6 weeks)54

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!