ABSTRACT
In this note, we present results concerning the theory and practice of determining for a given document which of several categories it best fits. We describe a mathematical model of classification schemes and the one scheme which can be proved optimal among all those based on word frequencies. Finally, we report the results of an experiment which illustrates the efficacy of this classification method.
- {Hayes, 1992} Philip Hayes, Intelligent High-Volume Text Processing Using Shallow, Domain Specific Techniques, Text-Based Intelligent Systems, P. Jacobs, ed., Lawrence Erlbaum, Hillsdale, NJ, pp. 227--241. Google Scholar
Digital Library
- {Lewis, 1992} David Lewis, Feature Selection and Feature Extraction for Text Categorization, Proceedings Speech and Natural Language Workshop, Morgan Kaufman, San Mateo, CA, February 1992, pp. 212--217. Google Scholar
Digital Library
- {Sundheim, 1991} Beth Sundheim, editor. Proceedings of the Third Message Understanding Evaluation and Conference, Morgan Kaufman, Los Altos, CA, May 1991. Google Scholar
Digital Library
- {Walker and Amsler, 1986} D. Walker and R. Amsler, The Use of Machine-Readable Dictionaries in Sublanguage Analysis, Analyzing Language in Restricted Domains, Grishman and Kittredge, eds., Lawrence Erlbaum, Hillsdale, NJ.Google Scholar
Index Terms
(auto-classified)Document classification by machine: theory and practice
Recommendations
Segmented document classification: problem and solution
DEXA'06: Proceedings of the 17th international conference on Database and Expert Systems ApplicationsIn recent years, structured text documents like XML files are playing an important role in the Web-based applications. Among them, there are some documents that are segmented into different sections like “title”,“body”, etc. We call them “segmented ...
Categorizing the Document Using Multi Class Classification in Data Mining
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication NetworksClassification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of ...
Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine
This work implements an enhanced hybrid classification method through the utilization of the naïve Bayes classifier and the Support Vector Machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document ...





Comments