Abstract
The rapid growth of the Internet and other computing facilities in recent years has resulted in the creation of a large amount of text in electronic form, which has increased the interest in and importance of different automatic text processing applications, including keyword extraction and term indexing. Although keywords are very useful for many applications, most documents available online are not provided with keywords. We describe a method for extracting keywords from Arabic documents. This method identifies the keywords by combining linguistics and statistical analysis of the text without using prior knowledge from its domain or information from any related corpus. The text is preprocessed to extract the main linguistic information, such as the roots and morphological patterns of derivative words. A cleaning phase is then applied to eliminate the meaningless words from the text. The most frequent terms are clustered into equivalence classes in which the derivative words generated from the same root and the non-derivative words generated from the same stem are placed together, and their count is accumulated. A vector space model is then used to capture the most frequent N-gram in the text. Experiments carried out using a real-world dataset show that the proposed method achieves good results with an average precision of 31% and average recall of 53% when tested against manually assigned keywords.
- Al-Sughaier, I. and Al-Kharashi, I. 2004. Arabic morphological analysis techniques: A comprehensive survey. J. Am. Soc. Inform. Sci. Technol. 55, 3, 189--213. Google Scholar
Digital Library
- Awajan, A. 2011. Multilayer model for Arabic text compression. Int. Arab J. Inform. Technol. 8, 2, 188--196.Google Scholar
- Beesley, R. 1996. Arabic finite-state morphological analysis and generation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’96). 89--94. Google Scholar
Digital Library
- Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Ould Abdallahi Ould Bebah, M., and Shoul, M. 2010. Alkhalil Morpho Sys: A morphosyntactic analysis system for Arabic texts. In Proceedings of the International Arab Conference on Information Technology. http://www.itpapers.info/acit10/Papers/f653.Google Scholar
- Cohen, J. D. 1995. Language and domain-independent automatic indexing terms for abstracting. J. Am. Soc. Inform. Sci. 46, 3, 162--174. Google Scholar
Digital Library
- Diab, M., Hacioglu, K., and Jurafsky, D. 2007. Automatic processing of modern standard Arabic text. In Arabic Computational Morphology. Springer, 159--179.Google Scholar
- El-Beltagy, S. and Rafea, A. 2008. KP-Miner: A keyphrase extraction system for English and Arabic documents. Inform. Sys. 34, 1, 132--144. DOI:10.1016/j.is.2008.05.002. Google Scholar
Digital Library
- El-Shishtawy, T. and Al-Sammak, A. 2009. Arabic keyphrase extraction using linguistic knowledge and machine learning techniques, In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools. The MEDAR Consortium.Google Scholar
- ESCWA. 2012. Status of the digital Arabic content industry in the Arab region. Economic and Social Commission for Western Asia-United Nations. http://www.escwa.un.org/information/publications/edit/upload/E_ESCWA_ICTD_12_TP-4_E.pdf.Google Scholar
- Giarlo, M. J. 2006. A comparative analysis of keyword extraction techniques, Rutgers, University. http://lackoftalent.org/michael/papers/596.pdf.Google Scholar
- Green, S. and Manning, C. D. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING). 394--402. Google Scholar
Digital Library
- Habash, N. Y. 2012. Introduction to Arabic Language Processing. Morgan and Claypool. Google Scholar
Digital Library
- Habash, N., Soudi, A., and Buckwalter, T. 2007. On Arabic transliteration. In Arabic Computational Morphology. Springer. 15--22.Google Scholar
- Hmeidi, I., Kanaan, G. and Evens, M. 1997. Design and implementation of automatic indexing for information retrieval with Arabic documents. J. Amer. Soc. Inform. Sci. 48, 10, 867--881. Google Scholar
Digital Library
- Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Google Scholar
Digital Library
- Hulth, A. 2004. Combining machine learning and natural language processing for automatic keyword extraction. Ph.D. Dissertation, Department of Computer and Systems Sciences, Stockholm University.Google Scholar
- Jones, K. 1972. A statistical interpretation of term specificity and its application in retrieval. J. Document. 28, 1, 11--21.Google Scholar
Cross Ref
- Liu, Z., Li, P., Zheng, Y., and Sun, M. 2009. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 257--266. Google Scholar
Digital Library
- Manning, C. D., Raghavan, P., and Schtze, H. 2009. An Introduction to Information Retrieval. Cambridge University Press, UK. Google Scholar
Digital Library
- Matsuo, Y. and Ishizuka, M. 2004. Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Art. Intell. Tools. 13, 1, 157--169.Google Scholar
Cross Ref
- Mihalcea, R. and Tarau, P. 2004. TextRank: Brining order into texts. In Proceedings of EMNLP. Association for Computational Linguistics. 404--411.Google Scholar
- Rose, S., Engel, D., Cramer, N., and Cowley, W. 2010. Automatic keyword extraction from individual documents. In Text Mining: Applications and Theory, M. W. Berry and J. Kogan (Eds.). John Wiley & Sons. 3--20.Google Scholar
- Saad, M. 2011. Arabic Corpora. http://sourceforge.net/projects/ar-textmining/files/Arbic-Corpora/. (Last accessed 5/13).Google Scholar
- Salton, G., Wong, A., and Yang, C. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620. Google Scholar
Digital Library
- Turney, P. D. 1999. Learning algorithm for keyphrase extraction. Technical Report ERB-1057. National Research Council Technology of Canada, Institute for Information Technology. http://arxiv.org/ftp/cs/papers/0212/0212013.pdf.Google Scholar
- Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., and Nevill-Manning, C. G. 1999. KEA: Practical automatic keyphrase extraction. In Proceedings of the 4th Conference on Digital Libraries (DL’99). 254--256. Google Scholar
Digital Library
Index Terms
Keyword Extraction from Arabic Documents using Term Equivalence Classes
Recommendations
Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language
Computational Collective IntelligenceAbstractIn this paper authors propose a hybrid approach for extracting keywords and keyphrases of text resources and documents in Kazakh. Direct application of the statistical method tf-idf is not the optimal solution to the question of extracting ...
Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages
ICDMW '08: Proceedings of the 2008 IEEE International Conference on Data Mining WorkshopsThis paper presents a new keyword extraction algorithm for Chinese news web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance ...
Building an Arabic Sentiment Lexicon Using Semi-supervised Learning
Sentiment analysis is the process of determining a predefined sentiment from text written in a natural language with respect to the entity to which it is referring. A number of lexical resources are available to facilitate this task in English. One such ...






Comments