Abstract—The main goal of stemming is to standardize words by reducing a word to its origin. In this paper a new algorithm for stemming in Farsi (Persian) language is presented. This stemmer is based on removing the suffixes and prefixes, and a database is used for saving the exceptions to decrease error rate. In the proposed method the speed of stemmer and also the percentage of errors are improved. The evaluation results on the prototype document collections show significant improvement in precision and recall in comparison with other well-known methods.
Index Terms—Farsi, persian, language, stemming.
Somayye Estahbanati is with Department of Computer Engineering. Islamic Azad University, Science and Research Branch, Ahvaz, Iran (Email: s.estahbanati@gmail.com).
Reza Javidan and Mehdi Nikkhah are with Department of Computer Engineering. Islamic Azad University, Beyza Branch, Beyza, Iran (Email: reza.javidan@gmail.com; Nikkhah@biau.ac.ir).
[PDF]
Cite: Somayyeh Estahbanati, Reza Javidan, and Mehdi Nikkhah, "A New Multi-Phase Algorithm for Stemming in Farsi Language Based on Morphology,"
International Journal of Computer Theory and Engineering vol. 3, no. 5, pp. 623-627, 2011.