نوع مقاله : علمی-پژوهشی
نویسندگان
1 گروه زبان شناسی همگانی، دانشکده ادبیات و زبانهای خارجه، دانشگاه علامه طباطبایی، تهران، ایران
2 گروه زبانشناسی همگانی، دانشگاه علامه طباطبایی، تهران، ایران
3 گروه رایانه، دانشکده آمار، ریاضی و رایانه، دانشگاه علامه طباطبایی، تهران، ایران
4 عضو هیئت علمی دانشگاه آزاد اسلامی کرج
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Nowadays, corpora are widely used in authorship attribution. In this research, a corpus of persian contemporary texts was applied to identify the authorship of texts and the effectiveness of function and content words in this task was compared. In order to reach this goal, seven contemporary writers named Hoshang Golshiri, Bozor Alavi, Ahmad Mahmoud, Mahmoud Dolatabadi, Nader Ebrahimi, Jalal Al Ahmad and Gholamhossein Saedi were selected and their books were collected. Then by using this corpus and deep learning algorithms like multilayer perceptron and Long Short Term Memory, effectiveness of function and content words was evaluated. The results of the research indicated that function words based method was superior to content words one in authorship attribution. In addition, pronouns, especially demonstrative and personal pronouns, showed the highest efficiency among the types of function words to determine the author of a text. Moreover, features based on conjunctions and auxiliary verbs were valuable to recognize persian writers.
کلیدواژهها [English]