Studying light verbs in two Persian specialized corpora

Document Type : .

Author
Computational linguistic research group. Information Science Research Department. Iranian Research Institute for Information Science and Technology (IranDoc). Tehran.Iran.
Abstract
One of the topics examined within the framework of corpus linguistics is the compound verb. Compound verbs consist of two parts: a "non-verbal element" and a "light verb." In this study, the frequency of light verbs was extracted from two specialized corpora, " PAZHUHESHNAME" and "PEKA" (the corpus of IranDoc books), and the frequency of the top three light verbs in these two corpora was compared with their frequency in a general corpus, namely the "Persian Text Corpus."
In these two specialized corpora, which together contain nearly eight million words, the light verbs "zadan" (to hit), "dashtan" (to have), "kardan" (to do), "sepordan" (to give), "gereftan" (to take), "amadan" (to come), "dadan" (to give), "oftadan" (to fall), "khordan" (to eat), "keshidan" (to pull), "avardan" (to bring), "nemoudan" (to do), "raftan" (to go), "bordan" (to carry), and "andakhtan" (to throw) were analyzed. Among these, the light verbs "kardan" with a frequency of 5848 and "dadan" with a frequency of 5037 ranked first and second in terms of frequency. The frequency of these two light verbs is also high in the general corpus, where they rank among the top. The light verb "gereftan" with a frequency of 3246 ranks third in the specialized corpora. However, in the general corpus, the light verb "gereftan" ranks after the light verbs "kardan," "dadan," "dashtan," and "namoudan" in terms of frequency.
After extracting and examining the frequency of light verbs, compound verbs formed with these three light verbs (kardan, dadan, and gereftan) were also analyzed in terms of the likelihood of a gap between the non-verbal element and the light verb within a defined context (five tokens before and five tokens after the target light verb). The analysis of the behavior of high-frequency compound verbs in specialized corpora showed that authors of specialized texts, mainly researchers, students, and professors, are less inclined to place a gap between the non-verbal element and the light verb. Therefore, in most compound verbs, the non-verbal element and the light verb appear together without any syntactic category or group intervening. In contrast, in general texts, the likelihood of authors inserting syntactic groups or various categories between the non-verbal element and the light verb is higher than in specialized texts
Keywords

اسحاقی، مهدیه و کریمی‌دوستان، غلامحسین. (1400). «زایایی فعل­های سبک در زبان فارسی». پژوهش‌های زبانی. 12 (2). صص. 4-28.
تسلیمی­پور، شیوا. (1391). پردازش خودکار معنایی افعال مرکب فارسی. پایان­نامه کارشناسی ارشد. رشته مهندسی کامپیوتر (هوش مصنوعی). دانشگاه شیراز.
چراغی، زهرا و کریمی‌دوستان، غلامحسین. (1392). «طبقه‌بندی افعال زبان فارسی بر اساس ساخت رویدادی نمودی». پژوهش‌های زبانی (دانشکده ادبیات و علوم انسانی، دانشگاه تهران). 4 (2). صص. 41-60.
دبیر مقدم، محمد. (1374). «فعل مرکب در زبان فارسی». مجله زبان‌شناسی، س12، ش1 و 2. صص.2-46.
طباطبائی، علاءالدین .(1384). «فعل مرکب در زبان فارسی». نامه فرهنگستان. 7 (2)، 26-34.
ظفرآبادی، بهادر و رحیمیان، جلال. (1399). «بررسی تحول و تعداد وقوع فعل سبک داشتن، دیدن، آمدن، آوردن و گرفتن از سال 1220 تا 1320». نشریه پژوهش‌های زبان‌شناسی. 1 (22). صص. 23-42.
عسگریان، نرجس. (1391). شناسایی خودکار افعال مرکب فارسی. پایان­نامه کارشناسی ارشد. مهندسی کامپیوستر- هوش مصنوعی. دانشگاه شیراز.
علایی ابوذر، الهام و بی­جن­خان، محمود. (1391). «بررسی افعال مرکب جداشدنی زبان فارسی در چارچوب زبان‌شناسی پیکره­ای». مجموعه مقالات دومین هم­اندیشی زبان‌شناسی رایانشی. انجمن زبان‌شناسی ایران. 45-58.
علایی ابوذر، الهام.، نصراله پاک­نیت، علی­اصغر حجت­پناه، مجتبی زالی و محمدهادی آقالویی آغمیونی. (1400). ساخت پیکرة متنی از مقاله‌های پژوهش­نامة پردازش و مدیریت اطلاعات. پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک).
کلباسی، ایران (1371). ساخت اشتقاقی واژه در فارسی امروز. مؤسسه مطالعات و تحقیقات فرهنگی.
کوهستانی، منوچهر. (1389). بررسی خطاهای املایی و نگارشی در وبلاگ‌های فارسی و ماهیت زبان‌شناختی آن‌ها. پایان‌نامۀ کارشناسی ارشد، تهران. دانشگاه تهران.
مرادی، ابراهیم و غلامحسین کریمی‌دوستان. (1392). «بررسی معنایی فعل مرکب در زبان فارسی».نشریه ادب و زبان. دوره 16. ش33، 306-325.
ناتل خانلری، پرویز. (1366). تاریخ زبان فارسی. ج2. نشر نو.
وحدت­زاده، سارا. (1391). بررسی پیکره­بنیاد فعل مرکب بر مبنای کتاب‌های دستور زبان فارسی دبیرستان‌های ایران. پایان­نامه کارشناسی ارشد. دانشگاه پیام نور. استان تهران.
Alayiaboozar, E., & Bijankhan, M. (2012). "Studying Persian separable compound verbs in corpus linguistics". Second conference in computational linguistics. Iranian linguistics society. Pp. 45-58. [in Persian].
Alayiaboozar, E., & Hojjatpanah, A (2022). "Steps for creating two Persian specialized corpora". International Journal of Information Science and Management (IJISM). Vol20, Issue 4, 231-243.
Alayiaboozar, E., Pakniat, N., Zali, M., & Aghalooyi Aghmiyooni,.M.H. (2021). Building a corpus from the published articles of Iranian Journal of Information Management and Processing. Iranian Research Institute for Information Science and Technology (Irandoc). [in Persian].
Asgarian, N. (2012). Automatic identification of Persian compound verbs. M.A. University of Shiraz. [in Persian].
Cheraghi, Z., and Karimidoostan, Gh.H. (2013). "Classification of Persian verbs based on event and aspectual". Journal of Language Research. 4 (2), 41-60. [in Persian].
Dabirmoghadam, M. (1995). "Compound verbs in Persian". Journal of Zabanshenasi. 12 (1 and 2), 2-46. [in Persian].
Eshaghi, M., and Karimidoustan, Gh.H. (2021). "The productivity of Persian light verbs". Journal of language researches. 12 (2), 4-28.
Kalbasi, I. (1992). Derivational structure of words in modern Persian. Cultural studies institute. [in Persian].
Karimi, S. (1997). "Persian complex verbs: idiomatic or compositional". Lexicology. University of Arizona. Pp. 273-318.
Kouhestani, M. (2010). Studying spell and writing errors in Persian weblogs and their linguistics characteristics. MA thesis. University of Tehran. [in Persian].
Lambton, A.K.S. (1984). Persian grammar. Cambridge university press
Moradi, E., & Karimidoostan, Gh.H. (2013). "Semantic study of compound verbs in Persian". Nasrpazhuhi e adab e Farsi. No. 33. Pp. 297-317. [in Persian].
Natel Khanlari, P. (1987). The history of Persian language. Nashr e No.
Rasooli, M.S., Faili, H., & Minaei-Bidgoli, B. (2011). Unsupervised identification of Persian compound verbs. Advances in artificial intelligence.   In proceeding of 10th Mexican international conference on artificial intelligence. MICAI: 394-406.
Sharif, B., and Shadmehr, M. (2022). "Semantics of Persian light verbs". Journal of linguistics studies, theory and practice. 1 (1), 191-206.
Sinclair, J. (2004). "Corpus and Text: Basic Principles". Developing Linguistic Corpora: a Guide to Good Practice. M. Wynne (ed.), .ahds.literature, languages and linguistics. The Oxford Text Archive.
Tabatabaei, A. (2005). "Compound verbs in Persian". Naame Farhangestan.7 (2), 26-34. [in Persian].
Taslimipoor, Sh. (2012). Automatic semantic processing of Persian compound verbs. M.A. thesis. University of Shiraz. [in Persian].
Vahdatzade, S. (2012). Corpus-based study of compound verbs based on Persian grammar books in Iranian high schools. M.A. thesis. University of Payam e Noor. [in Persian].
Zafarabadi, B., and Rahemeyan, J. (2020). "Investigating the evolution and frequency of Persian light verbs Dashtan (Have), Avardan (Bring), Gereftan (Get), and Didan (See) from 1840-1940". Journal of researches in linguistics. 12 (1), 23-42.
]