نوع مقاله : علمی-پژوهشی
نویسندگان
1 پژوهشگاه علوم انسانی و مطالعات فرهنگی
2 زبانشناسی، پژوهشکده زبانشناسی، پژوهشگاه علوم انسانی و مطالعات انسانی، تهران، ایران
3 مؤسسه علوم شناختی
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
In the comprehensive scientific roadmap of the country (Iran), the promotion of Persian language as a science language among other international science languages is taken into consideration. One of the ways to reach this goal is suggested as expanding the usage of the Persian language. To achieve the goal, which can be raised in the field of Persian language policy making, it is necessary to understand more about the linguistic content properties and the basic concepts that are taught in the textbooks to students. The description of these features can be considered when preparing the language content. In this research, a corpus of textbooks from grades 1 to 6 (the primary school period) is developed that contains around 208,000 words and annotated. These courses include Farsi, Experimental Sciences, Social Studies and Heavenly Gifts. All the sentences of different courses are written in plain text files, separated by grade and course, and after normalization in the pre-processing process, they are annotated automatically at four levels: broad transliteration, lemmatization, part-of-speech and syntactic constituency parsing. The results of this research can help to know more about the content of textbooks and to be useful in the fields of education and policy making in language planning.
کلیدواژهها [English]