An Introduction to the Process of the Design and Production of a Standard Persian Emotional Speech Database

Document Type : .

Authors
1 Institute for Cognitive Sciences Studies (ICSS), Tehran, Iran
2 Professor of Electrical and Computer Engineering, Amirkabir University of Technology, Tehran, Iran
Abstract
Different environmental stimuli are able to influence and change the human emotional states. Speech expresses changes in emotional state in two ways; verbally through vocabulary and syntax and also non-verbally through tone and intonation. Tone and intonation have a paralinguistic role and can alter the meaning of speech. The quantitative processing and study of emotions were first introduced with the concept of affective computing in computer science. The main idea was that the machine could recognize and interpret human emotional states and respond or behave appropriately according to them. The quantitative study of emotional speech is known as speech emotion recognition. Recognition or classification of emotional speech means being able to recognize the speaker’s emotional state using analysis performed on the speech signal. The first step for conducting this type of study is to have a rich, standard, high-quality, and appropriately sized dataset for evaluating speech emotion recognition algorithms. There are a wide variety of standard speech emotion datasets in popular languages. The lack of such a database in the Persian language for researchers in the field of speech emotion recognition has resulted in low quality understanding of emotional patterns and their impact in the Persian language. This point underscores the necessity of creating and producing a speech emotion dataset in the Persian language. In this paper, we describe the designing, preparing, and producing a speech emotion dataset in standard Persian language, similar to the approach the Berlin Emotional Speech Database, which can be used in speech emotion recognition studies.
Keywords

علی‌نژاد، بتول (۱۳۸۴). بررسی رابطه بین ویژگی‌های عروضی گفتار فارسی و نقش‌های کاربرد شناختی آنها (مبتنی بر مطالعه موردی)، مجله علمی- پژوهشی دانشکده ادبیات و علوم انسانی دانشگاه اصفهان، دوره دوم، شماره ۴۲ و ۴۳ (226-199).
علی‌نژاد، بتول؛ حسینی بالام، فهیمه (۱۳۹۲). مبانی آواشناسی آکوستیکی، انتشارات دانشگاه اصفهان، اصفهان.
علی‌نژاد، بتول؛ ویسی، الخاص (۱۳۸۶). بررسی رابطۀ بین ویژگی‌های کاربردشناختی آوایی و بیان عواطف در زبان فارسی (مطالعه موردی)، هفتمین همایش زبانشناسی ایران، (164-143).
قاسم‌زاده، حبیب‌الله (۱۳۸۵). مقدمه‌ای بر علم شناختی، نشریۀ درون‌سازمانی.
 
Alinezhad, B (2005).  A study of the relationship between the prosodic features of Persian speech and their cognitive functional roles (based on a case study), Scientific-Research Journal of the Faculty of Literature and Humanities, University of Isfahan, 2(42 & 43), pp. 226-199. [In Persian]
Alinajad, B.; Vays, E. (2007). Investigating the relationship between phonetic cognitive features and the expression of emotions in Persian (case study), 7th Conference of Linguistics in Iran, Iran, pp. 164-143. [In Persian]
Alinezhad, B. (2010), A Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information, Journal of Teaching Language Skills. Shiraz, 2(1), pp. 1–26.
Alinezhad, B. and Hosseinibalam, F. (2013). Fundamentals of acoustic phonetics. University of Isfahan Press. Isfahan, Iran. [In Persian]
Bishop, C. M. (2006). Machine Learning and Pattern Recoginiton. Springer.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. and Weiss, B. (2005). A Database of German Emotional Speech, in Proc. Interspeech, Lisbon, pp. 1517–1520.
Damasio, A. R. (1999), The feeling of what happens: body and emotion in the making of consciousness, Harcourt Brace, New York.
Deller, Jr. John R., Hansen, John H.L. and Proakis, John G. (1999). Discrete-Time Processing of Speech Signals, Wiley-IEEE Press, Classic Reissue.
Edelman, G. M. (2004). Wider than the sky: The phenomenal gift of consciousness. Yale University Press, USA.
Ekman, P. (2007). Emotions Revealed_ Recognizing Faces and Feelings to Improve Communication and Emotional Life. 2nd edition, Holt Paperbacks, New York, USA.
Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database des. Internal AAU Report, Center for Person Kommunikation, Denmark, 22.
Esmaileyan z., and Marvi H.(2014). “A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation.” International Journal of Engineering, Transactions A: Basics 27(1):79–90.
Franti, E., Ispas, I., Dragomir, V., Dasc, M., Alu, Zoltan, E., & Stoica, I. C. (2017). Voice Based Emotion Recognition with Convolutional Neural Networks for Companion Robots. Romanian Journal of Information Science and Technology, 20(3), 222–240. Retrieved from http://www.romjist.ro/full-texts/paper562.pdf
Friedenberg, J. and Silverman, G. (2006). Cognitive science: an introduction to the study of mind, Sage Publications.
Ghasemzadeh, H. (2006). An Introduction to Cognitive Science, Internal Publication, Iran. [In Persian]
Hamidi M., and Mansoorizade M. (2012). “Emotion Recognition from Persian Speech with Neural Network.” International Journal of Artificial Intelligence & Applications 3(5):107–12.
Hasrul, M., Hariharan, M. and Sazali, Y. (2011). Human Affective (Emotion) Behavior Analysis using Speech Signals: A Review, Proceedings of the 2012 International Conference on Biomedical Engineering, ICoBE.
Keshtiari, N., Kuhlmann, M., Eslami, M., & Klann-Delius, G. (2015). Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD). Behavior Research Methods, 47(1), 275–294. https://doi.org/10.3758/s13428-014-0467-x
Kleinginna, P. R. & Kleinginna, A. M. (1981). A categorized list of motivation definitions, with a suggestion for a consensual definition. Motiv Emot Volume 5, Issue 3, pp 263–291. https://doi.org/10.1007/BF00992553
Luria, A. R. (1982). Language and Cognition. Wiley.
Mohamad Nezami, O., Jamshid Lou, P., & Karami, M. (2019). ShEMO: a large-scale validated database for Persian speech emotion detection. Language Resources and Evaluation, 53(1), 1–16. https://doi.org/10.1007/s10579-018-9427-x
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112. https://doi.org/10.1016/j.specom.2006.11.004
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. https://doi.org/10.1016/S0167-6393(03)00099-2
Picard, R. W. (1995). Affective Computing. MIT Media Laboratory Perceptual Computing Section Technical Report, (321), 26. Retrieved from http://www.media.mit.edu/~picard/
Sedaaghi, M. (2008). Documentation of the Sahand Emotional Speech Database (SES). Technical report, Department of engineering, Sahand University of Technology.
Slaney, M., & McRoberts, G. (1998). Baby Ears: a recognition system for affective vocalizations. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 2, 985-988 vol.2.
Thagard P. (2005). Mind: Introduction to cognitive science. Cambridge, MA: MIT Press.