تعبیه گراف دانش به منظور بهبود سامانه‌های پرسش و پاسخ فارسی

اسد, علی; ممتازی, سعیده

doi:10.30465/lsi.2025.50863.1791

تعبیه گراف دانش به منظور بهبود سامانه‌های پرسش و پاسخ فارسی

نوع مقاله : علمی-پژوهشی

نویسندگان

علی اسد ¹

سعیده ممتازی ²

¹ کارشناسی مهندسی کامپیوتر، دانشگاه امیرکبیر، تهران، ایران

² دانشیار دانشکده مهندسی کامپیوتر، دانشگاه امیرکبیر، تهران، ایران

10.30465/lsi.2025.50863.1791

چکیده

با ذخیره داده‌ها در گراف‌های دانش می‌توان علاوه بر روابط صریح، روابطی ضمنی را نیز حدس و بازیابی کرد. این ویژگی به سامانه‌‌های پرسش و پاسخ این امکان را می‌دهد که فراتر از آنچه از سه‌گانه‌های گراف دیده‌اند به چالش کشیده شوند. تعبیه گراف دانش یا نمایش گره‌ها و یال‌های گراف در قالب بردار‌های عددی، به همین منظور صورت می‌پذیرد. در این مقاله مسئله پاسخ به پرسش‌های فارسی با استفاده از تعبیه گراف دانش فارسی بررسی شده است. مدل‌های مختلفی برای تعبیه گراف دانش آموزش داده ‌شده‌اند تا هویت پیدا و نهان گره‌ها را در قالب بردار بازنمایی کند. از سوی دیگر با استفاده از مدل های زبانی، پرسش‌های فارسی به گونه‌ای تعبیه می‌شوند که نمایانگر یال ضمنی و یا عینی بین هر پرسش و پاسخ مربوطه باشد. با این رویکرد می‌توان به پرسش‌هایی پاسخ داد که مستقیماً سه‌گانه‌ مربوطه در گراف آورده نشده است و همچنین پا را فراتر گذاشته و به پرسش‌های پیچیده‌تر که نیازمند طی چندین یال است نیز، پاسخ مناسب داد. نتایج حاصل از مدل پیشنهادی مبتنی بر تعبیه گراف دانش فارسی فارس-ویکی-کی‌جی برای پاسخ‌گویی به پرسش‌های فارسی، نشان‌دهنده دقت ۸۵ درصد بر روی مجموعه‌داده پرسش و پاسخ ساده و پیچیده به زبان فارسی می‌باشد.با ذخیره داده‌ها در گراف‌های دانش می‌توان علاوه بر روابط صریح، روابطی ضمنی را نیز حدس و بازیابی کرد. این ویژگی به سامانه‌‌های پرسش و پاسخ این امکان را می‌دهد که فراتر از آنچه از سه‌گانه‌های گراف دیده‌اند به چالش کشیده شوند. تعبیه گراف دانش یا نمایش گره‌ها و یال‌های گراف در قالب بردار‌های عددی، به همین منظور صورت می‌پذیرد. در این مقاله مسئله پاسخ به پرسش‌های فارسی با استفاده از تعبیه گراف دانش فارسی بررسی شده است. مدل‌های مختلفی برای تعبیه گراف دانش آموزش داده ‌شده‌اند تا هویت پیدا و نهان گره‌ها را در قالب بردار بازنمایی کند. از سوی دیگر با استفاده از مدل های زبانی، پرسش‌های فارسی به گونه‌ای تعبیه می‌شوند که نمایانگر یال ضمنی و یا عینی بین هر پرسش و پاسخ مربوطه باشد. با این رویکرد می‌توان به پرسش‌هایی پاسخ داد که مستقیماً سه‌گانه‌ مربوطه در گراف آورده نشده است و همچنین پا را فراتر گذاشته و به پرسش‌های پیچیده‌تر که نیازمند طی چندین یال است نیز، پاسخ مناسب داد. نتایج حاصل از مدل پیشنهادی مبتنی بر تعبیه گراف دانش فارسی فارس-ویکی-کی‌جی برای پاسخ‌گویی به پرسش‌های فارسی، نشان‌دهنده دقت ۸۵ درصد بر روی مجموعه‌داده پرسش و پاسخ ساده و پیچیده به زبان فارسی می‌باشد.

کلیدواژه‌ها

گراف دانش فارسی

تعبیه گراف دانش

‌ سامانه پرسش و پاسخ مبتنی بر گراف دانش

عنوان مقاله English

Knowledge graph embedding for improving Persian question answering

نویسندگان English

Ali Asad ¹

Saeedeh Momtazi ²

¹ Amirkabir University of Technology

² Amirkabir University of Technology

چکیده English

This paper explores the task of answering Persian questions using embeddings derived from a Persian knowledge graph called Fars-Wiki-KG (Shirmardi et al., 2021). This graph contains more than 2 million nodes and 7.5 million triples, covering over 90% of the Persian Wikipedia infoboxes. Based on the nodes and edges of this knowledge graph, 1370 patterns have been created to build a question answering dataset. These patterns include both simple and multi-hop questions. For example, a simple pattern might be: "Which country is the producer of X?" In contrast, a multi-hop pattern could be: "What language do the people speak in the country of the producer of X?" The first one is straightforward, while the second question involves two hops: first finding the country of X, and then determining the language associated with that country. By considering these patterns, and extracting their corresponding answer from Fars-Wiki-KG, we provided a dataset consisting of questions and answers. Having a question like "Which country is the producer of 'Breaking Bad'?", the head entity is 'Breaking Bad,' the relation is the country of origin, and the tail entity is the 'USA' which is the answer.

Various knowledge graph embedding models, such as translational and semantic matching models, are trained and evaluated on subsets of the knowledge graph to embed its nodes. This process captures both explicit and latent information between the nodes in numerical vector form. At the same time, Persian questions are embedded using a multilingual version of the BERT language model, namely XLM-RoBERTa (Liu et al., 2019), to represent the relationships between each question and its corresponding answer. Each question starts with a head entity, followed by a relationship that describes how to reach the tail, which is the answer to the question. By utilizing the head entity embedding and the relation embedding (which contains information about the question), a score is assigned to all other nodes in the graph. This score indicates the likelihood of each node being the tail entity (question's possible answers). Therefore, the above-explained approach, inspired from the research by (Saxena et al., 2020), allows the system to answer questions even when the relevant triples are not explicitly included in the graph. Moreover, it enables responses to more complex questions that require traversing multiple edges within the graph called multi-hop questions.

کلیدواژه‌ها English

Persian knowledge graph

knowledge graph embedding

knowledge graph-based question answering

عبداللهی، علی. (1401). پیاده‌سازی سیستم یکپارچه گراف دانش فارسی به همراه هستان‌شناسی و پرسش و پاسخ [پایان‌نامه کارشناسی ارشد]. دانشگاه صنعتی امیرکبیر.

Asgari-Bidhendi, M., Hadian, A., & Minaei, B. (2019). FarsBase: The Persian knowledge graph. Semantic Web, 10(5), 1–15.

Bollacker, K. D., Evans, C., Paritosh, P. K., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1247–1250. https://doi.org/10.1145/1376616.1376746

Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems, 26, 2787–2795.

Choudhary, S., Luthra, T., Mittal, A., & Singh, R. (2021). A survey of knowledge graph embedding and their applications. arXiv preprint arXiv:2107.07842.

Huang, X., Zhang, J., Li, D., & Li, P. (2019). Knowledge graph embedding based question answering. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 105–113. https://doi.org/10.1145/3289600.3290956

Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., ... & Bizer, C. (2015). DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2), 167–195. https://doi.org/10.3233/SW-140134

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.

Nickel, M., Tresp, V., & Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data. Proceedings of the 28th International Conference on Machine Learning, 809–816.

Rossi, A., Barbosa, D., Firmani, D., Matinata, A., & Merialdo, P. (2021). Knowledge graph embedding for link prediction: A comparative analysis. ACM Transactions on Knowledge Discovery from Data, 15(2), 1–49. https://doi.org/10.1145/3441454

Ren, H., Hu, W., & Leskovec, J. (2020). Query2box: Reasoning over knowledge graphs in vector space using box embeddings. International Conference on Learning Representations.

Saxena, A., Tripathi, A., & Talukdar, P. (2020). Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4498–4507. https://doi.org/10.18653/v1/2020.acl-main.412

Shirmardi, F., Hosseini, S. M. H., & Momtazi, S. (2021). Farswikikg: An automatically constructed knowledge graph for Persian. International Journal of Web Research, 4(2), 25–30.\

Sun, H., Bedrax-Weiss, T., & Cohen, W. (2019). PullNet: Open domain question answering with iterative retrieval on knowledge bases and text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2380–2390. https://doi.org/10.18653/v1/D19-1242

Sun, H., Dhingra, B., Zaheer, M., Mazaitis, K., Salakhutdinov, R., & Cohen, W. (2018). Open domain question answering using early fusion of knowledge bases and text. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4231–4242. https://doi.org/10.18653/v1/D18-1455

Sun, Z., Deng, Z. H., Nie, J. Y., & Tang, J. (2019). RotatE: Knowledge graph embedding by relational rotation in complex space. International Conference on Learning Representations.

Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016). Complex embeddings for simple link prediction. Proceedings of the 33rd International Conference on Machine Learning, 2071–2080.

Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledge base. Communications of the ACM, 57(10), 78–85. https://doi.org/10.1145/2629489

Wang, Q., Mao, Z., Wang, B., & Guo, L. (2017). Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724–2743. https://doi.org/10.1109/TKDE.2017.2754499

Yang, B., Yih, W. T., He, X., Gao, J., & Deng, L. (2015). Embedding entities and relations for learning and inference in knowledge bases. International Conference on Learning Representations.

دوره 20، شماره 40
اسفند 1403
صفحه 131-152

XML

اصل مقاله 1.33 M

تعداد مشاهده مقاله	463
تعداد دریافت فایل اصل مقاله	194

تعبیه گراف دانش به منظور بهبود سامانه‌های پرسش و پاسخ فارسی

Knowledge graph embedding for improving Persian question answering

دوره 20، شماره 40اسفند 1403صفحه 131-152

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 20، شماره 40
اسفند 1403
صفحه 131-152