350 rub
Journal Highly available systems №2 for 2025 г.
Article in number:
Access automation to information for navigating through semantic library data and integrating the knowledge graph with the language model
Type of article: scientific article
DOI: https://doi.org/10.18127/j20729472-202502-01
UDC: 004
Authors:

V.I. Budzko1, O.M. Ataeva2, N.P. Tuchkova3

1–3 Federal Research Center «Computer Science» of the Russian Academy Sciences (Moscow, Russia)
1 vbudzko@ipiran.ru, 2 oataeva@frccsc.ru, 3 ntuchkova@frccsc.ru

Abstract:

The paper studies the problem of integrating language models (LM) and knowledge graph (KG). KG is built in the semantic library of scientific subject areas LibMeta for navigation through scientific publications. Using the example of KG of the mathematical subject area (SjD), it is shown that as a result of this approach, LM does not go beyond the SjD, which allows us to state a more relevant answer to the query. The descriptions of mathematical SjD are based on mathematical encyclopedias of the soviet mathematical school and the library of subject areas is filled by integrating subject areas of specialized mathematical journals. Using the example of mathematical SjD and applications, the problem of creating an environment for using a digital assistant in Russian when mastering scientific knowledge in a local SjD and accessing scientific research is considered. Setting up LM on SjD is implemented by creating a set of instructions and checking the truth of the answers based on them. Applications of the research results are expected to be implemented in mathematical knowledge systems, library and journal systems to support business processes, search and analysis of scientific publications.

The research is aimed at creating a technology for information support of scientific research in the process of searching and analyzing scientific information. The proposed approach allows reducing the flow of information noise when working with scientific publications.

A methodology for the interaction of LM and KG of mathematical SjD has been developed based on instructions applied to the description of SjD in the form of KG.

The application of the proposed approach will allow using multiple instructions to simplify work with LM in the process of searching for specialized information while reducing LM hallucinations and without involving expert advice. In the context of intensification of scientific work associated with an increasing flow of information, a solution for search augmented generation (RAG) is proposed.

Pages: 5-20
For citation

Budzko V.I., Ataeva O.M., Tuchkova N.P. Access automation to information for navigating through semantic library data and integrating the knowledge graph with the language model. Highly Available Systems. 2025. V. 21. № 2. P. 5−20. DOI: https://doi.org/ 10.18127/j20729472-202502-01 (in Russian)

References
  1. Russell S., Norvig P. Artificial Intelligence, Global Edition. A Modern Approach. 4th edition. Munich, Pearson. 2021. 1168 p.
  2. ISO 25964 the international standard for thesauri and interoperability with other vocabularies. https://www.niso.org/schemas/iso25964
  3. Ataeva O., Serebryakov V., Tuchkova N. Development of the semantic space 'Mathematics' by integrating a subspace of its applied area. Lobachevskii Journal of Mathematics. 2022. V. 43. № 12. P. 3435–3446. https://doi.org/10.1134/S1995080222150069
  4. Kaddour J., at all. Challenges and Applications of Large Language Models / Kaddour J., Harris J., Mozes M., Bradley H., Raileanu R., McHardy R. 2023. https://arxiv.org/abs/2307.10169  https://doi.org/10.48550/arXiv.2307.10169
  5. Singh A., Ehtesham A., Kumar T., Khoei T. Agentic retrieval-augmented generation: a survey on agentic RAG. 2025. https://doi.org/10.48550/ arXiv.2501.09136
  6. Annepaka, Y., Pakray P. Large language models: a survey of their development, capabilities, and applications. Knowledge and Information Systems. 2025. V. 67. P. 2967–3022. https://doi.org/10.1007/s10115-024-02310-4
  7. Minaee S., Mikolov T., Nikzad N., Chenaghlu M., Socher R., Amatriain X., Gao J. Large Language Models: A Survey. 2025. arXiv:2402.06196v3 /https://doi.org/10.48550/arXiv.2402.06196
  8. Liu Z., etc. OntoTune: Ontology-Driven Self-training for Aligning Large Language Models / Z. Liu, C. Gan, J. Wang, Y. Zhang, Z. Bo, M. Sun, H. Chen, W.Zhang. arXiv:2502.05478. 2025.  https://doi.org/10.48550/arXiv.2502.05478
  9. Xu D., etc. Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models / D. Xu, Z. Zhang, Z. Lin, X. Wu, Z. Zhu,T. Xu, X. Zhao, Y. Zheng, E. Chen. 2024. https://arxiv.org/pdf/2403.01972
  10. Azerbayev Z., at all. LLEMMA: an open language model for mathematics / 2th International Conference on Learning Representations / Z. Azerbayev Z., Schoelkopf H., Paster K., Dos Santos M., McAleer S., Jiang A.Q., Deng J., Biderman S., Welleck S., ICLR 2024 Hybrid, Vienna. Austria.
  11. Hendrycks D., at all. Measuring mathematical problem solving with the MATH dataset / Hendrycks D., Burns C., Kadavath S., Arora A., Basart S., Tang E., Song D., Steinhardt J. NeurIPS, 2021. https://doi.org/10.48550/arXiv.2103.03874
  12. Shao Z., at all. DeepSeekMath: pushing the limits of mathematical reasoning in open language models / Shao Z., Wang P., Zhu Q., Xu R., Song J., Bi X., Zhang H., Zhang M., Li Y.K. Wu Y., Guo D. https://doi.org/10.48550/arXiv.2402.03300
  13. Vinogradov I.M. Matematicheskaya e`nciklopediya. V 5 t. M.: Sovetskaya e`nciklopediya. 1977–1985.
  14. Moiseev E.I., Muromskij A.A., Tuchkova N.P. Tezaurus informacionno-poiskovy`j po predmetnoj oblasti «oby`knovenny`e differencial`ny`e uravneniya». M.: MAKS Press. 2005. 116 s.
  15. Pan S., at all. Unifying Large Language Models and Knowledge Graphs: A Roadmap // in IEEE Transactions on Knowledge and Data Engineering/ Pan S., Luo L., Wang Y., Chen C., Wang J., Wu X., V.. 36. №. 7. P. 3580–3599, July 2024, https://doi.org/10.1109/TKDE.2024.3352100.
  16. Luo L., at all. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models / Luo L., Zhao Z., Gong C., Haffari G., Pan S. arXiv preprint arXiv:2410.13080. 2024. https://doi.org/10.48550/arXiv.2410.13080
  17. De Santis A., at all. Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data / De Santis A., Balduini M., De Santis F., Proia A., Leo A., Brambilla M., Della Valle E. arXiv:2408.01700v1. 2024 https://doi.org/10.1007/978-3-031-77847-6_17
  18. Liu Z., at all. OntoTune: ontology-driven self-training for aligning large language models / Liu Z., Gan C., Wang J., Zhang Y., Bo Z., Sun M., Chen H., Zhang W. https://doi.org/10.48550/arXiv.2502.05478
  19. Ataeva, O., Serebryakov, V., and Tuchkova N. Ontological approach to a knowledge graph construction in a semantic library // Lobachevskii J. of Mathematics. 2023. 44. (6). P. 2229–2239. https://doi.org/10.1134/S1995080223060471
  20. Derong Xu, Ziheng Zhang, Zhenxi Lin, Xian Wu, Zhihong Zhu, Tong Xu, Xiangyu Zhao, Yefeng Zheng, Enhong Chen. Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models. https://doi.org/10.48550/arXiv.2403.01972
  21. Christian Steinfeldt, Helena Mihaljević. Evaluation and Domain Adaptation of Similarity Models for Short Mathematical Texts Intelligent Computer Mathematics / 17th International Conference, CICM 2024, Montréal, QC, Canada, August 5–9 2024. P. 241–260. https://doi.org/10.1007/978-3-031-66997-2_14
  22. Budzko V.I., Medennikov V.I. Sistemny`j analiz obrazovatel`ny`x cifrovy`x e`kosistem v APK. Sistemy` vy`sokoj dostupnosti. 2023. T. 19. № 1. S. 46–58. https://doi.org/10.18127/j20729472-202301-04
  23. Encyclopedia of Mathematics. https://www.encyclopediaofmath.org/index.php/Main_Page
  24. Matematicheskaya e`nciklopediya. E`nciklopediya / Gl. red. L.D. Faddeev. M.: Bol`shaya russkaya e`nciklopediya. 1998. 692 s.
  25. Брычков Ю.А. Специальные функции. Производные, интегралы, ряды и другие формулы. Справочник. М.: Бином. 509 с.
  26. Ataeva O., Serebryakov V., and Tuchkova N. From Texts to Knowledge Graph in the Semantic Library LibMeta. Lobachevskij Journal of Mathematics. 2024. V. 45. P. 2211–2219. https://doi.org/10.1134/S1995080224602625
  27. Ataeva, O.M., Sererbryakov, V.A., Tuchkova N.P. Mathematical Physics Branches: Identifying Mixed Type Equations. Lobachevskij Journal of Mathematics. 2019. V. 40. № 7. P. 876–886.  https://doi.org/10.1134/S1995080219070047
  28. Ataeva, O., Serebryakov, V., and Tuchkova N. Ontology-driven knowledge graph construction in the mathematics semantic library. Pattern Recognition and Image Analysis. 2024. V. 34. № 3. P. 451–458. https://doi.org/10.1134/S1054661824700196
Date of receipt: 14.04.2025
Approved after review: 30.04.2025
Accepted for publication: 30.05.2025