Access automation to information for navigating through semantic library data and integrating the knowledge graph with the language model

350 rub

Journal Highly available systems №2 for 2025 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j20729472-202502-01

UDC: 004

Keywords: Library of subject areas large language model knowledge graph automation of access to scientific information integra-tion of knowledge graph and language model systems of information support of scientific research mathematical subject area industrial engineering ontological design

Authors:

V.I. Budzko1, O.M. Ataeva2, N.P. Tuchkova3

1–3 Federal Research Center «Computer Science» of the Russian Academy Sciences (Moscow, Russia)
1 vbudzko@ipiran.ru, 2 oataeva@frccsc.ru, 3 ntuchkova@frccsc.ru

Abstract:

The paper studies the problem of integrating language models (LM) and knowledge graph (KG). KG is built in the semantic library of scientific subject areas LibMeta for navigation through scientific publications. Using the example of KG of the mathematical subject area (SjD), it is shown that as a result of this approach, LM does not go beyond the SjD, which allows us to state a more relevant answer to the query. The descriptions of mathematical SjD are based on mathematical encyclopedias of the soviet mathematical school and the library of subject areas is filled by integrating subject areas of specialized mathematical journals. Using the example of mathematical SjD and applications, the problem of creating an environment for using a digital assistant in Russian when mastering scientific knowledge in a local SjD and accessing scientific research is considered. Setting up LM on SjD is implemented by creating a set of instructions and checking the truth of the answers based on them. Applications of the research results are expected to be implemented in mathematical knowledge systems, library and journal systems to support business processes, search and analysis of scientific publications.

The research is aimed at creating a technology for information support of scientific research in the process of searching and analyzing scientific information. The proposed approach allows reducing the flow of information noise when working with scientific publications.

A methodology for the interaction of LM and KG of mathematical SjD has been developed based on instructions applied to the description of SjD in the form of KG.

The application of the proposed approach will allow using multiple instructions to simplify work with LM in the process of searching for specialized information while reducing LM hallucinations and without involving expert advice. In the context of intensification of scientific work associated with an increasing flow of information, a solution for search augmented generation (RAG) is proposed.

Pages: 5-20

For citation

Budzko V.I., Ataeva O.M., Tuchkova N.P. Access automation to information for navigating through semantic library data and integrating the knowledge graph with the language model. Highly Available Systems. 2025. V. 21. № 2. P. 5−20. DOI: https://doi.org/ 10.18127/j20729472-202502-01 (in Russian)

References

Russell S., Norvig P. Artificial Intelligence, Global Edition. A Modern Approach. 4th edition. Munich, Pearson. 2021. 1168 p.
ISO 25964 the international standard for thesauri and interoperability with other vocabularies. https://www.niso.org/schemas/iso25964
Ataeva O., Serebryakov V., Tuchkova N. Development of the semantic space 'Mathematics' by integrating a subspace of its applied area. Lobachevskii Journal of Mathematics. 2022. V. 43. № 12. P. 3435–3446. https://doi.org/10.1134/S1995080222150069
Kaddour J., at all. Challenges and Applications of Large Language Models / Kaddour J., Harris J., Mozes M., Bradley H., Raileanu R., McHardy R. 2023. https://arxiv.org/abs/2307.10169 https://doi.org/10.48550/arXiv.2307.10169
Singh A., Ehtesham A., Kumar T., Khoei T. Agentic retrieval-augmented generation: a survey on agentic RAG. 2025. https://doi.org/10.48550/ arXiv.2501.09136
Annepaka, Y., Pakray P. Large language models: a survey of their development, capabilities, and applications. Knowledge and Information Systems. 2025. V. 67. P. 2967–3022. https://doi.org/10.1007/s10115-024-02310-4
Minaee S., Mikolov T., Nikzad N., Chenaghlu M., Socher R., Amatriain X., Gao J. Large Language Models: A Survey. 2025. arXiv:2402.06196v3 /https://doi.org/10.48550/arXiv.2402.06196
Liu Z., etc. OntoTune: Ontology-Driven Self-training for Aligning Large Language Models / Z. Liu, C. Gan, J. Wang, Y. Zhang, Z. Bo, M. Sun, H. Chen, W.Zhang. arXiv:2502.05478. 2025. https://doi.org/10.48550/arXiv.2502.05478
Xu D., etc. Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models / D. Xu, Z. Zhang, Z. Lin, X. Wu, Z. Zhu,T. Xu, X. Zhao, Y. Zheng, E. Chen. 2024. https://arxiv.org/pdf/2403.01972
Azerbayev Z., at all. LLEMMA: an open language model for mathematics / 2th International Conference on Learning Representations / Z. Azerbayev Z., Schoelkopf H., Paster K., Dos Santos M., McAleer S., Jiang A.Q., Deng J., Biderman S., Welleck S., ICLR 2024 Hybrid, Vienna. Austria.
Hendrycks D., at all. Measuring mathematical problem solving with the MATH dataset / Hendrycks D., Burns C., Kadavath S., Arora A., Basart S., Tang E., Song D., Steinhardt J. NeurIPS, 2021. https://doi.org/10.48550/arXiv.2103.03874
Shao Z., at all. DeepSeekMath: pushing the limits of mathematical reasoning in open language models / Shao Z., Wang P., Zhu Q., Xu R., Song J., Bi X., Zhang H., Zhang M., Li Y.K. Wu Y., Guo D. https://doi.org/10.48550/arXiv.2402.03300
Vinogradov I.M. Matematicheskaya e`nciklopediya. V 5 t. M.: Sovetskaya e`nciklopediya. 1977–1985.
Moiseev E.I., Muromskij A.A., Tuchkova N.P. Tezaurus informacionno-poiskovy`j po predmetnoj oblasti «oby`knovenny`e differencial`ny`e uravneniya». M.: MAKS Press. 2005. 116 s.
Pan S., at all. Unifying Large Language Models and Knowledge Graphs: A Roadmap // in IEEE Transactions on Knowledge and Data Engineering/ Pan S., Luo L., Wang Y., Chen C., Wang J., Wu X., V.. 36. №. 7. P. 3580–3599, July 2024, https://doi.org/10.1109/TKDE.2024.3352100.
Luo L., at all. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models / Luo L., Zhao Z., Gong C., Haffari G., Pan S. arXiv preprint arXiv:2410.13080. 2024. https://doi.org/10.48550/arXiv.2410.13080
De Santis A., at all. Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data / De Santis A., Balduini M., De Santis F., Proia A., Leo A., Brambilla M., Della Valle E. arXiv:2408.01700v1. 2024 https://doi.org/10.1007/978-3-031-77847-6_17
Liu Z., at all. OntoTune: ontology-driven self-training for aligning large language models / Liu Z., Gan C., Wang J., Zhang Y., Bo Z., Sun M., Chen H., Zhang W. https://doi.org/10.48550/arXiv.2502.05478
Ataeva, O., Serebryakov, V., and Tuchkova N. Ontological approach to a knowledge graph construction in a semantic library // Lobachevskii J. of Mathematics. 2023. 44. (6). P. 2229–2239. https://doi.org/10.1134/S1995080223060471
Derong Xu, Ziheng Zhang, Zhenxi Lin, Xian Wu, Zhihong Zhu, Tong Xu, Xiangyu Zhao, Yefeng Zheng, Enhong Chen. Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models. https://doi.org/10.48550/arXiv.2403.01972
Christian Steinfeldt, Helena Mihaljević. Evaluation and Domain Adaptation of Similarity Models for Short Mathematical Texts Intelligent Computer Mathematics / 17th International Conference, CICM 2024, Montréal, QC, Canada, August 5–9 2024. P. 241–260. https://doi.org/10.1007/978-3-031-66997-2_14
Budzko V.I., Medennikov V.I. Sistemny`j analiz obrazovatel`ny`x cifrovy`x e`kosistem v APK. Sistemy` vy`sokoj dostupnosti. 2023. T. 19. № 1. S. 46–58. https://doi.org/10.18127/j20729472-202301-04
Encyclopedia of Mathematics. https://www.encyclopediaofmath.org/index.php/Main_Page
Matematicheskaya e`nciklopediya. E`nciklopediya / Gl. red. L.D. Faddeev. M.: Bol`shaya russkaya e`nciklopediya. 1998. 692 s.
Брычков Ю.А. Специальные функции. Производные, интегралы, ряды и другие формулы. Справочник. М.: Бином. 509 с.
Ataeva O., Serebryakov V., and Tuchkova N. From Texts to Knowledge Graph in the Semantic Library LibMeta. Lobachevskij Journal of Mathematics. 2024. V. 45. P. 2211–2219. https://doi.org/10.1134/S1995080224602625
Ataeva, O.M., Sererbryakov, V.A., Tuchkova N.P. Mathematical Physics Branches: Identifying Mixed Type Equations. Lobachevskij Journal of Mathematics. 2019. V. 40. № 7. P. 876–886. https://doi.org/10.1134/S1995080219070047
Ataeva, O., Serebryakov, V., and Tuchkova N. Ontology-driven knowledge graph construction in the mathematics semantic library. Pattern Recognition and Image Analysis. 2024. V. 34. № 3. P. 451–458. https://doi.org/10.1134/S1054661824700196

Date of receipt: 14.04.2025

Approved after review: 30.04.2025

Accepted for publication: 30.05.2025