500 rub
Journal Highly available systems №1 for 2026 г.
Article in number:
Methods of semantic annotation and ontological modeling of mathematical texts in LaTeX format
Type of article: scientific article
DOI: https://doi.org/10.18127/j20729472-202601-18
UDC: 004.62
Authors:

M.G. Kobuk1, O.M. Ataeva2

1-2 Moscow Witte University (Moscow, Russia)

2 Federal Research Center "Computer Science and Control" of the RAS (Moscow, Russia)

1 mikhail.kobuk@mail.ru; 2 oataeva@frccsc.ru

Abstract:

Problem statement. This work addresses the problem of structuring scientific articles for building data corpora within the SciLibAIRU semantic ecosystem, and the transition from a document-oriented data representation to a format suitable for automated analysis and semantic search.

Objective. The objective of this work is to investigate an optimal method for vector-based fuzzy search over mathematical texts and to implement it in conjunction with a parser for mathematical LaTeX texts.

Results. A prototype vector search system is proposed that is capable of ingesting LaTeX versions of scientific texts and providing a fuzzy search interface over textual fragments within the SciLibAIRU library.

Practical significance. The results are applicable to library and editorial information systems.

Pages: 90-94
For citation

Kobuk M.G., Ataeva O.M. Methods of semantic annotation and ontological modeling of mathematical texts in LaTeX format. Highly Available Systems. 2026. V. 22. № 1. P. 90−94. DOI: https://doi.org/10.18127/j20729472-202601-18 (in Russian)

References
  1. Hoftich M. TEX4ht: LATEX to Web Publishing. TUGboat. 2019. V. 40. № 1. R. 76–81.
  2. Frankston C. et al. Using HTML Papers on arXiv: Why It’s Important, and How We Made It Happen. arXiv:2402.08954, 2024.
  3. Serebryakov V.A., Galochkin M.P., Gonchar D.R., Furugyan M.G. Teoriya i realizaciya yazy`kov programmirovaniya. Izd. 2-e. M.: Izd-vo MZ-Press. 2006. 352 s.
  4. Xopkroft Dzh., Motvani R., Ul`man Dzh. Vvedenie v teoriyu avtomatov, yazy`kov i vy`chislenij. M.: Vil`yams. 2002. 528 s.
  5. Axo A.V., Lam M.S., Seti R., Ul`man Dzh.D. Kompilyatory`: principy`, texnologii i instrumentarij. Izd .2-e. M.: Vil`yams. 2008. 1184 s.
  6. Peters M., Neumann M, Ivyer M., Gardner M., Clark C., Lee L., Zettlemoyer L. Deep contextualized word representations. arXiv:1802.05365v2, 2018. DOI: arXiv:1802.05365
  7. Pennington J., Socher R., Manning C. GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. R. 1532–1543. DOI: 10.3115/v1/D14-1162
  8. Joulin A., Grave E., Bojanowski P., Mikolov T. Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 2. Short Papers. Valencia, Spain, April 2017. R. 427–431. DOI: 10.18653/v1/E17-2068
  9. Feng F., Yang Y., Cer D., Arivazhagan N., Wang W. Language-agnostic BERT Sentence Embedding. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Dublin, Ireland. May 2022. R. 878–891. DOI: 10.18653/v1/2022.acl-long.62
  10. Zmitrovich D. et al. A Family of Pretrained Transformer Language Models for Russian. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia. May 2024. P. 507–524. arXiv.2309.10931. DOI: 10.48550/arXiv.2309.10931
  11. Kuratov Y., Arkhipov M. Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference «Dialogue 2019» Moscow. May-June 2019. arXiv:1905.07213. DOI: 10.48550/arXiv.1905.07213
  12. Nikolich A., Puchkova A. Fine-tuning GPT-3 for Russian Text Summarization. arXiv preprint 2021. arXiv:2108.03502. DOI: 10.48550/arXiv.2108.03502
  13. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems (NIPS 26). 2013. R. 3111–3119. DOI: 10.5555/2999792.2999959
  14. Gerasimenko N., Vatolin A.. Ianina A., Vorontsov K. SciRus: Tiny and Powerful Multilingual Encoder for Scientific Texts. Doklady Mathe­matics. 2024. V. 110. № 1. P. S193–S202. DOI: 10.1134/S1064562424602178
Date of receipt: 24.02.2026
Approved after review: 26.02.2026
Accepted for publication: 10.03.2026