500 rub
Journal Highly available systems №1 for 2026 г.
Article in number:
Using preliminary segmentation to increase the detail of semantic similarity measurement for scientific texts
Type of article: scientific article
DOI: https://doi.org/10.18127/j20729472-202601-15
UDC: 004.92
Authors:

M.S. Gavrilov1

1 V.A. Trapeznikov Institute of Control Sciences, Russian Academy of Sciences (Moscow, Russia)

1 cobraj@yandex.ru

Abstract:

A combined approach to assessing the semantic similarity of scientific texts is proposed, based on their preliminary thematic segmentation into meaningful blocks (e.g., «objectives», «methods», «results») followed by multidimensional comparison. An experimental improvement in document clustering quality compared to full-text analysis is demonstrated.

Pages: 76-80
For citation

Gavrilov M.S. Using preliminary segmentation to increase the detail of semantic similarity measurement for scientific texts. Highly Available Systems. 2026. V. 22. № 1. P. 76−80. DOI: https://doi.org/10.18127/j20729472-202601-15 (in Russian)

References
  1. Wang J., Dong Y. Measurement of text similarity: a survey. Information. 2020. V. 11. № 9. P. 421.
  2. Govindaraju V., Ramanathan K. Similar document search andrecommendation. Journal of Emerging Technologies in Web Intelligence. 2012. V. 4. № 1. P. 84–93.
  3. Caracciolo C., de Rijke M. Generating and retrieving text segments for focused access to scientific documents. Advances in Information Retrieval: 28th European Conference on IR Research, ECIR 2006. London, UK. April 10–12. 2006. P. 350–361.
  4. Wan S., Lan Y., Guo J., Xu J., Pang L., Cheng X. A deep architecture for semantic matching with multiple positional sentence representations. Proceedings of the AAAI Conference on Artificial Intelligence. 2016. V. 30. № 1. https://doi.org/10.1609/aaai.v30i1.10342
  5. Chernobaev I., Surkova A. Obzor metodov tematicheskoj segmentacii tekstovy`x danny`x. Informacionny`e sistemy` i texnologii IST-2018. 2018. S. 1079–1083.
  6. Mishenin A. Tematicheskaya segmentaciya semanticheski odnorodny`x dokumentov. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaya matematika. Informa-tika. Processy` upravleniya. 2011. № 3. S. 127–133.
  7. SciRus-tiny: model to obtain embeddings of scientific texts in russian and english. URL: https://huggingface.co/mlsa-iai-msu-lab/sci-rus-tiny (data obrashheniya 22.04.2025).
  8. Liu et al. DeepSeek-V3 technical report. arXiv:2412.19437. 2024. https://doi.org/10.48550/arXiv. 2412.19437
Date of receipt: 24.02.2026
Approved after review: 26.02.2026
Accepted for publication: 10.03.2026