350 rub
Journal Neurocomputers №5 for 2025 г.
Article in number:
Evaluation of semantic similarity between sentences in Russian using sentence transformers
Type of article: scientific article
DOI: https://doi.org/10.18127/j19998554-202505-02
UDC: 004.89
Authors:

N.D. Todosiev1, G.I. Afanasyev2, V.B. Timofeev3, Yu.E. Gapanyuk4
1–4 Bauman Moscow State Technical University (Moscow, Russia)

1 todosievnik@gmail.ru, 2 gaipcs@bmstu.ru, 3 vbtimofeev@yandex.ru, 4 gapyu@bmstu.ru

Abstract:

Existing semantic similarity scores are poorly suited to Russian language, because they compare sentences lexically rather than semantically. In the context of text generation problems in Russian, this problem is particularly relevant. This article is devoted to the development of a new evaluation that is as close as possible to human assessment and best fulfills the task of evaluating semantic similarity in the Russian language. The aim of the article is to develop a new automatic evaluation of the semantic similarity of two sentences for the task of generating texts in Russian. In the course of the work, a new evaluation has been developed based on existing sentence-transformers, trained first to understand the Russian language, and then to understand Russian semantics. The results suggest that the new inter-sentence semantic similarity score estimates the similarity between two sentences closest to a human assessment. The resulting assessment is of practical value for developers of question-and-answer and other systems that use text generation in Russian. The evaluation model is planned to be used in the task of generating texts in Russian.

Pages: 17-22
For citation

Todosiev N.D., Afanasyev G.I., Timofeev V.B., Gapanyuk Yu.E. Evaluation of semantic similarity between sentences in Russian using sentence transformers. Neurocomputers. 2025. V. 27. № 5. P. 17–22. DOI: https://doi.org/10.18127/j19998554-202505-02 (in Russian)

References
  1. OpenAI. ChatGPT: Optimizing language models for dialogue [Elektronnyj resurs]. OpenAI. 2022. URL: https://openai.com/blog/chatgpt/ (data obrashcheniya: 11.02.2024).
  2. Bang Y. et al. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv [cs.CL]. 2023.
  3. Papineni K. et al. BLEU: A method for automatic evaluation of machine translation [Elektronnyj resurs]. URL: https://aclanthology.org/ P02-1040.pdf (data obrashcheniya: 11.02.2024).
  4. Reiter E. A structured review of the validity of BLEU. Computational Linguistics. 2018. V. 44. № 3. P. 393–401.
  5. Post M. A call for clarity in reporting BLEU scores. arXiv [cs.CL]. 2018.
  6. Callison-Burch C., Osborne M., Koehn P. Re-evaluating the role of BLEU in machine translation research [Elektronnyj resurs]. URL: https://aclanthology.org/E06-1032.pdf (data obrashcheniya: 22.01.2024).
  7. Lin C.-Y. ROUGE: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches Out. Barcelona, Spain. 2004. P. 74–81.
  8. Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, Michigan. 2005. P. 65–72.
  9. Mikolov T. et al. Efficient estimation of word representations in vector space. arXiv [cs.CL]. 2013.
  10. Zhang T. et al. BERTScore: Evaluating text generation with BERT. arXiv [cs.CL]. 2019.
  11. Reimers N., Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. arXiv [cs.CL]. 2019.
  12. Reimers N., Gurevych I. Making monolingual sentence embeddings multilingual using knowledge distillation. arXiv [cs.CL]. 2020.
  13. Conneau A. et al. Unsupervised cross-lingual representation learning at scale. arXiv [cs.CL]. 2019.
  14. Common crawl [Elektronnyj resurs]. URL: https://commoncrawl.org/ (data obrashcheniya: 17.02.2024).
  15. Cer D. et al. SemEval-2017 Task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation. 2017.
  16. Ziganshina L.E. et al. Assessing human post-editing efforts to compare the performance of three machine translation engines for English to Russian translation of cochrane plain language health information: results of a randomised comparison. Informatics. MDPI. 2021. V. 8. P. 9.
  17. Shavrina T. et al. RussianSuperGLUE: A Russian language understanding evaluation benchmark. arXiv [cs.CL]. 2020.
  18. Zobnin A.I., Nosyrev G.V. Morfologicheskij analizator MyStem 3.0. Trudy Instituta russkogo yazyka im. V.V. Vinogradova. 2015. T. 6. S. 300–310. (in Russian)
Date of receipt: 02.07.2025
Approved after review: 16.07.2025
Accepted for publication: 23.09.2025