Development of a method for assessing the quality of machine translation based on ensemble methods in machine learning

350 rub

Journal Science Intensive Technologies №2 for 2021 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j19998465-202102-06

UDC: 004.021

Keywords: Machine translation quality assessment machine translation machine learning Bagging Extra Tree Random Forest regression ensemble methods

Authors:

A.V. Kozina, Yu.S. Belov

Kaluga Branch of the Bauman MSTU (Kaluga, Russia)

Abstract:

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.

Pages: 52-58

For citation

Kozina A.V., Belov Yu.S. Development of a method for assessing the quality of machine translation based on ensemble methods in machine learning. Science Intensive Technologies. 2021. V. 22. № 2. P. 52−58. DOI: https://doi.org/10.18127/j19998465-202102-06 (In Russian).

References

Blatz J., Fitzgerald E., Foster G. et al. Confidence estimation for machine translation. Proc. of the 20th international conference on computational linguistics. 2004. P. 315–321.
Quirk C. Training a Sentence-Level Machine Translation Confidence Measure. InLREC. 2004. P. 825–828.
Xiong D., Zhang M., Li H. Error detection for statistical machine translation using linguistic features. Proc. of the 48th annual meeting of the Association for Computational Linguistics. 2010. P. 604–611.
Kozina A.V., Cherepkov E.A., Belov Yu.S. Avtomaticheskie metriki ocenki kachestva mashinnogo perevoda/ Sistemnyj administrator. 2019. № 11(204). S. 84–87 (In Russian).
Kipyatkova I.S., Karpov A.A. Avtomaticheskaya obrabotka i statisticheskij analiz novostnogo tekstovogo korpusa dlya modeli yazyka sistemy raspoznavaniya russkoj rechi/ Informacionno-upravlyayushchie sistemy. 2010. № 4(47). S. 2–8 (In Russian).
Andreeva O.V., Bagirov M.B., Dan'kina A.A. Intellektual'nyj analiz dannyh na baze Stanford CoreNLP dlya opredeleniya chastej rechi v russkom yazyke/ Sistemy i sredstva informatiki. 2018. T. 28. № 2. S. 145–153 (In Russian).
Denisova D. S. Sovremennye sistemy mashinnogo perevoda/ Staticheskij mashinnyj perevod. Sinergiya nauk. 2018. № 19. S. 1425– 1434 (In Russian).
Sinyaev I.F., Shesterneva O.V. Issledovanie bagging podhoda pri postroenii ansamblya modelej dlya povysheniya tochnosti klassifikacii/ Aktual'nye problemy aviacii i kosmonavtiki. 2014. № 10. S. 300 (In Russian).
Guda S.A., Algasov A.S. Tekhnologii mashinnogo obucheniya dlya analiza geometrii molekul. Vestnik RGUPS. 2019. № 2(74). S. 84–89 (In Russian).
Tarasov K.G. Sravnenie dvuh algoritmov mashinnogo obucheniya: Random Forest i Gradient Boosted Decision Trees/ Mezhvuzovskaya nauchno-tekhnicheskaya konferenciya studentov, aspirantov i molodyh specialistov im. E.V. Armenskogo. 2016. S. 80–81 (In Russian).

Date of receipt: 2.02.2021

Approved after review: 20.02.2021

Accepted for publication: 09.03.2021