350 rub
Journal Neurocomputers №2 for 2023 г.
Article in number:
Analysis of neural network language models for solving problems of text data processing
Type of article: scientific article
DOI: https://doi.org/10.18127/j19998554-202302-01
UDC: 004.89
Authors:

G.S. Ivanova1, P.A. Martynyuk2

1–2 Bauman Moscow State Technical University (Moscow, Russia)

Abstract:

Problem setting. Nowadays, there is an increasing need to implement tools into software systems that enable automatic or automated processing of text data. This is due to the increasing rate of growth in the amount of information presented in the form of text, both in individual information systems and in the global Internet. The development of modern technologies of machine and deep learning, in turn, has led to an increase in the popularity of neural network models. This article is devoted to the analysis of neural network models used to solve classical problems of processing text data in natural language.

Target. Clarification of the range of tasks to be solved in natural language processing for each classical neural network language model.

Results. For each of the considered models, the features of the architecture and principles of functioning are formulated, the strengths and weaknesses of the models are highlighted. The dependence between the architectures of models and the range of tasks they solve is given.

Practical significance. The results of the analysis can be of practical value for developers of text data processing systems. The article provides basic information about the most popular neural network models, which can help specialists in choosing a specific neural network architecture.

Pages: 5-20
For citation

Ivanova G.S., Martynyuk P.A., Analysis of neural network language models for solving problems of text data processing. Neurocomputers. 2023. V. 25. № 2. Р. 5-20. DOI: https://doi.org/10.18127/j19998554-202302-01 (In Russian)

References
  1. Malte A., Ratadiya P. Evolution of Transfer Learning in Natural Language Processing. arXiv preprint arXiv. 1910.07370. 2019.
  2. Shelukhin O.I., Kostin D.V. Classification of abnormal states of computer systems by means of intelligent analysis of system logs. Neurocomputers: development, application. 2020. V. 22. №. 1. P. 66–76. DOI 10.18127/j19998554-2001-07. (in Russian)
  3. Glazkova A.V. Comparison of neural network models for classification of text fragments containing biographical information. Software products and systems. 2019. №2. P. 263–267. (in Russian)
  4. Sozykin A.V. An overview of methods for deep learning in neural networks. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2017. V. 6. № 3. P. 28–59. (in Russian)
  5. Schuster M., Paliwal K.K. Bidirectional recurrent neural networks. Proceedings of the 1997 IEEE Transactions on Signal Processing. 1997. V. 45. № 11. P. 2673–2681.
  6. Lipton Z.C., Berkowitz J., Elkan Ch. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv preprint arXiv:1506.00019v4. 2015.
  7. Informatik F., Bengio Y., Frasconi P., Schmidhuber J. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. 2003. 15 p.
  8. Pustynnyj Ya.N. Solving the problem of vanishing gradient using long short term memory neural networks. Innovations and Investments magazine. 2020. №2. P. 130–132. (in Russian)
  9. Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation. 1997. V. 9. Is. 8. P.1735–1780.
  10. Gers F.A., Schmidhuber J., Cummins F. Learning to Forget: Continual Prediction with LSTM. Neural Computation. 2000. V. 12. Is. 10. P. 2451–2471.
  11. Gers F., Schmidhuber J. Recurrent nets that time and count. Proceedings of the International Joint Conference on Neural Networks. 2000. V. 3. P. 189–194.
  12. Cho K. et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. P. 1724–1734.
  13. Józefowicz R., Zaremba W., Sutskever I. An Empirical Exploration of Recurrent Network Architectures. Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML). V. 37. 2015. P. 2342–2350.
  14. Zhou G.B. et al. Minimal Gated Unit for Recurrent Neural Networks. International Journal of Automation and Computing. 2016. 13. 10.1007/s11633-016-1006-2.
  15. Sutskever I., Vinyals O., Le Q. Sequence to sequence learning with neural networks. Advances in neural information processing systems. 2014. P. 3104–3112.
  16. Thakur A. LSTM RNN in Keras: Examples of One-to-Many, Many-to-One & Many-to-Many. Weights & Biases. Machine learning experiment tracking, dataset versioning, and model evaluation. Electronic resource. Access mode: https://wandb.ai/fully-connected, date of reference 01.10.2022.
  17. Peters M.E. et al. Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018. V. 1. P. 2227–2237.
  18. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019. V. 1. P. 4171–4186.
  19. Aitken K., Ramasesh V., Cao Yu., Maheswaranathan N. Understanding How Encoder-Decoder Architectures Attend. arXiv preprint arXiv:2110.15253. 2021.
  20. Bahdanau D., Cho K., Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014.
  21. Vaswani A. et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017. V. 30. P. 5998–6008.
  22. Alammar J. The Illustrated Transformer. Electronic resource. Access mode: http://jalammar.github.io/illustrated-transformer/, date of reference 05.10.2022.
  23. Hugging F. The AI community building the future. Electronic resource. Access mode: https://huggingface.co/, date of reference 05.10.2022.
  24. Emon E.A., Rahman S., Banarjee J., Das A.K., Mittra T. A Deep Learning Approach to Detect Abusive Bengali Text. Proceedings of the 7th International Conference on Smart Computing & Communications (ICSCC). 2019. P. 1–5.
  25. Merkx D., Frank S. Comparing Transformers and RNNs on predicting human sentence processing data. arXiv preprint arXiv:2005.09471. 2020.
  26. Lakew S.M., Cettolo M., Federico M. A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation. Proceedings of the 27th International Conference on Computational Linguistics. 2018. PP. 641–652.
  27. Pramodya A., Pushpananda R., Weerasinghe R. A Comparison of Transformer, Recurrent Neural Networks and SMT in Tamil to Sinhala MT. Proceedings of the 20th International Conference on Advances in ICT for Emerging Regions (ICTer). 2020. P. 155–160.
  28. Karita Sh. et al. A Comparative Study on Transformer vs RNN in Speech Applications. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019. P. 449–456.
  29. Merity S. Single Headed Attention RNN: Stop Thinking With Your Head. arXiv preprint arXiv:1911.11423. 2019.
Date of receipt: 15.02.2023
Approved after review: 28.02.2023
Accepted for publication: 20.03.2023