350 rub
Journal Neurocomputers №3 for 2019 г.
Article in number:
Modification of the algorithm for identification and categorization of scientific terms through a neural network
Type of article: scientific article
DOI: 10.18127/j19998554-201903-02
UDC: 004.89
Authors:

V. V. Bakhtin – Post-graduate Student, Department of Automatics and Telemechanics, Perm State National Research University; Design Programmer, “Satellite” LLC (Perm)

E-mail: bakhtin_94@bk.ru

Abstract:

The paper reviews a project on the automation of the term system construction. TSBuilder (Term System Builder) was developed in 2014 as a multilayer Rosenblatt’s perceptron for supervised machine learning, 1–3 word terms identification in natural language texts and their rigid categorization. The program was modified to reduce rigidity of categorization which will brings text mining more in line with human thinking. We have expanded the range of parameters (semantical, morphological, and syntactical) for categorization, removing the restriction of the term length of three words. We have used convolution on a continuous sequence of terms, and have presented the probabilities of a term falling into different categories. The neural network will not assign a single category to a term but will give N answers (where N is the number of predefined classes), each of which O  [0, 1] is the probability of the term belonging to a given class.

The article consists of an introduction, three parts and a conclusion. In the introduction, the relevance of the study is substantiated; the task of improving the accuracy of automated identification and classification of terminological units using neural networks is set. The first section briefly describes the algorithm implemented in the first version of the TSBuilder software package. The second section describes the methods of improving the algorithm of identification and classification of terms, their impact on the final result of the work. The third section analyzes the results of the implementation of the previously described methods of improving the algorithm. Based on the results of a study, authors made conclusion on increasing the flexibility of the algorithm and expanding the possibilities of its application.

Pages: 14-19
References
  1. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958. V. 65. № 6. P. 386-408. URL: http://dx.doi.org/10.1037/h0042519
  2. Macmillan dictionary [Elektronnyj resurs]. URL: https://www.macmillandictionary.com/dictionary/british/lemma (data obrashcheniya: 30.05.2019).
  3. Bakhtin V., Isaeva E. Developing an algorithm for identification and categorization of scientific terms in natural language text through the elements of artificial intelligence. Proceedings of 14th International Scientific-Technical Conference on Actual Problems of Electronic Instrument Engineering (APEIE). Novosibirsk: 2018. P. 384–390.
  4. List of English stop words. XPO6: Blog of an enthusiast (AI, NLP, domaining and investments) [Elektronnyj resurs]. URL: http:// xpo6.com/list-of-english-stop-words/ (data obrashcheniya: 02.06.2018).
Date of receipt: 27 июня 2019 г.