350 rub
Journal Nonlinear World №1 for 2025 г.
Article in number:
Construction of hierarchical vector representations of OKVED codes using fully connected adapters
Type of article: scientific article
DOI: https://doi.org/10.18127/j20700970-202501-07
UDC: 519.171.2 + 004.852
Authors:

N.V. Blokhin 1

1 Financial University under the Government of the Russian Federation (Moscow, Russia)
1nvblokhin@fa.ru

Abstract:

Industry classification is a crucial characteristic of a company, influencing its business environment, investment risks, and other key operational aspects. In Russia, the OKVED codes are used to describe industry affiliation, but they possess a hierarchical structure and an uneven distribution, which makes their representation challenging. Traditional machine learning methods often struggle to analyze and predict outcomes based on these hierarchical attributes. By leveraging modern neural network techniques, it is possible to construct compact vector representations of OKVED codes that capture their hierarchical nature and serve as informative features for economic analysis tasks.

The proposed approach creates a vector for an industry code in an iterative manner, in which vector representations for increasingly detailed levels of the hierarchy are developed. At each step of the process, a new vector corresponding to the next level of depth is combined with the vector from the previous level. This iterative layering significantly enhances the expressiveness of the vectors, particularly for underrepresented codes. To facilitate the transition between hierarchical levels, fully connected adapter layers are introduced. Additionally, the study explores the integration of attention mechanisms during the transition between levels. This allows the model to dynamically focus on the most meaningful representations of the levels, thereby theoretically improving the quality of the generated vectors.

The effectiveness of the obtained embeddings is demonstrated by integrating the developed representations into a default prediction model and observing an improvement in the model's performance. Experimental results indicate a notable improvement in the Gini score when the embeddings are utilized, compared to the model that does not incorporate industry affiliation information at all, and to models that employ simpler embedding methods such as one-hot encoding or plain lookup tables. We also compared the proposed hierarchical representations with embeddings obtained by concatenating independent vectors of different levels of the classifier and demonstrated that our method surpasses such approaches as well.

The results of this research can be applied to various economic tasks, including financial risk assessment, market segmentation, and industry analysis. The proposed models and methods for generating OKVED code embeddings can be validated using publicly available datasets and extended to other applied scenarios. By incorporating hierarchical structures into vector representations, this approach enhances the interpretability and usability of industry classification data in machine learning applications.

Pages: 59-71
For citation

Blokhin N.V. Construction of hierarchical vector representations of OKVED codes using fully connected adapters. Nonlinear World. 2025. V. 23. № 1. P. 59–71. DOI: https:// doi.org/10.18127/ j20700970-202501-07 (In Russian)

References
  1. Prikaz Rosstata ot 31.12.2014 № 742 (red. ot 04.02.2016) «O Metodicheskih ukazaniyah po opredeleniyu osnovnogo vida ekonomicheskoj deyatel'nosti hozyajstvuyushchih sub"ektov na osnove Obshcherossijskogo klassifikatora vidov ekonomicheskoj deyatel'nosti (OKVED2) dlya formirovaniya svodnoj oficial'noj statisticheskoj informacii» (In Russian)
  2. Porter M.E. Competitive Strategy: Techniques for Analyzing Industries and Competitors, Free Press. New York. NY. 1980.
  3. Uryadnikova M.V. Osobennosti vliyaniya otraslevyh rynkov na cenoobrazovanie. Vestnik ChGU. 2010. №1. URL: https://cyberleninka.ru/article/n/osobennosti-vliyaniya-otraslevyh-rynkov-na-tsenoobrazovanie (data obrashcheniya: 25.01.2025) (In Russian).
  4. «OK 029-2014 (KDES Red. 2). Obshcherossijskij klassifikator vidov ekonomicheskoj deyatel'nosti» (utv. Prikazom Rosstandarta ot 31.01.2014 № 14-st) (red. ot 14.03.2024) (In Russian)
  5. Hancock J., Khoshgoftaar T. Survey on categorical data for neural networks. Journal of Big Data. 2020. V. 7. P. 28.
  6. Yu M., Chen X., Gu X., Liu H., Du L. A subspace constraint based approach for fast hierarchical graph embedding. World Wide Web. 2023. V. 26. P. 3691–3705.
  7. Yu J., Zhang C., Hu Z., Ji Y. Embedding Hierarchical Tree Structure of Concepts in Knowledge Graph Embedding. Electronics. 2024. V. 13. № 22.
  8. Goodfellow I., Bengio Y., Courville A. Deep Learning. 2016. MIT Press.
  9. Zhang S., Hanghang T., Xu J., Maciejewski R. Graph convolutional networks: a comprehensive review. Computational Social Networks. 2019. V. 6.
  10. Hamilton W. L., Ying R., Leskovec J. Inductive Representation Learning on Large Graphs. Proceedings of the 31st International Conference on Neural In-formation Processing Systems. 2017. P. 1025–1035.
  11. Lin T. Focal Loss for Dense Object Detection. IEEE International Conference on Computer Vision (ICCV). 2017. P. 2999–3007.
  12. Schechtman E., Schechtman G. The relationship between Gini terminology and the ROC curve. METRON. 2019. V. 77.
  13. Deep Graph Library. URL: https://www.dgl.ai/ (data obrashcheniya: 26.01.2025).
  14. Blohin N.V., Makrushin S.V. Postroenie vektornogo predstavleniya otraslej ekonomiki s pomoshch'yu grafovyh nejronnyh setej. Informacionno-izmeritel'nye i upravlyayushchie sistemy. 2023. T. 21. № 5. S. 7–15. DOI 10.18127/j20700814-202305-02 (In Russian).
Date of receipt: 26.01.2025
Approved after review: 16.02.2025
Accepted for publication: 26.02.2025