Solving the traveling salesman problem using reinforcement learning

350 rub

Journal Nonlinear World №2 for 2024 г.

Article in number:

Type of article: scientific article

DOI: https://doi.org/10.18127/j20700970-202402-02

UDC: 519.171.2 + 004.852

Keywords: Reinforcement learning traveling salesman task optimization algorithm graph banchmark

Authors:

N.V. Blokhin1, K.A. Zinovyeva2, R.A. Kochkarov3

1–3 Financial University under the Government of the Russian Federation (Moscow, Russia)
1 nvblokhin@fa.ru, 2 215888@edu.fa.ru ru, 3 rkochkarov@fa.ru

Abstract:

The article considers the solution of the traveling salesman problem using reinforcement learning. This task is to find the optimal route that passes through the specified cities at least once and then returns to the original city. The traveling salesman problem belongs to the class of NP-hard problems, and therefore finding an exact solution for sufficiently large data sets using classical mathematical methods turns out to be impossible in a reasonable time.

The aim of the work is to increase the efficiency and speed of solving the traveling salesman problem with a large number of vertices by using the reinforcement learning method.

A variant of solving the traveling salesman problem using the Q-learning algorithm is considered, which allows the agent to learn the optimal behavior strategy in the process of interacting with the environment, which is a set of cities and roads between them, which are modeled as a graph. Within the framework of this study, a software package was developed to solve the traveling salesman problem using reinforcement learning. A series of experiments related to the appearance of unforeseen conditions complicating the work of a traveling salesman, such as the appearance of "problem" zones or zones, the passage through which gives a reduction in time. The results of the experiments showed the capabilities of the developed algorithm for adaptation to changes and obstacles of the environment. The resulting algorithm shows itself to be more stable and reliable compared to the algorithms obtained by classical solution methods.

The solution of this problem can be used in practice in a variety of real situations, including optimization of deliveries or network routing, assistance in the work of logistics departments of various transport, postal and courier services.

Pages: 18-25

For citation

Blokhin N.V., Zinovyeva K.A., Kochkarov R.A. Solving the traveling salesman problem using reinforcement learning. Nonlinear World. 2024. V. 22. № 2. P. 18–25. DOI: https://doi.org/10.18127/ j20700970-202402-02 (In Russian)

References

Sajt Problemno-orientirovannoj informacionno-vychislitel'noj sistemy [Elektronnyj resurs]. URL: http://poivs.tsput.ru/ru/ Math/DiscreteMath/GraphTheory/ProblemOfTheTravelingSalesman (data obrashcheniya 24.08.2023 g.) (In Russian).
Marco Dorigo, Thomas Stutzle Ant colony optimization / «A Bradford book». Massachusetts Institute of Technology. 2004. 321 r. aco-book.pdf. [Elektronnyj resurs]. URL: cmu.edu (data obrashcheniya 20.08.2023 g.)
Konspekty universiteta ITMO [Elektronnyj resurs]. URL: https://neerc.ifmo.ru/wiki/index.php?title=%D0%9E%D0%B1% D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC (data obrashcheniya 26.08.2023) (In Russian).
Q-Learning introduction and Q Table – Reinforcement Learning w/ Python Tutorial p.1, r.2. // [Elektronnyj resurs]. URL: https://
pythonprogramming.net/q-learning-reinforcement-learning-python-tutorial/ (data obrashcheniya 07.08.2023 g.).
TSPLIB – biblioteka benchmarkov Gerharda Rejnel'ta [Elektronnyj resurs]. URL: http://comopt.ifi.uni-heidelberg.de/software/ TSPLIB95/tsp/ (data obrashcheniya 26.08.2023) (In Russian).
Realizaciya klassa Q agent [Elektronnyj resurs]. URL: kseniaznvv/tsp (github.com) (data obrashcheniya 20.08.2023 g.) (In Russian).
Théo Alves Da Costa Solving the traveling salesman problem with reinforcement learning, November 3, 2021 [Elektronnyj resurs]. URL: https://ekimetrics.github.io/blog/2021/11/03/tsp/ (data obrashcheniya 07.08.2023 g.).
Adam Paszke, Mark Towers Reinforcement learning (dqn) tutorial [Elektronnyj resurs]. URL: https://pytorch.org/tutorials/
intermediate/reinforcement_q_learning.html (data obrashcheniya 07.08.2023 g.).
Mele U.J., Gambardella L.M., Montemanni R. A1-14 Machine Learning Approaches for the Traveling Salesman Problem: A Survey [Elektronnyj resurs]. URL: A1-14 Machine Learning Approaches for the Traveling Salesman Problem- A Survey.pdf – Google Disk (data obrashcheniya 20.08.2023 g.).
Concorde TSP Benchmarks [Elektronnyj resurs]. URL: https://www.math.uwaterloo.ca/tsp/index.html (data obrashcheniya 20.08.2023 g.).

Date of receipt: 14.03.2024

Approved after review: 04.04.2024

Accepted for publication: 23.05.2024