350 rub
Journal Neurocomputers №12 for 2011 г.
Article in number:
Separate state-value calculation in frontal cortex: reward-guided and punishment-guided learning
Authors:
I. A. Smirnitskaya
Abstract:
The implementation of reinforcement learning (RL) theory in the brain is discussed. In particular, how cortical regions represents value function. We consider positive (in the case of reward) and negative (in the case of punishment) values to be calculated in different ways. Orbitofrontal cortex is known to calculate value of both signs. We consider that negative value calculation don-t use RL methods. We propose the algorithm of amygdala participation in negative value calculation. In the whole, full algorithm of state-value calculation include reinforcement learning in the case of reward and learning by means of amygdala influence in the case of punishment.
Pages: 33-44
References
  1. Горбачевская А.И., Чивилева, О.Г. Морфологический анализ путей проведения информации в базальных ганглиях млекопитающих // Успехи физиологических наук. 2003. Т. 34. № 2. С. 46-63.
  2. Мержанова Н.Х., Долбакян Э.Е., Хохлова В.Н.Межнейронные фронто-гиппокампальные взаимодействия у кошек, обученных выбору качества подкрепления // Журнал высшей нервной деятельности. 2003. Т. 53. № 3. С. 290-298.
  3. Саттон Р.С., Барто Э.Г. Обучение с подкреплением: Пер. с англ. Е. О. Романова / под ред. Ю. В. Тюменцева. М.: Бином. 2011.
  4. Силькис И.Г.Участие дофамина в усилении корковых сигналов, активизирующих NMDA-рецепторы в стриатуме (гипотетический механизм) // Российский физиологический журнал им. И.М. Сеченова. 2001. Т. 87. № 12. С.1569-1578.
  5. Смирнитская И.А., Фролов А.А., Мержанова Г.Х.Модель выбора вознаграждения, на основе теории обучения по подкреплению // Нейрокомпьютеры: разработка, применение. 2006. Т. 57. № 2. С. 133 - 143.
  6. Alexander, G.E., Crutcher, M.D., Functional architecture of basal ganglia circuits: neural substrates of parallel processing // Trends Neurosci. 1990. V. 13. № 7. P. 266-271.
  7. Christensen, M.S., Lundbye-Jensen, J., Petersen, N., Geertsen, S.S., Paulson, O.B., Nielsen, J.B.,Watching Your Foot Move - An fMRI Studyof Visuomotor Interactions during Foot Movement // Cerebral Cortex. 2007. V. 17. P. 1906-1917.
  8. Hosokawa, T., Kato, K., Inoue, M., and Mikami, A., Neurons in themacaque orbitofrontal cortex code relative preference of both rewardingand aversive outcomes // Neuroscience Research. 2007. V. 57. P. 434-445.
  9. Höistad, M., Barbas, H.,Sequence of information processing for emotions throughpathways linking temporal and insular cortices with the amygdale //Neuroimage. 2008. V. 40. № 3. P. 1016-1033.
  10. Ito, M., Doya, K., Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit // Curr. Opinion in Neurobiol. 2011. V. 21. P. 368-373.
  11. Kennerly, S. W., Wallton, M. E.,Decision Making and Reward in Frontal Cortex: Complementary Evidence From Neurophysiological and Neuropsychological Studies// Behav. Neurosci. 2011. V. 125. № 3.P. 297-317.
  12. Kennerley, S.W.,Walton, M.E., Behrens, T.E., Buckley, M.J., Rushworth, M.F., Optimal decision making and the anterior cingulate cortex // Nat. Neurosci. 2006. V. 9. № 7. P. 940-947.
  13. Khamassi, M., Lallée, S., Procyk, E., Dominey, P.F., Robot cognitive control with a neurophysiologically inspired reinforcement learning model // Front. in Neurosci. July 2011. V. 5
  14. Medalla, M., Barbas, H.,Anterior cingulate synapses in prefrontal areas 10 and 46suggest differential influence in cognitive control //J. Neurosci. 2010. V. 30. № 48. P. 16068-16081.
  15. Ongur, D, Price, J. L.,The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans // Cerebral Cortex. 2000. V. 10. P. 206-219.
  16. Paré, D., Quirk,G. J., Ledoux,J. E., New Vistas on Amygdala Networks in Conditioned Fear // J. Neurophysiol. 2004.
    V. 92. P. 1-9.
  17. Paton, J. J., Belova, M. A., Morrison, S. E., Saltzman, C. D., Primite amygdala represents positive and negative value of visual stimuli during learning // 2006. Nature. V. 439(7078). P. 865-870.
  18. Phelps, E. A., Delgado, M. R., Nearing, K. I., LeDoux, J. E., Extinction Learning in Humans: Role of the Amygdala and vmPFC // Neuron. 2004. V. 43. P. 897-905.
  19. Polli, F. E., Barton, J.J.S., Thakkar, K. N., Greve, D.N., Goff, D.C., Rauch, S.L., Manoach, D.S., Reduced error-related activation in two anterior cingulate circuits is related to impaired performance in schizophrenia // Brain. 2008. V. 131. P. 971-986.
  20. Quirk, G. J., Likhtik, E., Pelletier, J. G., and Pare, D., Stimulation ofmedial prefrontal cortex decreases the responsiveness of centralamygdala output neurons// J. Neurosci. 2003. V. 23. P. 8800-8807.
  21. Quirk, G. J. and Mueller, D., Neural mechanisms of extinction learning and retrieval // Neuropsychopharmacology. 2008.
    V. 33. P. 56-72.
  22. Saltzman, C. D., Fusi, S.,Emotion, Cognition, and Mental State Representation in Amygdala and Prefrontal Cortex // Annu. Rev. Neurosci. 2010. V. 33. P. 173-202.
  23. Sesack, S. R., Grace, A. A.,Cortico-Basal Ganglia Reward Network: Microcircuitry // Neuropsychopharmacology. 2010.
    V. 35. № 1. P. 27-47.
  24. Schoenbaum, G., Chiba, A.A., Gallagher, M.,Neural Encoding in Orbitofrontal Cortex and Basolateral Amygdala during Olfactory Discrimination Learning // J. Neurosci. 1999. V. 19. № 5. P. 1876-1884.
  25. Shultz, W., Predictive reward signal of dopamine neurons // J. Neurophysiol. 1998. V. 80. P. 1-27.
  26. Silvetti, M., Seurinck, R, Verguts, T.,Value and prediction error in medial frontal cortex: integrating the single-unit and systems levels of analysis // Front. Neurosci., August 2011. V. 5.
  27. Simmons, D. A., Brooks, B. M., Neill, D. B., GABAergic inactivation of basolateral amygdala alters behavioral processes other than primary reward of ventral tegmental self-stimulation// Behav. Brain Res. 2007. V. 181. P. 110-117.
  28. Wallis, J. D. and Kennerley, S. W., Heterogeneous reward signals in prefrontal cortex// Current Opinion in Neurobiology. 2010. V. 20. P. 191-198.