An exploration-exploitation model based on norepinepherine and dopamine activity

Samuel McClure, Mark S. Gilzenrat, Jonathan D. Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems
Pages867-874
Number of pages8
StatePublished - 2005
Externally publishedYes
Event2005 Annual Conference on Neural Information Processing Systems, NIPS 2005 - Vancouver, BC, Canada
Duration: Dec 5 2005Dec 8 2005

Other

Other2005 Annual Conference on Neural Information Processing Systems, NIPS 2005
CountryCanada
CityVancouver, BC
Period12/5/0512/8/05

Fingerprint

Neurons
Data recording
Reinforcement learning
Target tracking
Learning algorithms
Switches
Dopamine

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

McClure, S., Gilzenrat, M. S., & Cohen, J. D. (2005). An exploration-exploitation model based on norepinepherine and dopamine activity. In Advances in Neural Information Processing Systems (pp. 867-874)

An exploration-exploitation model based on norepinepherine and dopamine activity. / McClure, Samuel; Gilzenrat, Mark S.; Cohen, Jonathan D.

Advances in Neural Information Processing Systems. 2005. p. 867-874.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

McClure, S, Gilzenrat, MS & Cohen, JD 2005, An exploration-exploitation model based on norepinepherine and dopamine activity. in Advances in Neural Information Processing Systems. pp. 867-874, 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005, Vancouver, BC, Canada, 12/5/05.
McClure S, Gilzenrat MS, Cohen JD. An exploration-exploitation model based on norepinepherine and dopamine activity. In Advances in Neural Information Processing Systems. 2005. p. 867-874
McClure, Samuel ; Gilzenrat, Mark S. ; Cohen, Jonathan D. / An exploration-exploitation model based on norepinepherine and dopamine activity. Advances in Neural Information Processing Systems. 2005. pp. 867-874
@inproceedings{43f100e3d6784a059b563d58c9317dc9,
title = "An exploration-exploitation model based on norepinepherine and dopamine activity",
abstract = "We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.",
author = "Samuel McClure and Gilzenrat, {Mark S.} and Cohen, {Jonathan D.}",
year = "2005",
language = "English (US)",
isbn = "9780262232531",
pages = "867--874",
booktitle = "Advances in Neural Information Processing Systems",

}

TY - GEN

T1 - An exploration-exploitation model based on norepinepherine and dopamine activity

AU - McClure, Samuel

AU - Gilzenrat, Mark S.

AU - Cohen, Jonathan D.

PY - 2005

Y1 - 2005

N2 - We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.

AB - We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.

UR - http://www.scopus.com/inward/record.url?scp=34250317199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250317199&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:34250317199

SN - 9780262232531

SP - 867

EP - 874

BT - Advances in Neural Information Processing Systems

ER -