Reinforcement Learning

The following section documents the infrastructure for performing reinforcement learning on neural systems. The neural system acts as the agent, while the environment is a series of sensory stimuli presented to the model. The agent's action is the classification of the stimuli, and the choice follows some policy. Learning occurs as the connection weights of the system are updated according to some learning rule after each choice made by the agent.

NeurobloxPharma.Agent — Type

Agent(g::GraphSystem; t_block=missing, u0=[], p=[], kwargs...)

A reinforcement learning agent, used to interact with an AbstractEnvironment to simulate a learning task.

Arguments :

g : A GraphSystem containing the model that the agent is using to make choices and update its connections during reinforcement learning.

Keyword arguments :

u0 : Initial conditions for the model in g. If not provided then default values will be used.
p : Parameter values for the model in g. If not provided then default values will be used.
t_block : The time period of a PeriodicCallback which will reset the cumulative spike counter of neurons in the model. This is optional and can be useful when the plasticity rules require number of spikes within specific time windows to update connection weights.
kwargs... : All other keyword arguments are passed to the ODEProblem that is constructed inside the agent and solved during runexperiment! and runtrial! .

NeurobloxPharma.ClassificationEnvironment — Type

ClassificationEnvironment(stim::ImageStimulus, N_trials; name, namespace, t_stimulus, t_pause)

Create an environment for reinforcement learning. A set of images is presented to the agent to be classified. This struct stores the correct class for each image, and the current trial of the experiment.

Arguments:

stim: The ImageStimulus, created from a set of images
N_trials: Number of trials. The agent performs one classification each trial.
t_stimulus: The length of time the stimulus is on (ms)
t_pause: The length of time the stimulus is off (ms)

The following policies are implemented in Neuroblox. Policies are represented by the AbstractActionSelection type. Policies are added as nodes to the graph, with the set of actions represented by incoming connections.

NeurobloxPharma.GreedyPolicy — Type

GreedyPolicy(; name, t_decision, namespace, competitor_states = Symbol[])

A policy that makes a choice by picking the state with the highest value among competitor_states which represent each available choice. t_decision is the time of the decision.

The following learning rules are implemented in Neuroblox:

NeurobloxPharma.HebbianPlasticity — Type

HebbianPlasticity(; K, W_lim,
                  state_pre = nothing,
                  state_post = nothing,
                  t_pre = nothing,
                  t_post = nothing)

Hebbian plasticity rule. The connection weight is updated according to :

\[ w_{j+1} = w_j + \text{feedback} × K x_\text{pre} x_\text{post} (W_\text{lim} - w)\]

where feedback is a binary indicator of the correctness of the model's action, and x indicates the activity of the pre- and post-synaptic neuron states state_pre and state_post at timepoints t_pre and t_post respectively.

Arguments: - K : the learning rate of the connection - Wlim : the maximum weight for the connection - statepre : state of the presynaptic neuron that is used in the plasticity rule (by default this is state V in neurons). - statepost : statepre : state of the postsynaptic neuron that is used in the plasticity rule (by default this is state V in neurons). - tpre : timepoint at which `statepreis evaluated to be used in the plasticity rule. - t_post : t_pre : timepoint at whichstate_post` is evaluated to be used in the plasticity rule.

See also HebbianModulationPlasticity.

NeurobloxPharma.HebbianModulationPlasticity — Type

HebbianModulationPlasticity(; K, decay, α, θₘ,
                            state_pre = nothing,
                            state_post = nothing,
                            t_pre = nothing,
                            t_post = nothing,
                            t_mod = nothing,
                            modulator = nothing)

Hebbian plasticity rule, modulated by the dopamine reward prediction error. The weight update is largest when the reward prediction error is far from the modulation threshold θₘ.

\[ ϵ = \text{feedback} - (\text{DA}_b - \text{DA}) w_{j+1} = w_j + \max(\times K x_\text{pre} x_\text{post} ϵ(ϵ + θₘ) dσ(α(ϵ + θₘ)) - \text{decay} × w, -w)\]

where feedback is a binary indicator of the correctness of the model's action, DA_b is the baseline dopamine level, DA is the modulator's dopamine release, dσ is the derivative of the logistic function, and x indicates the activity of the pre- and post-synaptic neuron states state_pre and state_post at timepoints t_pre and t_post respectively. The decay prevents the weights from diverging.

Arguments: - K: the learning rate of the connection - decay: Decay of the weight update - α: the selectivity of the derivative of the logistic function - θₘ: the modulation threshold for the reward prediction error - statepre : state of the presynaptic neuron that is used in the plasticity rule (by default this is state V in neurons). - statepost : statepre : state of the postsynaptic neuron that is used in the plasticity rule (by default this is state V in neurons). - tpre : timepoint at which state_pre is evaluated to be used in the plasticity rule. - tpost : tpre : timepoint at which state_post is evaluated to be used in the plasticity rule.