Synaptic Plasticity and Reinforcement Learning


Synaptic Plasticity and Reinforcement Learning

Jupyter Notebook: Please work on learning.ipynb.

Introduction

In Neuroblox, we can add plasticity rules to our circuit models. The symbolic weights that are defined for every connection are the ones that are updated according to these plasticity rules after every simulation run. Weight updates are automatically handled after each simulation when doing reinforcement learning in Neuroblox. This is the topic that we will cover here.

We will consider two examples. First, a cortical circuit with Hebbian plasticity and then an extended circuit which implements reinforcement learning to categorize image stimuli between two categories without any a priori knowledge of the categorization rule.

In reinforcement learning (RL) we typically talk about an agent acting on an environment and then the environment returning new stimuli and rewards to the agent. We have used the same semantics in Neuroblox as we will see shortly. The Agent is constructed from the model graph and the ClassificationEnvironment is constructed by a Blox of multiple stimuli, which also contains information about the true category of each stimulus.

Learning goals

Cortico-Cortical Plasticity

using Neuroblox
using OrdinaryDiffEq ## to build the ODE problem and solve it, gain access to multiple solvers from this
using Random ## for generating random variables
using CairoMakie ## for customized plotting recipies for blox
using CSV ## to read data from CSV files
using DataFrames ## to format the data into DataFrames
using Downloads ## to download image stimuli files

N_trials = 10 ## number of trials
trial_dur = 1000 ## in ms

# download the stimulus images
image_set = CSV.read(Downloads.download("raw.githubusercontent.com/Neuroblox/NeurobloxDocsHost/refs/heads/main/data/stimuli_set.csv"), DataFrame) ## reading data into DataFrame format

model_name = :g
# define stimulus Blox
# t_stimulus: how long the stimulus is on (in ms)
# t_pause : how long the stimulus is off (in ms)
@named stim = ImageStimulus(image_set; namespace=model_name, t_stimulus=trial_dur, t_pause=0);

# Cortical Bloxs
@named VAC = CorticalBlox(; namespace=model_name, N_wta=4, N_exci=5,  density=0.05, weight=1)
@named AC = CorticalBlox(; namespace=model_name, N_wta=2, N_exci=5, density=0.05, weight=1)
# ascending system Blox, modulating frequency set to 16 Hz
@named ASC1 = NextGenerationEIBlox(; namespace=model_name, Cₑ=2*26,Cᵢ=1*26, alpha_invₑₑ=10.0/26, alpha_invₑᵢ=0.8/26, alpha_invᵢₑ=10.0/26, alpha_invᵢᵢ=0.8/26, kₑᵢ=0.6*26, kᵢₑ=0.6*26)

# learning rule
hebbian_cort = HebbianPlasticity(K=5e-4, W_lim=15, t_pre=trial_dur, t_post=trial_dur)

g = MetaDiGraph()

add_edge!(g, stim => VAC, weight=14)
add_edge!(g, ASC1 => VAC, weight=44)
add_edge!(g, ASC1 => AC, weight=44)
add_edge!(g, VAC => AC, weight=3, density=0.1, learning_rule = hebbian_cort) ## pass learning rule as a keyword argument

agent = Agent(g; name=model_name);
env = ClassificationEnvironment(stim, N_trials; name=:env, namespace=model_name);

fig = Figure(size = (1600, 800))

adjacency(fig[1,1], agent; title="Initial weights", colorrange=(0,7))

run_experiment!(agent, env; t_warmup=200.0, alg=Vern7())

adjacency(fig[1,2], agent; title="Final weights", colorrange=(0,7))
fig
Important Note: Neuroblox is a commercial product of Neuroblox, Inc.
It is free to use for non-commercial academic teaching
and research purposes. For commercial users, license fees apply.
Please refer to the End User License Agreement
(https://github.com/Neuroblox/NeurobloxEULA) for details.
Please contact sales@neuroblox.org for purchasing information.

To report any bugs, issues, or feature requests for Neuroblox software,
please use the public Github repository NeurobloxIssues, located at
https://github.com/Neuroblox/NeurobloxIssues.

Notice how the weight values in the upper left corner (connections with HebbianPlasticity) have changed after simulation.

Cortico-Striatal Circuit performing Category Learning

This is one simplified biological instantiation of an RL system. It is carrying out a simple RL behavior but not faithfully simulating physiology. The experiment we are simulating here is the category learning experiment [Antzoulatos2014] which was successfully modeled through a detailed corticostriatal model [2].

time_block_dur = 90.0 ## ms (size of discrete time blocks)
N_trials = 100 ## number of trials
trial_dur = 1000 ## ms

image_set = CSV.read(Downloads.download("raw.githubusercontent.com/Neuroblox/NeurobloxDocsHost/refs/heads/main/data/stimuli_set.csv"), DataFrame) ## reading data into DataFrame format

# additional Striatum Bloxs
@named STR1 = Striatum(; namespace=model_name, N_inhib=5)
@named STR2 = Striatum(; namespace=model_name, N_inhib=5)

@named tan_pop1 = TAN(κ=10; namespace=model_name)
@named tan_pop2 = TAN(κ=10; namespace=model_name)

@named SNcb = SNc(κ_DA=1; namespace=model_name)

# action selection Blox, necessary for making a choice
@named AS = GreedyPolicy(; namespace=model_name, t_decision=2*time_block_dur)

# learning rules
hebbian_mod = HebbianModulationPlasticity(K=0.06, decay=0.01, α=2.5, θₘ=1, modulator=SNcb, t_pre=trial_dur, t_post=trial_dur, t_mod=time_block_dur)
hebbian_cort = HebbianPlasticity(K=5e-4, W_lim=7, t_pre=trial_dur, t_post=trial_dur)

g = MetaDiGraph()

add_edge!(g, stim => VAC, weight=14)
add_edge!(g, ASC1 => VAC, weight=44)
add_edge!(g, ASC1 => AC, weight=44)
add_edge!(g, VAC => AC, weight=3, density=0.1, learning_rule = hebbian_cort)
add_edge!(g, AC => STR1, weight = 0.075, density =  0.04, learning_rule =  hebbian_mod)
add_edge!(g, AC => STR2, weight =  0.075, density =  0.04, learning_rule =  hebbian_mod)
add_edge!(g, tan_pop1 => STR1, weight = 1, t_event = time_block_dur)
add_edge!(g, tan_pop2 => STR2, weight = 1, t_event = time_block_dur)
add_edge!(g, STR1 => tan_pop1, weight = 1)
add_edge!(g, STR2 => tan_pop1, weight = 1)
add_edge!(g, STR1 => tan_pop2, weight = 1)
add_edge!(g, STR2 => tan_pop2, weight = 1)
add_edge!(g, STR1 => STR2, weight = 1, t_event = 2*time_block_dur)
add_edge!(g, STR2 => STR1, weight = 1, t_event = 2*time_block_dur)
add_edge!(g, STR1 => SNcb, weight = 1)
add_edge!(g, STR2 => SNcb, weight = 1)
# action selection connections
add_edge!(g, STR1 => AS);
add_edge!(g, STR2 => AS);

The last two connections add the ability to output actions. The AS Blox is a GreedyPolicy meaning that it will compare the activity of both Striatum Bloxs STR1 and STR2 and select the highest value. If STR1 wins then the left choice is made and if STR2 wins then the model chooses the right direction as the true dot movement direction.

agent = Agent(g; name=model_name, t_block = time_block_dur); ## define agent
env = ClassificationEnvironment(stim, N_trials; name=:env, namespace=model_name)

fig = Figure(title="Adjacency matrix", size = (1600, 800))

adjacency(fig[1,1], agent; title = "Before Learning", colorrange=(0,0.2))

trace = run_experiment!(agent, env; t_warmup=200.0, alg=Vern7(), verbose=true)
Trial = 1, Category choice = 1, Response = False
Trial = 2, Category choice = 2, Response = False
Trial = 3, Category choice = 2, Response = False
Trial = 4, Category choice = 1, Response = False
Trial = 5, Category choice = 2, Response = False
Trial = 6, Category choice = 1, Response = False
Trial = 7, Category choice = 2, Response = Correct
Trial = 8, Category choice = 2, Response = Correct
Trial = 9, Category choice = 2, Response = Correct
Trial = 10, Category choice = 2, Response = Correct
Trial = 11, Category choice = 2, Response = Correct
Trial = 12, Category choice = 2, Response = Correct
Trial = 13, Category choice = 1, Response = Correct
Trial = 14, Category choice = 2, Response = Correct
Trial = 15, Category choice = 1, Response = Correct
Trial = 16, Category choice = 1, Response = Correct
Trial = 17, Category choice = 2, Response = False
Trial = 18, Category choice = 1, Response = Correct
Trial = 19, Category choice = 2, Response = False
Trial = 20, Category choice = 2, Response = Correct
Trial = 21, Category choice = 1, Response = Correct
Trial = 22, Category choice = 1, Response = Correct
Trial = 23, Category choice = 2, Response = False
Trial = 24, Category choice = 1, Response = False
Trial = 25, Category choice = 1, Response = Correct
Trial = 26, Category choice = 2, Response = False
Trial = 27, Category choice = 2, Response = False
Trial = 28, Category choice = 1, Response = Correct
Trial = 29, Category choice = 1, Response = False
Trial = 30, Category choice = 1, Response = Correct
Trial = 31, Category choice = 1, Response = False
Trial = 32, Category choice = 1, Response = Correct
Trial = 33, Category choice = 1, Response = False
Trial = 34, Category choice = 2, Response = Correct
Trial = 35, Category choice = 1, Response = False
Trial = 36, Category choice = 1, Response = Correct
Trial = 37, Category choice = 1, Response = Correct
Trial = 38, Category choice = 2, Response = False
Trial = 39, Category choice = 2, Response = Correct
Trial = 40, Category choice = 1, Response = Correct
Trial = 41, Category choice = 2, Response = Correct
Trial = 42, Category choice = 2, Response = False
Trial = 43, Category choice = 2, Response = False
Trial = 44, Category choice = 2, Response = Correct
Trial = 45, Category choice = 2, Response = False
Trial = 46, Category choice = 1, Response = Correct
Trial = 47, Category choice = 2, Response = False
Trial = 48, Category choice = 1, Response = False
Trial = 49, Category choice = 2, Response = Correct
Trial = 50, Category choice = 2, Response = Correct
Trial = 51, Category choice = 2, Response = False
Trial = 52, Category choice = 1, Response = Correct
Trial = 53, Category choice = 2, Response = Correct
Trial = 54, Category choice = 2, Response = Correct
Trial = 55, Category choice = 1, Response = False
Trial = 56, Category choice = 1, Response = False
Trial = 57, Category choice = 2, Response = False
Trial = 58, Category choice = 2, Response = False
Trial = 59, Category choice = 2, Response = False
Trial = 60, Category choice = 2, Response = False
Trial = 61, Category choice = 1, Response = False
Trial = 62, Category choice = 2, Response = Correct
Trial = 63, Category choice = 2, Response = Correct
Trial = 64, Category choice = 2, Response = False
Trial = 65, Category choice = 1, Response = Correct
Trial = 66, Category choice = 2, Response = Correct
Trial = 67, Category choice = 2, Response = False
Trial = 68, Category choice = 2, Response = Correct
Trial = 69, Category choice = 2, Response = Correct
Trial = 70, Category choice = 1, Response = Correct
Trial = 71, Category choice = 1, Response = Correct
Trial = 72, Category choice = 2, Response = Correct
Trial = 73, Category choice = 1, Response = False
Trial = 74, Category choice = 2, Response = False
Trial = 75, Category choice = 1, Response = Correct
Trial = 76, Category choice = 1, Response = False
Trial = 77, Category choice = 2, Response = False
Trial = 78, Category choice = 1, Response = Correct
Trial = 79, Category choice = 1, Response = Correct
Trial = 80, Category choice = 1, Response = Correct
Trial = 81, Category choice = 2, Response = Correct
Trial = 82, Category choice = 2, Response = False
Trial = 83, Category choice = 2, Response = False
Trial = 84, Category choice = 1, Response = False
Trial = 85, Category choice = 1, Response = False
Trial = 86, Category choice = 2, Response = False
Trial = 87, Category choice = 1, Response = Correct
Trial = 88, Category choice = 1, Response = Correct
Trial = 89, Category choice = 2, Response = Correct
Trial = 90, Category choice = 2, Response = False
Trial = 91, Category choice = 1, Response = False
Trial = 92, Category choice = 2, Response = False
Trial = 93, Category choice = 1, Response = False
Trial = 94, Category choice = 1, Response = False
Trial = 95, Category choice = 2, Response = False
Trial = 96, Category choice = 2, Response = Correct
Trial = 97, Category choice = 2, Response = Correct
Trial = 98, Category choice = 2, Response = Correct
Trial = 99, Category choice = 1, Response = Correct
Trial = 100, Category choice = 1, Response = Correct
(trial = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], correct = Bool[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1], action = [1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1])

trace is a NamedTuple containing useful outcomes for each trial of the experiment:

trace.trial ## trial indices
trace.correct ## whether the response was correct or not on each trial
trace.action; ## what responce was made on each trial, 1 is left and 2 is right

adjacency(fig[1,2], agent; title = "After Learning", colorrange=(0,0.2))
fig

Notice the changes in weight values after the RL experiment.

Challenge Problems

References

CC BY-SA 4.0 Neuroblox Inc. Last modified: January 16, 2025. Website built with Franklin.jl and the Julia programming language.