Synaptic Plasticity and Reinforcement Learning
Jupyter Notebook: Please work on
learning.ipynb.
<iframe width="560" height="315" src="https://www.youtube.com/embed/pBvgcIHK6GY?si=SxGU7CNdErHnvVKc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>Introduction
In Neuroblox, we can add plasticity rules to our circuit models. The symbolic weights that are defined for every connection are the ones that are updated according to these plasticity rules after every simulation run. Weight updates are automatically handled after each simulation when doing reinforcement learning in Neuroblox. This is the topic that we will cover here.
We will consider two examples. First, a cortical circuit with Hebbian plasticity and then an extended circuit which implements reinforcement learning to categorize image stimuli between two categories without any a priori knowledge of the categorization rule.
In reinforcement learning (RL) we typically talk about an agent acting on an environment and then the environment returning new stimuli and rewards to the agent. We have used the same semantics in Neuroblox as we will see shortly. The Agent is constructed from the model graph and the ClassificationEnvironment is constructed by a Blox of multiple stimuli, which also contains information about the true category of each stimulus.
Learning goals
- Simulate circuits with Hebbian synaptic plasticity.
- Simulate circuits performing reinforcement learning in a behavioral task.
Cortico-Cortical Plasticity
using Neuroblox
using OrdinaryDiffEqDefault, OrdinaryDiffEqVerner ## to build the ODE problem and solve it, gain access to multiple solvers from this
using Random ## for generating random variables
using CairoMakie ## for customized plotting recipies for blox
using CSV ## to read data from CSV files
using DataFrames ## to format the data into DataFrames
using Downloads ## to download image stimuli files
N_trials = 10 ## number of trials
trial_dur = 1000 ## in ms
# download the stimulus images
image_set = CSV.read(Downloads.download("raw.githubusercontent.com/Neuroblox/NeurobloxDocsHost/refs/heads/main/data/stimuli_set.csv"), DataFrame) ## reading data into DataFrame format
# define stimulus Blox
# t_stimulus: how long the stimulus is on (in ms)
# t_pause : how long the stimulus is off (in ms)
model_name = :g
@graph g begin
@nodes begin
stim = ImageStimulus(image_set; t_stimulus=trial_dur, t_pause=0);
# Cortical Bloxs
VAC = Cortical(; N_wta=4, N_exci=5, density=0.05, weight=1)
AC = Cortical(; N_wta=2, N_exci=5, density=0.05, weight=1)
# ascending system Blox, modulating frequency set to 16 Hz
ASC1 = NextGenerationEI(; Cₑ=2*26,Cᵢ=1*26, alpha_invₑₑ=10.0/26, alpha_invₑᵢ=0.8/26, alpha_invᵢₑ=10.0/26, alpha_invᵢᵢ=0.8/26, kₑᵢ=0.6*26, kᵢₑ=0.6*26)
end
# learning rule
hebbian_cort = HebbianPlasticity(K=5e-4, W_lim=15, t_pre=trial_dur, t_post=trial_dur)
@connections begin
stim => VAC, [weight=14]
ASC1 => VAC, [weight=44]
ASC1 => AC, [weight=44]
VAC => AC, [weight=3, density=0.1, learning_rule = hebbian_cort] ## pass learning rule as a keyword argument
end
end
agent = Agent(g; name=:g);
env = ClassificationEnvironment(stim, N_trials; name=:env, namespace=model_name);
fig = Figure(size = (1600, 800))
adjacency(fig[1,1], agent; title="Initial weights", colorrange=(0,7))
run_experiment!(agent, env; t_warmup=200.0, alg=Vern7())
adjacency(fig[1,2], agent; title="Final weights", colorrange=(0,7))
fig┌ Warning: No action selection provided
└ @ NeurobloxBase.GraphDynamicsInterop ~/actions-runner/_work/NeurobloxDev/NeurobloxDev/NeurobloxBase/src/GraphDynamicsInterop/learning_interop.jl:22 Notice how the weight values in the upper left corner (connections with
HebbianPlasticity) have changed after simulation.
Cortico-Striatal Circuit Performing Category Learning
This is one simplified biological instantiation of an RL system. It is carrying out a simple RL behavior but not faithfully simulating physiology. The experiment we are simulating here is the category learning experiment [Antzoulatos2014] which was successfully modeled through a detailed corticostriatal model [2].
time_block_dur = 90.0 ## ms (size of discrete time blocks)
N_trials = 100 ## number of trials
trial_dur = 1000 ## ms
image_set = CSV.read(Downloads.download("raw.githubusercontent.com/Neuroblox/NeurobloxDocsHost/refs/heads/main/data/stimuli_set.csv"), DataFrame) ## reading data into DataFrame format
# additional Striatum Bloxs
@graph g begin
@nodes begin
STR1 = Striatum(; namespace=model_name, N_inhib=5)
STR2 = Striatum(; namespace=model_name, N_inhib=5)
tan_pop1 = TAN(κ=10; namespace=model_name)
tan_pop2 = TAN(κ=10; namespace=model_name)
SNcb = SNc(κ_DA=1; namespace=model_name)
# action selection Blox, necessary for making a choice
AS = GreedyPolicy(; namespace=model_name, t_decision=2*time_block_dur)
end
# learning rules
hebbian_mod = HebbianModulationPlasticity(K=0.06, decay=0.01, α=2.5, θₘ=1, modulator=SNcb, t_pre=trial_dur, t_post=trial_dur, t_mod=time_block_dur)
hebbian_cort = HebbianPlasticity(K=5e-4, W_lim=7, t_pre=trial_dur, t_post=trial_dur)
@connections begin
stim => VAC, [weight=14]
ASC1 => VAC, [weight=44]
ASC1 => AC, [weight=44]
VAC => AC, [weight=3, density=0.1, learning_rule = hebbian_cort]
AC => STR1, [weight = 0.075, density = 0.04, learning_rule = hebbian_mod]
AC => STR2, [weight = 0.075, density = 0.04, learning_rule = hebbian_mod]
tan_pop1 => STR1, [weight = 1, t_event = time_block_dur]
tan_pop2 => STR2, [weight = 1, t_event = time_block_dur]
STR1 => tan_pop1, [weight = 1]
STR2 => tan_pop1, [weight = 1]
STR1 => tan_pop2, [weight = 1]
STR2 => tan_pop2, [weight = 1]
STR1 => STR2, [weight = 1, t_event = 2*time_block_dur]
STR2 => STR1, [weight = 1, t_event = 2*time_block_dur]
STR1 => SNcb, [weight = 1]
STR2 => SNcb, [weight = 1]
# action selection connections
STR1 => AS;
STR2 => AS;
end
endGraphSystem(...10 nodes and 18 connections...)The last two connections add the ability to output actions. The AS Blox is a GreedyPolicy meaning that it will compare the activity of both Striatum Bloxs STR1 and STR2 and select the highest value. If STR1 wins then the left choice is made and if STR2 wins then the model chooses the right direction as the true dot movement direction.
agent = Agent(g; name=model_name, t_block = time_block_dur); ## define agent
env = ClassificationEnvironment(stim, N_trials; name=:env, namespace=model_name)
fig = Figure(title="Adjacency matrix", size = (1600, 800))
adjacency(fig[1,1], agent; title = "Before Learning", colorrange=(0,0.2))
trace = run_experiment!(agent, env; t_warmup=200.0, alg=Vern7(), verbose=true)(trial = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], correct = Bool[0, 0, 0, 1, 0, 0, 1, 1, 0, 0 … 0, 0, 0, 0, 0, 0, 0, 1, 0, 1], action = [1, 2, 2, 2, 2, 1, 2, 2, 1, 1 … 1, 2, 1, 1, 2, 1, 1, 2, 2, 1], time = [7.769071343, 2.123154995, 2.217163443, 3.568730881, 2.195470896, 3.525855001, 2.026462844, 3.705968676, 2.03252582, 2.04580149 … 2.177738743, 3.494262425, 2.011296202, 2.062249981, 2.093992872, 2.17503347, 3.506377375, 2.040967988, 2.199399045, 2.031941169])trace is a vector of NamedTuples containing useful outcomes for each trial of the experiment:
trace.trial ## trial indices
trace.correct ## whether the response was correct or not on each trial
trace.action; ## what responce was made on each trial, 1 is left and 2 is right
adjacency(fig[1,2], agent; title = "After Learning", colorrange=(0,0.2))
figNotice the changes in weight values after the RL experiment.
Challenge Problems
- Visualize the model's performance in the category learning task as a function of time (trials). Hint: For correct trials
trace.correct = 1and for incorrect trialstrace.correct = 0. - Since this is an oversimplified instantiation of a cortico-striatal model, it is highly sensitive to the parameters, such that if we change the values of
TANparameters (maximum activityκ), or learning rates inHebbianModulationPlasticity(learning rate constantK), the system won’t be able to learn. Try playing with these parameters and figure out at what range of these parameters the model works.
References
- [1] Pathak A., Brincat S., Organtzidis H., Strey H., Senneff S., Antzoulatos E., Mujica-Parodi L., Miller E., Granger R. Biomimetic model of corticostriatal micro-assemblies discovers new neural code., bioRxiv 2023.11.06.565902, 2024
- [2] Antzoulatos EG, Miller EK. Increases in functional connectivity between prefrontal cortex and striatum during category learning. Neuron. 2014 Jul 2;83(1):216-25. doi: 10.1016/j.neuron.2014.05.005. Epub 2014 Jun 12. PMID: 24930701; PMCID: PMC4098789.
This page was generated using Literate.jl.