Learning Parameterized Expressions
Note: Parametric expressions are currently considered experimental and may change in the future.
Parameterized expressions in SymbolicRegression.jl allow you to discover symbolic expressions that contain optimizable parameters. This is particularly useful when you have data that follows different patterns based on some categorical variable, or when you want to learn an expression with constants that should be optimized during the search.
In this tutorial, we'll generate synthetic data with class-dependent parameters and use symbolic regression to discover the parameterized expressions.
The Problem
Let's create a synthetic dataset where the underlying function changes based on a class label:
\[y = 2\cos(x_2 + 0.1) + x_1^2 - 3.2 \ \ \ \ \text{[class 1]} \\ \text{OR} \\ y = 2\cos(x_2 + 1.5) + x_1^2 - 0.5 \ \ \ \ \text{[class 2]}\]
We will need to simultaneously learn the symbolic expression and per-class parameters!
using SymbolicRegression
using Random: MersenneTwister
using MLJBase: machine, fit!, predict, report
using Test
Now, we generate synthetic data, with these 2 different classes.
X = let rng = MersenneTwister(0), n = 30
(; x1=randn(rng, n), x2=randn(rng, n), class=rand(rng, 1:2, n))
end
(x1 = [-0.7587307822993239, 0.03249717326229417, 0.04868971510118324, 0.426553609186312, -0.6455341387712752, 0.16047914065004126, -1.174243139542269, -0.8590577126607076, 0.8166486156723918, -1.3610991864137623 … -1.4738536651017287, -2.3922557197621166, -1.004193438685409, -0.15671091559738035, -0.40851221296291546, -0.02542189385385727, -1.507668525428601, -0.14180202282588494, -1.1251781589718115, -1.6634915201296359],
x2 = [0.9225430229784556, -0.3431658216750271, 2.1494079717206267, -0.3614340720711092, -0.17101357024903552, 1.4044957064877326, -1.14561382828759, 1.199962308278444, -0.974684107278398, 0.517299511539031 … 0.15367872415413888, 0.1617663833102913, 1.6815067955745, 0.08878524390300123, 2.997069453443845, 1.3160268541638762, 0.10493050742285542, 0.17279914285302175, 0.39613226871496143, 1.2560661845396246],
class = [1, 1, 2, 2, 2, 2, 1, 2, 2, 1 … 1, 2, 2, 1, 1, 2, 1, 1, 1, 2],)
Now, we generate target values using the true model that has class-dependent parameters:
y = let P1 = [0.1, 1.5], P2 = [3.2, 0.5]
[2 * cos(x2 + P1[class]) + x1^2 - P2[class] for (x1, x2, class) in zip(X.x1, X.x2, X.class)]
end
30-element Vector{Float64}:
-1.5819329379648106
-1.257782764921467
-2.2452471837347545
0.5197422215634926
0.3956348400637564
-2.4182943336237583
-0.8184112159299595
-1.5701319117010097
1.8972460993013578
0.28348012694263547
⋮
5.041198154342794
-1.4900026104671513
-1.210975832899709
-5.031135783951781
-2.394293547335875
1.0312146396920676
-1.2538511817635862
-0.1751135457735593
0.4140028695829825
Setting up the Search
We'll configure the symbolic regression search to use template expressions with parameters that vary by class
Get number of categories from the data
n_categories = length(unique(X.class))
2
Create a template expression specification with 2 parameters
expression_spec = @template_spec(
expressions = (f,), parameters = (p1=n_categories, p2=n_categories),
) do x1, x2, class
f(x1, x2, p1[class], p2[class])
end
model = SRRegressor(;
niterations=100,
binary_operators=[+, *, /, -],
unary_operators=[cos, exp],
populations=30,
expression_spec=expression_spec,
);
Now, let's set up the machine and fit it:
mach = machine(model, X, y)
untrained Machine; caches model-specific representations of data
model: SRRegressor(defaults = nothing, …)
args:
1: Source @780 ⏎ ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}}}
2: Source @942 ⏎ AbstractVector{ScientificTypesBase.Continuous}
At this point, you would run:
fit!(mach)
You can extract the best expression and parameters with:
report(mach).equations[end]
Show raw source code
#=
# Learning Parameterized Expressions
_Note: Parametric expressions are currently considered experimental and may change in the future._
Parameterized expressions in SymbolicRegression.jl allow you to discover symbolic expressions that contain
optimizable parameters. This is particularly useful when you have data that follows different patterns
based on some categorical variable, or when you want to learn an expression with constants that should
be optimized during the search.
In this tutorial, we'll generate synthetic data with class-dependent parameters and use symbolic regression to discover the parameterized expressions.
## The Problem
Let's create a synthetic dataset where the underlying function changes based on a class label:
\```math
y = 2\cos(x_2 + 0.1) + x_1^2 - 3.2 \ \ \ \ \text{[class 1]} \\
\text{OR} \\
y = 2\cos(x_2 + 1.5) + x_1^2 - 0.5 \ \ \ \ \text{[class 2]}
\```
We will need to simultaneously learn the symbolic expression and per-class parameters!
=#
using SymbolicRegression
using Random: MersenneTwister
using Zygote #src
using MLJBase: machine, fit!, predict, report
using Test
#=
Now, we generate synthetic data, with these 2 different classes.
=#
X = let rng = MersenneTwister(0), n = 30
(; x1=randn(rng, n), x2=randn(rng, n), class=rand(rng, 1:2, n))
end
#=
Now, we generate target values using the true model that
has class-dependent parameters:
=#
y = let P1 = [0.1, 1.5], P2 = [3.2, 0.5]
[2 * cos(x2 + P1[class]) + x1^2 - P2[class] for (x1, x2, class) in zip(X.x1, X.x2, X.class)]
end
#=
## Setting up the Search
We'll configure the symbolic regression search to
use template expressions with parameters that _vary by class_
=#
stop_at = Ref(1e-4) #src
# Get number of categories from the data
n_categories = length(unique(X.class))
# Create a template expression specification with 2 parameters
expression_spec = @template_spec(
expressions = (f,), parameters = (p1=n_categories, p2=n_categories),
) do x1, x2, class
f(x1, x2, p1[class], p2[class])
end
test_kwargs = if get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false") == "true" #src
(; #src
expression_spec=ParametricExpressionSpec(; max_parameters=2), #src
autodiff_backend=:Zygote, #src
) #src
else #src
NamedTuple() #src
end #src
model = SRRegressor(;
niterations=100,
binary_operators=[+, *, /, -],
unary_operators=[cos, exp],
populations=30,
expression_spec=expression_spec,
test_kwargs..., #src
early_stop_condition=(loss, _) -> loss < stop_at[], #src
);
#=
Now, let's set up the machine and fit it:
=#
mach = machine(model, X, y)
#=
At this point, you would run:
\```julia
fit!(mach)
\```
You can extract the best expression and parameters with:
\```julia
report(mach).equations[end]
\```
=#
which uses Literate.jl to generate this page.