Learning Parameterized Expressions

Note: Parametric expressions are currently considered experimental and may change in the future.

Parameterized expressions in SymbolicRegression.jl allow you to discover symbolic expressions that contain optimizable parameters. This is particularly useful when you have data that follows different patterns based on some categorical variable, or when you want to learn an expression with constants that should be optimized during the search.

In this tutorial, we'll generate synthetic data with class-dependent parameters and use symbolic regression to discover the parameterized expressions.

The Problem

Let's create a synthetic dataset where the underlying function changes based on a class label:

\[y = 2\cos(x_2 + 0.1) + x_1^2 - 3.2 \ \ \ \ \text{[class 1]} \\ \text{OR} \\ y = 2\cos(x_2 + 1.5) + x_1^2 - 0.5 \ \ \ \ \text{[class 2]}\]

We will need to simultaneously learn the symbolic expression and per-class parameters!

using SymbolicRegression
using Random: MersenneTwister
using MLJBase: machine, fit!, predict, report
using Test

Now, we generate synthetic data, with these 2 different classes.

X = let rng = MersenneTwister(0), n = 30
    (; x1=randn(rng, n), x2=randn(rng, n), class=rand(rng, 1:2, n))
end

(x1 = [-0.7587307822993239, 0.03249717326229417, 0.04868971510118324, 0.426553609186312, -0.6455341387712752, 0.16047914065004126, -1.174243139542269, -0.8590577126607076, 0.8166486156723918, -1.3610991864137623  …  -1.4738536651017287, -2.3922557197621166, -1.004193438685409, -0.15671091559738035, -0.40851221296291546, -0.02542189385385727, -1.507668525428601, -0.14180202282588494, -1.1251781589718115, -1.6634915201296359],
 x2 = [0.9225430229784556, -0.3431658216750271, 2.1494079717206267, -0.3614340720711092, -0.17101357024903552, 1.4044957064877326, -1.14561382828759, 1.199962308278444, -0.974684107278398, 0.517299511539031  …  0.15367872415413888, 0.1617663833102913, 1.6815067955745, 0.08878524390300123, 2.997069453443845, 1.3160268541638762, 0.10493050742285542, 0.17279914285302175, 0.39613226871496143, 1.2560661845396246],
 class = [1, 1, 2, 2, 2, 2, 1, 2, 2, 1  …  1, 2, 2, 1, 1, 2, 1, 1, 1, 2],)

Now, we generate target values using the true model that has class-dependent parameters:

y = let P1 = [0.1, 1.5], P2 = [3.2, 0.5]
    [2 * cos(x2 + P1[class]) + x1^2 - P2[class] for (x1, x2, class) in zip(X.x1, X.x2, X.class)]
end

30-element Vector{Float64}:
 -1.5819329379648106
 -1.257782764921467
 -2.2452471837347545
  0.5197422215634926
  0.3956348400637564
 -2.4182943336237583
 -0.8184112159299595
 -1.5701319117010097
  1.8972460993013578
  0.28348012694263547
  ⋮
  5.041198154342794
 -1.4900026104671513
 -1.210975832899709
 -5.031135783951781
 -2.394293547335875
  1.0312146396920676
 -1.2538511817635862
 -0.1751135457735593
  0.4140028695829825

Setting up the Search

We'll configure the symbolic regression search to use template expressions with parameters that vary by class

Get number of categories from the data

n_categories = length(unique(X.class))

Create a template expression specification with 2 parameters

expression_spec = @template_spec(
    expressions = (f,), parameters = (p1=n_categories, p2=n_categories),
) do x1, x2, class
    f(x1, x2, p1[class], p2[class])
end

model = SRRegressor(;
    niterations=100,
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    populations=30,
    expression_spec=expression_spec,
);

Now, let's set up the machine and fit it:

mach = machine(model, X, y)

untrained Machine; caches model-specific representations of data
  model: SRRegressor(defaults = nothing, …)
  args: 
    1:	Source @037 ⏎ ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}}}
    2:	Source @697 ⏎ AbstractVector{ScientificTypesBase.Continuous}

At this point, you would run:

fit!(mach)

You can extract the best expression and parameters with:

report(mach).equations[end]

Show raw source code

#=
# Learning Parameterized Expressions

_Note: Parametric expressions are currently considered experimental and may change in the future._

Parameterized expressions in SymbolicRegression.jl allow you to discover symbolic expressions that contain
optimizable parameters. This is particularly useful when you have data that follows different patterns
based on some categorical variable, or when you want to learn an expression with constants that should
be optimized during the search.

In this tutorial, we'll generate synthetic data with class-dependent parameters and use symbolic regression to discover the parameterized expressions.

## The Problem

Let's create a synthetic dataset where the underlying function changes based on a class label:

\```math
y = 2\cos(x_2 + 0.1) + x_1^2 - 3.2 \ \ \ \ \text{[class 1]} \\
\text{OR} \\
y = 2\cos(x_2 + 1.5) + x_1^2 - 0.5 \ \ \ \ \text{[class 2]}
\```

We will need to simultaneously learn the symbolic expression and per-class parameters!
=#
using SymbolicRegression
using Random: MersenneTwister
using Zygote  #src
using MLJBase: machine, fit!, predict, report
using Test

#=
Now, we generate synthetic data, with these 2 different classes.
=#

X = let rng = MersenneTwister(0), n = 30
    (; x1=randn(rng, n), x2=randn(rng, n), class=rand(rng, 1:2, n))
end

#=
Now, we generate target values using the true model that
has class-dependent parameters:
=#
y = let P1 = [0.1, 1.5], P2 = [3.2, 0.5]
    [2 * cos(x2 + P1[class]) + x1^2 - P2[class] for (x1, x2, class) in zip(X.x1, X.x2, X.class)]
end

#=
## Setting up the Search

We'll configure the symbolic regression search to
use template expressions with parameters that _vary by class_
=#

stop_at = Ref(1e-4)  #src

# Get number of categories from the data
n_categories = length(unique(X.class))

# Create a template expression specification with 2 parameters
expression_spec = @template_spec(
    expressions = (f,), parameters = (p1=n_categories, p2=n_categories),
) do x1, x2, class
    f(x1, x2, p1[class], p2[class])
end
test_kwargs = if get(ENV, "SYMBOLIC_REGRESSION_IS_TESTING", "false") == "true"  #src
    (;  #src
        expression_spec=ParametricExpressionSpec(; max_parameters=2),  #src
        autodiff_backend=:Zygote,  #src
    )  #src
else  #src
    NamedTuple()  #src
end  #src

model = SRRegressor(;
    niterations=100,
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    populations=30,
    expression_spec=expression_spec,
    test_kwargs...,  #src
    early_stop_condition=(loss, _) -> loss < stop_at[],  #src
);

#=
Now, let's set up the machine and fit it:
=#

mach = machine(model, X, y)

#=
At this point, you would run:

\```julia
fit!(mach)
\```

You can extract the best expression and parameters with:

\```julia
report(mach).equations[end]
\```

=#

which uses Literate.jl to generate this page.