Tuning and Workflow Tips¶
I give a short guide below on how I like to tune PySR for my applications.
First, my general tips would be to avoid using redundant operators, like how pow
can do the same things as square
, or how -
(binary) and neg
(unary) are equivalent. The fewer operators the better! Only use operators you need.
When running PySR, I usually do the following:
I run from IPython (Jupyter Notebooks don't work as well1) on the head node of a slurm cluster. Passing cluster_manager="slurm"
will make PySR set up a run over the entire allocation. I set procs
equal to the total number of cores over my entire allocation.
I use the tensorboard feature for experiment tracking.
- I start by using the default parameters.
- I use only the operators I think it needs and no more.
- Increase
populations
to3*num_cores
. - If my dataset is more than 1000 points, I either subsample it (low-dimensional and not much noise) or set
batching=True
(high-dimensional or very noisy, so it needs to evaluate on all the data). - While on a laptop or single node machine, you might leave the default
ncycles_per_iteration
, on a cluster with ~100 cores I like to setncycles_per_iteration
to maybe5000
or so, until the head node occupation is under10%
. (A larger value means the workers talk less frequently to eachother, which is useful when you have many workers!) - Set
constraints
andnested_constraints
as strict as possible. These can help quite a bit with exploration. Typically, if I am usingpow
, I would setconstraints={"pow": (9, 1)}
, so that power laws can only have a variable or constant as their exponent. If I am usingsin
andcos
, I also like to setnested_constraints={"sin": {"sin": 0, "cos": 0}, "cos": {"sin": 0, "cos": 0}}
, so that sin and cos can't be nested, which seems to happen frequently. (Although in practice I would just usesin
, since the search could always add a phase offset!) - Set
maxsize
a bit larger than the final size you want. e.g., if you want a final equation of size30
, you might set this to35
, so that it has a bit of room to explore. - I typically don't use
maxdepth
, but if I do, I set it strictly, while also leaving a bit of room for exploration. e.g., if you want a final equation limited to a depth of5
, you might set this to6
or7
, so that it has a bit of room to explore. - Set
parsimony
equal to about the minimum loss you would expect, divided by 5-10. e.g., if you expect the final equation to have a loss of0.001
, you might setparsimony=0.0001
. - Set
weight_optimize
to some larger value, maybe0.001
. This is very important ifncycles_per_iteration
is large, so that optimization happens more frequently. - Set
turbo
toTrue
. This turns on advanced loop vectorization, but is still quite experimental. It should give you a nice 20% or more speedup. - For final runs, after I have tuned everything, I typically set
niterations
to some very large value, and just let it run for a week until my job finishes (genetic algorithms tend not to converge, they can look like they settle down, but then find a new family of expression, and explore a new space). If I am satisfied with the current equations (which are visible either in the terminal or in the saved csv file), I quit the job early.
Since I am running in IPython, I can just hit q
and then <enter>
to stop the job, tweak the hyperparameters, and then start the search again.
I can also use warm_start=True
if I wish to continue where I left off (though note that changing some parameters, like maxsize
, are incompatible with warm starts).
Some things I try out to see if they help:
- Play around with
complexity_of_operators
. Set operators you dislike (e.g.,pow
) to have a larger complexity. - Try setting
adaptive_parsimony_scaling
a bit larger, maybe up to1000
. - Sometimes I try using
warmup_maxsize_by
. This is useful if you find that the search finds a very complex equation very quickly, and then gets stuck. It basically forces it to start at the simpler equations and build up complexity slowly. - Play around with different losses:
- I typically try
L2DistLoss()
andL1DistLoss()
. L1 loss is more robust to outliers compared to L2 (L1 finds the median, while L2 finds the mean of a random variable), so is often a good choice for a noisy dataset. - I might also provide the
weights
parameter tofit
if there is some reasonable choice of weighting. For example, maybe I know the signal-to-noise of a particular row ofy
- I would set that SNR equal to the weights. Or, perhaps I do some sort of importance sampling, and weight the rows by importance.
- I typically try
Very rarely I might also try tuning the mutation weights, the crossover probability, or the optimization parameters. I never use denoise
or select_k_features
as I find they aren't very useful.
For large datasets I usually just randomly sample ~1000 points or so. In case all the points matter, I might use batching=True
.
If I find the equations get very complex and I'm not sure if they are numerically precise, I might set precision=64
.
Once a run is finished, I use the PySRRegressor.from_file
function to load the saved search in a different process (requires the pickle file, and possibly also the .csv
file if you quit early). I can then explore the equations, convert them to LaTeX, and plot their output.
More Tips¶
You might also wish to explore the discussions page for more tips, and to see if anyone else has had similar questions. Be sure to also read through the reference.
-
Jupyter Notebooks are supported by PySR, but miss out on some useful features available in IPython and Python: the progress bar, and early stopping with "q". In Jupyter you cannot interrupt a search once it has started; you have to restart the kernel. See this issue for updates. ↩