Selecting an Algorithm#
This guide will help you with selecting an algorithm and its corresponding hyper-parameters needed to fit your SED(s). Lightning currently has three fitting algorithms to select from: the adaptive MCMC algorithm (Algorithm 4) from Andrieu & Thoms (2008), the affine-invariant MCMC algorithm from Goodman & Weare (2010), and a Levenberg–Marquardt algorithm implemented via Craig Markwardt’s MPFIT. Below, we first compare the MPFIT and MCMC algorithms and give advice on how to select between them. Then, we explain how the algorithms are initialized. Finally, we describe each algorithm and its corresponding hyper-parameters in more detail.
MCMC vs MPFIT#
Like all other settings in Lightning, the choice of algorithm will depend on your research goals. The main differences between the MCMC algorithms and the MPFIT algorithm in practical terms are the speed of the algorithms and the form of the uncertainties on the free parameters. Since the MPFIT algorithm just searches for the best-fit solution rather than probing the shape of the posterior distribution like the MCMC algorithms, it is requires significantly fewer loss function evaluations and hence is significantly faster than the MCMC algorithms. However, this speed comes at a loss in terms of the uncertainty estimations. The MPFIT algorithm simply outputs a \(1\sigma\) uncertainty estimation on each free parameter. Comparatively, the MCMC algorithms give a full Bayesian posterior distribution for each free parameter. Therefore, if you are looking for a quick best-fit solution to your SEDs with uncertainties you will not heavily rely on for other analyses, we recommend using the MPFIT algorithm. However, if you are looking for a full Bayesian posterior distribution with well estimated uncertainties, we recommend using the MCMC algorithms.
Random Initialization#
All three algorithms in Lightning require an initial starting value for each free parameter. Currently, Lightning automatically selects the starting values by randomly sampling the prior distribution of each parameter independently. Since uniform priors can have a much larger range than we would want to randomly sample, we allow the user to input an initialization range. This limits the random sampling of the prior to only occur within the initialization range.
To give an example of this, say we have a uniform prior for a parameter with a range of 0 to 100. However, we typically expect this parameter’s solution to be less than 10. If we were to randomly initialize using the full prior and randomly select a value near the maximum limit, our algorithms may not reach the best solution within its allotted iteration limit. Therefore, by setting the initialization range to be between 0 and 10, we allow for a more appropriate initial value and increase our chances to reach the best solution.
Adaptive MCMC#
Description#
The adaptive MCMC algorithm is an adaptive version of the standard Metropolis–Hastings algorithm created by Andrieu & Thoms (2008). The algorithm simply adjusts the proposal density distribution to achieve an optimal acceptance ratio (~25%). Additionally, this adjustment of the proposal density is vanishing, meaning the adaptiveness decreases with each subsequent iteration. Therefore, after many iterations the adaptiveness is insignificant and the algorithm is practically equivalent to the standard Metropolis–Hastings algorithm.
Hyper-parameters#
The adaptive MCMC algorithm has three hyper-parameters NTRIALS
, NPARALLEL
, and BETA_EXPONENT
.
The value of NTRIALS
gives the number of MCMC trials (iterations) to run for each parallel chain,
the number of which is given by NPARALLEL
. Having NPARALLEL = 1
will only run a single MCMC chain,
while having NPARALLEL > 1
will run that many independent parallel MCMC chains. Running a single MCMC chain
presents the issue of us not being able to determine if we have reached the best solution or are stuck in a local
minimum in \(\chi^2\) space, since we have no way of comparing the chain’s solution to the best solution.
By running multiple parallel chains from different starting locations, we can compare the ending
segment of each parallel chain. If all or most of these segments have reached the same solution,
then we can be confident that this is likely the best solution (see Convergence Metrics
for more details on what qualifies as the same solution). Therefore, we recommend setting NPARALLEL
to be a value between 5
and 20
to determine if the chains have reached the best solution.
A larger value will allow you to be more confident that you have reached the best solution, but will
require more computational time. Additionally, we recommend setting NTRIALS
to an integer multiple
of 5e4
for each model component you are using. If your parallel chains
are not converging (see the Convergence Metrics description for more details),
we recommend increasing this value by a few extra integer multiples to give more
iterations for convergence to occur.
BETA_EXPONENT
is the parameter unique to the adaptive MCMC algorithm. It determines how fast
the adaptiveness decreases with each subsequent iteration. Values must be positive, and larger values
stop the adaptiveness in fewer trials. We recommend using the default value for BETA_EXPONENT
in most cases. However, if your parallel chains are not converging even after increasing the number
of trials, we recommend decreasing this value slightly (e.g., from 0.35
to 0.3
).
Post-Processed Chain#
During post-processing, only a portion of the raw MCMC chain from fitting is kept for each parameter. For the adaptive MCMC, this post-processed chain portion is determined as follows:
the parallel chains have
BURN_IN
number of initial trials discarded as the burn-in phase,these truncated chains are then thinned by keeping only every
THIN_FACTOR
elements,the thinned and truncated chain with the highest probability is selected,
the final
FINAL_CHAIN_LENGTH
elements from this highest probability chain are kept as the post-processed chain.
Here, we explain our reason for selecting the highest probability chain in step 3. Assuming convergence of the parallel chains has been reached, all parallel chains will have very similar distributions, and it does not matter which one we select to use. Therefore, we find selecting the one with the highest probability guarantees it is at the best solution even when convergence is not reached.
Affine-Invariant MCMC#
Description#
The affine-invariant MCMC uses an ensemble of samplers to adjust the proposal density distribution and sample the posterior distribution. This ensemble consists of multiple chains (or walkers) that are run in parallel and allowed to interact with one another so that they can adapt their proposal densities. Partial re-sampling, which is a generalized version of the Gibbs sampling, is implemented to achieve this interaction. In our implementation, we use the affine-invariant stretch move method as presented in Goodman & Weare (2010).
Hyper-parameters#
The affine-invariant MCMC algorithm has three hyper-parameters NTRIALS
, NPARALLEL
, and AFFINE_A
.
The value of NPARALLEL
gives the number of walkers to include in the ensemble, and NTRIALS
gives
the number of MCMC trials (iterations) to run for each walker. Unlike the adaptive MCMC, the affine-invariant
MCMC must have NPARALLEL > 1
. Specifically, NPARALLEL
must be greater than the number of free
parameters plus one, and ideally, it should be at least twice the number of free parameters for optimal
sampling. Therefore, we recommend setting NPARALLEL
to be a value between 50
and 100
, which is
3 to 5 times the maximum number of free parameters that is expected from Lightning’s most complex models.
Additionally, we recommend setting NTRIALS
to an integer multiple of 1e4
for each
model component you are using. If your ensemble is not converging
(see the Convergence Metrics description for more details), we recommend increasing
this value by a few extra integer multiples to give more iterations for convergence to occur.
AFFINE_A
is the parameter unique to the affine-invariant MCMC algorithm. It specifies
the move scaling constant, which defines the maximum and minimum step size of the stretch move.
Values must be greater than or equal to 1, and larger values allow for larger stretch moves in
parameter space. We recommend using the default value for AFFINE_A
in most cases. However, if your ensemble is not converging even after increasing the number
of trials or has a low overall acceptance rate (< 20%), we recommend decreasing this value slightly
(e.g., from 2
to 1.8
).
Post-Processed Chain#
During post-processing, only a portion of the raw MCMC ensemble from fitting is kept for each parameter. For the affine-invariant MCMC, the post-processed chain portion is determined as follows:
each walker in the ensemble has
BURN_IN
number of initial trials discarded as the burn-in phase,if a walker has an acceptance fraction less than
AFFINE_STRANDED_DEVIATION
standard deviations below the median acceptance fraction, we consider them stranded walkers and remove them from the ensemble,the non-stranded truncated ensemble is then thinned by keeping only every
THIN_FACTOR
elements,the thinned and truncated ensemble is flattened element-wise into a single chain,
the final
FINAL_CHAIN_LENGTH
elements from this flattened chain are kept as the post-processed chain.
Here, we explain our reason for removing stranded walkers in step 2. Due to the boundaries of the free parameters, the affine-invariant MCMC can have trouble accepting moves of walkers separated from the ensemble when the ensemble is near a boundary. This results in the walkers becoming stranded and having a very low acceptance rates, since they are failing to have any proposal jumps accepted. With enough iterations, these walkers will get lucky and have a jump that rejoins them with the ensemble. However, we do not have an infinite amount of iterations to allow for this to occur. Therefore, once our iteration limit has been reached, we want to remove any stranded walkers that may remain. We have found that the most effective method for correctly selecting stranded walkers is to compare each walker’s acceptance fraction with that of the median of the ensemble. Those that have an abnormally low acceptance fractions compared to the rest of the ensemble are usually stranded.
Note
We find that only a few walkers within the ensemble become stranded when using a standard amount
of iterations. Therefore, having AFFINE_STRANDED_DEVIATION = 2
effectively removes these walkers without
removing non-stranded ones. However, when using a smaller amount of iteration for quick sampling, more
walkers may end up remaining stranded. Therefore, we recommend setting AFFINE_STRANDED_DEVIATION = 1
to
account for the increase in the ensemble’s standard deviation and better classify stranded walkers.
Adaptive vs Affine-Invariant MCMC#
The main differences between the affine-invariant MCMC and the adaptive MCMC algorithms is their speed and consistency for reaching the best solution. From some general tests, we find that both algorithms result in very similar posterior distributions if the best solution is reached, as should be expected. However, the adaptive MCMC algorithm is less effective at searching parameter space for the best solution. It can spend a significant portion, if not all, of its trials stuck in local minima if it does not start near the best solution, especially with more complex models. In comparison, we find the affine-invariant MCMC algorithm regularly reaches the best solution rapidly without getting stuck in local minima. Therefore, we recommend using the affine-invariant MCMC over the adaptive MCMC algorithm as it more consistently reaches the best solution and requires less cost function evaluations to get the needed posterior distribution.
MPFIT#
Description#
The MPFIT algorithm is Craig Markwardt’s implementation of the gradient-descent Levenberg–Marquardt algorithm, which is used to solve non-linear least squares problems. The MPFIT implementation allows for several necessary constraints in Lightning, such as fixing parameters and setting parameter bounds. Additionally, the algorithm calculates the parameter covariance matrix to give estimated parameter uncertainties.
Hyper-parameters#
The MPFIT algorithm has five hyper-parameters NSOLVERS
, FTOL
, GTOL
, XTOL
,
and MAXITER
. The value of NSOLVERS
gives the number of “solvers” to run in parallel,
where each solver is a fit to the SED using different starting locations in parameters space.
Numerous solvers are necessary, since like the adaptive MCMC algorithm, running a single solver
presents the issue of us not being able to determine if we have reached the best solution or are
stuck in a local minimum. By running multiple solvers from different starting locations, we can
compare each solver’s solution. If the majority of the solvers have reached the same solution,
then we can be confident that this is likely the best solution. Therefore, we recommend setting
NSOLVERS
to an integer multiple of 50
for each model component
you are using to determine if the solvers have reached the best solution. A larger value will allow
you to be more confident that you have reached the best solution, but will require more computational
time.
FTOL
, GTOL
, and XTOL
give the tolerances indicating when the MPFIT algorithm should
terminate. Smaller values of the tolerances mean MPFIT will continue to run until smaller
differences are produced in the relative error. We recommend using the default values for each
of the tolerances. However, if you find that the majority of your solvers are not finding the
same best solution, then we recommend decreasing FTOL
or XTOL
slightly (e.g., from
1d-10
to 1d-12
) if the MPFIT STATUS
is 1
or 2
, respectively. This may allow
for some solvers to escape local minima and reach the best solution.
Finally, MAXITER
gives the maximum number of MPFIT iterations to perform per solver. If the
MPFIT algorithm has not terminated already from reaching one of the tolerances, then it will
terminate after performing this maximum number of iterations. This number is to prevent MPFIT
from potentially running indefinitely if the tolerances cannot be reached. We recommend using
the default value for MAXITER
. However, if a reasonable portion of your set of solvers is
reaching the maximum iterations, then you can increase this value to allow for more iteration
for the tolerances to be reached.
Post-Processed Fits#
The post-processing for the MPFIT algorithm is simple. The solver with the best-fit solution (i.e., lowest \(\chi^2\)) is kept, and its parameter and uncertainty estimations are used as the best fit. The rest are discarded as they were only needed to test convergence to the best solution.