Convergence Metrics#
Note
We will reference the convergence metrics defined in the Convergence Metrics and Goodness of Fit Outputs. So, it may be helpful to review or reference them as you read this discussion.
Convergence for our algorithms is the concept that enough walkers, chains, or solvers have reached the same solution after their allowed iterations for us to be confident that it is the best solution and that the parameter space around the solution is well sampled. The metrics used to test for convergence are therefore essential to ensuring that the algorithm successfully fit each SED. Below, we give a detailed discussion on the metrics for each algorithm and how to determine if they indicate that convergence was or was not reached.
Before discussing the individual metrics for each algorithm, we will note that the output
CONVERGENCE_FLAG
is a quick and easy way to check if convergence was reached.
If the value is 0
, the metrics for the given algorithm allow us to confidently say that
convergence has been reached and that we can trust our solution for that SED.
If the value is 1
, convergence may still have been reached, but it is more ambiguous.
Therefore, the offending metric(s) will need to be inspected to check if it is truly indicating
lack of convergence.
Affine-Invariant MCMC#
The affine-invariant MCMC algorithm only has two metrics to determine convergence, the acceptance
fraction (ACCEPTANCE_FRAC
) and the autocorrelation time (AUTOCORR_TIME
). The acceptance
fraction looks at each individual walker in the ensemble and checks what fraction of the total iterations were
moves accepted for that walker. Depending on the number of free parameters, an acceptance
fraction for each walker is typically expected to be within 20-50%. Lower fractions can indicate that the
algorithm is taking too large of steps in parameter space and failing to properly sample the posterior,
while larger fractions can indicate too small of steps. If any walkers have the ACCEPTANCE_FLAG
set, this does not mean the ensemble as a whole failed to converge. As discussed for the Affine-Invariant MCMC,
if only a few walkers have abnormally low acceptance rates, we label them as stranded and excluded them from
the post-processed chain portion (i.e., they have no effect on convergence). Therefore, we recommend comparing
the STRANDED_FLAG
with the ACCEPTANCE_FLAG
, and if they are both set for the same walkers only, then you
can be confident that acceptance fraction metric is not indicating failed convergence.
Additionally, in some of our test with more complex models, we have found that most walkers in the ensemble
can have acceptance fractions at or just below 20% when using the default AFFINE_A
in the
Configuration Settings. This does not necessarily indicate failed convergence, especially if the
acceptance fraction is just below 20% (i.e., >18%) and consistent for the vast majority of the ensemble.
However, it does indicate that the algorithm is not sampling efficiently. Therefore, we recommend
rerunning Lightning using a slightly smaller value for AFFINE_A
, which should improve sampling
and increase the acceptance fraction into the expected range.
As for the autocorrelation time, it is a measure of how many steps it takes for a walker to “forget” where it started. We recommend the emcee Autocorrelation Analysis & Convergence documentation for more details. To summarize, the MCMC algorithm needs to run for a number of iterations equal to some factor (e.g., they recommend ~50) times the autocorrelation time in order for us to trust that the autocorrelation time estimate is accurate. A factor fewer than ~50 can cause the autocorrelation time to be underestimated, which could result in a post-processed chain with a section of highly correlated samples.
You explicitly set what minimum value you are willing to tolerate for this factor
using the TOLERANCE
value in the Configuration Settings, such that
if the AUTOCORR_TIME
times TOLERANCE
is less than NTRIALS
, the AUTOCORR_FLAG
will be set.
We recommend using a TOLERANCE
value of 50, since if the factor is above that, you can be confident
that your autocorrelation time is accurate and your walkers converged. However, if a parameter has the
AUTOCORR_FLAG
set, we recommend first checking the actual factor (which can be calculated by dividing
NTRIALS
by AUTOCORR_TIME
) before assuming convergence failed. If this calculated factor is >45,
the autocorrelation time estimate is likely still accurate and convergence was likely still reached.
However, below this we urge caution as AUTOCORR_TIME
can become underestimated and convergence may
have failed. If you are not getting calculated factors large enough to confidently have convergence,
we recommend simply increasing NTRIALS
until you get a reasonable value. (Note that this will very
likely take more additional trials than just AUTOCORR_TIME
times TOLERANCE
, as AUTOCORR_TIME
is underestimated and will increase with more trials.)
Adaptive MCMC#
The adaptive MCMC algorithm has three metrics to determine convergence, the acceptance
fraction (ACCEPTANCE_FRAC
), the Gelman-Rubin convergence metric (GELMAN_RUBIN_R_HAT
),
and the Brooks-Gelman multidimensional convergence metric (BROOKS_GELMAN_R_HAT
).
Exactly as the affine-invariant MCMC algorithm, the acceptance fraction
gives the fraction of accepted trials for each parallel chain. However, unlike the affine-invariant MCMC
algorithm, this metric should never be flagged as being outside the expected range of 20-50%.
This is due to the design of the adaptive algorithm. It adapts the proposal distribution to keep
the acceptance fraction between 20-50%. Therefore, if this metric is flagged for the adaptive MCMC algorithm,
you have either run for an extreme number of trials or have too large of a value for BETA_EXPONENT
.
The Gelman-Rubin metric is the best indicator of convergence for the adaptive MCMC algorithm.
The metric compares the within chain variance and the between chain variance
to test if multiple parallel chains have converged to the same solution for each parameter. The metric results
in a value that is greater than or equal to 1, where a value of 1 indicates that the parallel chains are identical.
As discussed in their original paper,
if the square root of this metric is less than 1.2, it can be concluded that the chains have
converged to the same solution. Therefore, if the square root of GELMAN_RUBIN_R_HAT
is greater
than 1.2, the GELMAN_RUBIN_FLAG
is set for the offending parameter. If the flag is set, we recommend
double checking the actual metric to see how close it is to 1.44 (i.e., \(1.2^2\)). If it is above 1.44 but
below 1.69 (i.e., \(1.3^2\)), then convergence may still have been reached for the parameter. However,
any value above 1.69 should be considered non-convergence for the parameter, and the SED should be refit.
The Brooks-Gelman multidimension metric is very similar to the Gelman-Rubin metric. The only difference is
that the Brooks-Gelman metric looks at all parameters collectively versus individually like the Gelman-Rubin
metric. Therefore, it is more sensitive to slight differences in the parallel chains. Just like the
GELMAN_RUBIN_FLAG
, the BROOKS_GELMAN_FLAG
is set if the BROOKS_GELMAN_R_HAT
is larger than 1.44.
If this flag is set and the value of BROOKS_GELMAN_R_HAT
is larger than 1.44, we still highly recommend
checking the GELMAN_RUBIN_R_HAT
first before concluding that convergence has not been reached. In our
tests, we have found several cases where the BROOKS_GELMAN_R_HAT
can be > 2, while all parameters can
have GELMAN_RUBIN_R_HAT
values very close to 1 (i.e., < 1.05). This discrepancy is due to the increased
sensitivity of the Brooks-Gelman metric across the whole chain. Therefore, we recommend relying on the
Gelman-Rubin metric to determine if convergence has failed.
MPFIT#
The MPFIT algorithm has four metrics to determine convergence, the status code (STATUS
), the iteration
fraction (ITER_FRAC
), the stuck fraction (STUCK_FRAC
), and the similarity.
The status code is the success status of the MPFIT algorithm. If the code is greater than 0, then the algorithm
executed successfully. This should always be the case when using the MPFIT algorithm in Lightning, since any
errors that could occur in the input or configuration should be detected by Lightning before running. Therefore,
the STATUS_FLAG
will rarely occur. The only time it will is if the random initialization occurs near an
edge of the parameter bounds.
The goal for running multiple solvers for the MPFIT algorithm is the expectation that at least
a majority of them will converge to the same solution. Therefore, the
solvers that did not make it to the same solution need to be filtered out.
The iteration fraction does this by giving
the fraction of the maximum iterations used by each solver to reach their final
solution. If a solver used the maximum allotted iterations, then it was likely still searching
for the best solution before it was terminated by the algorithm. Solvers that reach the maximum
iteration have their ITER_FLAG
set. Therefore, if only a small minority have their
ITER_FLAG
set, then convergence of the other solvers may have still occurred. In this case,
we recommend checking the next two metrics to determine if convergence has been reached.
To first check if the solvers reach the same solution, a check needs to be preformed for how many solvers
reached a similar value in \(\chi^2\) space and how many did not. The fraction of all solvers that did
not reach a similar value of \(\chi^2\) is given by the stuck fraction. The STUCK_FLAG
is then set
if this fraction is less than 50%, meaning a minority of solvers had similar \(\chi^2\) values. We define
a similar value of \(\chi^2\) as values within 4 of the best fit solver. This value is completely arbitrary,
and we note that solvers at \(\chi^2\) greater than 4 could still have reached the same solution.
Therefore, if this fraction is less than 50%, we recommend comparing the \(\chi^2\) of each solver to
see how much worse each solver’s fit was compared to the best-fit solver.
Note
The \(\chi^2\) of each solver can be recalculated from PVALUE
of DOF
using the
IDL function CHISQR_CVF
(i.e., chisqr = CHISQR_CVF(PVALUE, DOF)
).
Finally, to check if the solvers reached the same solution in parameter space, the parameter values of the
non-stuck solvers need to be compared for similarity with the best-fit solver. Parameter values that are within 1% difference
of best-fit solver’s parameter values are considered to have converged to the same solution. If parameters
have a larger difference, this can indicate that a multi-modal solution may exist and convergence to a
common solution may not be possible with MPFIT. The SIMILAR_FLAG
is set if any of the non-stuck solvers
had 1% differences in solutions compared to the best-fit solver. If your SIMILAR_FLAG
is set, you
will likely need to check the percent differences in parameter values using the PARAMETER_VALUES
output.
The higher the percent difference the less likely that your solutions converged to the same solution.
Therefore, we recommend not settling for percent differences greater than 5%. Above this fraction, you risk
having a multi-modal solution, which MPFIT is not designed to evaluate.