Bayesian filtering

Bayesian filtering is a general framework for state estimation in dynamical systems. In Bayesian filtering, we aim to estimate the current state of a system based on noisy observations and a mathematical model of the system dynamics. The state, sometimes augmented by the system’s parameters, is assumed to evolve over time according to a stochastic process, and the observations are assumed to be corrupted by random (e.g., Gaussian) noise. The goal of the Bayesian filter is to recursively update the probability distribution of the system’s state as new observations become available, using Bayes’ theorem. The most commonly used Bayesian filters are the Kalman filter and the sequential Monte Carlo filter, also known as particle filter.

Bayes’ theorem

Humans are Bayesian machines, constantly using Bayesian reasoning to make decisions and predictions about the world around them. Bayes’ theorem is the mathematical foundation for this process, allowing us to update our beliefs in the face of new evidence.

\[p(A|B) = \frac{p(B|A) p(A)}{p(B)}.\]

Where \(p(A|B)\) is the posterior probability of hypothesis \(A\) given evidence \(B\) has been observed, \(p(B|A)\) is the likelihood of observing evidence \(B\) given hypothesis \(A\), \(p(A)\) is the prior probability of hypothesis \(A\), and \(p(B)\) is the prior probability of evidence \(B\).

At its core, Bayes’ theorem is a simple concept: the probability of a hypothesis given some observed evidence is proportional to the product of the prior probability of the hypothesis and the likelihood of the evidence given the hypothesis. In other words, Bayes’ theorem tells us how to update our beliefs in light of new data. So whether we’re deciding what to eat for breakfast or making predictions about the weather, we’re all Bayesian at heart.

The inference module

The inference module contains classes that infer the probability distribution of model parameters from observation data, also known as inverse analysis or data assimilation, in many engineering disciplines. The output of the inference module is the probability distribution of model states \(\vec{x}_t\), sometimes augmented by the parameters \(\vec{\Theta}\), conditioned on observation data \(\vec{y}_t\) at time \(t\).

Sequential Monte Carlo

The method currently available for statistical inference is Sequential Monte Carlo. It recursively updates the probability distribution of the augmented model state \(\hat{\vec{x}}_T=(\vec{x}_T, \vec{\Theta})\) from the sequences of observation data \(\hat{\vec{y}}_{0:T}\) from time \(t = 0\) to time \(T\). The posterior distribution of the augmented model state is approximated by a set of samples, where each sample instantiates a realization of the augmented model state. The samples are drawn from a proposal density, which can be either informative or non-informative.

Via Bayes’ rule, the posterior distribution of the augmented model state reads

\[p(\hat{\vec{x}}_{0:T}|\hat{\vec{y}}_{1:T}) \propto \prod_{t_i=1}^T p(\hat{\vec{y}}_{t_i}|\hat{\vec{x}}_{t_i}) p(\hat{\vec{x}}_{t_i}|\hat{\vec{x}}_{{t_i}-1}) p(\hat{\vec{x}}_0),\]

Where \(p(\hat{\vec{x}}_0)\) is the so-called prior distribution, which is the initial distribution of the model state. We can rewrite this equation in the recursive form, so that the posterior distribution gets updated at every time step \(t\), according to

\[p(\hat{\vec{x}}_{0:t}|\hat{\vec{y}}_{1:t}) \propto p(\hat{\vec{y}}_t|\hat{\vec{x}}_t)p(\hat{\vec{x}}_t|\hat{\vec{x}}_{t-1})p(\hat{\vec{x}}_{1:t-1}|\hat{\vec{y}}_{1:t-1}),\]

Where \(p(\hat{\vec{y}}_t|\hat{\vec{x}}_t)\) and \(p(\hat{\vec{x}}_t|\hat{\vec{x}}_{t-1})\) are the likelihood distribution and the transition distribution, respectively. The likelihood distribution is the probability distribution of the observation data given the model state. The transition distribution is the probability distribution of the model state at time \(t\) given the model state at time \(t-1\). We apply no perturbation in the parameters \(\vec{\Theta}\) nor in the model state \(\vec{x}_t\) over time, because most material models describe time or history dependent behavior and are expensive to evaluate. Therefore, the transition distribution is deterministic.

Importance sampling

These distributions can be efficiently evaluated via importance sampling. The idea is to have samples that are more important than others when approximating a target distribution. The measure of this importance is the so-called importance weight.

Therefore, we draw samples, \(\vec{\Theta}^{(i)} \ (i=1,...,N_p)\), from a proposal density, leading to an ensemble of model states \(\vec{x}_t^{(i)}\). The importance weights \(w_t^{(i)}\) are updated recursively, via

\[w_t^{(i)} \propto p(\hat{\vec{y}}_t|\hat{\vec{x}}_t^{(i)})p(\hat{\vec{x}}_t^{(i)}|\hat{\vec{x}}_{t-1}^{(i)}) w_{t-1}^{(i)}.\]

The likelihood \(p(\hat{\vec{y}}_t|\hat{\vec{x}}_t^{(i)})\) can be simply a multivariate Gaussian, which is computed by the function get_likelihoods of the SMC class.

\[p(\hat{\vec{y}}_t|\hat{\vec{x}}_t^{(i)}) \propto \exp \{-\frac{1}{2}[\hat{\vec{y}}_t-\mathbf{H}(\vec{x}^{(i)}_t)]^T {\mathbf{\Sigma}_t^D}^{-1} [\hat{\vec{y}}_t-\mathbf{H}(\vec{x}^{(i)}_t)]\},\]

where \(\mathbf{H}\) is the observation model that reduces to a diagonal matrix for uncorrelated observables, and \(\mathbf{\Sigma}_t^D\) is the covariance matrix SMC.cov_matrices calculated from \(\hat{\vec{y}}_t\) and the user-defined normalized variance DynamicSystem.sigma_max, in SMC.get_covariance_matrices.

By making use of importance sampling, the posterior distribution \(p(\hat{\vec{y}}_t|\hat{\vec{x}}_t^{(i)})\) gets updated over time in SMC.data_assimilation_loop — this is known as Bayesian updating.

Ensemble predictions

Since the importance weight on each sample \(\vec{\Theta}^{(i)}\) is discrete and the sample \(\vec{\Theta}^{(i)}\) and model state \(\vec{x}_t^{(i)}\) are in a one-to-one relationship, the ensemble mean DynamicSystem.estimated_params and variance DynamicSystem.estimated_params_cv can be computed as

\[ \begin{align}\begin{aligned}\mathrm{\widehat{E}}[f_t(\hat{\vec{x}}_t)|\hat{\vec{y}}_{1:t}] & = \sum_{i=1}^{N_p} w_t^{(i)} f_t(\hat{\vec{x}}_t^{(i)}),\\\mathrm{\widehat{Var}}[f_t(\hat{\vec{x}}_t)|\hat{\vec{y}}_{1:t}] & = \sum_{i=1}^{N_p} w_t^{(i)} (f_t(\hat{\vec{x}}_t^{(i)})-\mathrm{\widehat{E}}[f_t(\hat{\vec{x}}_t)|\hat{\vec{y}}_{1:t}])^2,\end{aligned}\end{align} \]

where \(f_t\) describes an arbitrary quantity of interest as a function of the model’s state and parameters \(\hat{\vec{x}}_t^{(i)}\).

simulation versus observation data — Ensemble predictions, top three fits, and the observation data

The sampling module

The sampling module allows drawing samples from

a non-informative uniform distribution
a proposal density that is designed and optimized to make the inference efficient

Sampling from low-discrepancy sequences

Since we typically don’t know the prior distribution of model parameters, we start with a non-informative, uniform sampling using quasi-random or near-random numbers. We make use of the Quasi-Monte Carlo generators of scipy

You can choose to sample the parameter space from

by specifying the initial sampling method :IterativeBayesianFilter:initial_sampling when initializing the UQ method.

Initialize the Bayesian calibration method

ibf_cls = IterativeBayesianFilter.from_dict(
    {
        "inference":{
            "ess_target": 0.3,
            "scale_cov_with_max": True
        },
        "sampling":{
            "max_num_components": 2
        }
        "initial_sampling": "Halton"
    }
)

Quasi-Monte Carlo generator — Samples generated with the Halton sequence, Sobol sequence and Latin Hypercube sampling.

Sampling from a proposal density function

An initial uniform sampling is unbiased, but it can be very inefficient since the correlation structure is not sampled. If we have some vague idea of the posterior distribution, we can come up with a proposal density. For that, we can use the :class:.GaussianMixtureModel class which is a wrapper of BayesianGaussianMixture of scikit-learn. Note that BayesianGaussianMixture is based on a variational Bayesian estimation of a Gaussian mixture, meaning the parameters of a Gaussian mixture distribution are inferred. For example, the number of components is optimized to fit the data, rather than an input of the Gaussian mixture.

The non-parametric GaussianMixtureModel.gmm model can be trained with previously generated samples and their importance weights (i.e., an approximation of the posterior distribution) obtained from inference to construct a smooth proposal density function. New samples are then drawn from this proposal density in GaussianMixtureModel.regenerate_params.

Resampling via a Gaussian mixture — Resampling of parameter space via a Gaussian mixture model.

Iterative Bayesian filter

The iterative Bayesian filtering algorithm combines sequential Monte Carlo filtering for inference and non-parametric Gaussian mixtures for (re)sampling. Sequential Monte Carlo combined with quasi- or near-random sequence sampling leads to the so-called sequential quasi-Monte Carlo (SQMC) filter with the necessary number of samples proportional to \(d\log{d}\). Although the SQMC filter is unbiased, it is very inefficient and ensured to degenerate as time proceeds.

The idea of iterative Bayesian filtering is to solve the inverse problem all over again, from time \(0\) to \(T\) with new samples drawn from a more sensible proposal density, effectively performing multi-level resampling to avoid weight degeneracy and improve efficiency. The essential steps include

IterativeBayesianFilter.initialize generates the initial samples using a low-discrepancy sequence.
Currently, IterativeBayesianFilter.inference uses SMC to quantify the evolution of the posterior distribution of model parameters over time.
When running SMC filtering via IterativeBayesianFilter.run_inference, it is crucial to ensure that the effective sample size is large enough, so that the ensemble does not degenerate into a few samples with very large weights and GaussianMixtureModel are trained with sufficient data.
IterativeBayesianFilter.run_sampling generates new samples from GaussianMixtureModel as the proposal density, trained with the previous ensemble (i.e., samples and associated weights).
IterativeBayesianFilter.solve combines the steps above and is used by BayesianCalibration for high-level operations, such as BayesianCalibration.run_one_iteration, BayesianCalibration.load_and_run_one_iteration, etc.