11 Unscented Kalman Filter
Main reference: Simon (2006)
The EKF is most widely applied state estimation algorithm for nonlinear systems. However, the EKF can be difficult to tune and often gives unreliable estimates if the system nonlinearities are severe.
This is because EKF relies on linearization to propagate the mean and covariance of the state. The unscented Kalman filter (UKF) is an extension to the Kalman filter that reduces the linearization errors of EKF.
In this chapter, the following are discussed
Investigating how mean and covariance propagate in nonlinear systems
Unscented transformation
Derivation of UKF and less linearization error concept
Modifications to standard UKF to give more accurate/faster filtering results.
11.1 Mean and covariance of nonlinear transformation
Linear approximations can result in errors in the transformation of means and covariances when a random variable is operated on by a nonlinear function.
In this section, a simple example nonlinear transformation: polar-to-rectangular coordinate transformation (\(y_1 = r \cos(\theta), y_2 = r \sin(\theta)\), was taken and demonstrated the error that comes due to linear approximation.
This has nothing to do with KF, but the basic for Unscented Transformation.
11.2 Unscented transformation
The EKF handles nonlinearity by linearizing \(f\) and \(h\) around the mean using Jacobians. This works if \(f\) and \(h\) are nearly linear but fails when they’re strongly nonlinear.
Unscented transform is a different approach: Instead of linearizing the nonlinear function, it nonlinearly propagates carefully chosen sample points (called sigma points) through the true nonlinear function. Then it reconstructs the new mean and covariance from those transformed points.
So, EKF: approximates the function by Jacobians, UKF: approximates the distribution by sigma points.
Unscented meaning: The transform captures the true mean and covariance of a random variable under a nonlinear mapping without “scenting” (or distoring) the distribution with linearization errors. Unscented loosely means unbiased or unperturbed.
11.2.1 Mathematically: Unscented Transform
Given a random vector \(\mathbf{x}\) with mean \(\bar{\mathbf{x}}\) and covariance \(\mathbf{P}_x\),
Generate a set of \(2L+1\) sigma points: \[ \mathbf{X}_0 = \bar{\mathbf{x}}, \ \ \mathbf{X}_i = \bar{\mathbf{x}}+\left(\sqrt{\left(L+\lambda\right)\mathbf{P}_x}\right)_i \] where, \(L\) is the dimension of \(\mathbf{x}\), and \(\lambda\) is a scaling parameter.
Propagate each sigma point through the nonlinear function: \[ \mathbf{Y}_i = f\left(\mathbf{X}_i\right) \]
Compute the new mean and covariance using weighted sums: \[ \bar{\mathbf{y}} = \sum_{i}{W}_i^{(m)}\mathbf{Y}_i, \ \ \ \mathbf{P}_y = \sum_{i}W_i^{(c)}\left(\mathbf{Y}_i-\bar{\mathbf{y}}\right)\left(\mathbf{Y}_i-\bar{\mathbf{y}}\right)^T \]
This transformation accurately captures the mean and covariances up to third order (for Gaussian inputs), whereas EKF captures only first-order.
Unscented Transform : Statistical method to compute transformed mean and covariance without linearization
The source for above notes on Unscented transformation is ChatGPT. Below ones will be from Simon (2006).
An unscented transformation is based on two fundamental principles.
It is easy to perform a nonlinear transformation on a single point, rather than an entire pdf.
It is not hard to find a set of individual points in state space whose sample pdf approximates the true pdf of a state vector.
The key to unscented transformation is described below.
Let \(\bar{\mathbf{x}},\mathbf{P}\) be the mean and covariance of a vector \(\mathbf{x}\).
Find a set of deterministic vectors called sigma points whose ensemble mean and covariance are equal to \(\bar{\mathbf{x}}\) and \(\mathbf{P}\).
Apply the known nonlinear function \(\mathbf{y}=\mathbf{h}(\mathbf{x})\) to each deterministic vector to obtain transformed vectors.
The ensemble mean and covariance of the transformed vectors will give as good estimate of the true mean and covariance of \(\mathbf{y}\).
Proof of 3rd order approximation of both mean and covariance was shown through the derivation in the section 14.2 in Simon (2006). The unscented transformation algorithm is given below.
The unscented transformation algorithm
We begin with an \(n\)-element vector \(\mathbf{x}\) with known mean \(\bar{\mathbf{x}}\) and covariance \(\mathbf{P}\). Given a known nonlinear transformation \(\mathbf{y} = \mathbf{h}(\mathbf{x})\), we want to estimate the mean and covariance of \(\mathbf{y}\), denoted as \(\hat{\mathbf{y}}_u\) and \(\mathbf{P}_u\).
Form \(2n\) sigma point vectors \(\mathbf{x}^{(i)}\) as follows: \[ \begin{align} \mathbf{x}^{(i)} &= \bar{\mathbf{x}} + \tilde{\mathbf{x}}^{(i)}, \ \ \ i=1,\dots,2n \\ \tilde{\mathbf{x}}^{(i)} &= \left(\sqrt{n\mathbf{P}}\right)^T_i, \ \ \ i=1,\dots,n \\ \tilde{\mathbf{x}}^{(n+i)} &= -\left(\sqrt{n\mathbf{P}}\right)^T_i, \ \ \ i=1,\dots,n \end{align} \] where \(\sqrt{n\mathbf{P}}\) is the matrix square root of \(n\mathbf{P}\) such that \(\left(\sqrt{n\mathbf{P}}\right)^T\sqrt{n\mathbf{P}} = n\mathbf{P}\), and \(\left(\sqrt{n\mathbf{P}}\right)_i\) is the ith row of \(n\sqrt{\mathbf{P}}\).
Transform the sigma points as follows: \[ \mathbf{y}^{(i)} = \mathbf{h}\left(\mathbf{x}^{(i)}\right), \ \ \ i = 1,\dots,2n \]
Approximate the mean and covariance of \(\mathbf{y}\) as follows: \[ \begin{align} \bar{\mathbf{y}}_u &= \frac{1}{2n}\sum_{i=1}^{2n}\mathbf{y}^{(i)} \\ \mathbf{P}_u &= \frac{1}{2n}\sum_{i=1}^{2n}\left(\mathbf{y}^{(i)}-\bar{\mathbf{y}}_u\right)\left(\mathbf{y}^{(i)}-\bar{\mathbf{y}}_u\right)^T \end{align} \]
11.3 Unscented Kalman filtering algorithm
The EKF equations is replaced with unscented transformations to obtain the UKF algorithm. The algorithm is given below.
We have an n-state discrete-time nonlinear system given by \[ \begin{align} \mathbf{x}_{k} &= \mathbf{f}\left(\mathbf{x}_{k-1},\mathbf{u}_k,t_k\right)+\mathbf{w}_k \\ \mathbf{y}_k &= \mathbf{h}(\mathbf{x}_k,t_k)+\mathbf{v}_k \\ \mathbf{w}_k &\sim (0,\mathbf{Q}_k) \\ \mathbf{v}_k &\sim (0,\mathbf{R}_K) \end{align} \]
The UKF is initialized as follows \[ \begin{align} \hat{\mathbf{x}}_0^+ &= E(\mathbf{x}_0) \\ \mathbf{P}_0^+ &= E\left[\left(\mathbf{x}_0 - \hat{\mathbf{x}}_0^+\right)\left(\mathbf{x}_0 - \hat{\mathbf{x}}_0^+\right)^T\right] \end{align} \]
The following time update equations are used to propagate the state estimate and covariance from one measurement time to the next.
- To propagate from time step \((k-1)\) to \(k\), first choose sigma points \(\mathbf{x}_{k-1}^{(i)}\), with appropriate changes since the current best guess for the mean and covariance of \(\mathbf{x}_k\) are \(\hat{\mathbf{x}}_{k-1}^+\) and \(\mathbf{P}_{k-1}^+\).
\[ \begin{align} \hat{\mathbf{x}}_{k-1}^{(i)} &= \hat{\mathbf{x}}_{k-1}^+ + \tilde{\mathbf{x}}^{(i)}, \ \ \ i=1,\dots,2n \\ \tilde{\mathbf{x}}^{(i)} &= \left(\sqrt{n\mathbf{P}_{k-1}^+}\right)^T_i, \ \ \ i=1,\dots,n \\ \tilde{\mathbf{x}}^{(n+i)} &= - \left(\sqrt{n\mathbf{P}_{k-1}^+}\right)_i^T, \ \ \ i=1,\dots,n \end{align} \]
Ue the known nonlinear system equations \(\mathbf{f}(.)\) to transform the sigma points into \(\hat{\mathbf{x}}_k^{(i)}\) vectors. \[ \hat{\mathbf{x}}_k^{(i)} = \mathbf{f}\left(\hat{\mathbf{x}}_{k-1}^{(i)}, \mathbf{u}_k,t_k\right) \]
Combine the \(\hat{\mathbf{x}}_k^{(i)}\) vectors to obtain the a priori state estimate at time \(k\). \[ \hat{\mathbf{x}}_k^- = \frac{1}{2n}\sum_{i=1}^{2n}\hat{x}_k^{(i)} \]
Estimate the a priori error covariance, with process noise taken into account (\(\mathbf{Q}_k\)). \[ \mathbf{P}_k^- = \frac{1}{2n}\sum_{i=1}^{2n}\left(\hat{\mathbf{x}}_k^{(i)} - \hat{\mathbf{x}}_k^-\right)\left(\hat{\mathbf{x}}_k^{(i)} - \hat{\mathbf{x}}_k^-\right)^T + \mathbf{Q}_{k-1} \]
Now that the time update equations are done, we implement the measurement-update equations.
Choose sigma points \(\mathbf{x}_K^{(i)}\) based on the current best guess for the mean and covariance \(\hat{\mathbf{x}}_k^-\) and \(\mathbf{P}_k^-\). \[ \begin{align} \hat{\mathbf{x}}_k^{(i)} &= \hat{\mathbf{x}}_k^- + \tilde{\mathbf{x}}^{(i)}, \ \ \ i=1,\dots,2n \\ \tilde{\mathbf{x}}^{(i)} &= \left(\sqrt{n\mathbf{P}_k^-}\right)^T_i, \ \ \ i=1,\dots,n \\ \tilde{\mathbf{x}}^{(n+i)} &= -\left(\sqrt{n\mathbf{P}_k^-}\right)^T_i, \ \ \ i=1,\dots,n \\ \end{align} \] This step can be omitted if desired. That is, instead of generating new sigma points we can reuse the sigma points that were obtained from the time update. This will save computational effort if we are willing to sacrifice performance.
Use the known nonlinear measurement equation \(\mathbf{h}(.)\) to transform the sigma points into \(\hat{\mathbf{y}}_k^{(i)}\) vectors (predicted measurements). \[ \hat{\mathbf{y}}_k^{(i)} = \mathbf{h}(\hat{\mathbf{x}}_k^{(i)},t_k) \]
Combine the \(\hat{\mathbf{y}}_k^{(i)}\) vectors to obtain the predicted measurement at time \(k\). \[ \hat{\mathbf{y}}_k = \frac{1}{2n}\sum_{i=1}^{2n}\hat{\mathbf{y}}_k^{(i)} \]
Estimate the covariance of the predicted measurement. However, we should add \(\mathbf{R}_k\) to the equation to take the measurement noise into account. \[ \mathbf{P}_y = \frac{1}{2n}\sum_{i=1}^{2n} \left(\hat{\mathbf{y}}_k^{(i)} - \hat{\mathbf{y}}_k\right)\left(\hat{\mathbf{y}}_k^{(i)} - \hat{\mathbf{y}}_k\right)^T + \mathbf{R}_k \]
Estimate the cross covariance between \(\hat{\mathbf{x}}_k^-\) and \(\hat{\mathbf{y}}_k\) \[ \mathbf{P}_{xy} = \frac{1}{2n}\left(\hat{\mathbf{x}}_k^{(i)}-\hat{\mathbf{x}}_k^-\right)\left(\hat{\mathbf{y}}_k^{(i)}-\hat{\mathbf{y}}_k^-\right)^T \]
The measurement update of the state estimate can be performed using the normal Kalman filter equations (derived based on statistics given in section 10.5.1 Simon (2006)) \[ \begin{align} \mathbf{K}_k &= \mathbf{P}_{xy}\mathbf{P}_y^{-1} \\ \hat{\mathbf{x}}_k^+ &= \hat{\mathbf{x}}_k^- + \mathbf{K}_k\left(\mathbf{y}_k -\hat{\mathbf{y}}_k\right) \\ \mathbf{P}_k^+ &= \mathbf{P}_k^- - \mathbf{K}_k\mathbf{P}_y\mathbf{K}_k^T \end{align} \]
The algorithm above assumes that the process and measurement equations are linear with respect to the noise. In general, it can be nonlinear as well, as shown below.
\[ \begin{align} \mathbf{x}_{k} &= \mathbf{f}(\mathbf{x}_{k-1},\mathbf{u}_k,\mathbf{w}_k,t_k) \\ \mathbf{y}_k &= \mathbf{h}(\mathbf{x}_k,\mathbf{v}_k,t_k) \end{align} \]
To handle this situation, we can augment the noise onto the state vector \[ \mathbf{x}_k^{(a)} = \begin{bmatrix} \mathbf{x}_k \\ \mathbf{w}_k \\ \mathbf{v}_k \end{bmatrix} \]
Then we can use the UKF to estimate the augmented state \(\mathbf{x}_k^{(a)}\). The UKF is initialized as \[ \begin{align} \hat{\mathbf{x}}_0^{a+} &= \begin{bmatrix} E(\mathbf{x}_0) \\ 0 \\ 0 \\ \end{bmatrix} \\ \mathbf{P}_0^{a+} &= \begin{bmatrix} E\left[\left(\mathbf{x}_0-\hat{\mathbf{x}}_0\right)\left(\mathbf{x}_0-\hat{\mathbf{x}}_0\right)^T\right] & 0 & 0 \\ 0 & \mathbf{Q}_0 & 0 \\ 0 & 0 & \mathbf{R}_0 \end{bmatrix} \end{align} \]
Then we use the UKF algorithm presented above, except that we are estimate the augmented mean and covariance, so we remove \(\mathbf{Q}_{k-1}\) and \(\mathbf{R}_k\) from the covariance equations given in the above algorithm.
Other unscented transformations were also briefly discussed. They are
- General unscented transformations
- Simplex unscented transformation
- Spherical unscented transformation
They can be referred if needed.
11.4 Summary
Unscented Kalman filter gives greatly improved estimation performance compared to EKF for nonlinear systems. (3 orders accurate in Taylor series, compared to linear form in EKF)
EKF requires computation of Jacobians, but UKF does not use Jacobians. It is useful when the analytical form of the systems is unknown to compute Jacobians.
UKF is relatively new (1995) but becoming popular.