4 Alternate Kalman filter formulations

This chapter discusses mathematically equivalent alternate forms of Kalman filter from the perspective of applications. The following alternate forms were discussed

Sequential Kalman filtering - it avoids matrix inversions
Information filtering - it is computationally cheaper under certain conditions
Square root filtering - it prevents divergence and instability
U-D filtering - another way of implementing square root filtering

The main reference for this chapter is Simon (2006).

4.1 Sequential Kalman filtering

Recalling the measurement update equations (Eq. 3.5)

\[ \begin{align} \mathbf{y}_k &= \mathbf{H}_k\mathbf{x}_k+\mathbf{v}_k \\ \mathbf{K}_k &= \mathbf{P}^-_{k}\mathbf{H}_k^T\left(\mathbf{H}_k\mathbf{P}^-_{k}\mathbf{H}_k^T+\mathbf{R}_k\right)^{-1} \\ \hat{\mathbf{x}}^+_k &= \hat{\mathbf{x}}^-_{k} + \mathbf{K}_k\left(\mathbf{y}_k - \mathbf{H}_k\hat{\mathbf{x}}^-_{k}\right) \\ \mathbf{P}^+_k &= \left(\mathbf{I} - \mathbf{K}_k\mathbf{H}_k\right)\mathbf{P}_{k}^- \end{align} \tag{4.1}\]

Here, the \(\mathbf{K}_k\) equation has a matrix inverse term that needs to be simplified.

The idea of Sequential Kalman filtering is to decompose the measurement vector \(\mathbf{y}_k\) into its individual components \(y_{ik}\) and use those components to compute the Kalman gain in a piecewise manner.

Assume that the covariance measurement matrix \(\mathbf{R}_k\) is diagonal. That is, \[ \begin{equation} \mathbf{R}_k = \begin{bmatrix} R_{1k} & \dots & 0 \\ \vdots & \cdots & \vdots \\ 0 & \dots & R_{rk} \end{bmatrix} \end{equation} \]

where \(r\) is the length of the measurement vector \(\mathbf{y}_k\).

Let \(y_{ik}\) denote the \(i\)^th element in the measurement vector \(\mathbf{y}_k\) and let \(\mathbf{H}_{ik}\) denote the \(i\)^th row in the \(\mathbf{H}\) matrix. Then,

\[ \begin{align} y_{ik} &= \mathbf{H}_{ik}\mathbf{x}_k + v_{ik} \\ v_{ik} &\sim (0,R_{ik}) . \end{align} \]

It is to be noted that \[ \begin{align} \hat{x}_{0k}^+ &= \hat{x}_k^- \\ \mathbf{P}_{0k}^+ &= \mathbf{P}_k^- \end{align} \]

That is, \(\hat{x}_{0k}^+\) is the estimate after zero measurements have been processed, so it is equal to the priori estimate, and the same applies for the state error covariance matrix.

So, for \(i=1,2,\dots,r\), the update equations will be rewritten as

\[ \begin{align} \mathbf{K}_{ik} &= \mathbf{P}^+_{i-1,k}\mathbf{H}_k^T\left(\mathbf{H}_{ik}\mathbf{P}^+_{i-1,k}\mathbf{H}_{ik}^T+\mathbf{R}_{ik}\right)^{-1} \\ \hat{x}^+_{ik} &= \hat{x}^+_{i-1,k} + \mathbf{K}_{ik}\left(y_{ik} - \mathbf{H}_{ik}\hat{x}^+_{i-1,k}\right) \\ \mathbf{P}^+_{ik} &= \left(\mathbf{I} - \mathbf{K}_{ik}\mathbf{H}_{ik}\right)\mathbf{P}_{i-1,k}^+ \end{align} \tag{4.2}\]

After all \(r\) measurements are processed, we set \(\hat{\mathbf{x}}_k^+ = \hat{x}_{rk}^+\) and \(\mathbf{P}_k^+ = \mathbf{P}_{rk}^+\).

The summary and algorithm can be found in chapter 6 of Simon (2006).

The algorithm assumes that \(\mathbf{R}_k\) is diagonal. However, there is an alternate way to handle non-diagonal \(\mathbf{R}_k\) matrix. It can be referred in Simon (2006). In summary, the sequential Kalman filter can be used if one of the following two conditions hold

The \(\mathbf{R}_k\) is diagonal
The \(\mathbf{R}_k\) is a constant (i.e. time-invariant)

Note

The term sequential filtering is sometimes used synonymously with the Kalman filter. It is synonym for recursive and not sequential which is discussed in this section.

Thus, sometimes the standard Kalman filter is called the batch Kalman filter to distinguish it from the sequential Kalman filter.

Here, the batch represents the vector of measurements \(\mathbf{y}_k\) at given time step, not the batch of time step data.

4.2 Information filtering

Here, the implemented Kalman filter propagates the \(\mathbf{P}^{-1}\) rather than \(\mathbf{P}\).

\(\mathbf{P}\) is the uncertainty in the state estimation, while the \(\mathcal{I} = \mathbf{P}^{-1}\) represents the certainty in the estimation.

Now, from the update equations (Eq. 3.5), the 2^nd measurement update equation for \(\mathbf{P}_k\) can be written as

\[ \begin{align} \mathcal{I}^+_k &= \left(\mathbf{P}_k\right)^{-1} = \left(\mathbf{P}_k^-\right)^{-1} + \mathbf{H}_k^T\mathbf{R}_k^{-1}\mathbf{H}_k \\ &= \mathcal{I}^-_k + \mathbf{H}_k^T\mathbf{R}_k^{-1}\mathbf{H}_k \end{align} \]

And correspondingly the estimation equation for \(\mathcal{I}_k\) is obtained as

\[ \begin{align} \mathbf{P}_k^- &= \mathbf{F}_{k-1}\mathbf{P}_{k-1}^+\mathbf{F}_{k-1}^T+\mathbf{Q}_{k-1} \\ \mathcal{I}_k^- &= \left[\mathbf{F}_{k-1}\left(\mathcal{I}_{k-1}^+\right)^{-1}\mathbf{F}_{k-1}^T+\mathbf{Q}_{k-1}\right]^{-1} \end{align} \]

The above equation can be further expanded, and the expanded form can be obtained from Simon (2006) along with the algorithm for the information filtering.

The key points on this filtering approach are as follows.

The standard Kalman filter requires inversion of an \(r\times r\) matrix were \(r\) is the number of measurements. While information filter require at least a couple of \(n\times n\) matrix inversions, where \(n\) is the number of states. Therefore, if \(r \gg n\) (i.e. more measurements than states, per time) then information filter is more efficient, computationally.
If the initial uncertainty is infinite, it is difficult to numerically set \(\mathbf{P}_0^+=\infty\), but we can set \(\mathcal{I}_0^+ = 0\). It makes information filter to be mathematically more precise. However, the converse is also the drawback for information filter i.e. the case with perfect knowledge of \(\mathbf{x}_0\): \(\mathbf{P}_0^+=0\) while \(\mathcal{I}_0^+ = \infty\).

4.3 Square root filtering

In early applications of Kalman filter, numerical problems arose due to finite and low hardware precision. Square root filtering is a way to mathematically increase the precision when hardware precision is unavailable.

This arose mainly due to inconsistent range of values under same units across states. That is, if one state has cm as unit and its values range between 0 and 1 while the other state will have same unit cm but its range will be higher. This leads to multiple ranges of covariances in \(\mathbf{P}_k\) matrix, affecting its condition number. The matrix \(\mathbf{P}_k\) can be mathematically non-singular but computationally can be singular due to finite precision.

The basic idea behind Square root filtering is to find a matrix \(\mathbf{S}\) such that \[ \mathbf{P} = \mathbf{S}\mathbf{S}^T. \]

\(\mathbf{S}\) is called the square root of matrix \(\mathbf{P}\). There are multiple definitions for matrix square root, like \(\mathbf{P} = \mathbf{S}^2\), but the above one is used in Simon (2006).

The \(\mathbf{S}\) will be propagated instead of \(\mathbf{P}\), although it is expensive. Here, the trade off between computational cost and numerical precision is done.

The Cholesky matrix square root algorithm was given in the book to compute \(\mathbf{S}\).

The square root time-update equation (i.e. the priori equation for \(\mathbf{S}\)) was derived and later the Potter’s square root measurement-update equation (i.e. the posteriori equation for \(\mathbf{S}\)) was derived. An alternate method based on triangularization for Potter’s method was also given.

The key points in Square root filtering are as follows.

The mathematical precision on propagating \(\mathbf{P}\) is increased by propagating \(\mathbf{S}\) instead of \(\mathbf{P}\), which is a matrix square root of \(\mathbf{P}\).
This precision problem could arise only on online mode with embedded systems that have low hardware precision. It is mostly not to worry for offline 64 bit precision computations.

4.4 U-D Filtering

It is another way to increase the numerical precision of the Kalman filter.

The idea is to decompose \(n \times n\) matrix \(\mathbf{P}\) into \(\mathbf{U}\mathbf{D}\mathbf{U}^T\), where \(\mathbf{U}\) is an upper triangular matrix with 1’s on the diagonal and \(\mathbf{D}\) is a diagonal matrix.

The U-D factorization is not much complicated, it is expensive but not as much as the square root filtering.

The method also applies sequential filtering (Section 4.1), so, the following conditions apply here as well.

Either the \(\mathbf{R}_k\) is diagonal, or
The \(\mathbf{R}_k\) is a constant (i.e. time-invariant)

All the equations and algorithm were given in the book.

4.5 Summary

In this chapter, the different choices when implementing Kalman filter was discussed. The choices vary based on the need of accuracy or simplicity. The choices were

Covariance filtering or information filtering
Standard filtering, square root filtering, or U-D filtering
Batch filtering, or sequential filtering

Any of these choices can be made independently of the other choices.