$$ x^+_0 = E[x_0] = \frac{1}{N}\sum_{i=0}^{N}x_i $$
$$ \boldsymbol{P}^+_0 = E \left[ (x_0 - x^+_0)(x_0 - x^+_0)^T \right] $$
The initialization of the EKF occurs at $k=0$. The variable $x^+_0$ is defined as being the mean value of $x$ over N iterations. Following this, $x_0$ is derived from the IMU measurements and used to obtain $P^+_0$.
$$ \hat{x}_{k|k-1} = f(\hat{x}_{k-1|k-1}, u_k) \qquad \rightleftharpoons \qquad \begin{aligned} x_k^- & = \boldsymbol{\Phi}(x^+_{k-1}, u_{k-1}) \\ & = \boldsymbol{A}_{k-1} x^+_{k-1} + \boldsymbol{B}_{k-1}u_{k-1} \end{aligned} $$ $$ x_k^- [7\times1] = [7\times7] [7\times1] + [7\times3][3\times1] $$
The first step of the EKF Algorithm is to produce an estimated value $x^-_k$. The inputs for this step is the previous value $x^+_{k-1}$, and the Gyroscope measurements for the previous time step, $u_{k-1}$, which should be in radians/second.
The A and B matrices are application dependent, however for an IMU implementation containing an Accelerometer, Gyroscope and Magnetometer, they are shown below:
$$ A_{k-1} = \begin{bmatrix} I_{4x4} & -\frac{T}{2}S(q) \\ 0_{3x4} & I_{3x3} \\ \end{bmatrix}_{k-1} = \begin{bmatrix} 1 & 0 & 0 & 0 & -\frac{T}{2}(-q_1) & -\frac{T}{2}(-q_2) & -\frac{T}{2}(-q_3) \\ 0 & 1 & 0 & 0 & -\frac{T}{2}(\phantom{-}q_0) & -\frac{T}{2}(-q_3) & -\frac{T}{2}(\phantom{-}q_2) \\ 0 & 0 & 1 & 0 & -\frac{T}{2}(\phantom{-}q_3) & -\frac{T}{2}(\phantom{-}q_0) & -\frac{T}{2}(-q_1) \\ 0 & 0 & 0 & 1 & -\frac{T}{2}(-q_2) & -\frac{T}{2}(\phantom{-}q_1) & -\frac{T}{2}(\phantom{-}q_0) \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}_{k-1} \qquad x^+_{k-1} = \begin{bmatrix} q_0 \\ q_1 \\ q_2 \\ q_3 \\ b_x \\ b_y \\ b_z \\ \end{bmatrix} $$
$$ B_{k-1} = \begin{bmatrix} \frac{T}{2}S(q) \\ 0_\mathrm{3x3} \\ \end{bmatrix}_{k-1} = \frac{T}{2} \begin{bmatrix} -q_1 & -q_2 & -q_3 \\ \phantom{-}q_0 & -q_3 & \phantom{-}q_2 \\ \phantom{-}q_3 & \phantom{-}q_0 & -q_1 \\ -q_2 & \phantom{-}q_1 & \phantom{-}q_0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix}_{k-1} \qquad u_{k-1} = \begin{bmatrix} w_x \\ w_y \\ w_z \\ \end{bmatrix}_{k-1} $$
Where $S(q)$ is the system dynamic equation using Quaternions.
$$$$
Next, we compute the estimated Covariance matrix, $P^{-}_{k}$. Where $Q_{k-1}$ is the Process noise covariance.
$$ \boldsymbol{P}_k^- = \boldsymbol{A}_{k-1} \boldsymbol{P}^+_{k-1} \boldsymbol{A}^T_{k-1} + \boldsymbol{Q}_{k-1} $$
$$ P_k^- [7\times7] = [7\times7][7\times7][7\times7]^T + [7\times7] $$
The first step is to calculate the Measurement Innovation, which is the error between the measurement values and the rotated estimation of the predicted values.
$$ \tilde{y}_k = \left[ z_k - h_k(\hat{x}_{k|k-1}) \right] \qquad \rightleftharpoons \qquad d_k = \left[ z_k - h_k(x^-_k)V_x^{ref} \right] = \left[ z_k - d^-_k \right] $$
$$ d_k[6\times1] = \left( [6\times1] - \begin{bmatrix} \left[3\times3\right] & \left[3\times1\right] \\ \left[3\times3\right] & \left[3\times1\right] \\ \end{bmatrix} \right) = [6\times1] - [6\times1] $$
The measurement function, $h_{k}\left(*\right)$, which is typically a transformation from the calculated coordinate system to the measurement body coordinate system. Due to our predicted quaternion, $x^-_k$, being calculated in the NED-Frame (Global), we must rotate the quaternion from the NED-Frame to the Body-Frame (Local) using $R^b_n$.
$$ h(q_k) = h(\hat{x}_{k|k-1}) = h(x^-_k) = R^b_n $$
We then multiply the Rotated matrix by the reference vector, $V^{ref}_x$. The $V^{ref}_x$ vectors are the North and Down vectors for the NED-coordinate frame, as defined below:
$$ V^{ref}_x = \begin{bmatrix} V_x \\ V_y \\ V_z \end{bmatrix} \qquad \rightarrow \qquad V^{ref}_N = \begin{bmatrix} M_x \\ M_y \\ 0 \end{bmatrix} \qquad V^{ref}_D = g* \begin{bmatrix} \phantom{-}0 \\ \phantom{-}0 \\ -1 \end{bmatrix} $$
$$$$
Next, we are able to calculate the Innovation Covariance matrix, $S_k$: $$ \boldsymbol{S}_k = \boldsymbol{H}_k \boldsymbol{P}_{k|k-1} \boldsymbol{H}^T_k + \boldsymbol{R}_k \qquad \rightleftharpoons \qquad \boldsymbol{S}_k = \boldsymbol{H}_k \boldsymbol{P}^-_k \boldsymbol{H}^T_k + \boldsymbol{R}_k $$
$$ \boldsymbol{S}_k [6\times6] = [6\times7][7\times7][7\times6] + [6\times6] $$
Where $H_k$ is the Jacobian matrix of the measurement function, $h(*)$.
$$ H_k = \left. \frac{\partial h}{\partial x}\right\vert_{\hat{x}_{k|k-1}} = \left.\frac{\partial h (x_k, u_k)}{\partial x}\right\vert_{x_k = \hat{x}^-_k} $$
The generalized Jacobian Matrix, $H_k$, may be calculated using a reference vector as shown below. The predicted value of the quaternion, $x^-_\mathrm{k}$ is contained within the calculation of the Jacobian.
$$ H_k = \left.\frac{\partial h^V(q_k)}{\partial(q_k)} \right\vert_\mathrm{q_\mathrm{k|k-1}} = \left.\frac{\partial h(x_k,u_k)}{\partial x}\right\vert_\mathrm{x_k=x^-_k} $$
$$ H_k = 2 * \begin{bmatrix} \begin{bmatrix} \phantom{-}q_0 & \phantom{-}q_3 & -q_2 \end{bmatrix} V_x^\mathrm{ref} & \begin{bmatrix} \phantom{-}q_1 & \phantom{-}q_2 & \phantom{-}q_3 \end{bmatrix} V_x^{ref} & \begin{bmatrix} -q_2 & \phantom{-}q_1 & -q_0 \end{bmatrix} V_x^{ref} & \begin{bmatrix} -q_3 & \phantom{-}q_0 & \phantom{-}q_1 \end{bmatrix} V_x^{ref} \\[0.5em] \begin{bmatrix} -q_3 & \phantom{-}q_0 & \phantom{-}q_1 \end{bmatrix} V_x^{ref} & \begin{bmatrix} \phantom{-}q_2 & -q_1 & \phantom{-}q_0 \end{bmatrix} V_x^{ref} & \begin{bmatrix} \phantom{-}q_1 & \phantom{-}q_2 & \phantom{-}q_3 \end{bmatrix} V_x^{ref} & \begin{bmatrix} -q_0 & -q_3 & \phantom{-}q_2 \end{bmatrix} V_x^{ref} \\[0.5em] \begin{bmatrix} \phantom{-}q_2 & -q_1 & \phantom{-}q_0 \end{bmatrix} V_x^{ref} & \begin{bmatrix} \phantom{-}q_3 & -q_0 & -q_1 \end{bmatrix} V_x^{ref} & \begin{bmatrix} \phantom{-}q_0 & \phantom{-}q_3 & -q_2 \end{bmatrix} V_x^{ref} & \begin{bmatrix} \phantom{-}q_1 & \phantom{-}q_2 & \phantom{-}q_3 \end{bmatrix} V_x^{ref} \end{bmatrix}_{k} $$
Where $V_x^\mathrm{ref}$ are the reference vectors defined above.
$$$$
Now we are able to calculate the Kalman Gain by using the predicted Covariance matrix, $P^-_k$.
$$ \boldsymbol{K}_{k} = \boldsymbol{P}^{-}_{k} \boldsymbol{H}^{T}_{k} \left( \boldsymbol{H}_{k} \boldsymbol{P}^{-}_{k} \boldsymbol{H}^{T}_{k} + \boldsymbol{R}_k \right)^{-1} $$
$$ \boldsymbol{K}_k[7\times6] = [7\times7][7\times6]\left([6\times7][7\times7][7\times6] + [6\times6]\right)^\mathrm{-1} = [7\times6]\left( [6\times6]\right) $$
$$$$
Once the Kalman Gain has been determined, we can use it to calculate our current state estimate, $x^+_k$, by using the previously predicted estimate state,$x^-_k$, and our measurement innovation, $d_k$.
$$ x^+_k = x^-_k + \boldsymbol{K}_k [ d_k ] = x^-_k + \boldsymbol{K}_k \left[ z_k - h_k(x^-_k)V^\mathrm{ref}_x \right] $$
$$ x^+_k[7\times1] = [7\times1] + [7\times6][6\times1] $$
$$$$
Next, we use the estimated value of $x^-_k$, the Kalman Gain, $K_k$, the measured values of the sensors, $z_k$, and the estimated values of the sensors based on the rotated value of the predicted quaternion, $x^{-}_{k-1}$.
$$ \boldsymbol{P}^+_k = \left( \boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k \right) \boldsymbol{P}^-_k $$
$$ \boldsymbol{P}^+_k[7\times7] = \left([7\times7] - [7\times6][6\times7]\right)[7\times7] = [7\times7][7\times7] $$
Now we can calculate the Covariance Matrix, $P_k$. The identity matrix, I, is used to limit the divergence of the EKF.
$$$$
After updating the system state variables, $x^+_k$ and $P^+_k$, we are able to calculate the system residual error, $\varepsilon_k$.
$$ \varepsilon_k = \left[ z_k - h_k(x^+_k)V^\mathrm{ref}_x \right] $$
$$ \varepsilon_k[6\times1] = \left( [6\times1] - \begin{bmatrix} \left[3\times3\right] & \left[3\times1\right] \\ \left[3\times3\right] & \left[3\times1\right] \\ \end{bmatrix} \right) = [6\times1] - [6\times1] $$
With the system residual error, $\varepsilon_k$, we can now update the Process Noise Covariance, $Q_k$, and Measurement Noise Covariance, $R_k$, using the equations below.
$$ \boldsymbol{R}_k = \alpha\boldsymbol{R}_{k-1} + (1-\alpha)\left( \varepsilon_k \varepsilon^T_k + \boldsymbol{H}_k \boldsymbol{P}^-_{k} \boldsymbol{H}^T_k \right) \qquad \boldsymbol{Q}_k = \alpha\boldsymbol{Q}_{k-1} + (1 - \alpha)\left( \boldsymbol{K}_k d_k d^T_k \boldsymbol{K}^T_k \right) $$
Following this, we can update our previous value variables, $x_\mathrm{k-1}$ and $P_\mathrm{k-1}$.
$$ x^{+}_{k-1} = x^+_k \qquad \boldsymbol{P}^{+}_{k-1} = \boldsymbol{P}^+_k $$