State Space Models for Time Series Analysis and the dlm package

State space modelling is a popular technique for forecasting and smoothing time series data. There are two main components which make up state space models, an observed data and the unobserved states,

    \[ \begin{aligned} \text{observed data:  }& y_1, \dots, y_T \\ \text{unobserved data:  }& x_1, \dots, x_T \end{aligned} \]

The observed data are conditionally independent given the states. The transition probabilities of the states are defined as P(x_t|x_{t-1}) implying the Markov property of the probability of moving to the next state is only dependent on the previous state for all t=1, \dots, T. The probability of observing y at time t is given by P(y_t|x_t) implying the observed data is conditional on the current state.

Dynamic linear models are a special case of state space models where the errors of the state and observed components are normally distributed. The formulation of these models is as follows

    \[ \begin{aligned} x_0 & \sim N(m_0, C_0) \\ x_t|x_{t-1} & \sim N(G_t x_{t-1}, W_t) \\ y_t|x_t & \sim N(F_t x_t, V_t) \end{aligned} \]

Where

  • F_t is a p \times m matrix
  • G_t is a p \times p matrix
  • V_t is an m \times m matrix
  • W_t is a p \times p matrix

The formulation as a set of equations is given by

    \[ \begin{aligned} y_t &= F_t x_t + \nu_t \\ x_t &= G_t x_{t-1} + \omega_t \end{aligned} \]

where the errors are iid

    \[ \begin{aligned} \nu_t &\sim N(0, V_t) \\ \omega_t &\sim N(0, W_t) \\ x_0 &\sim N(m_0, C_0) \end{aligned} \]

Under this formulation there is huge flexibility and can specify a wide range of models such as high order polynomials, regression and arima models with seasonal components.

Time series analysis with the DLM package and the Kalman filter

The dlm package in r is fantastic. It is modular so you have the freedom to build models with multiple components for example you can specify a linear trend model with a quarterly seasonal component. In this example we will look at forecasting the co2 and air passenger data by specifying our own state space model. This will include applying a Kalman filter, Kalman smoothing, estimation of the parameters and finally forecasting the 6 years of observations.

We should all know the co2 data

data(co2)
suppressPackageStartupMessages(library(forecast)) 
suppressPackageStartupMessages(library(dlm))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(zoo))
suppressPackageStartupMessages(library(gridExtra))

# the co2 dataset
ggplot(data.frame(x=index(co2), y=co2), aes(x=x, y=y)) + geom_line()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

plot of chunk unnamed-chunk-29

 

It’s a classic data set with a very obvious increasing trend and cyclic component. There is 1 observation for each month, therefore we’re looking at a 12 order seasonal component. Data like this doesn’t usually happen in the real world! It’s often much noisier and unclear what the underlying pattern is. Let’s make this more interesting by adding in a bit of noise.

# noisy co2
co2 <- co2 + rnorm(length(co2), 0, 2)
ggplot(data.frame(x=index(co2), y=co2), aes(x=x, y=y)) + geom_line()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

plot of chunk unnamed-chunk-30

 

Much better! Firstly we’ll write a function to build the state space model.

# forecasting using state sapce models
model.build <- function(p) {
    return(
      dlmModPoly(2, dV=p[1], dW=p[2:3]) +
      dlmModSeas(12, dV=p[4])
    )
}

 

If we know the parameters we can plug them straight into the model, however that’s not usually the case. Instead we estimate them using MLE.

model.mle <- dlmMLE(co2, parm=c(0.1, 0, 1, 1), build=model.build)
if(model.mle$convergence==0) print("converged") else print("did not converge")
## [1] "did not converge"
model.mle$par
## [1] 9.036397e-01 0.000000e+00 9.934028e-06 1.803640e+00

 

It’s best practice to check the convergence of the MLE procedure. As with any MLE process we need to have appropriate initial starting points to ensure the algorithm will converge to the right maximum. The dlmMLE function calls the base R function optim so there is some flexibility in optimisation techniques if needed. These ML estimates are now passed to the model function to build the final parameterisation of the model.

model.fit <- model.build(model.mle$par)

 

Now we can apply a Kalman filter to the data using the dlmFilter function.

model.filtered <- dlmFilter(co2, model.fit)

 

It’s worth spending some time with this output to understand what has been calculated. There are 3 main output components:

  1. a is the state forecast for one time step and is calculated by

        \[a_t = G_t m_{t-1}\]

    where

        \[x_t|y_{t-1} \sim N(a_t, R_t)\]

    and

        \[R_t = G_t C_{t-1} G_t^T + W_t\]

  2. f is the forecast step of the observations and is calculated by

        \[f_t = F_t a_t\]

    where

        \[y_t|y_{t-1} \sim N(f_t, Q_t)\]

    and

        \[Q_t = F_t R_{t-1} F_t^T + V_t\]

  3. m is the posterior and is calculated by

        \[m_t = R_t f_t Q_t^{-1}(y_t-f_t)\]

    and

        \[C_t = R_t - R_t F_t^T Q_t^{-1} F_t R_t\]

The a and m components have 13 columns where columns 3 to 13 are the seasonal components of the model. The first 2 columns are the linear trend components. This is convenient to know since we can easily plot the trend without the seasonal component if we wish. If we wish to plot the forecast observations we would use f. Just to convince ourselves that this is correct let’s calculate a from the output and check.

all(model.filtered$m %*% t(model.fit$GG) == model.filtered$a) 
## TRUE

 

We can also add a smoother to the data as follows.

# kalman smoother
model.smoothed <- dlmSmooth(co2, model.fit)

 

The main output here is s. Again there are 13 columns, the first 2 from the linear trend model and the last columns from the seasonal component. Plotting these together we can see the relationship and accuracy of the forecasting.

n <- 6*12
model.forecast <- dlmForecast(model.filtered, nAhead=n)

x <- index(co2)
xf <- seq(max(x), max(x)+n/12, 1/12)
aa <- model.forecast$a[,-1]*(-1)
aa <- cbind(model.forecast$a[,1], aa)
a <- drop(model.forecast$a%*%t(FF(model.fit)))
a <- c(tail(co2,1), a)
df <- rbind(
  data.frame(x=x, y=as.numeric(co2), series="original"),
  data.frame(x=x, y=apply(model.filtered$m[-1,1:2], 1, sum), series="filtered"),
  data.frame(x=x, y=apply(model.smoothed$s[-1,1:2], 1, sum), series="smoothed"),
  data.frame(x=xf, y=a, series="forecast")
)
g.dlm <- ggplot(subset(df, x>1970), aes(x=x, y=y, colour=series)) + geom_line()
g.dlm

plot of chunk unnamed-chunk-37

 

As you can see this forecasts fairly well. The filtered and smoothed series are pretty similar. Here we have ignored the seasonal components and just plot the trend components to demonstrate it’s utility.

For reference the same data will be forecast using the a traditional ARIMA model.

# forecasting using arima
model <- auto.arima(co2)
model.forecast <- forecast(model, h = 6*12)
plot(model.forecast)

plot of chunk unnamed-chunk-38

 

Interestingly it doesn’t pick up on the seasonal component like the state space model. We could formulate the ARIMA model manually to improve on this (usually auto.arima does a pretty good job though).

Air passenger data

The air passenger dataset is another classic. It has an obvious increasing trend and an annual seasonal cycle, therefore we can essentially use the SSM we build above, only re-estimating the parameters. We do need to apply a log transform to the data first to counter the heteroscedastic variance.

air <- AirPassengers
log.air <- log(air)
grid.arrange(
  qplot(x=index(air), y=air, geom = "line"),
  qplot(x=index(air), y=log.air, geom = "line"),
  nrow=2
)
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

plot of chunk unnamed-chunk-39

 

Now applying the state space model as we did above for the co2 data. For this data we’ll forecast the last 2 years and compare with the observed data.

# forecasting using state sapce models
model.build <- function(p) {
  return(
    dlmModPoly(2, dV=p[1], dW=p[2:3]) + 
      dlmModSeas(12, dV=p[4])
  )
}

#log.air <- log(air) + rnorm(length(log.air), 0, 0.15)
log.air <- log(air)
train <- log.air[1:120]
test <- log.air[121:144]

model.mle <- dlmMLE(train, parm=c(1, 1, 1, 1), build=model.build)
model.fit <- model.build(model.mle$par)
model.filtered <- dlmFilter(train, model.fit)
model.smoothed <- dlmSmooth(train, model.fit)

n <- 2*12
model.forecast <- dlmForecast(model.filtered, nAhead=n)

x <- index(log.air)
a <- drop(model.forecast$a%*%t(FF(model.fit)))
df <- rbind(
  data.frame(x=index(log.air), y=as.numeric(log.air), series="original"),
  data.frame(x=x[1:120], y=apply(model.filtered$m[-1,1:2], 1, sum), series="filtered"),
  data.frame(x=x[1:120], y=apply(model.smoothed$s[-1,1:2], 1, sum), series="smoothed"),
  data.frame(x=x[121:144], y=a, series="forecast")
)
g.dlm <- ggplot(subset(df, x>1950), aes(x=x, y=y, colour=series)) + geom_line()
g.dlm

plot of chunk unnamed-chunk-40

 

The forecast series and the original observed series are very close. In this case the state space model and done very well in forecasting the time series.

Summary

In summary here we have

  1. Formulated state space models under the dynamic linear model framework where the errors are assumed to be normally distributed.
  2. Forecast a noisy co2 series with a seasonal and trend component and
  3. Forecast the air passenger data using the linear trend plus seasonal model.

The dlm package is very flexible to handle a number of different model specifications. What I really like about this package is that it’s not a black box, you are required to think about the best specification for your model and interpret the output in context of what you aim to achieve.

There is a lot more to explore with this package such as implementing a regression component, ARIMA model under a state space framework and MCMC approaches for parameter estimation. I aim to dive deeper in later posts.

Follow me on social media: