The Kalman filter is an algorithm that estimates the values of unknown, or latent, variables from a series of noisy measurements taken over time. The Kalman filter has numerous applications in finance and different areas of engineering, and in this blog post, we will show how the Kalman filter can apply to student learning. We will use a Kalman filter to infer an approximation to a student’s ability given their response times to a set of questions.
To fully understand the concepts below, you’ll need a background in basic probability and statistics.
To begin, we are going to describe our inference problem and a model of the system – a set of assumptions of how student learning works.
Given a series of scalar measurements:
where denotes the natural logarithm of the time it took our student to answer question ,
We want to infer the scalar latent variables:
where the student has ability at the time question is answered.
We model the change in student ability over time as a Gaussian random walk, meaning that the current ability value is based on the previous ability value, plus some Gaussian noise:
, where is drawn from , where is the time the student spent between questions, and is a hyperparameter that corresponds to the variance for this latent process (1).
Having the variance of the noise increase linearly with the time difference makes our equation consistent with a continuous Gaussian random walk, and is consistent with our intuition that a student’s ability needs time to change. In other words, a student is unlikely to experience significant change in ability if they’re between questions in a quiz, but if it’s been a day since they’ve answered a question, they’re more likely to have taken the time to learn more about the concept, or, conversely, they might have forgotten the material. Because the latent state variance is time-dependent, our filter is technically called a hybrid Kalman filter, since it assumes a continuous-time model for a discrete set of observations.
We don’t assume that the student ability accounts for all the variability in the log of the student response time, . For example, it’s possible that the student is familiar with this particular problem or is distracted by her environment. Therefore, we say that the observed times are likewise corrupted by some Gaussian noise, :
, where is drawn from , where is a hyperparameter that corresponds to the variance of the Gaussian noise (2).
The resulting model is pictured by the diagram above: the ability of the student at the previous question determines the log response time . The ability of the student at current question is determined by the ability at the previous question , and determines the current log response time, .
The Kalman filter is an algorithm for estimating each a posteriori — that is, it computes an estimate for each given observations . It does this recursively, which means that it only needs an estimate of , along with observation , to output an estimate for .
To make a new estimate, we first need to compute two intermediate pieces of information: a prior distribution and a likelihood distribution.
A note on syntax:
denotes our observations through , and represents our estimate of ability at the k-1th question given observations ; likewise represents our estimate of the variance given observations .
denotes the Gaussian probability density function (pdf) with mean and variance :
Calculating our prior distribution
Our prior term represents the knowledge we have of current latent state having seen everything but our current observation .
To calculate our prior, we start off with an estimate of , represented as a Gaussian distribution with mean and variance :
From equation 1, we see that is simply the addition of two independent random Gaussian random variables, and .
From probability theory, we know that the probability density of the addition of two independent random variables can be expressed as a convolution of the two composite probability densities. It happens that the convolution of two Gaussians is also a Gaussian:
We call the prior knowledge we have about . The next step is to look at the current observation, and see what information it adds.
Calculating our likelihood distribution
Our likelihood term represents the information our current observation gives us about our latent state .
From equation 2, we see that likelihood of our observation given our hidden variable is simply a Gaussian centered at . This becomes our likelihood term:
Combining prior and likelihood
The Kalman filter combines the prior knowledge we have about and our likelihood term in accordance with Bayes’ rule, by multiplying the prior term with the likelihood term. We call this resulting distribution the posterior, or our estimate of given all the information we have.
Luckily, the multiplication of two Gaussians is still a Gaussian, although unnormalized:
, where is a normalizing constant, and where:
To summarize, given a Gaussian posterior distribution for (Equation 3) and a new observation , the Kalman filter estimates a new Gaussian posterior for (Equation 6). By updating the Kalman filter as we receive new observations, we can obtain fast, real-time estimates of our latent state.
Open Source Code
An open source implementation of this hybrid Kalman filter algorithm is on Knewton’s GitHub:
Sophie Chou is a senior at Columbia University majoring in Computer Science. She’s interested in Machine Learning, Natural Language Processing, and becoming a better teacher. You can follow her on twitter @mpetitchou.
Andersen Chen is a senior at Brown University, majoring in Mathematics and Computer Science. He’s interested in data science, startups, and machine learning.
What's this? You're reading N choose K, the Knewton tech blog. We're crafting the Knewton Adaptive Learning Platform that uses data from millions of students to continuously personalize the presentation of educational content according to learners' needs. Sound interesting? We're hiring.