In this post, we explore a probabilistic model of student learning and assessment that can be used to *recommend personalized lesson sequences*. A student using Knewton-powered products currently receives content recommendations one step at a time. The model presented in this blog post is motivated by the idea that such a student could receive additional context and motivation by also seeing the most likely trajectory, or sequence of recommendations, leading to her goals, given her current state.

**Problem**

Students using Knewton-powered adaptive learning products are working in an online environment, acquiring knowledge that enables them to pass specific assignments and achieve their learning goals. These students receive recommendations from Knewton about what to work on, in a sequence that is determined by the work the student has previously done.

The goal of this blog post is to explore a new way of evaluating pathways through content based on their ability to improve student learning. We leverage existing log data (entries that look like *Student A completed Lesson B* and *Student C passed assessment D*) to compare the performance of different lesson sequences. Moreover, our method can be leveraged to generate displayable lesson sequences that maximize educational gains while meeting practical time constraints (i.e., the student has at most days to maximize her probability of passing goal assessments). The additional context these sequences of recommendations provide could enhance students’ motivation, and consequently their overall performance.

**General Approach**

We use historical student-module interactions to learn a joint *probabilistic embedding* of the latent skills of students, skill requirements of assessments, and skill gains from lessons (modulated by prerequisite knowledge). Through offline evaluations on log data, we have shown that this model can successfully distinguish between lesson sequences that lead students to mastery or failure. In this post, we will outline how the model can be used to recommend lesson sequences in real time.

**Model**

Probabilistic embeddings are used for a variety of tasks, including music playlist prediction,^{1} feedback propagation in MOOCs,^{2} visualization of high-dimensional data,^{3} and language modeling for natural language processing.^{4} We are interested in embeddings that perform nonlinear dimensionality reduction on high-dimensional data, and also can be used to learn useful relationships between objects that are not explicitly expressed in the data. We will use a probabilistic embedding to capture the dynamics of student learning and testing, and in the process we will describe a method to recommend personalized lesson sequences. Our approach resembles a proposed sparse factor analysis method.^{5}

In our model, students, lessons, and assessments are jointly embedded in a low-dimensional semantic space that we call the “latent skill space.” Students have trajectories through the latent skill space, while assessments and lessons have fixed locations. Formally, a student is represented as a set of latent skill levels that can change over time; a lesson module is represented as a vector of skill gains and a set of prerequisite skill requirements ; an assessment module is represented as a set of skill requirements .

Students interact with lessons and assessments in the following way. First, a student can be tested on an assessment module with a pass-fail result , where the likelihood of passing is high when a student has skill levels that exceed all of the assessment requirements and vice versa. Second, a student can work on lesson modules to improve skill levels over time. To fully realize the skill gains associated with completing a lesson module, a student must satisfy prerequisites, though fulfilling some of the prerequisites to some extent will result in relatively small gains. Time is discretized such that at every timestep , a student completes a lesson and may complete zero or many assessments. The evolution of student knowledge can be formalized as an input-output hidden Markov model.

There are two equations that characterize the dynamics of the latent skill space: the “assessment result likelihood,” and the “learning update.” Both are discussed in the following two sections.

**Modeling Assessment Results**

For student , assessment , and pass-fail result ,

where (\phi\) is the sigmoid function, and

where . A pass result is indicated by , and a fail by . The term can be rewritten as , where is the angle between and ; it can be interpreted as “relevant skill”. The term can be interpreted as assessment difficulty. The bias term is a student-specific term that captures a student’s general (assessment-invariant and time-invariant) ability to pass; it can be interpreted as a measure of how often the student guesses correct answers. The bias term is a module-specific term that captures an assessment’s general (student-invariant and time-invariant) difficulty. These bias terms are analogous to the bias terms used for modeling song popularity in playlists.^{1}

**Modeling Student Learning from Lessons**

For student who worked on a lesson with skill gains and prerequisites at time , the updated student state is

where and is a constant. The intuition behind this equation is that the skill gain from a lesson should be weighted according to how well a student satisfies the lesson prerequisites. A student can compensate for lack of prerequisites in one skill through excess strength in another skill, but the extent to which this trade-off is possible depends on the lesson prerequisites. The same principle applies to satisfying assessment skill requirements. Figure 0 illustrates the vector field of skill gains possible for different students under fixed lesson prerequisites. Without prerequisites, the vector field is uniform.

**Parameter Estimation**

How do we take a bunch of logged student-module interactions and turn them into an embedding? We compute maximum a posteriori (MAP) estimates of model parameters by maximizing the following objective function:

where is the set of assessment interactions, is the set of lesson interactions, is a regularization term that penalizes the norms of embedding parameters to reduce overfitting, and is a regularization parameter. Non-negativity constraints on embedding parameters are enforced. We solve the optimization problem using the L-BFGS-B algorithm, which extends the popular L-BFGS algorithm to handle box constraints. We randomly initialize parameters and run the iterative optimization until the relative difference between consecutive objective function evaluations is less than . Averaging validation accuracy over multiple runs during cross-validation reduces sensitivity to the random initializations (since the objective function is non-convex).

**Toy Examples**

To illustrate how an embedding can capture the underlying geometry of different scenarios, we conducted a series of experiments on small, synthetically-generated interaction histories.

**Application**

We have some evidence that an embedding can make good sequence recommendations in an offline evaluation. How can we use an embedding to make sequence recommendations in an online setting? Formally, we must address the following problem: *Given a student’s current skill levels and a set of assessments the student wants to pass, what is the optimal sequence of lessons for the student?* We can formulate the problem two ways: specify a minimum pass likelihood for the goal assessments and find the shortest path to mastery, or specify a time constraint and find the path that leads to the highest pass likelihood for the goal assessments while not exceeding the prescribed length.

In general, both problems can be tackled by using a trained Latent Skill Embedding model to specify a Markov Decision Process, in which the possible states are given by the student embedding, the set of possible actions corresponds to the set of lesson modules that the student can work on, the transition function is the learning update (Equation 2), and the reward function is the likelihood of passing all goal assessments (Equation 1) discounted by the expected time taken to complete the lesson sequence. Various methods for finding the optimal policy for an MDP with a continuous state space, discrete action space, and probabilistic transitions have been explored in the literature.^{6}

For an embedding model without lesson prerequisites, the order in which lessons are completed is irrelevant since sums are commutative. Time-constrained lesson recommendation thus reduces to the 0-1 knapsack problem. In this case, each lesson contributes skill gains, but costs the student an expected amount of time,which can be predicted using matrix factorization. The 0-1 knapsack problem can be solved efficiently using dynamic programming.

In practice, content recommendations should intersperse assessments in the lesson sequence to keep students engaged and make sure they are learning. We are currently working on an algorithm that incorporates this idea into the sequence optimization problem.

**Implementation**

A Python implementation of the Latent Skill Embedding model is available at https://github.com/Knewton/lentil. The IPython notebooks used to perform data exploration, model evaluations, sensitivity analyses, and synthetic experiments are available at https://github.com/rddy/lentil. The data sets from Knewton used for these experiments are not available to the public. The project will be maintained at http://siddharth.io/lentil.

**Authors**

Siddharth Reddy is a junior at Cornell University, studying computer science and applied mathematics. He’s interested in machine learning, adaptive education, and startups.

This work was a collaboration between Siddharth Reddy, Igor Labutov, and Thorsten Joachims at Cornell University, in conjunction with the data science team at Knewton. The project was presented at the Machine Learning for Education workshop at ICML 2015 (see http://dsp.rice.edu/ML4Ed_ICML2015 for details).

**References**

- Shuo Chen, Joshua L Moore, Douglas Turnbull, and Thorsten Joachims.

Playlist prediction via metric embedding. In Proceedings of the 18th ACM

SIGKDD international conference on Knowledge discovery and data mining,

pages 714–722. ACM, 2012.

- Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran

Sahami, and Leonidas Guibas. Learning program embeddings to propagate

feedback on student code. arXiv preprint arXiv:1505.05969, 2015.

- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.

Journal of Machine Learning Research, 9(2579-2605):85, 2008.

- Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities

in continuous space word representations. In HLT-NAACL, pages 746–751,

2013.

- Andrew S Lan, Christoph Studer, and Richard G Baraniuk. Time-varying

learning and content analytics via sparse factor analysis. In Proceedings of

the 20th ACM SIGKDD international conference on Knowledge discovery

and data mining, pages 452–461. ACM, 2014.

- Marco Wiering and Martijn van Otterlo. Reinforcement Learning: State-of-the-art,

volume 12. Springer Science & Business Media, 2012.