Redux at Knewton

About two years ago it became apparent that our frontend architecture was showing its limits. We dreaded adding new features because of side-effect bugs. Changing a dropdown could break unrelated parts of the UI. These issues occurred mostly because of the way we managed our application state, or actually didn’t: the DOM was the state.

It’s always the state

It’s easy to fall into this trap: you start with a simple frontend application with little to no javascript, because you set out to make an application that could work without javascript enabled. Then one day you need some form of widget that native html doesn’t support, so you add a jquery plugin, and then another, and then a wizard form (we called it warlock because we’re funny). Before you know it, your application is a patchwork minefield of jquery plugins, orchestrated by a master script that looks for data attributes or even worse: class names.

we-have-a-problem

Knewton, we have a problem

React

So we started looking into React. This is a commit from March 2015:

commit 89bca222aa25aca6f405be3b608acc189da9c739
Author: Desperate Engineer
Date:   Fri Mar 27 15:06:27 2015 -0400

    [FIX] Making the class warlock behave
    React save our souls

The next week we started rewriting our code base to use React. In our new state management container components were in charge of maintaining an internal state and defining callbacks, and presentational components were turning our state into pretty boxes and widgets. Container components retrieved the data from the backend, stored it along with some metadata like the request state in their internal state, and then rendered a loader icon or the content. Our React templates were declarative and composable, and lived in the same place as the branching logic. We were even unit testing frontend code with Jest!

Then the state got out of hand again.

The container component approach works well when the presentational components that require data are close to the container components that maintain it (in the react component tree). But what happens if two distant components need the same data? You have to either duplicate the data, or make a container component high enough in the tree to be a common ancestor. The first option is bad because you can have inconsistencies and you must maintain the state in two places, and the second option is tedious because you have to pass down props through a lot of components that don’t need them.

An alternative is to extract the state out of the component tree, and make containers subscribe to it. This way you can have container components close to the presentational components, but the container components are only replicas of the actual state. This is the approach in the Flux architecture. This state outside of the component tree is called a store in Flux.

Store in a component tree

Adding a store

Sort of Flux

In addition to having a global store, Flux uses a one-way data flow, which is a fancy way of saying that components are not allowed to update the store directly: components must send actions that a dispatcher will process to update the store. This approach was new to a lot of web engineers, but it was not exactly a new idea.

Our first attempt at (partially) implementing Flux relied on a store mixin, a deprecated React pattern that shares behaviors across components by overloading component lifecycle methods.

It solved our problems for simple use cases, but our Teacher dashboard data could be updated by multiple components. Our mixin solution did not implement unidirectional data flow, so this ownership was shared between the container components and even some presentational components. In spite of code reviews, we also had slightly different styles in the way we wrote components, and the sources lacked structure. We did not always know where a callback that changed the state was implemented.

Flux pattern

Flux pattern

Because the state didn’t have a cohesive way of being updated, we had to rely on more mixins and shared functions to make components interact with the store. We needed the actions and the dispatcher from Flux.

Redux

Redux is a javascript library implementing Flux. It uses the context to pass down the store to a container component wrapped in a high order component. The high order component pattern is a component that wraps another on. It relies on its own lifecycle methods. Facebook explained why they decided to deprecate mixins in this post: Mixins Considered Harmful.

Programming modern web applications is hard. Just like React, Redux is not a silver bullet, and just like React, Redux has helped us solve problems that can be solved without it. It bring its own set of challenges, and after about 8 months of Redux, there are things that we’re happy about, and things that still need improvement.

Open Source versus Homemade, and the occasional contributor

Unless you have a lot of resources, open source solutions will often be superior to homemade solutions like our store mixin. It’s simple really: more people are using it, more people are encountering the same bugs, and more people struggled with it so there is better documentation. In the case of Redux, the documentation is exemplary, and is probably one the biggest reasons behind Redux’s success over other Flux libraries. Dan Abramov (co-author of Redux) also released a series of video tutorials. This critical in our team, where both backend and frontend engineers contribute to the frontend code. Having documented standards and tutorials helps to speed up on-boarding, and requires less support from frontend engineers. It becomes easier for occasional contributors to jump in the code, look at the way things were done, and fix a bug or implement a new feature.

Redux encourages splitting out features into multiple files: one for the reducer (Flux’s dispatcher), one for the actions and the action creators, and one for the connected containers. In addition to these files, we also have route files for react-router, and saga files (more on this later). This is tedious at first, because you need to maintain more files, but it creates an explicit pattern that makes the sources easier to navigate. This structure makes it harder to write code in the wrong place.

Another benefit of using an open source library is the tooling and the middleware extensions developed by the community. The Redux chrome extension displays the actions and the state in real time, and can undo actions or display the diffs between two states. We use middleware to handle things that Redux does not support, like asynchronous actions.

How we finally tamed state management

Redux embraces the immutable app architecture (described in this excellent talk by Lee Byron). To keep state management simple, redux recommends to return a new state after every action rather than mutating the current one. We do not use immutable data structures, but instead we rely on a combination of code reviews and a redux middleware in our dev environment to catch mistakes: redux-freeze. It is not as robust as using immutable data structures like the ones offered by ImmutableJs, but it allows us to use native js data structures, and we’re ok with the trade-offs.

Every action processed by a reducer produces a new state instead of mutating the old one. This can be costly in terms of performances if you use the wrong approach like deep copies. ImmutableJs can help enforce immutability with little to no performance penalty, but at the cost of having to translate your data structures back and forth. Since we don’t use ImmutableJs, we have to make sure every reducer function is a pure function. By using structural sharing like ImmutableJs does under the hood, we can achieve decent performance, as most of the tree is reused while only nodes with changes are replaced.

const defaultState = {
    step: 0,
    finalStep: 4
}
function carouselReducer(state = defaultState, action) {
    switch(action.type) {
        case CAROUSEL_NEXT:
            return {
              ...state,
              step: Math.min(state.step + 1, state.finalStep)
            }
        case CAROUSEL_PREVIOUS:
            return {
              ...state,
              step: Math.max(state.step - 1, 0)
            }
        default:
            return state
    }
}

The object property spread feature ({...state}) is helpful to keep reducers concise and readable. It’s not a standard yet, but it is cleaner than the vanilla javascript equivalent:

case CAROUSEL_NEXT:
    return Object.assign({}, state,
      {
        step: Math.min(state.step + 1, state.finalStep)
      }
    )

Normalization and routing

Redux also helped us normalize our state. Instead of making nested payloads, we now return flatter payloads that will be normalized in the state. This normalization translates into reducers storing dictionaries of entities indexed by id. They can be used as a cache: if the entity does not need to be fresh and is already in the store, we can reuse it. Our payloads are smaller since entities can be referenced by id.

It seems like a minor victory for the memory footprint, but the biggest difference comes from the number of requests we need to make. Requests to the backend slow down the experience considerably, especially expensive analytics calls or history requests in our dashboards. Since our application is mostly single page with react-router, we were frequently loading data that had already been loaded. By normalizing it in the state, we are able to reuse it across the screens, and only load the missing analytics. This also means that you can start showing some of the data before everything is loaded, giving the user a snappier experience.

indexed reducer

A reducer with courses indexed by id

Although we do not use it at Knewton, normalizr can help automate the normalization by describing the API’s schemas.

Asynchronous actions

With React, the standard approach to loading data is to use componentDidMount() in a container component. When the component mounts, componentDidMount() loads the data asynchronously with jQuery or the fetch API. You can use the state of the component to keep track of the status of the request to display a loading icon or store an error message.

With Redux however, the idea is to store as little data as possible in the state of Components. What if two components need access to the same data? Which one should trigger a request? How do you even trigger a request and update the store?

We have two problems on our hands here:
1. How to make asynchronous requests to a backend using redux (what actions are you firing?)
2. What is triggering these requests to load data (componentDidMount in a container? which one is responsible then?)

It’s easier to explain Redux with a todo list app, but as soon as you start working on a real world application you will invariably run into these problems.

To solve them, most people add one of these middleware libraries to their Redux setup:
– the redux doc suggests using redux-thunk. With redux thunk, components can dispatches functions (thunks) instead of simple actions, and this function will be in charge of dispatching the actions.
redux-saga‘s approach is to watch the actions being fired and trigger side effects (sagas). This is a simple yet powerful approach, and it means components still dispatch regular actions instead of dispatching complex flows. Interestingly, redux-saga uses generators to handle asynchronicity, which makes the sagas look more like synchronous code.
– Netflix uses redux-observable, like redux-saga it focuses on triggering side-effects when certain actions happen. The middleware monitors action types and triggers “epics”, similar to the sagas in redux-saga.

The side-effect approach is powerful because it decouples the action (usually dispatched in your view) from the effects of the action in your application. It allows us to support behaviors like: “When a user navigates to screen A, update the page title in the browser and load the required data if not already available”. We were able to dramatically reduce the number of requests made to the backend to load data by combining sagas and our normalized redux state.

Although we will not dive into testing details, sagas are straightforward to test because you can easily control what the generator returns at every step. One of the downsides of sagas is that the sources transpiled for generators are hard to read: it turns all the possible states in the generator into a state machine (a big switch statement in a loop).

Backend requests and the spinner problem

Sagas still don’t solve one of our problems: how to make asynchronous requests to the backend? We also need to be able to support loading indicators like our React app did. This means that the application needs to be aware of the current state of a request (not started, started, completed, error).

To make the actual request, we replaced our jquery calls with the fetch api. It uses native promises, so we had to do a little plumbing to make the saga dispatch actions to describe the state of the request. The end result is a function similar to the one described in this GitHub issue. This abstraction is important to reduce the boilerplate code.

It takes a promise-based api function, the name of the action to dispatch, and arguments to make the request:

yield* apiCall(getAssignmentAnalytics, API_GET_ASSIGNMENT_ANALYTICS, assignmentId)

The sequence of actions that happens when an instructor opens an assignment analytics screen:

@@INIT
@@router/LOCATION_CHANGE
OPEN_COURSE
@ReduxToastr/toastr/ADD
OPEN_ASSIGNMENT
@kn.api.init/API_GET_ASSIGNMENT_ANALYTICS
@kn.api.init/API_GET_QUESTION_COUNT
@kn.api.success/API_GET_ASSIGNMENT_ANALYTICS
@kn.api.success/API_GET_QUESTION_COUNT

These @ prefixes are used as namespaces for the actions. It is not part of any standard, but this practice is now common in redux libraries.

If the analytics for this assignment were already loaded by a previous screen, the saga will skip the call, and the screen will load faster.

The loading indicator problem is now solved. The reducers can look for the api init action (@kn.api.init/API_GET_ASSIGNMENT_ANALYTICS) to flip an isLoading property in the state. When the success or error action is dispatched, the property is reset to false and the spinner is hidden.

You can keep waiting, it won't go away

A lot of work for a loading indicator

Redux now

After a year using Redux we can draw a couple conclusions:

Things we’re happy about:

– We can share the state across components thanks to the one global state, instead of having multiple local component states.
– It provides a strong structure for our file hierarchy, and has opinions on how to do things the Right Way.
– Components are decoupled from the state logic, which makes them easier to write and reuse
– Testing stateless components and pure reducer functions is a breeze. Since we started using redux, our test coverage has expanded dramatically

Things we still need to work on:

– Sanitizing api payloads is still a hard task, and finding the right compromise between multiple entity specific endpoints and bloated api payloads is still a problem. GraphQL offers a good alternative to Redux, but the cost of migrating existing REST services is high.
– We don’t use selectors and memoized selectors enough (see reselect), it should be the standard way of accessing the state

Some downsides:

– A lot more files need to be maintained, and there is an overhead for some features that would have been trivial to implement with vanilla react
– There is a learning curve for React, Redux and Redux-saga. The good news is that they each have solid documentation, a lot of blog articles, videos, and tutorials to help get started.
– Library churn is perceived negatively. React and Redux libraries have very short development cycles and their APIs are deprecated quickly. Updating our react-router files for the third time in less than 2 years can be discouraging, but the upside is that bugs get fixed quickly and the APIs are always improving.

The javascript landscape changes quickly, and we engineers tend to sometimes blindly adopt the new hot frameworks. If you are starting a new react project, think about whether you really need a state management library, or if Redux is the best for you. Is GraphQL more suited? Are React states enough? Do you favor fast but potentially dirty iterations or a more stable and structured architecture at a higher cost?

In our case, redux was a good solution, it proved easy enough to update an existing code base without changing the backend endpoints, and we were able to make it coexist with the old system while we updated.

What's this? You're reading N choose K, the Knewton tech blog. We're crafting the Knewton Adaptive Learning Platform that uses data from millions of students to continuously personalize the presentation of educational content according to learners' needs. Sound interesting? We're hiring.