Churn Modelling Part 1:
Time Series Data

“Time flows, not you.” – Kenneth Slesser

Despite possessing many of the characteristics typical to more traditional machine learning classification problems, churn modelling (and time series analysis in general) presents several complexities that makes it a far more involved affair than working with data that is independent of time.

In this blog series I walk through many of the conceptual aspects of churn modelling, based on my experience implementing such a model in my workplace. It assumes the reader has some familiarity with applying predictive analytics in a work or project setting.

The series is split up as follows:

  • Part 1 is a general overview of time series and churn modelling.
  • Part 2 dives into the specifics of window and period selection.
  • Part 3 is concerned with the complexities around constructing the model itself, such as feature selection and how to effectively measure for success.

If you’re already comfortable with the fundamentals of time series data, I recommend you move straight on to Part 2 where I really start diving into the nitty-gritty.

What is Churn Modelling Exactly?

Churn modelling, in the context of ML, is a type of classification problem that involving predicting the likelihood of a user churning from a particular service, product, or feature. Since many businesses are built around the acquisition and retention of customers, churn modelling is often among the first, and most impactful, types of predictive analysis that a company will perform.

One distinct aspect of churn modelling is its use of time series data. Unlike static data, where the features do not change that much (if at all) once you have obtained them, time series data is fundamentally dynamic in nature, and may require the data scientist to rerun their model multiple times on the same individual fully expecting different results.

Differences in Data Types

The previous section might seem abstract, so let’s make it concrete. Consider the following image. What do you think it represents?

Photo by Manja Vitolic on Unsplash

Ten points if you guessed a cat! Now, consider a ML model that was tasked with identifying this creature out of a database filled with pictures of all sorts of other animals. It would likely break down various components of the image as features to make its final assessment. This could include:

  • Ears
  • Fur
  • Whiskers
  • Paws
  • The domestic setting

If well trained, the model will ultimately classify the animal in this image as a cat. And, barring any Schrödinger-related malarky, this representation of a living, breathing cat will continue to be a representation of a living, breathing cat forever onward.

Now, let’s pivot away from cats for a moment and imagine we’re looking at a randomly selected individual who is subscribed to a B2C company [1]. Let’s call him Bob.

Photo by Irene Strong on Unsplash

We want to ascertain if Bob will continue to be subscribed to our product in two weeks’ time. What information do we have of Bob? Well, he…

  • Actively used the service twelve times in the last week
  • Gave the product 5/5 stars in a random survey request the previous month
  • Has been subscribed for six months now

From the above information, our model might classify Bob as a “safe” user, based on information we have about what a “safe” user looks like. Sure enough, two weeks pass with nary a single churning Bob.

So far so good. But that is not the end of the story.

Have a look at the previous features for Bob. What do you notice about them that’s different from the features we broke down in the picture of the cat?

  • Actively used the service twelve times in the last week
  • Recently gave the product 5/5 stars in a random survey request the previous month
  • Has been subscribed for six months now

The answer is that for Bob, we are dealing with time series data! While the cat’s ears in the picture will always be cat’s ears, we cannot reasonably expect Bob’s use of the service in the past week to remain twelve forever. And sure enough, if we reassess Bob’s status in six months’ time, we find that he…

  • Has not used the service in the last week
  • Ignored our past three attempts for a random requested review
  • Has been subscribed for twelve months now

Well now, the situation looks mighty different from before, doesn’t it! [2]

Whether the above scenario leads to a prediction that Bob will churn or not isn’t as important as understanding that rerunning the model on the same individual may result in a different classification depending on when we do so.

This, then, is the nature of time series data with applications in churn modelling.

What are the Implications of This?

Time series data cannot be dealt with in quite the same way as purely static data, both in construction and intent.

Very often, your model may have to be run on the same individual at different points in time with the full expectation that the resulting classification may be different. This is particularly true for churn modelling; the continual analysis of churn likelihood is essential to obtain information in advance of the event occurring. Classifying a user today as “unlikely to churn” does not necessitate that they’ll be classified the same way tomorrow.

In the immortal words of Pink Floyd, “Here today, gone tomorrow”

Similarly, a single observation (say, a user) may have many, many pieces of information generated against them over time. Wrangling this data into a simple format that is useful for classification is often the most laborious and critical part of the time series modelling process.

Do you use the last months’ worth of data? The last two months? Do you include the number of emails they opened as a feature worth considering? If so, how does one account for the fact that we didn’t have marketing emails a year ago?

Finally, one of the most important things that needs to be done for time series data is window selection. Window selection refers to establishing appropriate time periods around the observation, both in terms of the period we choose to extract features from, as well as how far in advance we want to make the prediction. Because window selection is both extremely important, as well as quite complex, this serves as a natural place for us to segue to Part 2 of this series.


[1] B2C = Business to customer. Examples might include Netflix, a telecom provider, or a magazine subscription service.

[2] We call this scenario ‘the impermanence of time’, and there’s a great overview of it here.