About Me

My photo
An experienced Data Professional with experience in Data Science and Data Engineering interested in the intersection of Machine Learning and Engineering.

Most Recent Post


Retention Model Usage

What can we do with a Retention Model?

Data teams commonly build a retention model, or even a couple. It seems like a great idea. What members are likely to leave, and which are likely to stay? There are obviously a ton of use cases that could be derived from this.

I'm going to run through a couple of common use cases, and two pitfalls that can arise from retention models.

As a prerequisite, retention models only model what the company's definition of retention is. This a complex question, but well worth digging into before the model is constructed. This definition can have meaningful impacts on interpretation of the scores. Is the model "will a new member retain after 30 days?" or "will any member retention after 6 months?". This initial definition guides the interpretation of the model and guides it's usage.

Retention or Churn model?

Essentially both are the same, just inverted. Retention models how likely members are to stay and Churn models how likely members are to leave. As a matter of preference, I would recommend building a retention model as the developer will be discussing this model endlessly.

Would you rather talk about members staying or about members leaving? Personally I find it depressing to spend a chunk of my life thinking about the more negative aspects of churn, but I find it delightful to discuss all the positive aspects of retention.


Here are a few examples of projects I've seen come up in industry. All these projects can be hard to implement as businesses have existing processes, and methods and can be hesitant to change. Go slow, iterate in tiny chunks and do what you can to align with higher level projects and goals.

All these projects require,

  • Raw numeric scores from the Retention Classification problem to generated on member periodically (usually nightly or weekly)
  • AB testing to determine the effect of using the scores


This is overwhelmingly the most popular project I've seen discussed. Companies often spend $$$ sending out coupons and discounts and a lot of that is a waste. There are members who are highly likely to stay anyway who don't need a coupon, and there are members who are highly likely to leave and the coupon is used right before they cancel.

Businesses can use retention scores to better target the members in the "middle". Those that might cancel and might not.

Often the scores are split into deciles so that top decile can be safely considered as retaining, and the bottom decile can be safely considered to be churning. The middle 8 deciles then can targeted with coupons.

This allows for a couple of nice knobs and dials. A campaign can balance between the folks targeted and the budget of the offer. AB testing can be used to measure the impact of any particular strategy and allows for optimization of the strategy.


Let's let marketing run 5 campaigns over a week, which one is better?

How could this be measured? Number of members signed up is one way, another is the quality of the members signed up. Enter a retention model. With the retention scores a business can now compare a marketing campaign that signs up 10,000 members that intend to leave quickly with a marketing campaign that signs up 1500 members that really seem to like the service.

Having the retention score can help a business balance the goals of acquiring and keeping members.

Membership Count Prediction

It's a lot easier to predict the number of members you will have in 6 months if you have an idea of how many existing members are likely to quit in the near-term, and how many are likely to stay.

A straight time series analysis can work well here too, but coupling that with some data from a simulated retention model can make something more convincing. Maybe it will be more accurate too!



Endless analysis with no use of the model. Sometimes business folks are incredibly concerned with the change that this model can present to their processes. They will want to know more about the retention scores, and more, and more until 6 months has passed and the scores are still not being used.

It's important to have some guardrails around this analysis vs implementation as the business has good questions, and it's tempting to want to answer them all, but then the scores can take a long time to start providing value. Instead consider using AB Testing as the safety net in-lieu of not having all the questions about the scores answered.

AB Testing

A lack of AB Testing infrastructure will cause the effect of the scores impact on business metrics to be unknown. Why would a business pay for an in-house model if the impact is attributed to other sources.

On the plus side here, setting up good AB Testing infrastructure can immediately help any process without any retention scores or modeling at all.


Retention scores are often desired by a business, but it's important to have commitments on how they will be used before the model is constructed.

Often a super simple v1 can provide a lot of value with little effort in modeling as long as the AB Testing system is well understood.

Go! Build a retention model! It's usually pretty easy to convince your manager and your business partners, but be sure to do some diligence on measurement beforehand.