Is RFM still king? A data science evaluation


Predicting and preventing customer churn has a strong impact on the success of e-commerce businesses. Many businesses understand the importance of churn and engineer a predictive model to analyse and identify churning users. There are various flavors to define user churn in e-commerce. One commonly used definition is “the probability that a user will cease buying from an e-commerce business in the future.”

However, not all businesses have the resources to build, tune, and run a churn prediction model in a live production environment. Predicting accurately on a user level is hard, and constructing marketing campaigns to leverage these predictions is harder still.

The Data Science team at Retention Science implemented a generalized end-to-end customer churn prediction framework that has been applied to businesses across a wide variety of industry verticals. Different algorithms for churn prediction are present in this framework, and the best performing one is chosen for a specific business. It’s important to know that each model is trained only on the business’ own data leading to different configurations of the same approach.

This article evaluates the performance of 3 very well-documented user churn prediction methods:

  1. Basic RFM model
  2. Clustering based on RFM transactional and engagement features
  3. Classification based on Order, engagement and demographic features

We deploy these three approaches to the data for 85 selected e-commerce companies that partner with Retention Science and evaluate the results for the same.

Related content: Using predictions to get your customers to buy again


Model Definitions

Decile RFM Model

The classical RFM model is the most frequently adopted churn segmentation technique which comprises three measures: recency, frequency and monetary value. These are combined into a three-digit RFM cell code, covering 10 equal deciles (10% group). Among the three RFM measures, recency is often regarded as the most important one. Frequency and monetary could be considered as functions of each other and add a secondary dimension to rank users.

The simple hypothesis of this approach is that “People who are recently and frequently engaged with your business” will possibly buy again (have not churned) from your business. In our implementation we rank users based on these deciles and assign uniform probability scores to each unique rank-order. This can be a considered as an unsupervised and rule based algorithm. In practice, it performs really well if the above hypothesis is observed.

Clustering based on RFM features

It’s interesting to note that the RFM method has evolved from its original formulation. There are more than 50 different flavors of the RFM model [1]. Some of them are as simple as rank ordering by recency of purchase, and some are very complex involving clustering and assigning churn probabilities based on the R, F and M values.

In our improvisation we not only obtain RFM values of each user’s transaction, but also of their engagement behavior (app/website/email). Once these features are obtained we assign users to clusters (using traditional clustering algorithms) and assign probabilities of churn to each cluster.

The advantage it has over the classic decile RFM model is that we can incorporate R, F information related to the browsing behavior. Clustering gives you a better notion of vector space quantization than a naive decile ranking.

Binary Classification based on Multi Modal Features

With the advent of machine learning tools and libraries, the binary classification approach to predict churn has become the state-of-the-art solution. It is capable of sifting through any number of user features and can reveal the important one in our task of predicting churn (through feature ranking and selection). Some pros and cons of the RFM model are outlined in Table 1 below.

For the feature set, similar recency, frequency, and monetary inputs for retail transactions can be used similar to the RFM model. Additional features based on online browsing behavior and user demographics are included.

Since this is a supervised learning algorithm, it requires the modeler to annotate churn labels for users. This is tricky and usually involves going back in time and labeling users as churn/not churn based on whether they made a transaction in the hold out future time.

Now that we have labels and features, this becomes a well-understood machine learning problem where we can use powerful classification techniques like boosting/ensembling (maybe deep learning too!) to learn the rules for classification. For our classifier we trained an ensemble model by combining a Random Forest and a Logistic Regression classifier. Experiments showed that this ensemble was the most predictive.



We present experimental results when comparing the three methods for 85 businesses. For all models, the data was kept exactly the same for each business. A separate model was trained for each business which were evaluated by comparing the predictions and observations on a future time period. Re-iterating the definition “the probability that a user will cease buying from an e-commerce business in the future”, we define the term future (observing time) to be three months in this case. Similar results were obtained for six months and twelve months, but have not been shown in an effort to keep this blog consumable.

We look at the following two metrics to determine which model performs better:

  1. Area under the curve for the Receiver Operating Characteristic (AUC-ROC)
  2. Log Loss Value




The X axis of these graphs are Area under the Curve for ROC.
The Y axis is the 85 businesses arranged in increasing AuC order (of RFM in Fig 1.1 and 1.2, and of Classifier in Fig 1.3)

For almost all businesses, both clustering and classification methods perform better than RFM. (Fig 1.1 and 1.2). The average AUC-ROC observed for the three different methods were:

A 10.22% increase in AUC-ROC is observed using the Clustering method and a 12.43% increase is observed using the classifier method over the basic RFM method.

The comparison of the classifier approach to the clustering approach is much more interesting (see Fig 1.3). Out of the 85 businesses depicted, the classifier has better discrimination power in 65 cases. Also, the AUC-ROC is higher by an average of ~2% for the classifier approach.


  1. The traditional naive RFM method does poorly (but not too bad) compared to the clustering method.
  2. The one advantage of the clustering method is that it can incorporate browsing behavior which is indicative of future purchases
  3. In cases where the belief that recent users are not churning doesn’t hold, the RFM and the clustering method severely suffers
  4. The classification approach does not work effectively for a really small number of use cases. One prominent case is when there is high class imbalance of churners-vs-non churners

METRIC : Log Loss

Log Loss quantifies the accuracy of a classifier by penalizing false classifications. Minimizing the Log Loss is basically equivalent to maximizing the accuracy of the classifier.
In order to calculate Log Loss the churn prediction approach must assign a probability to each class rather than simply yielding the most likely class. To some extent this tells us how reliable the probabilistic approach to the model is. It’s obvious that lower the metric, better the prediction.

It goes without saying, that there is no notion of probabilities for the RFM and Clustering approaches. The ranking scores are merely converted into some pseudo-probabilities. Below, we compare the 3 methods on this evaluation criteria.

The RFM and the clustering approach both perform poorly when considering the Log Loss metric. The Clustering approach is slightly better.

This metric is where the Classification approach really shines. Probabilistically it is extremely powerful as compared to the other 2 methods.

For the 85 business considered, the average log-loss are as follows:



So which one are you going to choose, the RFM approach, Clustering approach or the Classifier? As is often the case, which approach is best depends on your specific needs, use cases, and engineering resources.

If your business follows the RFM principle, that classical simple method still works really well. However, with the advent of machine learning tools, you can be much more predictive and truly identify churning customers in advance with more advanced methods, as shown with the studies above, with lift in prediction accuracy in the 10-12% range. Using these models to better understand your customer cohorts, and predict which users are at risk of leaving your business, will save you lost revenue year round. Add some automation to these predictions, and you can save customers without lifting a finger. That seems like “king” to us.

[1] Wei, J.T., Lin, S.Y. and Wu, H.H., 2010. A review of the application of RFM model. African Journal of Business Management, 4(19), p.4199.

Related content: 5 Email Marketing Tips to Improve Customer Retention


About the author

Shubham Gupta is a Data Scientist at Retention Science. He has over 7 years of experience solving challenging problems working at the intersection of Software Engineering and Data Science at Google and Snapchat. He obtained his Masters in Mathematics from the University of Waterloo and he loves applying his knowledge to solve real-world problems.


Not a customer yet? Let us give you a demo.

Check out our free AI guides:










Like what you read? Share the knowledge!

ReSci is a team of marketers and data scientists on a mission to democratize AI. We make powerful recommendations and predictions accessible to brands. Find out how we can help you connect with your customers.