Feature Engineering: A Closer Look, Part 2 [RS Labs]

At Retention Science, we want to capture all sorts of variability in customer behavior in order to model behavior such as calculating purchase probability, predicting customer lifetime values, and optimizing which discounts are most appropriate for which customers. Once we acquire the raw data from our clients, we derive thousands of relevant features from that data. For instance, with user data, we’re able to derive their average order value, location, age, which items users browsed recently, and so forth. These features are then fed into our models to generate predictions and analyses, which in turn power our dashboard visualizations and marketing automation.

Feature Engineering In Practice

One of the core models we produce here at RS predicts whether someone will remain a customer or not. This is called churn. At first glance, the basic user level features that might feed this model involve age, gender, location, and order value.

While conventional attributes such as gender and location do work well, we wondered if we’d be able to add some extra predictive power by using attributes such as the mobile device someone uses or the operating system of their computer. Our hypothesis was that these deeper features might reflect some latent buying behavior in an audience.

To find out, we ran a feature engineering task to gather user device information. The device information is passed into our models as a “user agent” string that represents the device category (PC, Smartphone, Tablet, etc.), the operating system (Android, iOS, Linux, etc.), the manufacturer of the device (Apple, Google, etc.), and a few other relevant attributes. In total, for this study, we gathered 12.2M such features from 7.8M user sessions (e.g., separate interactions with a website), across 17 sample clients.

So Did the Features Help?

Given these new features, our next task was to evaluate their predictive performance. As we’ve discussed before, Retention Science has a churn model that makes predictions on a user level. Because of this, our evaluation involved plugging these new device features into our current model, and comparing what the old model predicted about the user to what the new model would predict about the user. The goal was to see if the new model would more accurately predict whether someone will remain a customer or not.

Our main measure of performance for this test is Area Under the Curve (AUC), which basically measures how well a predictive model discriminates between two classes: In our case, staying as a customer or not. Therefore, to evaluate our new feature, we ran our old model on a number of random days to assuage any date influence, and measured the average increase in AUC between the model without the device features (the “old” model) and one with the new device features (the “new” model).

On average, the device-level features increased our model’s performance by 7.5%, which is a great improvement. Therefore, we were able to conclude that there appears to be some latent signal in customer behavior based on the devices they are using to interact with our clients.

Since there was an interesting positive effect on our model, we decided to investigate further. We looked into the features of one of our existing clients with a userbase of 2.5M customers, as well as the device features those users represent. We wanted to see if we could uncover any new insights that would be of value to our client — and also because it was an interesting question, feature engineering-wise.

Closer look: User Distribution Based on Device Category and OS

Screen Shot 2017-05-18 at 10.54.27 AM

The figure above, which details what we found out about our client, shows that only 30% of the customers are using PCs. Clearly, this indicates that mobile- and tablet-based marketing are crucial to this client. And since we’ve determined that device signals are a factor in determining churn, this is an important piece of information. Further, a large number of customers use an Android or iOS device, so having a decent application for both might be useful.

Closer look: AOV Distribution Based on User Device and User Device OS

Screen Shot 2017-05-18 at 10.54.48 AM

Digging deeper, this feature can add richness not only to churn models but also to models of customer behavior. The two graphs above break down those 2.5M users by average order value (AOV) and rank them (left to right) accordingly. It is clear from the graph that the most valuable customers are those who purchase from a Mac product, and in particular, the iPad (tablet). Further, Mac users (either OSX or iOS) spend significantly more than those on Android devices. So it follows that there might be clear value in segmenting offers based on this feature.

This article has covered quite a bit on the topic of feature engineering, but in many ways, it’s only touched the surface. There is a rich literature on automatically selecting the best features, creating derivative features (e.g. composing features together) and even sampling the feature space down into something sparser (e.g. representing your many complex features in fewer dimensions mathematically).

But hopefully we’ve helped you gained a greater appreciation of what feature engineering is, where it fits in, and how to do it — and more importantly, what kind of value it brings.


About the Author

Avi Sanadhya is a Data Scientist at Retention Science. His interest lies in solving real-world problems at the intersection of Big Data and Machine Learning. He is pursuing his Masters in Data Science from the University of Southern California and received his B.E in Computer Science from Manipal University.