Solving Cold Start Product Recommendations in e-Commerce

Recommendation algorithms have evolved in the last decade to provide a personalized experience to every shopper. In most cases, these algorithms rely on some behavioral (implicit) or transactional (explicit) inputs of the user. Pure collaborative filtering methods and matrix factorization methods are well known techniques to perform recommendations that have shown high returns.

However, new incoming users usually experience the “cold start problem”, defined as a data-profile that is too shallow to provide enough signals for a prediction. The above methods usually don’t help much with these constraints. Having no preference information or implicit signals from the user can be a tough problem to solve. In this blog post, we surface a technique to provide predictive personalization when a “cold” user just signs up (these techniques already come out of the box for all the recommendation schemes in Cortex).


A user is defined as “cold start” if they DO NOT have the following:

  • Transaction history
  • Browsing history
  • Ratings or any sort of explicit feedback on any item
  • Interaction with an item
  • Their email, name, and OR IP address maybe known

To understand the severity of the problem, we examined data from 100 e-commerce companies. The percentage of users who qualified as cold start was surprisingly extremely high (see Fig 1). It’s concerning to see that most e-commerce companies had more than 45% of their user base as cold start. The reasons were probably due to email capture forms optimizing for conversions (less fields), and just simply not enough site activity from users before they sign up.

Screen Shot 2017-11-07 at 10.08.27 AM

[Fig 1.1 Number of companies Vs % of cold start users]

A naïve approach to solving this problem would be to recommend the best sellers to cold users. On the surface it seems like a simple rule for its strong mathematical basis. After all, the prior probabilities on the conversion rates on these items are pretty high. Even more, weekly bestseller lists could potentially identify what’ is hot.

The above naïve approach works well in certain cases, especially when the number of products is smaller and there isn’t much diversity in the user personas. However, when these conditions aren’t met, we can do much better. Intricate algorithms can turn one group of shoppers’’ past habits into custom recommendations for these new users.


Most solutions rely on associating the new user with a cohort among a set of pre-defined ones. There is a need to predict some information about the new user that can help connect him or her to a cohort. If a user can be profiled in some way, we can leverage prior information about a profile and exploit them in the form of a product recommendation.

Generally, e-commerce sites collect this information at minimum from user signups:

  1. An email address and / or a name
  2. IP addresses (captured from the browser)

Email addresses can be a good predictor of the first name of a user, which in turn can be leveraged as a gender predictor. While ReSci has a proprietary gender prediction algorithm, there exists 3rd party data sources that can also achieve similar results.

IP Address:
IPs are predictor variables of the location of the user along with their state, zip code and country. The Zip Code can also be helpful in determining the buying power or median income of the user.

Fig 1.2 shows how the above 2 pieces of information are tied together to produce a rich low vector space representation of a user. A look up table is constantly updated to mine top products based on each profile.

The look up table could look like this for a sporting goods client:


ID Gender State Zip Buying Power Top Items
12345 Male CA 90405 Low Rapid Hiking shoes, Protein powder, …


09876 Female CA 90020 High V neck summer top, Sports Bra, …




Screen Shot 2017-11-07 at 10.08.43 AM

[Fig 1.2 Overview of a real time cold start recommender]

Rec Look up Table:

It’s important that our look up table is weighted according to recency. This enables users to see trending items. Background workers mine transactional data and behavioral data to A/B test different definitions of top items look ups.

Some ideas to explore:

  • Top Items by revenue per profile
  • Top Items by click through rate per profile
  • Top Items by AOV per profile


So the approach looks good in theory, but what about the impact baselined against using top sellers. Using our AB testing platform, the above algorithm was tested against that baseline. The results were pretty encouraging, leading to ~19% average increase in click to open rates and ~16% increase in conversion rates (see Table 1.1).

Screen Shot 2017-11-07 at 11.33.58 AM

Table 1.1. Comparing the Cold start algorithm with a baseline of using top sellers

About the Author

Vedant Dhandhania is a Machine Learning Engineer at Retention Science. He helps predict customer behavior using advanced machine learning algorithms. His passion lies at the intersection of Signal Processing and Deep Learning.