Building Better Recommendation Engines through Redicto [RS Labs]
Product recommendations are a powerful tool for customer retention: they can boost average order value by 50%, increase revenues by 300%, and improve conversion. Product recommendations can often seem straightforward. But recommending items related to previous purchases, taking into consideration certain attributes, may not be as simple as showing Bob something that Ted likes, even though they both may have similar interests (basketball, sports), and live in the same city (Los Angeles). That’s a decent way to start, but what if Bob is a dad and Ted isn’t? Though they both like the Lakers, each are at different stages in life. As a retailer, your great recommendation to Bob of “Lakers-print baby diapers” won’t resonate at all with Ted. This is how naive models can fall flat.
As a business, you have complex logic that determines a good or bad recommendation. We recognize that each user is unique and each business is unique, so it only makes sense that product recommendation engines should be built accordingly.
Today, popular recommender algorithms like collaborative filtering are readily available in easy-to-use packages, but often these algorithms fall short of creating great recommendations for unique business needs. Fancy machine learning toolboxes and third-party providers have made it straightforward to generate recommendations, but frequently, using those recommendations straight out of the box yields poor results.
This rang true for many of our own clients, who found that the recommendations generated from these packaged algorithms were not quite right for their respective businesses. To address this issue, we developed Redicto, a backend system to customize models based on each of our clients’ unique needs. Redicto is a multi-layered modeling approach that allows us to ensemble state-of-the-art recommendation algorithms with knowledge-based models. In short, it helps us build the best product recommendation engine for each of our clients, tailored for their specific use-cases.
What are knowledge-based rules?
Consider these examples:
- As a shoe retailer, you don’t want to recommend the same shoe multiple times in different sizes.
- As a clothing retailer, you probably want to prevent recommending women’s dresses to your male customers.
- As a video platform, you might want to apply certain rules and restrictions around recommendations, based upon the age of the user.
We refer to these business rules as knowledge-based rules. A generic algorithm doesn’t have the domain knowledge to avoid these pitfalls. At Retention Science, we employ our recommendation algorithms across many different industry verticals. We saw the problem firsthand when multiple clients complained of poor results yielded from a well-known recommendation algorithm. We decided to design a flexible system that could be customized to each of our clients’ needs, without becoming a monster for us to maintain and expand ‘ and thus, Redicto was born.
We began by putting together the system requirements.
Our system needed to generate recommendations for hundreds of clients, with an average of 100M+ users per client. It needed to be robust enough to run multiple recommendation algorithms, and then also ensemble these models with custom knowledge-based models, applying rules and logic. These custom rules and logic might involve additional predictive models (e.g. predicted item gender). In addition, the rules may be chained on top of one another in series (e.g. gender-filter –> age-filter –> geographical-filter). Most importantly, we would need it to finish processing all of this within a couple of minutes.
Other fundamental requirements included:
- Emphasis on flexibility, scalability, and reusability
- Generalized approach, so that we could apply to any client dataset
- Standard inputs & outputs, to plug in or plug out different models, extend filters, etc.
- FAST (filtering and selection within a couple minutes)
- Process large data sets AND scale easily (eg. ~100M users X ~1M items X ~100 clients)
- Internal-facing API interface to interact with our application, RS Insights
Design & Prototype
We decided to use Scala for this service. We’re big Spark users for much of our machine learning/data processing stack, so the language was already one of our team’s favorites. Scala also supports functional programming, which is a great way to write efficient and clean data pipelines.
In addition, Spark allows us to start with a small cluster, and scale up if needed. At this stage, we focused on building out a quick prototype to measure performance, speed, scalability, and so forth.
We wanted a system that could run logic over recommendations in parallel, and handle chaining of logic, so that we could linearly apply multiple rules, and do so in a distributed manner.
For our design/prototype phase, we put together a couple specific real-world use cases, including:
- I) Filter out colors – For one client, we have a user-attribute to match every user with a color preference. If a user had a preference for red, for example, their recommendations should only be from the red color category.
- II) Increasing sizes – A user should only be recommended one size of an item, e.g., you shouldn’t receive the same shoe recommendation, in size 10 and size 11.5 (see image below).
Group A shows the recommendations from a standard collaborative filtering recommendation algorithm. Group B shows the output after running through Redicto’s custom knowledge-based filters, specifically tailored for this shoe store.
Initial performance tests produced extremely promising results. We tested a few algorithms and were able to achieve ~5M user predictions, running through 3 chained rules in a matter of 30 seconds on each node in our cluster. Because of Spark’s distributed nature, we can maintain this performance by scaling accordingly as the number of users increases.
This shows the result after applying two domain filters. The number of raw recommendations was 18M, which was reduced to 15M after a color filter was applied, and further reduced to 4M after a region filter was applied.
This shows the result after applying three domain filters. The number of raw recommendations was 19.5M, which was reduced to 8M after a category filter was applied. It was further reduced by only a few hundred after a item-conditional filter was applied (don’t include item A if item B was recommended), and reduced to 7.9M after a item-purchased filter was applied (only include item A if item B was purchased).
Now that the initial version is running nicely in production, what’s next?
Extend Redicto to handle other Retention Science predictions:
We employ many models: purchase probabilities calculation, CLV prediction, incentive optimization, as well as a dozen other predictions. Our goal is to eventually serve any of these predictions internally via Redicto. As we apply custom knowledge-based rules to our predictions, we do the same for all of our other models as well.
Better tools for our data scientists:
The design of Redicto allows us to “plug in” different layers of filtering or rules logic for any client by simply defining a configuration. This does not impact our clients or recommendations directly, but would make our lives easier as data scientists to run different rules and implement new recommendation schemes. A slick UI design is in the works. This will allow our data scientists to quickly apply different variations of recommendation scheme and tune parameters and filters to boost prediction accuracy.
Better tools for better marketers
We talk a lot about personalization at Retention Science, and for good reason ‘ it’s clear that personalization is key in keeping your customers engaged with your brand. Redicto was built with the same principle in mind: just as every customer deserves a customized, tailored shopping experience, every business should have the advantage of marketing tools that have been custom-built to your company’s specifications. Just as each shopper is different, every one of our clients has unique use-cases. We like to practice what we preach; Redicto, we think, is proof of that.
About the Author
Andrew Waage is co-founder and CTO at Retention Science where he drives technology initiatives that bring marketers data science to retain and delight their customers.