Female, Male, or Neutral? Filtering Based on Gender


In predictive recommendation systems, sometimes events will occur that appear counterintuitive. As a marketer, you may notice some male users in your customer list receiving recommendations for some women’s skin care products. Sometimes, this is simply because those male users were shopping for their mom, wife, girlfriend, daughter, etc, and their browsing history indicated this. Other times, those male users may have purchased a health food product with some correlation to those female skin care products, and the models picked up on those similarities. This is not necessarily bad, as truly predictive systems will often pick up hidden signals that are impossible to uncover with generic marketing tools.

However, some businesses will have specific reasons to not show female items to male users, and vice versa. And this is a perfect way for businesses to add their own domain knowledge to complement and improve a recommendation system. To solve this use case, we implemented a gender-match filter in Redicto. This blog describes the approach and details a recent improvement to our gender-tagging process.

Female, Male, or Neutral?

Our gender-match filter requires that we first tag users, with our best guess at their gender preference, and items, with our best guess at their gender-specificity. For both, we assign either Male, Female, or Neutral.

Inferring User Genders

For some of our clients, users can provide their gender during account signup, but this information is often omitted. On the other hand, a user’s first name is usually a required field during account signup, and so is most often present. In these cases, we can infer their gender using population statistics. There are many public datasets which can inform us of the gender distribution for common names: for example, the US Social Security Administration maintains actuarial tables with population counts by name, birth year, and gender (Five Thirty Eight has a nice explanation of how this dataset can be used to infer age.)

Inferring Item Genders

The second step is to infer the gender-specificity of the items. The first way we did this is using natural language processing (read: counting keywords). By looking at product names and other metadata from a variety of sources, we have built up a set of keywords (see Fig 1.1) which are highly indicative of male-specific and female-specific items. If a product contains one or more of these keywords, and they are not contradictory, then we can tag that product as for Males or Females. Otherwise, as with ambiguous users, we tag ambiguous products as Neutral.

Screen Shot 2017-08-22 at 12.28.58 PM

Fig 1.1 Sample of top keywords for item gender tagging, determined from pre-tagged sources of gender-targeted product names

Going Further

Recently, we noticed an issue with a clothing e-commerce retailer where our keyword-based item gender tagging was not sufficient. They have both men’s and women’s clothing, but these categories were not given in our data so we were inferring gender mostly from the item names. For some items (dresses, ties, etc) this worked fine, but for others there were no gender-specific clues. In fact, for some items, the names were exactly the same—such as a generic “White V-Neck Tee” which came in both a men’s and women’s variant.

To solve this problem, we looked at the genders of the users who purchased these items. True, in some cases users may purchase items which are not marketed for their gender—for example, they may be buying a gift, or be using a partner’s account, or simply like a product regardless of the manufacturer’s intended audience. However, on the whole, the numbers match our expectations. For the women’s v-neck tee, 91% of buyers with known (or inferred) gender were female. For the men’s v-neck tee, 90% were male.

Using this kind of data-driven approach results in much more accurate tagging compared to making assumptions based on metadata. The primary drawback of this method is handling new products, which won’t have generated enough purchases to determine whether a gender-specificity exists. A secondary issue is handling clients with imbalanced users bases (skewed toward female or male users). Our approach had to handle these cases in a general way across multiple e-commerce clients.

Creating a Gender Bias Score

Our implementation aims to generate a single gender bias score for each product which can be used by our gender-match recommendation filter.

A naive approach to formulate a Gender Score S could be:

S = (M – F) / (M + F)
F is the number of female users who have affinity to the item, and
M is the number of male users who have affinity to the item

The score ranges from -1 to +1, where -1 means 100% of the buyers were female, and +1 means 100% were male. If an equal number of males and females bought the item, that implies a score of 0.

After placing products on this scale, we can choose a suitable threshold to obtain our gender match filter.

For example, if a client wants a strong filter, we can set a threshold of 0.9: if S < -0.9, then the item is female-specific; if S > 0.9, then it is male-specific; otherwise it’s neutral.

Screen Shot 2017-08-22 at 12.29.22 PM

Fig 1.1. Examples of female, male and neutral items along with the gender score

Smoothing with a Pseudo-Count

Unfortunately, this score would be very noisy for items with very few purchases. If there is a new item with only one purchase by a female user, then it would get a score of -1, but we certainly don’t have enough evidence to assume it is a female-specific item. This is a good use case for a pseudo-count. We can pretend that all items have been bought, say, 10 times—by 5 male and 5 female users—which will give the scores some inertia that has to be disproven by evidence:

S = (M – F) / (M + F + Pseudo-count)

Increasing the pseudo-count drives scores toward zero, so choosing a value depends on the overall user population and what a “typical” product’s purchase count is:

Screen Shot 2017-08-22 at 12.29.43 PM

The distribution of the Item Gender Score for different choices of pseudo-counts

Accounting for an Unbalanced User Base

This would work if a client has similar numbers of female and male users. But if a site has 10000 female and 50 male users, then the scores will all be heavily skewed toward female users. Put another way, if an item is purchased by 50 women and 50 men, that means it was purchased by 0.5% of the women but 100% of the men, and is very likely a male-specific item. In this case, we can add a scale factor to give each male more weight:

S = (A * M – F) / (A * M + F + Pseudo-count)

Where A is that scaling factor, calculated as the ratio of female users to male users in the entire user base. In our example above, A would be 10000 / 50 = 200.

In the case of our clothing retailer, the user base happened to be quite balanced. The scaling factor was A = 80% (about 4 women for every 5 men). This resulted in shifting the scores very slightly to the left:

Screen Shot 2017-08-22 at 12.28.32 PM

For this client, adding the balance factor to give male and female users equal total weight shifted the scores to the left.

Why a Filter?

It would be reasonable to ask: why can’t the recommendation model just take care of the filtering itself? There are many algorithms that can incorporate attributes about a user, such as their gender, in making predictions. Ideally these should learn that male users tend not to buy female-use items, and vice-versa. However, this doesn’t give us any control over how conservative or liberal the model is regarding gender matching. Moreover, we want the freedom to choose different models for different use cases. For example, do the recommendations need to be made as real time responses to onsite actions, or can they wait for batch processing? Does the client want to emphasize item categorizations, or user-specific behavior? As such, many of the models we use do not explicitly use the user attributes. Adding gender-based filtering as a post-processing step gets past this hurdle and gives us fine-grained control to ensure that any or all of our recommendations avoid surprising users with items that don’t match their gender.

Usage and Next Steps

This would only be useful for clients that have products that are intended for both male and female buyers. If a client’s products are targeted only at females, then any male users would of course be using the site to purchase those items and would not need their results filtered. But in that case, we should see very few items with gender skews anyway.

We run this gender-match filter by default for all of our clients. Generally, we see 2-10% of our product recommendations removed by this filter, allowing other, more relevant items to be recommended instead.

For the next step in this feature, we can use the same method to improve our inferences about the gender preferences of users. That is, we can look at whether users purchase male-specific or female-specific items, rather than just assuming their gender based on their name. This raises an interesting problem because the item gender tagging depends on the user gender tagging, and adding this step would introduce a cyclical dependency. An iterative approach like Expectation-Maximization should work well.


Making sure a user’s recommendations are appropriate for their gender requires knowing their gender preference as well as the gender skew of all products in the catalogue. In the absence of explicit metadata, we can infer them. Textual analysis is a convenient and reliable method for doing so, but it makes many assumptions that may not be accurate for individual users or products. Here we’ve shown some details into how we perform this for identifying gender-specific items, allowing our models the flexibility to support business specific use cases.

About the Authors

Eric Doi is a data scientist at Retention Science.  His goal is to improve every day, just like gradient boosted learners.  He studied Computer Science at UC San Diego and Harvey Mudd College.

Kai Wang is a data scientist at Retention Science.

Vedant Dhandhania is a Machine Learning Engineer at Retention Science. He helps predict customer behavior using advanced machine learning algorithms. His passion lies at the intersection of Signal Processing and Deep Learning.