Example: LinkedIn has launched its first version of the People You May Know Feature. How would you isolate the impact of the algorithm behind it w/o considering the UI change effect?



 

Whenever you launch a first version of a data product, i.e. a new product powered by machine learning, you are making a lot of changes on the site. Let’s consider the People You May Know Feature. The first time it was launched, it implied adding to the user newsfeed a new box with clickable links. That new box with additional links by itself has high chances of moving the target metric, regardless of how good is the algorithm used to suggest people.
 

It is therefore hard to understand in which proportion the metric change is driven by the algorithm behind the new feature vs the UI changes needed to accommodate the new feature.
 

In these cases, you need to test each component separately. After all, the whole point of A/B testing is to isolate the effect of just one change. A way to exactly isolate the two components is to run 3 versions of the site at the same time:
 

The difference between version 2 and 3 will tell you the gains coming only from the model.
 

This approach is risky though cause users in version 3 might lose faith in the feature, and once users decide a new feature is bad, it is really hard to make them change their mind.
 

A milder approach would be to replace random suggestions with something super basic, which machine learning should easily beat, but that still makes sense. For instance, you can use a history-based model (suggest users whose profiles were visited in the past by that user) or simply suggest users with the highest number of shared connections. These versions will still give a baseline to compare your model to, without giving users in one test group the impression that your new feature is very stupid.



Btw this case study can be seen as another example of a recurring topic of this course: you run an A/B test not only to find a winner, but also to gain clean insights. Version 3 in this example is not expected to win, but it is crucial to give us a better understanding of the new feature.


 

Full course in Product Data Science