Network Based Experimentation

Network Based Experimentation

At Facebook, we routinely work with very large data sets and perform lots of experiments to test products, features and educational and marketing materials.

We constantly run experiments: from simple A|B tests to advanced set ups involving network partition.

A network experiment is an experiment that takes into account the structure of the underlying network on which an experiment is being performed. We suspect, or otherwise know, before even running our experiment that a user's reaction to a condition in the experiment may depend on other users' assignments to conditions and therefore want to incorporate that knowledge into our experimental procedures.

This dependence of your response on other users is often referred to as "network effects".

A recent campaign: “India Your Facebook, Your Rules” that showcased ads and videos to empower women to take control of their posts and photos went viral to the point that a large percentage of the "control" group was exposed (we held out a group of users of similar characteristics temporarily to be able to measure the effect of the campaign).

Network Clusters

A typical experiment randomly selects a group of users (from a common and homogeneous pool) that dot not received the experience being tested. This experience is very often a new product feature, or a marketing message.

However, randomizing users does not account for the network structure as heavily connected users can be assigned to different groups.

Our method consists of randomly assigning network clusters, sets of users that are tightly connected in the social graph, to experimental conditions. This random assignment of clusters makes it more likely that a user and their set of friends will end up in the same experimental condition, and therefore are "closer" to the situations of interest where everyone is in the same experimental condition.

In general, this type of assignment is expected to reduce the bias caused by network interference.

However as is generally the case with bias reduction techniques in statistics and machine learning, this comes with a corresponding cost in terms of more variance (i.e. less precision).

Network Partitioning

At Facebook, we use the open source Giraph iterative graph processing framework, built on top of Apache Hadoop. (http://giraph.apache.org/intro.html)

The group that supports it provides both applications and infrastructure for developing and running large-scale, parallel and iterative applications at Facebook scale (i.e. billions of users and trillions of edges).

The Facebook's 1.5 billion user base gets partitioned into a smaller number of clusters (currently less than 100K); this recursive problem is solved by iterating on a high performance memory grid.

There are many applications for (offline) processing of large graphs. A classic example is PageRank which computes a score for web pages based on links between pages. Another example is clustering a very large graph, with each cluster containing "similar" nodes.  Such large scale graph analytic applications are extremely compute intensive.

The Experiment

Once the clusters are calculated, we perform randomization at this level and further balance on behavioral metrics (such as # days active in the past 28 or L28); that way we make sure we get the network isolation to avoid 'contamination' by sharing and also reduce variance.

Attributing causality to the treatment becomes much easier this way.

To view or add a comment, sign in

More articles by Mario A. Vinasco

  • The way I see the scandal Facebook is caught on

    Two or three years ago, researchers created an app and attracted around 270 thousand people; these people could have…

  • Experimentación en Redes Sociales

    Este artículo es un pequeña introduccion en español a la versión en Inglés. En Facebook, constantemente estamos…

    1 Comment
  • The day I realized I work for Facebook

    Recently, facebook announced a new experience that lets you create personalized video cards for your friends on the…

    2 Comments
  • What I learned from Simulation and applied to Marketing Analytics

    In the world of information sciences, there are Emulations and Simulations. Emulation means a perfect and exact…

  • Tableau Tips&Tricks - Multi Pass Metrics

    Tableau has been gaining a lot of momentum and developers are in high demand. In this post I want to share how I did a…

  • Predicting eCommerce Buying

    Predicting eCommerce Buying - “My Synthesis on Demystifying Logistic Regression” When I present in conferences about…

    1 Comment

Others also viewed

Explore content categories