Network Based Experimentation

Mario A. Vinasco

Published Oct 12, 2015

At Facebook, we routinely work with very large data sets and perform lots of experiments to test products, features and educational and marketing materials.

We constantly run experiments: from simple A|B tests to advanced set ups involving network partition.

A network experiment is an experiment that takes into account the structure of the underlying network on which an experiment is being performed. We suspect, or otherwise know, before even running our experiment that a user's reaction to a condition in the experiment may depend on other users' assignments to conditions and therefore want to incorporate that knowledge into our experimental procedures.

This dependence of your response on other users is often referred to as "network effects".

A recent campaign: “India Your Facebook, Your Rules” that showcased ads and videos to empower women to take control of their posts and photos went viral to the point that a large percentage of the "control" group was exposed (we held out a group of users of similar characteristics temporarily to be able to measure the effect of the campaign).

Network Clusters

A typical experiment randomly selects a group of users (from a common and homogeneous pool) that dot not received the experience being tested. This experience is very often a new product feature, or a marketing message.

However, randomizing users does not account for the network structure as heavily connected users can be assigned to different groups.

Our method consists of randomly assigning network clusters, sets of users that are tightly connected in the social graph, to experimental conditions. This random assignment of clusters makes it more likely that a user and their set of friends will end up in the same experimental condition, and therefore are "closer" to the situations of interest where everyone is in the same experimental condition.

In general, this type of assignment is expected to reduce the bias caused by network interference.

However as is generally the case with bias reduction techniques in statistics and machine learning, this comes with a corresponding cost in terms of more variance (i.e. less precision).

Network Partitioning

At Facebook, we use the open source Giraph iterative graph processing framework, built on top of Apache Hadoop. (http://giraph.apache.org/intro.html)

The group that supports it provides both applications and infrastructure for developing and running large-scale, parallel and iterative applications at Facebook scale (i.e. billions of users and trillions of edges).

The Facebook's 1.5 billion user base gets partitioned into a smaller number of clusters (currently less than 100K); this recursive problem is solved by iterating on a high performance memory grid.

There are many applications for (offline) processing of large graphs. A classic example is PageRank which computes a score for web pages based on links between pages. Another example is clustering a very large graph, with each cluster containing "similar" nodes. Such large scale graph analytic applications are extremely compute intensive.

The Experiment

Once the clusters are calculated, we perform randomization at this level and further balance on behavioral metrics (such as # days active in the past 28 or L28); that way we make sure we get the network isolation to avoid 'contamination' by sharing and also reduce variance.

Attributing causality to the treatment becomes much easier this way.

To view or add a comment, sign in

Network Based Experimentation

Mario A. Vinasco

Network Clusters

Network Partitioning

The Experiment

More articles by Mario A. Vinasco

Others also viewed

Decoding X's Recommendation Mechanism

EDITORIAL: Sora Is Dead. So Is the Myth of OpenAI as a Product Company.

OpenClaw, 82 Days Later: Peter Steinberger Joins OpenAI

Why Google Antigravity is still far from Personal AI (as @DanielMiessler defines it)

Scraping social media

The Walled Garden and the Architect: Why AI Agents Fail at Web Search (And How to Fix It)

Perplexity.ai: Not all search engines are created equal

Embracing a new AI-assisted workflow

Relevance: The Outer Limits

Understanding Relevance

Explore content categories

Network Clusters

Network Partitioning

The Experiment

More articles by Mario A. Vinasco

The way I see the scandal Facebook is caught on

Experimentación en Redes Sociales

The day I realized I work for Facebook

What I learned from Simulation and applied to Marketing Analytics

Tableau Tips&Tricks - Multi Pass Metrics

Predicting eCommerce Buying

Others also viewed

Decoding X's Recommendation Mechanism

EDITORIAL: Sora Is Dead. So Is the Myth of OpenAI as a Product Company.

OpenClaw, 82 Days Later: Peter Steinberger Joins OpenAI

Why Google Antigravity is still far from Personal AI (as @DanielMiessler defines it)

Scraping social media

The Walled Garden and the Architect: Why AI Agents Fail at Web Search (And How to Fix It)

Perplexity.ai: Not all search engines are created equal

Embracing a new AI-assisted workflow

Relevance: The Outer Limits

Understanding Relevance

Explore content categories