How Uber compresses 86% of incoming data to decrease required storage space by 30 times?

Jignesh Solanki

Published Mar 15, 2017

Uber is the face of startup disruption, challenging every startup growth stories that we have seen so far. But they are also well known for their engineering endeavors, conducting millions of trips each day, flowing terabytes of data through their platform.

Now, that’s massive amount of data!

Let’s assume, each trip sends 20KB of data in a JSON format. So, if you calculate, Uber requires 20GB of storage just for 1 Million trips, which is the number of trips Uber makes in a day.

Let’s take a benchmark of 30TB and see when Uber will run of this storage space:

Let’s says if they were conducting 1 Million monthly trips, it would be take them 3 years to consume 30TB
Now, let’s say if they take 4 Million trips in a month, they will consume 30TB in just 11 months
Increasing the limit to a mid-level successful location-based startup, let’s say if Uber takes 10 million trips in a month, they will consume 30TB in just under 4 months.

Let’s have a look at location based taxi startups and how many rides they conduct over a month: Lyft, Uber and Curb

The graph below well illustrates how quickly Uber will consume 30TB of data

But, Guess what?

Uber has 45 million active riders and they often take more than 70 million trips in a month. If we do the math, 30TB won’t even last a month at this scale. With their massive growth, minimizing storage space was a priority for Uber.

Uber decided to use algorithms to optimize the raw JSON files. The objective was to compress data without sacrificing the performance, and reduce the encoding and decoding time to increase system efficiency.

By leveraging 10 encoding protocols( including Hrift, Protocol Buffers, Avro, MessagePack among others) and 3 compression libraries (Snappy, zlib, Bzip2), they were able to reduce the 20KB files to 2,822 bytes by a huge 86% margin.

This not only saved space but significantly reduces the data processing time, now doing the calculations, 30TB of storage would last over 30 years compared to just one year.

Keep in mind that the compatibility of these libraries and protocols will change with depending on the database and server you choose.

Bottom Line,

Uber’s approach was based on testing and working together with various protocols and libraries to build a customized solution. With high-volume, high-velocity data coming in, it becomes extremely important for location-based startups to place data compression strategy and optimize data storage.

What has been your experience dealing with data-heavy applications?

To view or add a comment, sign in

How Uber compresses 86% of incoming data to decrease required storage space by 30 times?

Jignesh Solanki

More articles by Jignesh Solanki

Others also viewed

$10M ARR. 2 months. This Startup Is Writing the New AI Playbook

Build Boring Startups

5 STARTUPS THAT ARE OWNING THE BIG DATA GAME

AWS GenAI Loft: Developer Community Mixer in Bengaluru - Tech Panel and Gameday

Can Product Intelligence help create the Google Maps for product data?

#101 - The DevTools Weekly Roundup

💸 The Billing Dilemma for AI Agents

AI Startups Are Growing 3x Faster Than SaaS. Payments Haven’t Caught Up.

Scaling Smart: Harnessing Developer AI Tools for Efficient Growth from Bootstrapped SaaS Ventures to Midsize SaaS Companies

Explore content categories

More articles by Jignesh Solanki

Why these organizations migrated to MongoDB?

“Works just fine” isn’t scalable

Enterprise Mobility focus shifts to ROI, but how would you measure it?

Why complex software systems often fail?

Serving an experience first generation

What you need to know about the OAuth breach?

Consumer-Focused-IoT Part 2: IoT and Consumer Privacy

Consumer Focused IoT: Context, Feedback and Brand Consumer Interactions

Could Tor be the future of IoT Security?