Task-1:How Facebook store thousands of tera-bytes of data and managing it.

Task-1:How Facebook store thousands of tera-bytes of data and managing it.

There’s a lot of data stored on Facebook, and a lot of its users’ own content. That content is the most important asset on the service, and users need to believe it’s secure, otherwise they won’t share. Getting storage right is critical – and is helping define how Facebook designs its data centers.

In the early days, says Facebook storage engineer Jeff Qin in a talk at the Storage Visions conference, its storage system grew using standard filers which took 10 I/O operations to save a photo — and wasted several more on directory traversals. While company hack days added features, like photo-tagging, the first real change to the service was the deployment of its own Haystack storage service in 2010. A RAID-6 storage service, with global replication for photographs, Haystack uses a single IOP per photo request.

With photo storage one of the biggest demands on the service, it’s important for Facebook to understand just what users are doing with photos, as how they’re viewed and shared determines how photo storage needs to be designed. It turns out that photograph access cools down quickly, with initial views ten times higher than after 18 months.

How Big Are Facebook’s Server Farms?

As Facebook grows, its data center requirements are growing along with it. The data center Oregon was announced as being 147,000 square feet. But as construction got rolling, the company announced plans to add a second phase to the project, which added another 160,000 square feet, bringing the total size of the campus to 307,000 square feet – larger than two Wal-Mart stores. Last year, Facebook secured permits to build another 487,000-square foot data center in Prineville.

Why the Big Data problem arises ?

There is a huge explosion in the data available. Look back a few years, and compare it with today, and you will see that there has been an exponential increase in the data that enterprises can access. This data exceeds the amount of data that can be stored and computed, as well as retrieved. The challenge is not so much the availability, but the management of this data. With statistics claiming that data would increase 6.6 times the distance between earth and moon by 2020, this is definitely a challenge.

Along with rise in unstructured data, there has also been a rise in the number of data formats. Video, audio, social media, smart device data etc. are just a few to name.

Hadoop?

“Facebook runs the world’s largest Hadoop cluster” says Jay Parikh, Vice President Infrastructure Engineering, Facebook.

Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on Hadoop database, i.e., Apache HBase.

 A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

Thanks for reading this article .

To view or add a comment, sign in

More articles by Deepak Saini

Others also viewed

Explore content categories