The Landscape of Big Data: Cloud Computing

The Landscape of Big Data: Cloud Computing

Living in the mountains near Taipei, Taiwan, flat-bottomed and puffy cumulus clouds sometimes visited our lawn. Hanging and floating there,  they had no edges or ends. Today's cloud architectures offer their terrabytes and elasticity, enabling bigger data accumulations. Like their vaporous cousins, today's clouds share amorphous edges but vary in their physical characteristics. Let's look at some of the technical and financial realities that tether today's data clouds to the ground.

Types of Clouds

One of the chief benefits of clouds is that they provide access to resources that would be expensive and difficult to operate in-house. Here's a short list of the three cloud types.

  1. Storage and computing environments (Amazon's Simple Storage Service (S3); Google Compute Engine; Windows Azure; and Rackspace, etc),
  2. Hosted Services (Amazon Relational Database (RDS), Google Big Query, etc.), and
  3. Vertical Applications (Salesforce, Splunk, Tableau, etc.)

In all of the above cloud types, it is possible to imagine some of the potential optimizations that a cloud environment can provide. First, they make it easy to access reliable distributed storage across multiple sites. If you were doing this in-house, you would have to be concerned with synchronization across sites -- not a trivial matter based on the size of the data being synchronized and the mission criticality of accessing the most up-to-date versions of the data. Second, cloud storage is cost effective and efficient, and offers an elastic environment that expands when needed, allowing you to pay for what you use vs. buying more physical storage in your own facility.

Cloud drawbacks include security and privacy concerns, the time and expense of importing or exporting data from a cloud environment and the dreaded "Lock-in" which makes it difficult for you to readily transfer the digital assets to another environment or integrate with another environment.

In spite of the dreaded lock-in, large cloud providers deliver financial, technical, and optimization advantages due to the scale of their operations. Not only can they buy in volume and at a lower cost, but they can make use of the available resources in non-peak times for internal workloads. Parallel workloads dramatically increase speed of computation. For example, they can distribute the workload to 100 servers vs. having one server crank away for 100 hours.

The NIST Definition of the aaS's

Yes, if you really want to geek out, click the header above to visit the National Institutes of Standards and Technology (NIST) Definition of Cloud Computing, where you can read about the differences between  Infrastructure, Software, and Platform "as a Service" as well as get a great reference resource to all of the different elements that make up cloud computing.

With all of the aaS offerings, you avoid buying a room of servers. A quick and dirty way to think of each offering is:
Infrastructure: You have control of the operating systems, storage, and deployed applications.
Software: You have access to software applications, and the files created using the software.
Platform: You deploy onto the cloud applications created using programming languages and tools from the provider. You control the configration of the applications but no hardware.

There are two other aspects of cloud offerings that may impact your selection. First, you will want to choose between a public or a private cloud. Public clouds are shared by multiple tenants, and some of the well known public cloud offerings include Google Compute Engine, Amazon, and Window's Azure. Private clouds allocate a set of dedicated resources for one organization and can be hosted both on and off your own premises. RackSpace allows both options.

The second aspect to consider is whether these providers have open or proprietary interfaces. Some of the standrard "open interfaces" today are mySQL, Hadoop, and MapReduce. In the proprietary category, there are new types of computation being offered. In the proprietary category, you will find products like Amazon's DynamoDB, a key value store that provides high performance for distributed storage for reading and writing small objects.  It is based on Dynamo, which is Amazon's internal technology for storing web session data. Caveat emptor! If you do decide to use DynamoDB, your interface will not be able to talk to other databases. Amazon is not alone in this approach. Our friends at Google offer Google Big Query, a service for launching SQL computations on Google's internal infrastructure. Its APIs are not known for being compatible with the other databases either.

Conclusion:
The purpose of laying out the basic characteristics of cloud computing is to ensure that those envisioning or provisioning big data solutions become familiar with these terms, and understand the trade-offs of the various offerings. These fundamental characteristics lay the foundation for next week's discussion on some of the security challenges of cloud computing as well as some potential remedies and responses. While today's clouds may seem limitless in many ways, it is important to understand their limitations, so you can appropriately house your own big data for the performance attributes that support your mission.

 

 

To view or add a comment, sign in

More articles by Reed MacMillan

  • Possibilities

    When I was 13, I complained to my dad about my two problems: 1) I was bored and 2) I didn't have money. When he came…

    10 Comments
  • Thoughts on How to Use Imagin...AI...tion

    As you draw closed the curtains of 2025 and peer at the shiny baby New Year of 2026, you may be pondering what is next…

    1 Comment
  • The Clock's Ticking

    IT modernization principally falls into several categories with which most of the denizens of LinkedIn are very…

  • Tipping Points and Groupthink

    In 2000, Malcom Gladwell published Tipping Point, a book that delved into how ideas gain traction and become viral. I…

    5 Comments
  • Teamwork

    From an early age, I learned that teams are fun. At the age of seven, I joined the swim team at our local pool.

    2 Comments
  • MIT Women's Conference Recap

    Last Thursday, I headed up to Cambridge, MA to participate in the MIT Women's Conference. I set my GPS address and…

    4 Comments
  • Holly Jolly Data Dog Blog

    Do you hear the sleigh bells ringing? Ring-ting-tingling? Are you walking on city sidewalks dressed in holiday style?…

    1 Comment
  • Data Dog Blog

    Data. Data.

    6 Comments
  • Political Animals: Watch Out! The Macroeconomic Elephant in the Room: How Trade Wars Depress the Economy

    Last Friday, the macroeconomic elephant in the room stomped on our microeconomic realities. Senator Lindsey Graham’s…

    7 Comments
  • 52 Candles of Business – Bake these tips into your cake!

    Happy Birthday to Me! The cake is getting crowded…but here’s a list of what I think you need to succeed in business…

    3 Comments

Others also viewed

Explore content categories