Big Data
What is big data?
Big data is a combination of structured, semi-structured and unstructured data that organizations collect, analyze and mine for information and insights. It's used in machine learning projects, predictive modeling and other advanced analytics applications.
Systems that process and store big data have become a common component of data management architectures in organizations. They're combined with tools that support big data analytics uses. Big data is often characterized by the three V's:
The large volume of data in many environments.
The wide variety of data types frequently stored in big data systems.
The high velocity at which the data is generated, collected and processed.
Doug Lany first identified these three V's of big data in 2001 when he was an analyst at consulting firm Meta Group Inc. Gartner popularized them after it acquired Meta Group in 2005. More recently, several other V's have been added to different descriptions of big data, including veracity, value and variability.
Although big data doesn't equate to any specific volume of data, big data deployments often involve terabytes, petabytes and even exabytes of data points created and collected over time.
Why is big data important and how is it used?
Big data is a combination of structured, semi-structured and unstructured data that organizations collect, analyze and mine for information and insights. It's used in machine learning projects, predictive modeling and other advanced analytics applications.
Systems that process and store big data have become a common component of data management architectures in organizations. They're combined with tools that support big data analytics uses. Big data is often characterized by the three V's:
The large volume of data in many environments.
The wide variety of data types frequently stored in big data systems.
The high velocity at which the data is generated, collected and processed.
Doug Lany first identified these three V's of big data in 2001 when he was an analyst at consulting firm Meta Group Inc. Gartner popularized them after it acquired Meta Group in 2005. More recently, several other V's have been added to different descriptions of big data, including veracity, value and variability.
Although big data doesn't equate to any specific volume of data, big data deployments often involve terabytes, petabytes and even exabytes of data points created and collected over time.
Why is big data important and how is it used?
Companies use big data in their systems to improve operational efficiency, provide better customer service, create personalized marketing campaigns and take other actions that can increase revenue and profits. Businesses that use big data effectively hold a potential competitive advantage over those that don't because they're able to make faster and more informed business decisions.
For example, big data provides valuable insights into customers that companies can use to refine their marketing, advertising and promotions to increase customer engagement and conversion rates. Both historical and real-time data can be analyzed to assess the evolving preferences of consumers or corporate buyers, enabling businesses to become more responsive to customer wants and needs.
Medical researchers use big data to identify disease signs and risk factors. Doctors use it to help diagnose illnesses and medical conditions in patients. In addition, a combination of data from electronic health records, social media sites, the web and other sources gives healthcare organizations and government agencies up-to-date information on infectious disease threats and outbreaks.
Recommended by LinkedIn
Here are some more examples of how organizations in various industries use big data:
Big data helps oil and gas companies identify potential drilling locations and monitor pipeline operations. Likewise, utilities use it to track electrical grids.
Financial services firms use big data systems for risk management and real-time analysis of market data.
Manufacturers and transportation companies rely on big data to manage their supply chains and optimize delivery routes.
Government agencies use bug data for emergency response, crime prevention and smart city initiatives.
How big data analytics works
To get valid and relevant results from big data analytics applications, data scientists and other data analysts must have a detailed understanding of the available data and a sense of what they're looking for in it. That makes data preparation a crucial first step in the analytics process. It includes profiling, cleansing, validation and transformation of data sets,
Once the data has been gathered and prepared for analysis, various data science and advanced analytics disciplines can be applied to run different applications, using tools that provide big data analytics features and capabilities. Those disciplines include machine learning and its deep learning subset, predictive modeling, data mining, statistical analysis, streaming analytics and text mining.
Using customer data as an example, the different branches of analytics that can be done with sets of big data include the following:
Comparative analysis. This examines customer behavior metrics and real-time customer engagement to compare a company's products, services and branding with those of its competitors.
Social media listening. This analyzes what people are saying on social media about a business or product, which can help identify potential problems and target audiences for marketing campaigns.
Marketing analytics. This provides information that can be used to improve marketing campaigns and promotional offers for products, services and business initiatives.
Sentiment analysis. All the data that's gathered on customer experience can be analyzed to reveal how they feel about a company or brand, customer satisfaction levels, potential issues and how customer service could be improved.
The future of big data
A number of emerging technologies are likely to affect how big data is collected and used. The following tech trends will have the most influence on big data's future:
AI and machine learning analysis. Large data sets are getting larger and thereby less efficiently analyzed by human eyes. AI and machine learning algorithms are becoming key to performing large-scale analyses and even preliminary tasks, such as data set cleansing and preprocessing. Automated machine learning tools are likely to be helpful in this area.
Improved storage with increased capacity. Cloud storage capabilities are continually improving. Data lakes and warehouses, which can be either on-premises or in the cloud, are attractive options for storing big data.
Emphasis on governance. Data governance and regulations will become more comprehensive and commonplace as the amount of data in use increases, requiring more effort to safeguard and regulate it.
Quantum computing. Although less known than AI, quantum computing can also expedite big data analyses with improved processing power. It's in its early stages of development and only available to large enterprises with access to extensive resources.