What is Data Mining?

What is Data Mining?

Data Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related - also known as "big data") in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. The process of data mining consists of three stages: 

Stage 1: Exploration. This stage usually starts with data preparation which may involve cleaning data, data transformations, selecting subsets of records and - in case of data sets with large numbers of variables ("fields") - performing some preliminary feature selection operations to bring the number of variables to a manageable range (depending on the statistical methods which are being considered). Then, depending on the nature of the analytic problem, this first stage of the process of data mining may involve anywhere between a simple choice of straightforward predictors for a regression model, to elaborate exploratory analyses using a wide variety of graphical and statistical methods in order to identify the most relevant variables and determine the complexity and/or the general nature of models that can be taken into account in the next stage.

Stage 2: Model building and validation. This stage involves considering various models and choosing the best one based on their predictive performance (i.e., explaining the variability in question and producing stable results across samples). This may sound like a simple operation, but in fact, it sometimes involves a very elaborate process. There are a variety of techniques developed to achieve that goal - many of which are based on so-called "competitive evaluation of models," that is, applying different models to the same data set and then comparing their performance to choose the best. These techniques - which are often considered the core of predictive data mining - include: Bagging (Voting, Averaging), Boosting, Stacking (Stacked Generalizations), and Meta-Learning.

Stage 3: Deployment. That final stage involves using the model selected as best in the previous stage and applying it to new data in order to generate predictions or estimates of the expected outcome.

The concept of Data Mining is becoming increasingly popular as a business information management tool where it is expected to reveal knowledge structures that can guide decisions in conditions of limited certainty. Recently, there has been increased interest in developing new analytic techniques specifically designed to address the issues relevant to business Data Mining (e.g., Classification Trees), but Data Mining is still based on the conceptual principles of statistics including the traditional Exploratory Data Analysis (EDA) and modeling and it shares with them both some components of its general approaches and specific techniques.

Data Mining is often considered to be "a blend of statistics, AI (artificial intelligence), and database research".

reference source: http://documents.software.dell.com/statistics/textbook/data-mining-techniques

To view or add a comment, sign in

More articles by Wittaya (Jojo) Pornpatcharapong

  • IT Job Opportunity (Urgently)

    ** ประกาศตามล่าเพื่อนร่วมงานหลายตำแหน่ง ** 1. Data Scientist (พัฒนา Machine Learning, AI) 2.

  • Wittaya Pornpatcharapong (Jojo)
  • We are hiring a DevOps Manager.

    Required Position: DevOps Manager Job Description: • Lead and contribute to the DevOps team of machine learning…

  • KPIs for E-Mail Marketing Campaign Tracking

    E-mail is one of the most powerful online marketing channels which drives traffic and conversion rate. To manage and…

  • Social Network Analysis with Degree of Separation

    If you'd like to know who is connected in his social network and how far or how much strength of connection each one is…

  • What is a Data Scientist?

    Now the era of Big Data has arisen, driven by the increasing availability of data from multiple sources such as social…

    2 Comments
  • "Big Data" is not more important than "Big Brains"

    In a new era, many companies' success or failure might depend on big data implementation and application. However, do…

    1 Comment
  • Keyword Research Guidelines

    SEO | Look for the keywords that are providing them with the most traffic PPC/SEM | Look for the keywords that are…

  • Sample Common Analytical Problems in Banks and Credit Card Companies

    How do I identify profitable customer from very big database and streaming logs created every seconds? What is our…

    3 Comments
  • Sample Common Analytical Problems in Retailers

    How do I stock the products in order to maximize my profitability? How do I market to my customers to maximize my…

    2 Comments

Others also viewed

Explore content categories