Business Intelligence, Knowledge Discovery, and Data Mining
How Understanding the Business Problem Relates to Knowledge Discovery
Understanding the business problem is essential to knowledge discovery. There are two items with respect to KD that require that the business problem be understood. These are business analytics and model building. According to Sabherwal and Becerra-Fernandez, domains that have historically used KD include: marketing, banking, target manufacturing, insurance, telecom, operations management, retail sales forecasting, system diagnostics, and, most recently, crime fighting. These are a very diverse set of domains that are not always alike or similar. It is also a tough task to build models and do business analytics for each of these domains. Businesses that want to follow business analytics strategies must embed business analytics into their most important business processes. When it comes to the building of models, it is essential to understand the business to be able to model it. If we know very little about the business and we build a model, then the output of that model will also be of little value. This concept is called the garbage-in-garbage-out (GIGO) principle. Thus, we must know about the business to be able to build a model that will yield results that are useful and valuable. A good model must be built upon the framework of good historical data. KD and data mining require an understanding of the domain and an awareness of the potential variables that could influence the outcome. The results and outcomes of business analytics and models must be integrated with actions.
Barriers to the Use of Data Mining
There are, indeed, barriers to the use of data mining. The factors driving data mining include: exploding data volumes, increasing decision complexity, need for quick reflexes, and technological progress. Data volumes have become an issue in recent time due to inexpensive data storage capabilities. With more data being stored, it has become increasingly hard to mine that data. The mining of data requires substantial computer power and access to memory to perform complex mathematical calculations. This is a barrier to the use of data mining. Increased decision complexity is a barrier because it requires that data span industries and countries and must also incorporate structured and unstructured data. Building data mining tools that are applicable across the entire spectrum of needs and corporate cultures is indeed a barrier to data mining. Unstructured data brings about a whole different set of requirements for algorithms to mine the data. This is a barrier to data mining. The need for quick reflexes is a barrier to data mining. Having quick reflexes that allow data mining to respond efficiently and effectively and in changing environments is a barrier to overcome. Technological progress is a barrier to data mining. Since more powerful computing power has become increasingly accessible and less expensive, the requirements for data mining tools have also become increasingly complex. This is a barrier to data mining.
With respect to the data mining process itself, the steps in the data mining process are also barriers to data mining since they are time consuming, iterative, and are not exactly easy. The first step, understanding the business, is tough but essential. This step requires a strong knowledge of the particular business organization as well as its strategy and goals. These are hard to know without being directly within the business organization. Understanding the data is another hard step in the process. The data is not always structured and is sometimes hard to describe or explore. Preparing the data is yet another hard task since the data must be selected, cleaned and filtered, formatted, and integrated. The building of models is an art unto itself and is extremely complex. This requires mathematically talented individuals to build the model plus there is a great need for historical data. Evaluating the results of the model is also challenging since it takes an analytical minded individual who understands model output and can link the results to the business areas. There are two common errors made in the evaluation of statistical data. These are referred to as type 1 and type 2 errors. A type 1 error happens when the evaluator surmises a “no” when it should have been a “yes”, for example. Type 2 error happens when the evaluator surmises a “yes” when it should have been a “no”. These errors must be overcome. The model must also be validated which, in itself, is not an easy endeavor. Finally, deployment of a data mining tool is a challenging task. The deployment must be planned well in advance and the target of the deployment must be known from the very start of the data mining project build. All of these things are barriers and challenges to data mining projects.
Pitfalls to Avoid for Data Mining
The eight pitfalls that should be avoided when implementing DM applications are:
Recommended by LinkedIn
Many DM vendors make grand presentations promising “the world”. We should avoid being overly impressed by vendor’s claims and slick outcomes. These are far cries from actual working implementations. Instead, work with the vendor to use actual company data to get around this. This will give a better idea of what can actually be produced using the DM tool. Putting the right tool in the wrong hand is also a very inefficient use of the tool. We should endeavor to get the right tool in the right hands by selecting the user who is capable and prepared for the use of the tool. This user may just be the manager who is the one who needs the output data. Presenting the data to the right individuals who will use the data is essential. Reports are typically generated for managers by employees at the lower levels. Getting the data to the manager who needs it and then having managers who will act upon that data is key. Organizations should appoint someone responsible for ensuring the mechanisms are in place to address findings and utilized data for actions and business decisions. The training of users is also important. Training must occur in the beginning and all along the way. This is because the DM user will become increasingly versed in the tool and will need to expand. Often, new versions of the software that provide increased and more complex capabilities will require new training. Organizations should not always go for the quick win. Going for the quick win all too often means doing what is easy. This typically translates to building a DM tool for the quick and easy task rather than building a tool that is valuable across the enterprise. Organizations should take the time to carefully design the system needed at the enterprise level. Going for the big bang is also a common pitfall. Building the grandiose big bang project may not always be the right path. Instead, building the DM tool in an iterative and evolving approach may be wiser. It may also allow the project to be more easily funded in increments rather than all up-front. Data quality is essential for a good DM process. Data stewards and appropriate governance structure must be implemented into the process to keep the data quality high and bring about the processes necessary to keep it that way. Finally, the organization must demonstrate the value of buying, implementing, and using the DM tool. If the value is not demonstrated, then the project could be deemed a failure and loss. To be deemed a success, the DM tool must bring forth data and reports that will help manage the company. The decisions made from the reports generated from the DM tool must be actionable and bring about better business performance.
References:
Sabherwal, R., & Becerra-Fernandez, I. (2011). Business Intelligence: Practices, technologies and management. New Jersey: John Wiley & Sons, Inc.
Williams, Steve and Williams, Nancy (2011). The Profit Impact of Business Intelligence. New York: Morgan Kaufmann Publishers.
AUTHOR BIO: Dr. Charles D. Madewell holds both Bachelor and Master of Science in Engineering from the University of Alabama in Huntsville, Alabama and a Doctor of Computer Science from Colorado Technical University in Colorado Springs, Colorado. His articles focus on topics important to computer science and information technology.