6 Steps in Data Gathering for Process Analytics

It is said that a journey of a thousand miles begins with the first step. The first step in analysis is to gather data. Without data, all that is available is opinions. What data should you collect?

1.     As much raw data as you can get your hands on

Gather as much of the raw data as possible. This could be an experience of “drinking from a firehose”, but it is always easier to cut or remove data than it is to go for data that just doesn’t exist. To reduce the feeling of being overwhelmed from too much data, only gather the “raw” data values. Many control systems will have calculated values based on raw values. To reduce the amount of data to go through, only get the raw data. If something needs to be calculated, calculate it during the analysis re-using the control system calculations (and verifying the calculations at the same time).

2.     Actual values

Gathering the “actual” values means avoid getting “averaged” values as these can hide problems. If averaging is required, do the averaging during the analysis phase. Averaging is a valuable data analysis technique, but can also mask issues that can be corrected. Also, be aware of data compression. This is a technique with process data historians and control systems to reduce data, disk space and network traffic. Unfortunately, data compression can sometimes hide what is actually happening. For example, a customer was collecting data on a pH meter. pH values go from 0 to 14. The data collection system was set up to compress data, and would only store changes in a value more than 1. A change of 1 in pH is VERY significant! Important analysis information was being lost in compression.

3.     High-frequency data

While doing work in office building air conditioning, it takes 20-minutes for the impact of a temperature change to be measured in a room. Sub-second data collection isn’t necessary. However, if you are working on analyzing electronic problems on drive motors, 5-minutes is an eternity! A good rule is that data should be collected at least 4-times as fast as the problem you are analyzing. It is better to get higher frequency data as you can create formulas to trim out the data fairly easily. Creating accurate data isn’t very accurate.

4.     High resolution data

The pH example above could also be utilized to understand resolution. If the pH measurement is only accurate to a change is pH of 1, and your process is highly sensitive to changes in the range of +/- 0.1 in pH, then the pH meter resolution isn’t good enough. Don’t confuse the numeric value with instrument accuracy. A "C" language double data type could hold a value of 7.000000000000002, but the meter may only be accurate to within +/- 1 of pH. The value on the display may seem highly accurate when the instrument isn’t that accurate.

5.     Wide ranging data

If you are analyzing a dryer operations, only capturing data when the dryer is at one load, and the incoming product is stable doesn’t provide much information. Gather data through as many of the various operations as possible. It is also a good idea to get data during transition times as well as when the unit is stable.

6.     Repeatable data

Many times a single-event could be attributed to multiple issues. While doing data analysis on a drying application, there was a product that was rarely run. So, we determined the correct gas-flow setting for the specific product. Unfortunately, the data wasn’t repeated to verify the findings, it was later discovered that at the time we collected the data that the incoming product being dryed was much different than usual. Therefore, had we collected several instances of this particular product grade, we would have realized that one of those instances was an anomaly.

Gathering good, high quality data that is repeatable makes analysis much easier it also allows you to defend decisions made.

To view or add a comment, sign in

More articles by Keith Smith

Others also viewed

Explore content categories