Predicting Faulty Bearings using Sensor Data and Machine Learning

Introduction

It is well known that two-dimensional convolutional neural networks are powerful image recognition machine learning techniques. There is another less known convolutional neural network called the one-dimensional convolutional neural network (1D CNN) which has many applications. Some of the applications of 1D CNN are time series forecasting using regression and classification use cases such as faulty component detection in manufacturing machinery, natural language processing (NLP), human activity monitoring, patient specific ECG classification, structural health monitoring and anomaly detection in power electronic circuitry. (Kiranyaz, 2019).

In this paper, the focus will be classification of faulty components using sensor data in manufacturing machinery. Feature engineering is the heart of the classic approach to signal processing and machine learning using features extracted from the output of algorithms such as the Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT). The contemporary approach will use 1D CNN and raw signals as input to the 1D CNN model. There is no need to input features to the 1D CNN; the model will automatically define features (feature maps) through backpropogation. Both approaches will be used in this paper and the results compared. Finally, the classic and contemporary approaches will be combined by extracting features from 1D CNN, FFT and DWT and then use conventional classification algorithms to predict faults.

Manufacturing is undergoing a major change with Industry 4.0 which refers to the fourth industrial revolution. Industry 4.0 is a trend toward automation and data exchange in manufacturing technologies and processes which includes the cyber-physical system (CPS), the internet of things (IoT), industrial internet of things (IIOT), cloud computing and artificial intelligence (Wikipedia, 2019) which enables the creation of smart factories. In Industry 4.0, interconnected sensor-enabled equipment is required in smart factories and continuously generate data for every aspect of the manufacturing process; data that can be collected and analyzed in real time (IVEDIX, 2017).

Machine Learning is a key component for the predictability of failures of equipment using both raw sensor data and engineered features from raw sensor data in manufacturing. Thus, providing foresight to predictive maintenance procedures before an unscheduled outage occurs. The promise of Industry 4.0 is the enhancement of manufacturing productivity. Sensors will play a major role in the push towards manufacturing automation and the smart factory. This project uses sensor data collected from monitoring ball bearing assemblies.

This project uses sensor accelerometer data to predict failures of industrial equipment. Data was created by the Case Western Reserve University containing normal and faulty bearings. Experiments were conducted using a 2 HP Reliance Electric motor, and acceleration data was measured at locations near to and remote from the motor bearings using sensors. The following picture shows the machinery.

No alt text provided for this image

Accelerometers were placed on the fan end and drive end of the motor as well as the base part of the motor. Outer raceway faults are stationary faults; therefore, placement of the fault relative to the load zone of the bearing has a direct impact on the vibration response of the motor/bearing system. In order to quantify this effect, experiments were conducted for both fan and drive end bearings with outer raceway faults located at 3 o`clock (directly in the load zone), at 6 o`clock (orthogonal to the load zone), and at 12 o`clock. Vibration signals were collected using a 16 channel DAT recorder, and were post processed in a Matlab format (CaseWestern). Vibration data was collected for both inner and outer raceways for both drive end and fan end parts of the motor.

The following figure shows the components of a SKF bearing assembly.

No alt text provided for this image

This project only used drive end, inner raceway sensor data. Data was collected under normal operation with workloads of 0 HP, 1 HP 2 HP and 3HP applied to the system. Faulty bearings were introduced using a process called electro-discharge machining (ESD). Defective bearings were introduced with .007 inches, .014 inches, .021 inches and .028 inches of defect using ESD and the four workloads were applied to the faulty bearings as well. There are 4 classes of normal operation and 16 classes of faulty bearing operation which makes a total of 20 different classifications of signal data.

Data Acquisition and Exploratory Data Analysis

Sensor data was collected at a rate of 12,000 samples per second and stored in multiple matlab formatted files. Python was used to concatenate all data files into one large array then the data was segmented into 256 samples called a segment. Each segment was then classified as belonging to one of twenty different classes as the following chart shows. This data is for inner raceway, drive end sensor data.

No alt text provided for this image

The following shows what typical normal baseline signals of sample size 256 look like.

No alt text provided for this image

Here are some of the signals of sample size 256 which have faulty bearings represented in sensor data.

No alt text provided for this image

There were 14,234 segments created. All of the 14,234 segments in their raw signal form are used as input to the contemporary approach, and the raw signals were used as input to the feature extraction processes for the classic approach.

The solution not only needs to determine if it is a normal or defective signal, but also it should classify the workload applied. Normal and defective signals should classify workloads as 0 HP, 1 HP, 2 HP or 3HP. In addition, if it is a defective signal, the defective type (.007, .014, .021, .028) should be determined.

The classic method requires domain knowledge for feature definition and extraction. This approach uses the Fast Fourier Transform and the Discrete Wavelet Transform to extract features through a feature engineering process.

Fast Fourier Transform Feature Extraction

The Fast Fourier Transform (FFT) is an implementation of the Discrete Fourier Transform (DFT). FFT converts the time domain samples to a frequency domain. Each of the 256 sized segments are input to FFT. The output of FFT are real and imaginary coefficients for each segment.

The amplitudes, power spectral densities and autocorrelations are inputs to the FFT feature extraction process. The FFT feature extraction process simply extracts the peaks of each of the resulting frequency domain values of amplitudes and power spectral densities. It also extracts the peaks of the autocorrelation wave.

The following is one of the raw signal segments which is input to FFT.

No alt text provided for this image

FFT is invoked and the feature extraction process highlights the peaks of each of our extractors. (Taspinar, Machine Learning with Signal Processing Techniques, 2018)

No alt text provided for this image

Note that many peaks have been detected. Only the top 10 peaks for each plot are retained as features for classification algorithms. The top n peaks approach is a tuning parameter.

Discrete Wavelet Transform Feature Extraction

The Discrete Wavelet Transform (DWT) operates in both the time and frequency domain. This allows for the analysis of a signal in the time and frequency domains. DWT uses concept called a wavelet to analyze signals. A wavelet is a predefined wave which lasts for a small amount of time with mean of zero. There are many families of wavelets. To perform a DWT, you must provide a wavelet as input. Choosing the right wavelet is important, and the best way choose a wavelet is by trial and error.

Some wavelet families are haar, db, sym, coif, bior, rbio, dmey, gaus, mexh, morel, cgau, shan, fbsp and cmore.

The following is a set of sym wavelets. They are shown here because it was found that the “sym2” wavelet performed the best for the bearing sensor data.

No alt text provided for this image

There are two fundamental operations of the DWT, scaling and shifting. Scaling is the process of stretching or shrinking the wavelet as it passes over the signal in time. The process of moving the wavelet over the signal in time is called shifting. Signals typically consist of slowly changing waves with abrupt short-term changes to the waves. (Devleker) It is the abrupt changes that are of specific interest to machine learning as it provides a blueprint of the behavior of the signal at that time. To find these slow and abrupt changes, DWT uses wavelets along with high pass and low pass filter banks. The signal is split into high frequency and low frequency signals at each level of filtering. The following shows how high pass and low pass filtering works.

No alt text provided for this image

DWT is often used in noise reduction as it separates the base signal from the noisy signals. The python function pywt.wavedec() returns the final set of coefficients for the low frequency result and all of the high frequency sets of coefficients at each level in the process. This information is then fed into three functions which calculate entropy, statistical metrics such as percentiles, means and standard deviations and crossings on each of the sets of coefficients. (Taspinar, 2018). The following is the results of running DWT against a input raw signal.

No alt text provided for this image

Each set of the coefficients returned by wavedec() is fed into the feature extraction functions and is then combined into one set of features. 

1D Convolutional Neural Networks and Raw Signals

The benefit of 1D CNN is it does not need the feature engineering step in which domain knowledge is required. The features are created as part of the execution of the model, and the input to the model are the raw signals segmented into 256 samples each. (Zang, 2017) The following represents the architecture of the 1D CNN used for this project.

No alt text provided for this image

Convolution Neural Networks automatically build the features through a process called filtering. The features are tuned through backpropagation. The Convolution/ReLu/Pooling layers can be stacked to create multiple sets of layers. The final set of layers looks as follows:

No alt text provided for this image

Feature Extraction using 1D CNN

1D CNN`s can be used for feature engineering and extraction as well. In this way, a new model can be created by combining classic and contemporary models where features are extracted using FFT, DWT and 1D CNN and used as feature input to XGBoost classification algorithm. For 1D CNN, the neural network is executed as usual; however, the final classification layer with voting is not performed. Instead the flattened layer is extracted as features for input into the XGBoost classification algorithm as follows:

No alt text provided for this image

Once this is done, the 1D CNN features are concatenated with the FFT and DWT features. 

Data Preparation and Execution of Models

The data preparation process for the contemporary approach consists of segmenting the signals into 256 sample segments and performing a train/test split of 70% training and 30% testing. The 70% training was further divided into 70% training and 30% validation. The 1D CNN training process feeds the raw signals directly into the model as follows:

No alt text provided for this image

The test set is then applied to the model. Model accuracy, loss and gain plots, classification report and confusion matrix are shown.

 The data preparation process for the classic approach is much the same as the contemporary approach. However, the raw signals are not fed into the model directly. A feature engineering layer extracts the features for the classification models as follows:

No alt text provided for this image

Each feature set (FFT feature set and DWT feature set) is input to the classification models separately and then the FFT and DWT feature sets are combined and are input to the classification models as one set of features. 

Note there is no need to further subdivide the training input into training and validation as was done in the 1D CNN. Once the features are extracted a series of machine learning algorithms are executed against the model.  Testing data is then predicted as usual. Accuracy, classification reports and confusion matrices are created.

The final model is a combined model of the classic and contemporary models by extracting features from all three techniques, FFT, DWT and 1D CNN. The combined feature set is then used as input to the XGBoost algorithm.

No alt text provided for this image

The Results

Both the contemporary and classic approaches performed well. The contemporary approach outperformed the classic approach with an accuracy score of .94. The classic approach had good scores when the FFT features were combined with the DWT features as one set of features. Combining the contemporary and classis approaches together through feature engineering performed the best overall. 

No alt text provided for this image

The three best scores: FFT & DWT combined engineered features ran in 17 minutes, 1D CNN ran in 23 minutes and combined approach ran in 36 minutes.  I guess the saying `the best things in life are worth waiting for` applies here. I only ran the results using CPUs (central processing units). Using GPUs (graphics processing units), the times would have been much lower.

No alt text provided for this image

The table on the right shows the misclassification summary of the three best runs.

The results showed that the classic and contemporary approaches are both well suited for machine learning using sensor data and combining the two results is an even more powerful approach.

Supporting Jupyter Notebooks

Signal Analysis for Feature Engineering:

https://github.com/paulscheibal/SBDataScienceCert/blob/master/CapstoneP2/Notebooks/SignalAnalysisforFeatureEngineering.ipynb

Feature Engineering with Bearing Sensor Data:

https://github.com/paulscheibal/SBDataScienceCert/blob/master/CapstoneP2/Notebooks/SignalFeatureEngineeringforBearingData.ipynb

Execution of all Machine Learning Algorithms:

https://github.com/paulscheibal/SBDataScienceCert/blob/master/CapstoneP2/Notebooks/SignalMachineLearning_BearingData.ipynb

Bibliography

Ahamed, N. (2015). Bearing basics SKF. Retrieved from https://www.slideshare.net/NaushadAhamed/bearing-basics-skf

CaseWestern. (n.d.). Case Western Reserve University Bearing Data Center Website. Case Western Reserve University. Retrieved from https://csegroups.case.edu/bearingdatacenter/pages/welcome-case-western-reserve-university-bearing-data-center-website

Devleker, K. (n.d.). Understanding Wavlets Parts 1, 2, 3. MathWorks. Retrieved from https://www.mathworks.com/videos/understanding-wavelets-part-3-an-example-application-of-the-discrete-wavelet-transform-121284.html

Goel, D. (2017). What is industry 4.0 and how it increases machine efficiency? Retrieved from https://thingtrax.com/2017/10/05/industry-4-0-increases-machine-efficiency/

IVEDIX. (2017). The Smart Factory. IVEDIX. Retrieved from https://ivedix.com/industry-4-0-sensors-analytics-and-the-smart-factory/

Kiranyaz, S. e. (2019). 1-D Convolutional Neural Networks for Signal Processing Applications. Retrieved from https://ieeexplore.ieee.org/document/8682194

Taspinar, A. (2018). A guide for using Wavelet Transform in Machine Learning. Ahmet Taspinar. Retrieved from http://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/

Taspinar, A. (2018). Machine Learning with Signal Processing Techniques. Retrieved from http://ataspinar.com/2018/04/04/machine-learning-with-signal-processing-techniques/

Tower-Clark, C. (2019). Big Data, AI & IoT Part Two: Driving Industry 4.0 One Step At A Time. Forbes. Retrieved from https://www.forbes.com/sites/charlestowersclark/2019/02/20/big-data-ai-iot-part-two-driving-industry-4-0-one-step-at-a-time/

Wikipedia. (2019). Industry 4.0. Widipedia. Retrieved from https://en.wikipedia.org/wiki/Industry_4.0

Zang, R. e. (2017). Fault Diagnosis from Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5375835/

Special Thanks

I would like to thank Dhiraj Khanna for helping me over the technical hurdles of understanding the Fast Fourier Transform.

I would also like to thank Ahmet Taspinar for the excellent articles and programs describing machine learning related to signal processing and feature engineering using FFT and DWT python functions.

To view or add a comment, sign in

More articles by Paul Scheibal

Others also viewed

Explore content categories