Support Vector Machine with Wavelet Decomposition Method for Fault Diagnosis of Tapered Roller Bearings by Modelling Manufacturing Defects

Tapered roller element bearings are generally applied in machines and transmission gearboxes. In manufacturing outer ring, inner ring and the rollers usually suffer damages. It is a challenging task to reveal and classify the defects. This paper presents an efficient method for fault classification by support vector machines. The faults on the bearing parts created by laser beam machine have similar shape and surface topography as the grinding faults from the manufacturing process. Vibration signature is collected by sensitive transducer and high resolution data acquisition unit. A test-rig is constructed to model the circumstances of the operation of the built-in tapered roller bearings. Moreover, test-rig is planned with the aim to mitigate the harmful vibration components from the environment that influence the precision of the vibration measurement. Feature extraction is executed by wavelet decomposition. Decomposition level is determined by FFT considering the structural frequencies of the bearing elements. The proper wavelet is selected by the Energy-to-Shannon Entropy criteria from Daubechies and Symlet wavelet families. The fault classification is done by R Cran software using support vector machine classifiers. Time domain parameters of the vibration signature such as kurtosis, skewness, crest factor and range are provided to the classifier. Classification rates are high enough to ensure the efficiency of the method in all cases in the study.


Introduction
It is a vital task to select the waste bearing parts from manufacturing to ensure safety and high quality level. This paper introduces a method with machine learning for fault detection of tapered roller bearing combined with wavelet decomposition and maximum Energy-to-Shannon entropy method.
In the following we overview some important researches in the field of bearing diagnostics related to the area of this research. Wavelet transform is widely used in technical diagnosis to detect faults of machines and bearings. Support Vector Machine (SVM) offers efficient classification method for machine fault diagnosis and bearing fault diagnosis.
Paya et al. [1] applied artificial neural network for gears and bearings, defects on inner race of bearing and gear tooth irregularity were analyzed with Daubechies4 wavelets with 96% classification rate.
Nikolaou and Antoniadis [2] achieved experiments with rolling element bearings by Daubechies_12, mean and standard deviation of wavelet packet coefficients were used in their experiments.
Prabhakar et al. [3] investigated rolling element bearings, one scratch mark each on inner race and outer race with Daubechies_04 wavelet where RMS and kurtosis of the signal were the inputs for the machine learning system. Saravanan et al. [4] measured gear tooth breakage, gear with crack at root with face wear by Morlet wavelet. Statistical fea-Applicability of SVM is proved in different fields of engineering system analysis. Mankovits et al. [11] executed the optimization of the shape of axi-symmetric rubber bumpers by support vector regression that shows the efficiency of the method in other engineering applications. Vámosi [12] solved a nonlinear classification problem of rubber elements with support vector classification. Manickam [13] applied soft computing methods, back propagation neural network for prediction of shell moulding parameters that showed the efficiency of machine learning methods. Kalácska et al. [14] analyzed the sliding properties of steels on other materials in their research combined with classification. Deák et al. [15] investigated the defect size of tapered roller bearings with wavelet transform by entropy optimization.
The efficiency of SVM classification of tapered roller bearings with manufacturing fault modelling has already not investigated according to the overview of literature.

Background of SVM for fault classification
Support Vector Machine is a classification and regression method which can be interpreted as a transformation to put the lower dimensional data to a higher dimension space. Support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification or regression. The hyperplanes in the higher dimensional space are defined as the set of points whose dot product with a vector in that space is constant. SVM provides non-separable patterns to separated patterns. The xisting failure or incipient failure is getting more identifiable because failure diagnostics is in the higher dimensional space. More important features get higher score of weights, less important ones get smaller values or nearly zero.
For calculating the SVM we see that the goal is to correctly classify the data set { x 1 , …, x n } by the following where y i are the labels of points. To define support vector machines and first linear classifiers, a data point is viewed as a p -dimensional vector and the purpose is to decide whether data set can be separated by a (p − 1) dimensional hyperplane with positive margin. This is called a linear classifier. Best hyperplane should be chosen that represents the largest margin between the two classes. Maximum margin is arg min .
In this case the distance from the hyperplanes to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the maximum-margin hyperplane and the linear classifier it defines is known as a maximum margin classifier. Support Vector machines uses hypthetic space of a linear functions in a high dimensional feature space. SVM can be trained with a learning algorithm from optimization theory.
The hyperplane can be expressed with the use of the support vectors as: where the vector w defines the boundary, x is the input vector of dimension N and b is a scalar threshold. At the margins, where the SVs are located, the equations for classes A and B , respectively, are Good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class, the functional margin. The SVM decision function is an application of the kernel function and Lagrangian optimization method is used to obtain the optimal decision function from the training data [16]. SVM is generally suitable for two-class tasks. As SVs correspond to the extremities of the data for a given class, a decision function can be created to specify whether a given data point belongs to either A or B . This is defined as The optimal hyperplane can be obtained as a solution to the optimization problem: where n is the number of training sets. The solution of the constrained quadratic programming optimization problem can be obtained as In cases where the linear boundary in input spaces is not be able to separate the two classes accurately, a hyperplane is created that allows linear separation in the higher dimension feature space. Vapnik suggested a way to create nonlinear classifiers by applying the kernel trick to maximum-margin hyperplanes.
This is achieved through the use of a transformation Φ , which transforms the data from an N -dimensional input space to Q -dimensional feature space. (2) The function must be continuous and positive definite. The kernel function K (x, y) is defined as The decision function is accordingly modified as The parameters v i are used as weighting factors to determine which of the input vectors are actually support vectors.
the radial basis kernel: the sigmoid kernel: In the non-separable cases a constraint is proposed where parameter C is a penalty constant for those sample points which are mis-separated by the optimal separation plane. The role of C is to strike a proper balance between the calculation complexity and the separating error. For the separable case, C is infinity while for non-separable case, it may be varied, depending on the number of allowable errors in the trained solution: few errors are permitted for high C , while low C allows a higher proportion of errors in the solution. To control the generalization capability of SVM, there are a few free parameters like the limiting term C and the kernel parameters like RBF width σ . In this content with penalty constant C the SV classification is to minimize It leads to a maximization problem that could be solved by using Lagrange multipliers The Sequential Minimal Optimization (SMO) algorithm gives an efficient way of solving the dual problem arising from the derivation of the SVM. SMO decomposes the overall quadratic programming problem into quadratic programming sub-problems.
Basically, in SVM theorem is true that with fewer support vectors the generalization ability is improved. Furthermore, as the decision function is comprised of SVs, having fewer SVs can reduce the computation complexity. The optimal solution of the SVM is achieved by the use of a quadratic optimization problem. The convex property of the formulation makes the solution unique. The SVM utilizes the Lagrangian optimization method to solve this problem.

Maximum Energy-to-Shannon Entropy ratio criteria to wavelet selection
Fault detection procedures based on time-frequency methods usually rely on visual observation of contour plots. It is also known that if the wavelet matches well with the shape of the signal at a specific scale and location a large transform value is obtained. However, a low transform value is obtained if the signal and wavelet do not correlate well. To avoid defects of visual observation a more precise way of determining the best suited wavelet is presented here.
The combination of the energy and Shannon entropy content of the wavelet coefficients of the signal, denoted by Energy to Shannon Entropy ratio is an appropriate indicator to choose the best wavelet for diagnosis and it can be calculated in the following form [6,7]. Seven different wavelets are considered for the present study that could be used for the wavelet decomposition. A comparison was executed between Daubechies and Symlet wavelet families which are basically appropriate for bearing ( fault diagnosis according to the scientific papers of researches. An appropriate base wavelet should extract the maximum amount of energy with minimizing the Shannon entropy of the corresponding wavelet coefficients. Averaged values of the Energy to Shannon Entropy ratios considering the faults on the different bearing parts are in the Table 1. Symlet_08 wavelet provided the highest E / S ratio meaning the best efficiency for wavelet decomposition.

Feature extraction and fault classification by SVM in R Cran software
For this study an experimental test rig (Fig. 1) has been constructed to measure properly the vibration signatures of the tapered roller bearings. The shaft in the test rig is supported by two tapered roller bearings. The one under investigation is No. 30205 tapered roller bearing. Four tapered roller bearings with different manufacturing defect width on the outer race (OR1-4) were investigated in our experiments ( Table 1). The shaft is driven by an alternating current motor of 0.75 kW (made by Cemer), frequency of 50 Hz, and nominal speed of 2770 rpm which is reduced to 1800 rpm with variable speed drive device. Rubber V-belt between the electric engine and the shaft provides smooth running and low vibration which help accurate and precise measurements. Rubber bumpers are installed to reduce vibration of the electric motor to the bearing housing in order to minimize harmful vibrations. Additional acoustic chamber provides the possibility for acoustic measurements. The arrangement realizes the option of different speeds by Schneider ATV32HU22M2 variable speed drive device. Test bearing is spanned by screw mechanism to supply the sufficient axial force to the measurements. Constant spanning force during the measurements is measured by strain gauges in Wheatstonebride mode on the basis of difference in voltage measurement.
Sampling frequency was 25600 Hz in this experiment and 102400 samples were collected, (therefore length of the measurement was 4.0 s). Loading conditions was basically constant during the measurement with no significant speed variants. NI 9234 DAQ was applied which was a 4-channel C Series dynamic signal acquisition module with 102 dB of dynamic range and 16 bit resolution. IMI 603C01 vibration transducer was applied for sampling the vibration signature. The accelerometer is placed on the previously ground surface of the top of the bearing house with screw mechanism exactly perpendicular to the axis of the rotation of the shaft. 32 bit AMD Athlon II X2 M300 2.0 GHz processor is used for data processing. Wavelet decomposition was executed in Labview environment and SVM fault classification was done by the R open source software.
Healthy, outer race (OR) fault, inner race (IR) fault, roller fault (RF), multi fault (MF), inner race back support fault (IRB) for the raw signal and the 3rd level wavelet decomposition were analyzed in the experiment. Fig. 5 shows some of the faults under investigation in this experiment. Defects on the outer race are a rectangular shape defects with the width of 0.1 mm created by precision laser beam machine. Defect parameters were measured by both Mahr Perthometer and Garant MM1-200 video microscope. The surface topology of these artificial faults are similar to the real grinding faults in manufacturing however it has a bit smoother edges at the entry and exit points of the rollers. This model has a good correlation to the real circumstances and offer the possibility to analyze the classification capability of the SVM. Fig. 4 shows the wavelet decomposition tree. Level cD3 as 3rd level wavelet decomposition provides the right frequency range where the structural frequencies of the bearing elements are emphasized. These frequencies come from the FFT of the time domain signals of the different bearing elements in the experiment (Fig. 2 and 3 present the signal in the time domain and its Fourier spectrum). The structural frequencies of the inner ring fault, roller fault, inner race back support fault and multifault could be analyzed by Fourier transform. However, defect size can not be easily detected by traditional ways of signal processing namely Fourier transform. However, tiny faults of this size could mitigate the lifetime of the bearing after installation to the machine. Instead of Fourier transform, wavelet transform is used because it can detect the sharp edges and sudden changes in the vibration signature more efficiently. Therefore, higher SVM classification rate is supposed to obtain. Previous experiments proved that Fourier transform is not effective enough for this kind of fault detection of bearing elements. R Cran software is a language and environment for statistical computing and graphics [17]. A sample training and testing vectors were created to R Cran. Total 432 instances of healthy bearing and 4 time domain statistical features were extracted for further study: range, kurtosis, skewness, crest factor which are totally 192 parameters for the software. Experiments were executed from 60 to 2880 rpm. In case of OR fault, IR fault, roller fault, multi fault and IR back fault, 20 revolutions were measured as control and check data for SVM. It is appropriate the 20% rule of SVM classification. Raw data were measured by NI DAQ 9234 and processed data after 3rd level wavelet decomposition were calculated by Labview VI. In the whole experiment 824 data were processed for the overall SVM classification.
Crest Factor is accepted to describe small size defects because it is equal to ratio of the peak acceleration to the RMS value. Kurtosis, the fourth normalized statistical moment, corresponds to the peak value of the data. For an undamaged bearing, the value is equal to three. Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. Significant skewness and kurtosis clearly indicate that data are not normal and the bearing suffers from damages. Crest Factor is the ratio of the peak acceleration to the RMS value. Crest factor is a good indicator of small size defects; although, when localized damage grows, the value of the crest factor decreases significantly because of the increasing RMS.

Results and discussion
Recognition rate is calculated by the formula: Recognition rate Number of correctly classified samples Tot = a al number of testing samples % ×100    Effectiveness of the SVM classification is in Table 2-4. Sigmoid kernel has been applied for SVM classification. Wavelet decomposition successfully removed the additional noise components from the vibration signature, behaved as a band-filter and enhanced the useful frequency content that involved the unique feature of faults themselves. Using the filtered signal obtains higher classification rates. As the result shows it is unambiguous task to classify inner race back support faults. The explanation could be that the axial force depends on the spanning of the bearing and usually it is lower than the radial forces where higher accelerations values are generated therefore fault classification is easier.

Conclusion
A novel method was presented for classification of tapered roller bearings with grinding faults from the manufacturing process using SVM classifier and optimal wavelet decomposition according to the Energy-to-Shannon entropy principle. The faults on the bearing parts created by laser beam machine have similar shape and surface topography as the grinding faults from the manufacturing process. Therefore, they could be used for modelling the real problem and for teaching the SVM classifier. A test-rig was planned and constructed that ensures the measurement of the bearings with basically low structural vibration level. However, the raw vibration signature contained additional noise so wavelet decomposition up to the 3rd level has been applied to obtain clear signal for teaching the SVM classifier. The optimal wavelet for the decomposition was selected by the Energy-to-Shannon entropy principle. Symlet_08 wavelet proved to be the best for the faults in the experiment. Statistical parameters of the time domain data were calculated such as kurtosis, skewness, crest factor and range. All of these parameters were good indicators of the bearing status and present the faults of the bearing elements. Both raw data set and the filtered data set after wavelet decomposition were added to the SVM classifier. Classification effectiveness was higher with the filtered data set than with the raw data in all cases. 96.4% classification rate was given for outer race fault that is remarkable for industrial application. Averaged value of the classification rate was 93.5% by using wavelet decomposition.