Process Capability Analysis and Predicting type of Assembly fit by XG boost Machine Learning model using Python

Figure-Assembly

Manufacturing industry often require two parts to be assemble, mostly they are Hole and shaft type assemblies.

Huge amount of time, cost and efforts are wasted to tackle assembly fitment problems.

Although, Design specification satisfy intent of fitment, it is due to process variation ,that affects the assembly process and fitment of components.

The underlying problem is considered as our business problem and our business objective is to: -

1.Minimize Rejections and Repair

2.Maximize First Pass yield.

Now, let us further drill down the assembly fitment problem.

Here, we have considered shaft and hole assembly, as shown in above figure.

The relationship between two mating components is known as “fit”, and pertains

to how tight or loose the items should be when joined together.

There are three types of fit commonly referenced in manufacturing.

1.Clearance Fit-allow for loose mating

2.An interference fit will be much tighter than a clearance fit

3. A transition fit would fall between a clearance and interference fit

Better understandable with below figure

In our case, with mentioned dimension, Transition fit is our required Fit.

Now, referring to American National Standard Preferred Hole Basis Metric Clearance Fits (ANSI B4.2–1978, R1984). e.g.

We came to know the maximum and minimum diameter differences between shaft and hole, for various fits.

Summarizing this in below table,

Above explained phenomenon of, having different fits due to various fit clearances, which are caused, mainly due to process variation, is our business problem.

As here you can see, we can have three different fits, in Machine learning language, this problem is called as, three class classification problem.

Which will be dealt in letter part of this article.

Now, let us first understand the cause of this effect i.e., Process variation

By doing Process Capability Analysis-

Process capability analysis represents a significant component of the Measure phase from the DMAIC (Define, Measure, Analysis, Improve, Control) cycle during a Six Sigma project. This analysis measures how a process performance fits the customer’s requirements, which are translated into specification limits for the characteristics of the product to be manufactured or produced. The results from this analysis may help to identify variation within a process and develop further action plans that lead to better yield, lower variation, and fewer defects.

Specifications

Specifications are the voice of the customer. Every process should be capable of fulfilling the customer’s requirements, which must be quantified to be attainable. Specification limits are the numerical expressions of the customer requirements. Due to natural variations within the process, specifications usually are a range with upper and lower bounds. USL (Upper Specification Limit) is a value above which the process performance is unacceptable, while LSL (Lower Specification Limit) is a value below which the process performance is unacceptable.

Process Performance -Process performance is the voice of the process. A process can be considered right when it is approximating to the target, with as little variation as possible. In the Six Sigma approach, the most common process performance measures are:

• Yield (Y): the number of good products or items produced by the process. It can be assessed once the process is finished, counting the items that fit the specifications:

• First-time yield (FTY): takes into consideration the rework in the middle of the process. Thus, regardless of the number of correct items at the end of the process, counts the correct items as “first time” correct items:

• Defects per opportunity (DPU): number of nonconformities per unit. Defects are the complement of the yield:

• Defects per million opportunities (DPMO): number of nonconformities per million opportunities. It is mainly used as a long-term performance measure of a process:

Process vs. Specifications

The sigma score of a process (Z) is a simple number that conveys how a process fits the customer specifications. Processes that reach a sigma level of 6 may be considered as “almost perfectly” (i.e. with almost zero defects) designed processes. A sigma value of 6 implies that less than 3.4 DPMO (defects per million opportunities) will be attained. The sigma is the number of standard deviations that fit between the specification limit and the mean of a process. It is calculated using the formula:

DPMO through sigma scores-

Capability Indices

Capability indices directly compare the customer specifications with the performance of the process. They are based on the fact that the natural limits or effective limits of a process are those between the mean and +/- 3 standard deviations (i.e. where 99.7% of the data is contained). The capability of a process (Cp) is calculated using the formula:

However, this formula does not allow to validate whether the process is centered in the mean (which is desirable). To deal with this issue, the adjusted capability index (Cpk) is calculated using the formula:

Like the sigma score, capability indices help to determine how well a process is meeting customer specifications. In general, a Cpk of 1.33 is acceptable, but the greater its value, the better.

Now ,Let us first conduct process capability analysis on input using Python, we have two input variables 1.Shaft diameter = d ,2.Hole diameter=H and one output variable i.e. Y= fitment class.

# Both input contains 100 random values with variation in base dimension.

Below are the key event ,plots and process capability summary.

Machine Learning Model building-

Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

A machine learning model is a file that has been trained to recognize certain types of patterns. You train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data.

Once you have trained the model, you can use it to reason over data that it hasn't seen before, and make predictions about those data.

In our business problem we are going to build Extreme gradient boosting model.

Gradient boosting refers to a class of ensemble machine learning algorithms that can be used for classification or regression predictive modelling problems.

Models are fit using any arbitrary differentiable loss function and gradient descent optimization algorithm.

The two main reasons to use XGBoost are execution speed and model performance.

Below are the key event ,plots and model summary.

Model result interpretation:-

1.for evaluated shaft and hole diameter process capability ,@70% assembly will fall into required fit ,20% into Clearance fit and 10% into Interference fit.

2.Based on model results, necessary cost-effective measures can be initiated at shaft or hole process to achieve desired fit.

3.Entire practice of process capability analysis and predictive model building can be horizontally deployable to other manufacturing processes.

Process Capability Analysis and Predicting type of Assembly fit by XG boost Machine Learning model using Python

Sunil Patil

Recommended by LinkedIn

Others also viewed

5 Ways I Use LLMs in My Modeling Workflows

The Generative Modelling Framework

Garbage collection and management using Machine Learning

So you want to create a circular binary mask in MatLab

Why Algebraic Thinking is Becoming Essential in the Agentic Era

Combining Integrated Simulation Software with Advanced Optimization Methods

Prompt Engineering Was Just a Bug in the System

How Pincette and Overengineering Literally Killed Image Analysis Pipeline

Vector embeddings as I understand them

Programming with DeepSeek

Explore content categories