Machine Learning in Industrial Level Applications: Challenges and Approaches

Machine Learning in Industrial Level Applications: Challenges and Approaches

For the past year I’ve read a lot of articles about Machine Learning applications, new algorithms, new models, but what I came to realize is most of them are theoretical, conceptual ideas or sometimes just debates about comparisons between different approaches that in practice improve the overall model metric in 1-2%.

Okay, all of that is good to further develop Data Science and Machine Learning areas, but I’m a pragmatic person, in my view something only has value if it can be applied to real world problems in a practical and feasible way.

With that in mind, the idea of this article is to delve into ML applications at industrial level.

A lot of companies are trying to implement models in their production lines, but few have succeeded in the task, so why is that?

Differently from Kaggle datasets, online courses or graduation projects, etc., real world data is messy, noisy, sparse and most likely spread through various locations in different database types.

Não foi fornecido texto alternativo para esta imagem

So appears our first challenge, when trying to develop an industrial level model the simple task of gathering data may take months of work, use teams across several locations and still miss lots of data.

Sometimes we literally have to plug a notebook at an equipment or machine terminal to access the data and extract it.

Now lets try to understand why this happens, in most production plants the equipment and machinery are years old, from distinct manufacturers, using diverse communication protocols, sometimes even in various languages. Besides all that, at a same production plant we may find diverse databases.

It is literally a data ocean, note I am not using the term data lake, as lakes are steady and present predictable behavior, oceans on the other hand can be tricky, dangerous and are known for overwhelming unwary sailors.

Não foi fornecido texto alternativo para esta imagem

How to solve this problem?!... well, I really wish there was a practical and easy solution, but unfortunately this problem requires investments in new hardware, new communications protocols, infrastructure, software and above all, retrain plant operators. This could potentially cause disruptions in the production due to learning curve period of new technologies.

The simpler solution, gather a task force, fold the shirtsleeves and go for a field trip, and do your best to catch them all…

Great, we got the data, or at least most of it, now comes our second challenge; preprocess everything to bring it to a common base.

Here, basically, is a matter of time and fully dedicated FTE’s to transform the data. Normally, this type of work is performed by Data Engineers, who creates data ingestion pipelines and set up cloud services.

Não foi fornecido texto alternativo para esta imagem

This second challenge is not difficult to solve, but it will take time, lots of it, probably a great proportion of data science project's time will be spend here.

After preprocessing the Data Science team can get their hands on workable data, so they start developing the model. Lets assume the team works fine and manage to create a model with good metrics and something that will bring value to the company.

Here starts our third challenge; the issue of model scalability, as the DS team worked through the data they most likely feature engineered it, created statistical analysis columns (polynomial expansion, max, min, rolling averages, etc).

Therefore, they transformed a lot the data for model training, and in order to infusion the model into the production the same transformation should be performed by a data pipeline at operations level.

The trick part is to have enough computer processing capacity at site to perform all transformations inline, specially if the operation requires real-time feed. Depending the level of processing capacity required this could be an initiative killer.

What about cloud services? Some operations can’t depend on web services, either by being in a location with poor telecommunications capabilities or simply because they can’t risk leave an entire operation dependent on internet connection.

Não foi fornecido texto alternativo para esta imagem

Finally, after overcome all the previous obstacles there is a forth one that may arise; similar equipments may have different sensors setups, so in other words, it is not guaranteed the model will work for all equipments that belongs to the same type group.

What?! Yes, this is a typical scenario in production plants; you have similar equipments built by different manufacturers, so basically the information output on those will probably be in diverse formats and structure...great, right?

Geez, it never gets easy….

Não foi fornecido texto alternativo para esta imagem

Bottom line, industrial plants have huge potential when we think about Machine Learning applications, but it won’t be a trivial or simple task. Production plants normally possess decades of information and data, but they don’t reside in a tidy and easy format (data lake), they are an ocean, where storms of data could cast astray even the most experienced Data Scientists.

Smooth seas do not make skillful sailors.

To view or add a comment, sign in

More articles by Bruno Marques

Others also viewed

Explore content categories