The manufacturing guide to machine learning

An introduction to AI & machine learning
applications in manufacturing

When it comes to automotive applications of artificial intelligence, most people will think of autonomous vehicles. It’s understandable: self-driving cars make it easy to visualize something as abstract as machine learning. We’ve all seen the videos of what an autonomous vehicle “sees” through its cameras and—aside from the unhelpful yet ongoing debates about trolley problems—the execution seems fairly straightforward. 

Basically, we can conceive of how autonomous vehicles drive because we’re able to drive ourselves.

It’s much more difficult to understand how artificial intelligence and machine learning apply to something like automotive manufacturing. The sheer number of variables involved in an assembly line is enormous, far more than a single person can manage on their own. 

Deciding when or how to change manufacturing lines is nothing like deciding when or how to change lanes. 

And yet, machine learning has the potential to be much more valuable in the factory than on the road, at least for the foreseeable future.

Machine learning can augment an engineer’s capabilities, processing the huge volumes of data generated during production so that they can make decisions with as much information as possible. Unfortunately, there remains considerable skepticism toward machine learning in manufacturing, no doubt due to its close ties to buzzwords like “digitalization” and “Industry 4.0”.

But machine learning isn’t just a buzzword. It’s also a valuable tool and the next logical step in the long history of manufacturing’s evolution. To understand why, we need to understand what machine learning is and how it can actually be useful in an industrial environment. But first, we need to look at manufacturing data.

The What, Why and How of Manufacturing Data

Artificial Intelligence and Machine Learning

What Data are Manufacturers Collecting?

There are many potential sources of information in manufacturing, not just the obvious ones such as sensors and PLCs. Other manufacturing data sources–including business transactions, maintenance records, geospatial data, and RFID scans–can all provide insights into industrial operations. Of course, that data needs to be accessible in order to derive those insights. 

More often than not, industrial data is siloed, meaning that it’s only accessible by one department or division and otherwise isolated from the rest of the organization, limiting its usefulness. By some estimates, more than two thirds of all manufacturing data collected goes unused. This so-called “dark data”, often retained purely for compliance purposes, incurs storage costs for manufacturers without generating commensurate value. Breaking down data silos in order to recoup those costs and realize a better ROI is a well-known problem that has spawned a whole host of manufacturing data visibility and data governance solutions. 

Given the challenge of manufacturing data management, one might wonder whether the costs of instrumentation, data collection and industrial data storage are really worthwhile.

Why Collect Industrial Data?

There are three main factors that have contributed to the exponential growth of manufacturing data over the past two decades:

  1. The Diminishing Cost of Sensors
  2. The Increase in Industrial Connectivity
  3. The Availability of Advanced Analytics

Diminishing Sensor Costs

Like transistors, the cost of sensors has been steadily dropping. According to the Microsoft 2019 Manufacturing Trends Report, the average cost of a sensor in 2018 was just $0.44, compared to $1.30 in 2004. The longer this trend continues, the more incentive manufacturers have to install sensors on legacy equipment, retrofitting it in anticipation of Industry 4.0. By collecting temperature, vibration, and other forms of time series data, manufacturers can utilize these previously unknown values to improve operational efficiency.

Increasing Industrial Connectivity

It was not so long ago that connectivity was a feature worth noting for a machine tool. These days, it’s practically a given that a new piece of industrial equipment will have industrial ethernet, fieldbus or wireless capabilities. According to a 2018 study, these three network technologies account for 85% of the global industrial market, with the remainder taken up by cloud technologies and open-source protocols. Moreover, that trend is accelerating rapidly, with Juniper Research projecting that the number of Industrial IoT connections globally will increase from 17.7 billion in 2020 to 36.8 billion by 2025, an overall growth rate of 107%. 

Advanced Manufacturing Analytics

It’s all well and good to take advantage of less expensive sensors and more available connectivity to collect industrial data, but the cost of doing so still needs to be justified. Cloud computing has been expanding rapidly and will continue to proliferate for the foreseeable future. Grand View Research estimates that the global cloud computing market will expand at a compound annual growth rate (CAGR) of 14.9% from 2020 to 2027.

This expansion is driven by the demand for advanced analytics in manufacturing but it’s also driving that demand by making artificial intelligence and machine learning more available across industries. Deloitte has estimated that 70% of companies which adopt AI will obtain it via cloud-based enterprise software. This has led some industry experts to ask whether cloud and AI are becoming two sides of the same coin

In any event, as advanced manufacturing analytics become more available and the potential applications for AI in manufacturing grows, the incentive to collect industrial data is growing along with it.

How Do You Get The Most Value From Manufacturing Data?

The road to digitalization in manufacturing may seem clear: increase production data collection through a combination of legacy equipment instrumentation and industrial connectivity, then leverage that data with advanced manufacturing analytics. Add to that the sheer number of manufacturing applications for AI in Industry 4.0, and one could easily draw the conclusion that the biggest challenge in generating value from manufacturing data is deciding where to start.

Unfortunately, it’s been shown time and again that the majority of digital transformations fail at the proof-of-concept phase, putting the success rate at less than one in three. There are a myriad of explanations for this, from lack of employee engagement to inconsistent communication strategies. Ultimately, the key to realizing value from manufacturing data is having a clear understanding of the problem you’re trying to solve and how tools such as machine learning can solve it.

Used in the right way, machine learning can help manufacturers achieve new levels of product quality, but to do that, they need to understand machine learning.  

3 Types of Machine Learning for Manufacturing

Machine learning involves utilizing a variety of algorithms to solve problems. These algorithms are frequently grouped according to how they learn. The learning style of an algorithm depends on how it ingests data, and whether that data is labelled or unlabelled.

Labelled data has a tag or classification, whereas unlabelled data does not. Think of units coming off an end-of-line test: If those units are identified as passing or failing in the data, then the data set is labelled. If the units do not have any indication of which parts passed or and which parts failed, then the data set is unlabelled. 

There are three types of machine learning styles:

  • Supervised Learning: algorithms that use labelled data 
  • Unsupervised Learning: algorithms that use unlabelled data 
  • Semi-Supervised Learning: algorithms that use partially labelled data 

Which machine learning style works best will depend on the available manufacturing data, and the problem you’re trying to solve. Hence, manufacturers may find it best to use a combination of supervised, unsupervised, and semi-supervised learning across facilities or even across lines within a single facility. 

Supervised Machine Learning Applications in Manufacturing

Acerta Solutions

End-of-Line (EOL) Testing

Supervised learning uses labelled data to solve problems that have a well-defined failure mode or threshold for failure. By feeding a machine learning model historical testing data, for example, from a noise, vibration and harshness (NVH) test, the model can learn to identify “normal” behaviour and use that to predict whether a unit will pass or fail the test, before units actually reach the test station. The model is then able classify and label new parts in real time.  

After a supervised learning model has been trained on historical data, it can be deployed via a machine learning platform or in a Docker container directly on the line. The model can then conduct real-time data analysis on parts coming off the line, identifying defects in testing and predicting, for example, problematic frequencies from NVH testing.

Machine Vision in Manufacturing

One common application of supervised learning in manufacturing is machine vision. These systems use image processing to automatically conduct analysis to predict failures and prevent stoppages on the line. They can also be used in automated inspection and process control to ensure better quality output. 

In production, image-processing commonly involves tracking objects in two-dimensional space, such as components on an assembly line. However, as the technology continues to improve, 3D machine vision can coordinate the complex movement of components in 3D space

Similar to the testing example discussed above, machine vision systems can be trained to identify broken or damaged parts on the line and then classify new parts as either broken or normal. However, vision systems are inherently limited to visual differences. By the time a vision system identifies an error in a part, the only recourse is to rework or scrap it. Ideally, machine learning should eliminate problems before they happen, rather than simply replacing human inspectors.

Unsupervised Machine Learning Applications in Manufacturing

Future of mobility with machine learning

Supply Chain Optimization

Unsupervised learning is very useful for finding the underlying structure of a dataset. It can analyze the complex interactions between different signals, providing insights into subtle trends and identify otherwise invisible issues with an assembly line or even a whole supply chain.

Currently, manufacturers are using AI and machine learning algorithms to identify the factors most likely to impact production volume, such as weather, consumer demand, or political considerations. These predictions are then used to optimize allocation of staff and inventory, among other resources, to ensure that operations run smoothly from end to end.  

Predicting Remaining Useful Life (RUL)

Unsupervised machine learning can also be used to determine when machinery might fail, comparing signals such as temperature, pressure, and usage frequency (among others) to historical failures. While there are obviously many consumer applications for improving estimations of remaining useful life, it’s also useful for manufacturers, as it reduces the frequency of and time spent on maintenance.

Semi-Supervised Machine Learning Applications in Manufacturing

Although it’s less well known than supervised and unsupervised machine learning, semi-supervised learning can be a valuable tool for improving machine learning outputs. 

In the context of manufacturing, supervised learning models are trained on data with failures that have been flagged as such (since the data is labelled), and unsupervised learning models are trained on data that may or may not contain failures (since the data is unlabelled). 

But, labeling data is a time-consuming, often impractical process, especially in manufacturing. Hence, semi-supervised learning is a good option when dealing with partially labelled data. Semi-supervised models are trained on a combination of labelled and unlabelled data, typically much more of the latter than the former. 

For example, a manufacturer could use semi-supervised machine learning algorithms to process a dataset from a production run in which some, but not all, units were tested. By comparing the labelled passing or failing units with the unlabelled (i.e., untested) units, a semi-supervised model can generate predictions of which untested units would be most likely to pass or fail the test, reducing the risk of quality escapes without comprehensive testing.

Introduction to Machine Learning Model Building

Although supervised, unsupervised, and semi-supervised learning are all viable machine learning approaches for manufacturing, there are several steps between knowing which learning style is best suited to solve a given problem, and actually generating a machine learning solution. 

Since manufacturing data is generated by disparate sources, it needs to be cleaned, with all null, missing, or duplicate values removed. After that, the data can then be transformed so that it can be ingested and processed by a machine learning model.

Some transformations are mandatory. For example, since machine learning algorithms are only able to process numerical values, non-numerical features (like a string) need to be converted and represented numerically. Some linear models and neural networks also have a fixed number of input nodes, which means the size of all inputs must be the same. 

There are also optional transformations (referred to as “quality” transformations) which can be used to enhance the model’s overall performance, but are not strictly required. 

Normalization is an example of a quality transformation, where features are transformed to a similar scale. There are several techniques that can be used for normalization, based on how the data is structured. Scaling to a range, for example, converts features from their original  range to a standard range (usually 0 to 1) and is best suited to data sets with a fairly uniform distribution. 

Data scientists can also transform the data by constructing new features based on existing input features, in a process called feature engineering. With the right set of skills and domain expertise, feature engineering can boost a machine learning model’s performance considerably. 

After the data is transformed, it can be divided into two sets: a training and a test set. How the data is divided is based on the type of data set. 

For example, when building a machine learning model for predictive maintenance problems, the time sequence is crucial and therefore the dataset must be divided carefully. In contrast, for classification problems, such as distinguishing defective and non-defective parts, the time sequence is irrelevant but there should be an equal distribution of defective and non-defective parts in the training and the test set. 

Once the data is prepared, it can be used for training and validation. Usually, manufacturing problems fall into one of two umbrella categories: classification (e.g., pass vs fail from an end-of-line test) or regression (e.g., predicting cutting tool wear over time).

Since each and every problem is unique, machine learning models must also be tuned and tweaked to yield the best performance possible. The performance of different models can be compared, ideally on an ongoing basis, and the one with the best fit selected.

Machine Learning & MES

A frequent victim of RAS Syndrome, a manufacturing execution system (MES) monitors the process of transforming raw materials into finished products. The MES is where all product and production data is collected and aggregated. MES is often connected to an organization’s enterprise resource planning (ERP), in which case everything from procurement, to accounting, to maintenance—even scrap and rework rates—is all available in one place. 

This is why the MES is valuable for machine learning.

Implementation of MES varies from plant to plant and even within a single facility. Older production lines might not use a manufacturing execution system at all, instead aggregating data manually. Relatively newer lines might cover the basics of MES and collect rudimentary data, such as downtime reports or scrap rates, while lines launched even more recently will collect data from everything. 

When this integration is done well—meaning that the line utilizes a sophisticated data collection methodology—plant engineers can trace parts back through each operation and see the parameters from every critical process.

MES 2019

Data Traceability & Data Completeness

In terms of industrial data, the two biggest differences between a line with MES and one without (or one that hasn’t integrated it completely) are traceability and completeness.

Traceability, in this context, is what gives a unit on an assembly line its identity. If an error occurs partway through the assembly process, resulting in a significant proportion of the unit being scrapped, traceability is what decides whether or not it’s still the same unit.

Traceability is also essential for machine learning applications in manufacturing. If you want to use machine learning to determine how OP10 is affecting OP20, you need traceability to link the data coming from those two operations.

Data completeness can be understood in terms of granularity. Think of measuring the distance between two planar features on a part. In most cases, a line worker would use three probes for each surface, but only record one number for the distance. That may have been good enough in the past, but it’s not sufficient if you want to leverage machine learning to improve manufacturing quality.

Machine learning models need all three measurements in order to represent the part’s planar features as accurately as possible. Without them, the math will assume that both planes are flat even if they aren’t. Put simply: the more granular the data, the more opportunities for insight

It should be noted that while it’s possible to attain the levels of data traceability and completeness necessary for machine learning without a MES, those who do have a MES tend to be in a much better position to use machine learning compared to those who don’t. 

On the other hand, having a MES in place doesn’t guarantee that a manufacturer has all the data necessary to utilize machine learning. Manufacturing execution systems don’t record time-based data, e.g., for a stamping operation, they will only store the final or max force and the time of day when the operation took place rather than recording force vs time data during each operation. Machine learning models need the latter sort of information to deliver the most accurate results.

Hence, while they’re technically neither necessary nor sufficient for implementing machine learning in a manufacturing environment, manufacturing execution systems can make the deployment of machine learning tools much easier. In addition, the presence of a MES indicates a level of sophistication in a company’s data strategy which puts it that much closer to taking the next step with machine learning.   

Benefits of Machine Learning in Manufacturing

Machine learning is not suitable for every manufacturer, but for those at the right stage of their digital transformation, its potential cannot be overstated. 

At the most basic level, incorporating a machine learning platform into a facility’s or organization’s IT infrastructure gives engineers more insight into their production processes. Tasks that have traditionally taken hours of manual labor, such as aggregating line data to identify trends, can be automated and completed in minutes or less. In this case, machine learning isn’t competing with statistical process control (SPC) or other traditional manufacturing methodologies; it’s augmenting them.

A similar point applies for diagnosing product or production issues. Root cause analysis can involve dealing with hundreds or thousands of signals from the production line, depending on the scope of the issue and how much data is collected. Using automated signal pruning, machine learning can reduce the number of signals requiring manual investigation by more than 99%, cutting the time for root cause analysis from weeks to hours.

In addition to providing visibility and reducing the need for manual data analysis, machine learning can also benefit manufacturing by enhancing end-of-line testing, particularly for complex assemblies, such as engines or transmissions. By generating predictions of whether and how likely units are to pass or fail an end-of-line test, machine learning removes the need to test every unit at the end of the line. Instead, manufacturers can focus on the subset of units where confidence in the predictions falls below a specified threshold. 

Finally, by combining the insights gained by incorporating machine learning into a production line, manufacturers can use those insights not only to solve production problems, but actually avoid them all together. By identifying the key decision points during assembly which contribute to product failures, a machine learning platform can recommend which parts to mate together in order to minimize the potential for quality escapes from completed assemblies. This approach has been shown to reduce the rework rate for axle assemblies by 65%.

Ultimately, how beneficial machine learning can be to manufacturing depends on the particularities of your application and the data being collected from it. With the right set of information and the right machine learning approach, manufacturers can reduce their scrap and rework rates, improve product quality, and increase production throughput.

Glossary of Terms

Artificial Intelligence: The implementation of human-like reasoning in computational systems

Classification Problem:  A predictive modeling problem where a class label is predicted based on discrete input data

Deep Learning: An approach to machine learning that uses artificial neural networks consisting of individual nodes connected and structured into input layers, hidden layers, and output layers

Labelled Data: Data that includes tags or metadata which provides additional information about individual data points, e.g., PASS/FAIL

Learning Style: A way to group machine learning algorithms according to their input data (i.e., labelled or unlabelled)

Machine Learning: An approach to artificial intelligence that uses algorithms or statistical models to perform tasks without explicit instructions

MES: Manufacturing Execution System, used for monitoring the production process

Regression Problem:  A predictive modeling problem where the predicted output is continuous, or a real value

Semi-Supervised Learning: A method of machine learning in which algorithms utilize data with only one labelled feature

Supervised Learning: A method of machine learning in which a model is trained on labelled data and presented with a desired output; typically used to solve classification or regression problems

Test Set: A subset of data used to evaluate a trained machine learning model’s performance

Training Set: A subset of data used to teach a machine learning model how to predict a target outcome.

Unlabelled Data: Data that lacks tags or explanations of what the individual data points represent

Unsupervised Learning: A method of machine learning in which models utilize unlabelled data to group points or features; typically used to solve clustering or dimensionality reduction problems