Manufacturing data collection: today’s big challenge

Last updated on March 27th, 2024

In 2012, General Electric found that one of its factories could generate 5,000 data samples every 33 milliseconds. Just one product line could produce as much as 4 trillion data points per year. Manufacturing connectivity and industrial data collection methods have continued to advance, increasing the volume of data that can be collected and the number of sources from which manufacturers can collect information on their processes.

And yet, much of this data sits idle or isolated in data silos even as the industry continues to invest in data collection and other Industry 4.0 technologies. This is because, when treated properly, industrial data can provide manufacturers with a new level of insight into their operations.

To really make use of industrial data, simply collecting it is not enough. The data you collect needs to be usable, which means making it accessible after it has been collected and stored. More importantly, you need to understand how the data you’re collecting can be used to solve your problem(s). 

The majority of digital transformations fail at the proof-of-concept phase – less than one in three are successful. This is because the keys to realizing value from manufacturing data is a clear understanding of the problem you’re trying to solve, a good idea of how the data you collect from the line relates to the problem, and how manufacturing analytics tools, such as machine learning, can solve it.

The growth of big data in manufacturing​

There are three main factors that have contributed to the exponential growth of manufacturing data over the past two decades:

1. Diminishing Sensor Costs​

Like transistors, the cost of sensors has been steadily dropping. According to the Microsoft 2019 Manufacturing Trends Report, the average cost of a sensor in 2018 was just $0.44, compared to $1.30 in 2004. The longer this trend continues, the more incentive manufacturers have to install sensors on legacy equipment, retrofitting it in anticipation of Industry 4.0. By collecting temperature, vibration, and other forms of time series data, manufacturers can utilize these previously unknown values to improve operational efficiency.

2. Increasing Industrial Connectivity​

It was not so long ago that connectivity was a feature worth noting for a machine tool. These days, it’s practically a given that a new piece of industrial equipment will have industrial ethernet, fieldbus or wireless capabilities. According to a 2018 study, these three network technologies account for 85% of the global industrial market, with the remainder taken up by cloud technologies and open-source protocols. Moreover, that trend is accelerating rapidly, with Juniper Research projecting that the number of Industrial IoT connections globally will increase from 17.7 billion in 2020 to 36.8 billion by 2025, an overall growth rate of 107%. 

3. Advanced Manufacturing Analytics​

It’s all well and good to take advantage of less expensive sensors and more available connectivity to collect industrial data, but the cost of doing so still needs to be justified. Cloud computing has been expanding rapidly and will continue to proliferate for the foreseeable future. Grand View Research estimates that the global cloud computing market will expand at a compound annual growth rate (CAGR) of 14.9% from 2020 to 2027.

This expansion is driven by the demand for advanced analytics in manufacturing but it’s also driving that demand by making artificial intelligence and machine learning more available across industries. Deloitte has estimated that 70% of companies which adopt AI will obtain it via cloud-based enterprise software. This has led some industry experts to ask whether cloud and AI are becoming two sides of the same coin

In any event, as advanced manufacturing analytics become more available and the potential applications for AI in manufacturing grows, the incentive to collect industrial data is growing along with it.

How Do You Get The Most Value From Manufacturing Data?​

The road to digitalization in manufacturing may seem clear: increase production data collection through a combination of legacy equipment instrumentation and industrial connectivity, then leverage that data with advanced manufacturing analytics. Add to that the sheer number of manufacturing applications for AI in Industry 4.0, and one could easily draw the conclusion that the biggest challenge in generating value from manufacturing data is deciding where to start. Unfortunately, it’s been shown time and again that the majority of digital transformations fail at the proof-of-concept phase, putting the success rate at less than one in three. There are a myriad of explanations for this, from lack of employee engagement to inconsistent communication strategies. Ultimately, the key to realizing value from manufacturing data is having a clear understanding of the problem you’re trying to solve and how tools such as machine learning can solve it. Used in the right way, machine learning can help manufacturers achieve new levels of product quality, but to do that, they need to understand machine learning.  

Challenges Accessing Stored Data​

The data collected from manufacturing facilities is diverse. Data formats, granularity, and collection intervals vary depending on the line, operation, and the specific sensor used. Although PLCs are commonly regarded as the primary source of industrial data, other manufacturing data sources–including business transactions, maintenance records, geospatial data, and RFID scans–can also provide insights into industrial operations. 

But, when data is collected from different sources, it becomes increasingly difficult to access. Oftentimes, manufacturing data is “siloed” or isolated by division, isolating it from the rest of the organization. In fact, estimates suggest that more than two thirds of manufacturing data goes unused.

So, even though manufacturers have lots of data, only a fraction of it is in a position to be put to work. Raw data, which has no value on its own, is not being transformed into usable information. 

This represents an added cost for manufacturers who have to store this information without a plan to generate a return on their investment. This is often referred to as “dark data” which is kept for compliance purposes, but never analyzed or used to inform decision making. 

Hence, data management is crucial to maximizing the results of data analysis and minimizing the amount spent on storing dark data.

Why Manufacturers Collect Data​

If manufacturing data management is such a challenge, one might wonder if  all the costs associated with instrumentation, data collection and industrial data storage are even worth it.

Yet despite the challenges, it’s become easier and easier to justify data collection over the past decade. The cost of a sensor in 2004 was almost three times the cost of a sensor now: Today’s machine tools are already equipped with industrial ethernet, fieldbus or wireless capabilities. And the cloud-based enterprise solution market is expanding, enabling companies to adopt advanced data analytics and AI solutions without requiring sophisticated on-site infrastructure. 

Plus, the cost of data collection is expected to continue to decline. Juniper Research, for example, projected that the number of Industrial IoT connections globally will increase from 17.7 billion in 2020 to 36.8 billion by 2025, an overall growth rate of 107%. Similarly, Grand View Research estimated that the global cloud computing market will expand at a compound annual growth rate (CAGR) of 14.9% from 2020 to 2027.

Data collection cost is decreasing
The cost of collecting data is decreasing

It makes digitalization in manufacturing sound deceptively simple:  improve your data collection through a combination of legacy equipment instrumentation and industrial connectivity, then leverage that data with advanced manufacturing analytics. Given the variety of manufacturing applications for AI in Industry 4.0, it’s easy to assume that the challenge isn’t generating value from manufacturing data, but deciding which problem to address.

 Yet, as mentioned earlier, most digitalization journeys fail. Why?

There are several  explanations for this, from lack of employee engagement to inconsistent communication strategies. To ensure success, manufacturers need to  understand how machine learning can be leveraged, which brings us back to manufacturing data.

What Manufacturers Need to Know About Data​

There is a lot of information you could learn about your manufacturing data, but to launch a successful machine learning initiative, you don’t need to know everything. Before engaging with manufacturing data analytics, you should at least have a basic idea of:

  • The quantity and frequency of the data you’re generating
  • The format(s) in which your data is stored (CSV, JSON, HDF5, etc.)
  • The type of data you’re generating (labeled vs unlabeled, time-series vs single value, etc.)

Usually, how much and how often are easy questions to answer: The quantity of data is the total number of signals or individual points in a data set, whereas frequency is how many data points you collect over a specific time period. Each point in a dataset can be thought of as an individual unit that’s described by its features. If you’re looking at the data from an NVH testing station, for example, amplitude would be a feature. 

That being said, data quantity is far less important than data quality. The denser the dataset – meaning it contains valuable information and relationships, and is generally larger – the better. If your dataset contains limited information then it can’t be used to solve your problem, no matter how much you have.

It’s a principle that applies whether you’re talking about data science or manufacturing: garbage in, garbage out.

Similarly, data needs to be stored in specific formats to be usable. Data can be formatted in CSV, JSON, HDF5, and SQL Databases, and stored in Key-Value Stores, Data Warehouses or Data Lakes + Data Catalogues.

From the perspective of machine learning, the most important distinction between data types is labeled or unlabeled. Labeled data has a tag or classification, whereas unlabeled data does not. Imagine parts coming off an end-of-line test: If the manufacturer tags the parts as “Pass” or “Fail”, their dataset is labeled. If they don’t have any indication of which parts passed or and which parts failed, then the dataset is unlabeled. 

Whether data is labeled or unlabeled matters when it comes to the types of algorithms, or learning styles, used to analyze the data. Machine learning problems are solved using a variety of algorithms, most often  grouped by “learning style”. The learning style of any given algorithm is based on how that algorithm ingests data, and whether the data is labeled or unlabeled. 

There are three main types of machine learning styles:

  • Supervised Learning, or algorithms that use labeled data
  • Unsupervised Learning, or algorithms that use unlabeled data, and 
  • Semi-Supervised Learning, or algorithms that use data with partially labeled data

Which machine learning style you use depends on what manufacturing data is available, and the problem you’re trying to solve. So, the machine learning style that best suits a manufacturer’s application sometimes differs from plant to plant, or from line to line. 

Once you’ve collected data, what can you do with it? 

Share on social:

Automate root cause analysis and predict defects in real time

How is that possible?