Machine learning and data collection for manufacturing
Last updated on October 25th, 2022
In 2012, General Electric found that one of its factories could generate 5,000 data samples every 33 milliseconds. Just one product line could produce as much as 4 trillion data points per year. Manufacturing connectivity and industrial data collection methods have continued to advance, increasing the volume of data that can be collected and the number of sources from which manufacturers can collect information on their processes.
And yet, much of this data sits idle or isolated in data silos even as the industry continues to invest in data collection and other Industry 4.0 technologies. This is because, when treated properly, industrial data can provide manufacturers with a new level of insight into their operations.
If you really want to impact your line using industrial data, simply collecting information is not enough. The data you collect needs to be usable, which means making it accessible after it has been collected and stored. More importantly, you need to understand how the data you’re collecting can be used to solve your problem(s).
The majority of digital transformations fail at the proof-of-concept phase – less than one in three are successful. This is because the keys to realizing value from manufacturing data is a clear understanding of the problem you’re trying to solve, a good idea of how the data you collect from the line relates to the problem, and how manufacturing analytics tools, such as machine learning, can solve it.
Challenges Accessing Stored Data
The data collected from manufacturing facilities is diverse. Data formats, granularity, and collection intervals vary depending on the line, operation, and the specific sensor used. Although PLCs are commonly regarded as the primary source of industrial data, other manufacturing data sources–including business transactions, maintenance records, geospatial data, and RFID scans–can also provide insights into industrial operations.
But, when data is collected from different sources, it becomes increasingly difficult to access. Oftentimes, manufacturing data is “siloed” or isolated by division, isolating it from the rest of the organization. In fact, estimates suggest that more than two thirds of manufacturing data goes unused.
So, even though manufacturers have lots of data, only a fraction of it is in a position to be put to work. Raw data, which has no value on its own, is not being transformed into usable information.
This represents an added cost for manufacturers who have to store this information without a plan to generate a return on their investment. This is often referred to as “dark data” which is kept for compliance purposes, but never analyzed or used to inform decision making.
Hence, data management is crucial to maximizing the results of data analysis and minimizing the amount spent on storing dark data.
Why Manufacturers Collect Data
If manufacturing data management is such a challenge, one might wonder if all the costs associated with instrumentation, data collection and industrial data storage are even worth it.
Yet despite the challenges, it’s become easier and easier to justify data collection over the past decade. The cost of a sensor in 2004 was almost three times the cost of a sensor now: Today’s machine tools are already equipped with industrial ethernet, fieldbus or wireless capabilities. And the cloud-based enterprise solution market is expanding, enabling companies to adopt advanced data analytics and AI solutions without requiring sophisticated on-site infrastructure.
Plus, the cost of data collection is expected to continue to decline. Juniper Research, for example, projected that the number of Industrial IoT connections globally will increase from 17.7 billion in 2020 to 36.8 billion by 2025, an overall growth rate of 107%. Similarly, Grand View Research estimated that the global cloud computing market will expand at a compound annual growth rate (CAGR) of 14.9% from 2020 to 2027.
It makes digitalization in manufacturing sound deceptively simple: improve your data collection through a combination of legacy equipment instrumentation and industrial connectivity, then leverage that data with advanced manufacturing analytics. Given the variety of manufacturing applications for AI in Industry 4.0, it’s easy to assume that the challenge isn’t generating value from manufacturing data, but deciding which problem to address.
Yet, as mentioned earlier, most digitalization journeys fail. Why?
There are several explanations for this, from lack of employee engagement to inconsistent communication strategies. To ensure success, manufacturers need to understand how machine learning can be leveraged, which brings us back to manufacturing data.
What Manufacturers Need to Know About Data
There is a lot of information you could learn about your manufacturing data, but to launch a successful machine learning initiative, you don’t need to know everything. Before engaging with manufacturing data analytics, you should at least have a basic idea of:
- The quantity and frequency of the data you’re generating
- The format(s) in which your data is stored (CSV, JSON, HDF5, etc.)
- The type of data you’re generating (labeled vs unlabeled, time-series vs single value, etc.)
Usually, how much and how often are easy questions to answer: The quantity of data is the total number of signals or individual points in a data set, whereas frequency is how many data points you collect over a specific time period. Each point in a dataset can be thought of as an individual unit that’s described by its features. If you’re looking at the data from an NVH testing station, for example, amplitude would be a feature.
That being said, data quantity is far less important than data quality. The denser the dataset – meaning it contains valuable information and relationships, and is generally larger – the better. If your dataset contains limited information then it can’t be used to solve your problem, no matter how much you have.
It’s a principle that applies whether you’re talking about data science or manufacturing: garbage in, garbage out.
Similarly, data needs to be stored in specific formats to be usable. Data can be formatted in CSV, JSON, HDF5, and SQL Databases, and stored in Key-Value Stores, Data Warehouses or Data Lakes + Data Catalogues.
From the perspective of machine learning, the most important distinction between data types is labeled or unlabeled. Labeled data has a tag or classification, whereas unlabeled data does not. Imagine parts coming off an end-of-line test: If the manufacturer tags the parts as “Pass” or “Fail”, their dataset is labeled. If they don’t have any indication of which parts passed or and which parts failed, then the dataset is unlabeled.
Whether data is labeled or unlabeled matters when it comes to the types of algorithms, or learning styles, used to analyze the data. Machine learning problems are solved using a variety of algorithms, most often grouped by “learning style”. The learning style of any given algorithm is based on how that algorithm ingests data, and whether the data is labeled or unlabeled.
There are three main types of machine learning styles:
- Supervised Learning, or algorithms that use labeled data,
- Unsupervised Learning, or algorithms that use unlabeled data, and
- Semi-Supervised Learning, or algorithms that use data with partially labeled data
Which machine learning style you use depends on what manufacturing data is available, and the problem you’re trying to solve. So, the machine learning style that best suits a manufacturer’s application sometimes differs from plant to plant, or from line to line.
Once you’ve collected data, what can you do with it?
Share on social: