There is a lot of information you could learn about your manufacturing data, but to launch a successful machine learning initiative, you don’t need to know everything. Before engaging with manufacturing data analytics, you should at least have a basic idea of:
- The quantity and frequency of the data you’re generating
- The format(s) in which your data is stored (CSV, JSON, HDF5, etc.)
- The type of data you’re generating (labeled vs unlabeled, time-series vs single value, etc.)
Usually, how much and how often are easy questions to answer: The quantity of data is the total number of signals or individual points in a data set, whereas frequency is how many data points you collect over a specific time period. Each point in a dataset can be thought of as an individual unit that’s described by its features. If you’re looking at the data from an NVH testing station, for example, amplitude would be a feature.
That being said, data quantity is far less important than data quality. The denser the dataset – meaning it contains valuable information and relationships, and is generally larger – the better. If your dataset contains limited information then it can’t be used to solve your problem, no matter how much you have.
It’s a principle that applies whether you’re talking about data science or manufacturing: garbage in, garbage out.
Similarly, data needs to be stored in specific formats to be usable. Data can be formatted in CSV, JSON, HDF5, and SQL Databases, and stored in Key-Value Stores, Data Warehouses or Data Lakes + Data Catalogues.
From the perspective of machine learning, the most important distinction between data types is labeled or unlabeled. Labeled data has a tag or classification, whereas unlabeled data does not. Imagine parts coming off an end-of-line test: If the manufacturer tags the parts as “Pass” or “Fail”, their dataset is labeled. If they don’t have any indication of which parts passed or and which parts failed, then the dataset is unlabeled.
Whether data is labeled or unlabeled matters when it comes to the types of algorithms, or learning styles, used to analyze the data. Machine learning problems are solved using a variety of algorithms, most often grouped by “learning style”. The learning style of any given algorithm is based on how that algorithm ingests data, and whether the data is labeled or unlabeled.
There are three main types of machine learning styles:
- Supervised Learning, or algorithms that use labeled data,
- Unsupervised Learning, or algorithms that use unlabeled data, and
- Semi-Supervised Learning, or algorithms that use data with partially labeled data
Which machine learning style you use depends on what manufacturing data is available, and the problem you’re trying to solve. So, the machine learning style that best suits a manufacturer’s application sometimes differs from plant to plant, or from line to line.
Once you’ve collected data, what can you do with it?
How do you build a machine learning model, and what can it tell you about your process?
Check out The Manufacturing Guide to Machine Learning to learn more!