What makes manufacturing data different?
Last updated on August 26th, 2022
You’ve probably heard the phrase “Data is the New Oil” bandied about, though the recent consensus seems to be running against that notion. Captivating as it is, the metaphor breaks down under scrutiny: for one thing, oil is limited while data is cumulative. Still, issues with this particular analogy aside, the underlying idea is easy enough to grasp: data is a valuable commodity, and its value is increasing.
The information economy is being “fueled” by data just as the industrial economy has been fueled by oil, but in both cases (to salvage the analogy somewhat) that fuel needs to be refined in order to be useful. Exactly how it’s refined and what the final product looks like depends on the data itself as well as the use case to which it’s being applied. To appreciate this, one need only contrast the data from two very different sources. Drawing on my own personal experience, let’s compare the data from finance and automotive manufacturing.
Similarities in Manufacturing & Financial Data
Although their sources and applications differ radically, there are still significant similarities between the data in finance and in manufacturing. First and foremost, both sectors work extensively with time-series, i.e., data points indexed in chronological order. It should be noted, however, that manufacturing data also consists of single-value observations—such as tolerances—which can be related to time series data but are importantly different.
The other major similarity between finance and manufacturing, perhaps surprisingly, is data volume. On the surface, it seems obvious that the financial sector produces a lot more time series data, even if you’re looking at it on a per stock basis. In any given timestamp for financial data, there are many different bids and asks depending on what people are willing to buy or sell a stock for. .Manufacturing data involves much less information being recorded, with one important exception that’s crucial in the automotive industry: end of line (EOL) testing.
The CAN bus in a vehicle on the road normally reports data at a rate of either 100Hz or 10Hz, with roughly 1000 signals being recorded. With financial data, reporting rarely goes beyond miliseconds—even in scenarios involving high frequency trading—and the order books normally have 10 bid/ask levels. Hence, a vehicle on the road generates 100ts (10ts) * 1000 signals, while a single stock or futures contract generates 1000ts * 10(*2) features.
Very similar.
Manufacturing vs Finance - Data Stationarity
Perhaps the biggest difference between the character of automotive and financial data is stationarity, i.e., the extent to which the statistical properties of a time series—mean, variances, etc.—remain constant over time.
Strictly speaking, if you look at a car or engine test in manufacturing, the time series is not stationary. A typical EOL engine test, for example, consists of several blocks: a ramp up and a ramp down, as well as a fixed level of, say, RPM. So, when you look at all the data for an EOL test, it’s not stationary, but because it’s a physical process and you know what stage your in (i.e., ramping up, ramping down or holding constant) it is stationary within each block, because the test is predetermined.
In contrast, it’s very hard to predict when the market will change from bull to bear. Stock sentiment changes, major news or an unexpected tweet completely overturn standard dynamics. Everyone can agree, based on historical data, when the ideal time to buy or sell a stock was, but it’s extremely difficult to predict that in advance.
So, the data from EOL tests isn’t stationary, but it can be split into chunks that are, if you have knowledge of the mechanical testing process. The same goes for cars on the road: they accelerate, decelerate and run at different constant speeds, so you can take the same approach of chunking the data in order to analyze it.
This is why domain knowledge matters: understanding how to breakdown an EOL test significantly simplifies your analysis.
Manufacturing vs Finance - Data Organization
It may come as a shock to hear that stock market data is much more organized than manufacturing data, but it’s true. A moment’s reflection reveals why: there are so many resources built to track quotes that it’s relatively easy to obtain a database from the financial sector that’s both fairly clean and granular.
The simple fact is that, with minimal effort, you can download 20 years of clean, thorough (i.e., down to the millisecond level) data for almost 2,000 stocks and futures and 80 years’ of daily trades.
Compare that with manufacturing, where a lot of companies are just getting into the idea of Industry 4.0, and it’s understandable why there are a lot more issues with data recording. In manufacturing, having even three years of (acceptable quality) historical data is significant.
Think of it this way: whether you download data from the New York Stock Exchange or the London Stock Exchange, it’s going to be the same format. You can analyse or run the same trading models on both datasets without having to make any adjustments.Compare that with two manufacturers—even when they’re making the same parts—and their processes, what they’re recording, how they’re recording it, and the overall quality of their data can all be different.
That’s why Acerta build models specifically for each client.
Manufacturing vs Finance - Data Labelling, Metrics & Optimization
Three other differences between manufacturing and financial data are worth touching upon because they’re related.
The first is labelling: most manufacturing data is labelled (e.g., parts identified as Passed/Failed/Warranty), whereas financial data is not, because the stock market’s inherent volatility affects what counts as “Okay” or not. That’s why traders set their own risk/reward tolerances: what’s “good” in stock market data is relative.
This naturally leads to a second difference: metrics.
Since most manufacturing data is labelled, the success metrics for machine learning tend to be concentrated around confusion matrices, in which predicted and actual values (e.g., Passed and Failed) are compared to determine an algorithm’s performance. The goal is to generate a model that’s as accurate as possible, in terms of minimizing false positives and false negatives.
In contrast, since financial data is unlabelled, the success metrics for risk/reward ratios are framed in terms of, for example, the Sharpe ratio or Sortino ratio. The goal in this case is to achieve the maximum return at the lowest volatility.
This brings us to the third difference between manufacturing and financial data: optimization.
The disparities in data labelling and metrics between the two sectors necessitates the application of different model types. While both machine learning applications require models to be frequently retrained, the reasons for retraining are different. For financial data, the absence of stationarity leads to models being reoptimized in the short term. In the case of manufacturing data, models need to be retrained in response to changes in the assembly process. That’s why Acerta’s LinePulse platform incorporates an auto-retraining framework to update models automatically.
There’s a common misperception, especially in manufacturing, that machine learning models are static: a client wants to reduce EOL test failures, so Acerta deploys a model that predicts those failures before the assemblies are actually tested. If that were the end of the story, we’d be doing our clients a disservice; anyone who offers to sell you a single machine learning model to solve your problem is fooling you.
That’s because there is no single perfect model for manufacturing, just as there is no single algorithm for the stock market. In both cases, the models need to be retrained. For the stock market, one of the common approaches to getting around the lack of stationarity is to deploy short-term models and retrain them as market conditions change. For manufacturing, assembly conditions change—new tooling comes in, seasons change, a third shift is added, etc.—and as a result, the measurements will drift away from where they started.
That’s why we prefer to think of our machine learning models from the perspective of continuous improvement, rather than optimization. We’re trying to infer line worker behavior from data alone, and our experience helps understand what they’re doing.
Automotive Industry 4.0
If I can leave any manufacturers who might be reading with one piece of advice, it would be this:
Don’t wait!
Jump into Industry 4.0 as fast as you can, because the sooner you do, the sooner you’ll be able to ensure that your data is properly stored and organized, and the faster you’ll be able to derive insights to improve your business.
Share on social: