Building a data science strategy

Last updated on August 26th, 2022

The auto industry is generating more data than ever before, from the moment the first pieces of a vehicle are cut or stamped to the day that vehicle rolls off the assembly line and beyond. And yet, it’s been estimated that as much as 70% of data generated during production goes unused, suggesting that automakers are sitting on a largely untapped resource.

Data science is frequently cited as one of the best ways for manufacturers to leverage their data to gain insights into production and thereby improve KPIs, but it’s not a panacea. If a company isn’t collecting the right data or storing it properly then its chances of achieving anything meaningful through data science will be slim to none.

Acerta recently hosted a webinar on this issue entitled, Building an Industrial Data Science Strategy, with our CTO, Jean-Christophe Petkovich and Conway MacKenzie Managing Director, Matt Townsend.

What follows are some highlights from their discussion.

What Manufacturing Data Should You Be Collecting?​

Artificial Intelligence and Machine Learning

“You should endeavour to choose the set of information that you feel best represents your parts or your products,” Petkovich began. “It’s difficult to come up with a selection of one-fits-all metrics, but there are some universal needs that apply in most cases. Collecting performance metrics about your plants is an obvious place to start, but you also need to link those metrics back to what influences them on the line. Ultimately, you want to know everything about the units that pass as well as those that fail or get re-worked, not just one or the other.”

“Finally, and this is something that we’ve seen a lot of manufacturers ignore,” Petkovich continued, “You need traceability information about your data, meaning how a given piece of information is connected to all the other pieces you have. That’s ultimately what lets you weave all the information together into a tapestry or story about your product.”

Townsend agreed, adding that, “What most people fail to do is reach out with what they don’t know. When you have a certain set of criteria for passing or failing, nine times out of ten, the process engineer will say, ‘If I move this widget, I can change that outcome,’ but you need to recognize that there are other things you don’t know about that have influence on top of that.”

“So,” Townsend concluded, “my suggestion is to grab environmental data, along with process data, through sensors. Grab as much data as you possibly can from whatever you think might influence the outcome.”

Formulating a Data Storage Strategy​

“This links back to some of what we’ve talked about already,” Petkovich noted. “What objective do you have in mind? That will determine your data needs. For example, how much data you need to collect and what the processing requirements are to accomplish your goal will determine whether you should use a cloud-based solution, an edge-based solution, or a hybrid between the two.”

“If you want to produce a result in near real time,” Petkovich continued, “you might need more edge compute so that you can respond as quickly as possible. If you’re looking at more global metrics, you need a system linking all of your data together in a centralized location, so you’re basically collecting the data locally and then shipping it.”

“On the same note,” Townsend commented, “you’re going to have data coming from multiple sources: from sensors and the environment as well as process data. Then there’s training: many people don’t realize that machine learning models are constantly being trained.”

“We [at Conway MacKenzie] believe that the hybrid method you mentioned is best,” Townsend continued, “because you have the ability to take all of the different nodes that are producing process data, move them up into a certain location that allows you to do training, and then have the engine running in real time on prem. You need to ensure that you’re training the model correctly and then bring that trained model back down to actually do the work.”

“I completely agree,” Petkovich replied. “There’s a lot of wisdom in taking the hybrid approach: it can make the rollout of the data infrastructure a lot easier, so you could have a working application based off of deployments of just the edge compute devices in one line that will let you experiment and expand out from there.”

Getting Real-Time Insights into Production​

“I can tell you, as a guy whose job is to connect equipment, this is the hardest thing to do,” Townsend said. “The biggest challenge in manufacturing today is the fact that 90% of the companies in the US are using legacy equipment that has been around for 40 or 50 years, so ‘How do we make those smart?’ is the biggest question.”

“Another aspect of the ‘real-time’ component of this question goes back to the point about hybrid vs cloud vs edge,” Petkovich added. “Hybrid lets you take a more future-proof approach, since—if you decide later that ‘real-time’ for you only means every minute or every hour, you might be okay with a cloud solution, but if it means every second, you’re going to be in trouble. It goes to show that how you distribute your compute devices and how much throughput you’re able to handle are also important factors here.”

“That’s a great point,” Townsend replied. “The definition of ‘real-time’ varies between manufacturers. Maybe it’s adjusting the cutter path on a CNC mill, or it could be making decisions on the fly (sub-one second) in order to change a process as it’s happening. It all comes back to being able to network the equipment back to whatever infrastructure you’re trying to solve for.”

Where Should Data Science Fit in Your Innovation Roadmap?​

Industrial Global Data Storage

“It’s an interesting question,” Petkovich began, “because it matters what part of the data science equation you’re executing at a given time. It’s kind of a chicken-and-egg problem: you can’t really do much data science without data, but at the same time, you can’t really decide what data to collect without doing at least a little data science.”

“Do you want to train models?” Petkovich asked. “Are you looking for more visibility or do you want to make global or local decisions? You need to answer these questions before you do the actual data aggregation or start structuring your roadmap.”

“I deal with this question every day,” Townsend said. “My suggestion is to use the innovation in multiple methods, so you’re setting a roadmap of what you want your factory to look like in the future. Pick something that’s going to develop an ROI as soon as possible. That will help get people onboard and have this Industry 4.0 culture as part of their everyday life.”

“And while you’re doing that,” Townsend continued, “Have your process engineering team start to put the building blocks in place and develop what you would consider a best case for your proof of concept, so that you know what you’re working towards: I want my factory in the future to look like this.”

There are many more insights to be gained from listening to the full discussion. You can download the audio highlights here or check out our other Webinars for more automotive intelligence.

Share on social:

Automate root cause analysis and predict defects in real time

How is that possible?