The untapped potential of vehicle data

Last updated on August 26th, 2022

If you can remember the turn of the millennium, you can probably remember how we used to talk about data. In their 2003 study, “How Much Information?” Peter Lyman and Hal Varian estimated that print, film, magnetic and optical storage media produced about five exabytes of new information in 2002. To explain that amount in terms their audience could understand, they used the following analogy:

“If digitized, the nineteen million books and other print collections in the Library of Congress would contain about ten terabytes of information; five exabytes of information is equivalent in size to the information contained in half a million new libraries the size of the Library of Congress print collections.”

Remember this picture of Bill Gates showing how much information we can store on a CD-ROM?

You may have noticed that we don’t generally make these kinds of comparisons anymore.

Why not?

Maybe they simply don’t resonate in a world where paper is no longer the default medium for storing information. More likely, it’s because the sheer scale of data generated these days makes any comparison to physical information storage mindboggling. Eric Schmidt once estimated that we create as much information every two days as we did from the dawn of history to 2003—and that was a decade ago.

Every day, we’re producing more data, and more sources of data, than ever before.

Your car is a prime example.

The Value of Vehicle Data

Acerta has estimated (and checked our math) that connected vehicles generated approximately 94 PB of data in 2019. That’s a fairly conservative appraisal, and it’s dwarfed by the potential of autonomous vehicles, which already generate 11 TB of data per day just from testing.

Our grandparents’ cars ran on oil; our grandchildren’s cars will run on data.

Now, conventional wisdom used to be that data is the new oil, but while that may not be the case for an individual’s personal data, it does seem to hold true in the automotive world. McKinsey has estimated that, “[T]he value pool from car data and shared mobility could add up to more than USD 1.5 trillion by 2030, and the foreseeable proliferation of new features and services will turn ‘car data’ into a key theme on the agenda of the auto industry.”

This raises an obvious question: What can you do with vehicle data that makes it so valuable?

Leveraging Vehicle Data in 2020

You don’t need to be driving the latest connected car to be able to take advantage of vehicle data. On-board diagnostics (OBD) have been standard on US and European vehicles since the mid-90s, enabling a variety of applications using data extracted via the OBD connector. For example, since cars compliant with the ODB-II standard (1996 and newer in the US) store trouble codes for emissions, testers can connect directly to a vehicle’s computer, rather than taking readings from the exhaust system. OBD-II data can also be combined with other information, such as GPS location data, to monitor idling times, speeding and fuel efficiency-important metrics for fleet managers and insurance providers.

There’s even more potential when it comes to connected vehicle data. An early forerunner of connected car applications, OnStar has already been around for more than a decade, helping police track down stolen vehicles and contacting emergency services when accidents occur. Today, connected vehicle data is being used for commerce, predictive maintenance, even monitoring driver drowsiness, and the applications are only going to get more diverse as vehicle connectivity and processing power improves. Think of where smartphones were ten years ago, recognize that’s where cars are today, and you’ll start to get a sense of the real potential for vehicle data. That being said, there’s still one catalyst missing.

Vehicle Data + Machine Learning

As the volume of vehicle data continues to increase, automotive engineers will require new tools to manage that data and extract information from it. There’s simply too much for us to handle, even taking other methods of statistical analysis into account. The good news is that machine learning doesn’t just make automotive data more manageable; it also unlocks new applications.

For example, Hyundai recently announced the development of a smart cruise control system (SCC) based on machine learning (SCC-ML). The idea is to take smart cruise control to the next level, not simply maintaining distance from the vehicle ahead but doing so in a way that feels natural to the driver because it’s based on their own driving habits — minus the bad ones, of course.

To take an example from one of our case studies, applying machine learning to vehicle data can help identify the optimal wheel slip ratios for different surface, which is important information for anti-lock braking systems (ABS), as well as torque vectoring and transmission shift timing. In one instance, AutoPulse was able to detect changes in the road surfaces in less than a second without the usual “hints” that come from rapid changes in acceleration.

Another use case for leveraging vehicle data with machine learning is what might be called “peripheral predictive maintenance” where ML is used to extract information from components that do not have sensors based on data from those that do. Acerta took this approach in another AutoPulse project, achieving accuracy rates of 95.8% and 100% for loose suspension and steering misalignment, respectively.

Finally, combining on-road data and machine learning effectively closes the loop between automotive design, manufacturing, and servicing. Automakers can use the insights generated by applying machine learning to vehicle data to identify components that are operating inefficiently or underperforming on reliability, and then apply those insights in their next design or manufacturing cycle.

No doubt, there will be numerous other applications that emerge as the volume of automotive data grows and data science expertise proliferates. The flood of vehicle data is coming, and the best way to stay afloat is with machine learning.

Share on social:

Automate root cause analysis and predict defects in real time

How is that possible?