Data Hygiene: The Real Cost of Dirty Data and 5 Tips to Improve Data Quality

Growing up, we’re taught to keep our hands clean. As children we’re told ‘wash your hands before dinner’ and ‘wash your hands after playing outside.’ Even as adults, public health programs show us how to wash our hands effectively to prevent disease.

Now, as primary data custodians for our respective organizations, we still need to keep it clean.

This time we’re not talking about your hands, but your data.

The term dirty data is not just a catchy alliteration. It is, in fact, an extremely serious problem.

How serious, you might ask.


Numbers to Prove Dirty Data Hurts

According to Experian’s 2019 Global Data Management Research: “We see, year after year, that despite our ambitions, many businesses fail to take full advantage of the opportunity that data can provide to improve customer interactions to increase business performance.”

In the US alone, an IBM survey revealed that bad data costs the economy $3.1 trillion every year. Moreover, more than 30% of business leaders are not confident with the data they’re using to make key business decisions, while 27% of respondents are uncertain.

Dirty data typically refers to data that is poorly structured, has inaccuracies or is incomplete. It impacts various industries differently. However, whichever industry an enterprise is operating in, the negative repercussions are equally damaging to a business’ overall health.

In the financial services industry, dirty data goes beyond financial loss. Inaccurate and incomplete data can lead to regulatory breaches, delayed decisions due to manual checks, and sub-optimal trade strategies just to name a few.

Businesses that use and rely on a CRM for lead nurturing and customer segmentation are likewise negatively affected. Culled statistics show that while 67% of businesses use CRM data for customer targeting, 60% believe that their overall data health is unreliable.

Even the healthcare industry is not spared. According to Healthcare Finance, supplies management is one area that dirty data affects the most. Supply costs account for 20% - 30% of operating expenses, yet this area is often mismanaged due to incorrect or incomplete data. Dirty data also affects inventory. Making sure that medical supplies are present where they are needed which could spell the difference between life and death.

How can dirty data inflict so much damage? Data Doctor Thomas Redman explains, “The reason dirty data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive. The data they need has plenty of errors, and in the face of a critical deadline, many individuals simply make corrections themselves to complete the task at hand. They don’t think to reach out to the data creator, explain their requirements, and help eliminate root causes.”

Dirty data can affect all business type and industries, and it can wreak havoc even in today’s most advanced digital projects.

How Dirty Data Hampers the Progress of AI and Data Governance Projects

Initiatives involving artificial intelligence and modern data governance are the prime examples of how dirty data impacts digital transformation projects.

According to a study by market research firm Dimensional Research, 8 out of 10 AI and machine learning projects have stalled due to poor data quality, while 96%, “have run into problems with data quality, data labeling required to train AI, and building model confidence.”


This is bad news as AI, big data, and machine learning now rank highly in the top priorities of many companies’ digital transformation initiatives.


Dirty Data and the Utilities Industry: A Closer Look

The utility sector is one of the industries most affected by this AI, machine learning, and data governance stall caused by dirty data. Smart grids, smart meters, digital twins, microgrids are amazing innovations, but with them comes a deluge of data. For example, electric companies used to read meters 12 times a year. Today, smart meters relay data as frequently as every 15 minutes or even more frequently. They are creating terrabytes of valuable data which needs to be managed.

As a result, many T&D companies are finding their data handling and data provisioning capacities outpaced by the information they’re receiving.

Utility Dive’s Herman Trabish explains, “Figuring out how to manage those data could hold the key to new revenue streams and improved grid operation, if utilities can find software tools to integrate multiple grid technologies and handle ever-escalating quantities of information.”


Data Hygiene: 5 Tips to Improve Your Data Quality

Sanitizing dirty data may not be as easy as washing your hands, but it’s also not impossible. Here are 5 tips to help you clean your data:

1. Determine if you have the internal capacity and a modern platform for data provisioning

There’s no problem in admitting that you don’t have the manpower, technology, and other resources to ensure trustworthy and timely data delivery, i.e. data provisioning. There are two ways to put in place a data provisioning program, depending on your budget. You can either invest in a domain specific iPaaS, bring experts onboard and purchase data provisioning technology, or alternatively partner with a company that can perform data provisioning on your behalf.

2. Explore the feasibility of using ML

As we cited above, machine learning is one of the digital transformation initiatives that is most affected by dirty data. However, it can also offer one of the most potent solutions to prevent it. ML can create and enrich data assets efficiently. It supports data quality through proactive and reactive data maintenance protocols. Further, it encourages data use by relevant parties through the ease of data discoverability.

3. Empower data prep among those who know the information the best

Agile data preparation practices allow the experts in your organization to do the data provisioning or data preparation themselves. This ensures that the data is processed, organized, and presented accurately and in its most useful form.

4. Remove data preparation silos

Data access should never be an inter-departmental competition. Data should never be kept in silos, especially if there are multiple departments or stakeholders relying on the same information.

Establish working groups that are primarily in charge of data collection, preparation and provisioning. This will break break down data silos and promote collaboration.

5. Standardize data definitions

If your energy company defines “Report Week” as “calendar week beginning at 12:01 a.m. on Sunday and ending at midnight on Saturday,” make sure everyone who deals with data understands it the same way. Even the slightest deviation in the agreed definition can render your data incorrect and dirty.

It’s Time for a Data Detox

You’re only as good as the data you use. Muddled data equals muddled business decisions. Taking both proactive and reactive steps to ensure the completeness, accuracy, and usefulness of the data you collect allows you to gain a competitive advantage, expedite your digital transformation, and achieve better business results overall.

Greenbird offers out-of-the-box system integration for utilities. We are a true DevOps company, delivering unique time-to-market and reliability. We were named a Gartner ‘Cool Vendor’ in 2018 because of our domain specific and flexible integration capabilities, crucial for creating easy-to-consume integrated solutions. Utilihive empowers utilities to manage their data flow faster and smoother than traditional system integration models while accelerating the journey towards the energy revolution. To learn how you can unleash the value of data while removing silos, get the executive brief on Utilihive here.

Related stories