Software Development

How to prevent common data quality issues


Data can be an organization’s most valuable tool, but not if your database is full of people named ‘Mickey Mouse’ or has out-of-date addresses. 

According to Michael Lee, solution engineer at data verification solution provider Melissa, the most common issues that could be present in your database tend to be the simple ones, such as typos, inaccurate data, or mistakes from transferring the data. Sometimes people mistype things when filling out a form, or they may be intentionally putting in fake data.

“Let’s say, for example, they’re signing up for a marketing item or signing up for a website,” he said. “Sometimes you don’t want to use your correct contact information. So you might put in, like fake email addresses or disposable email addresses.”

Issues can also arise when you transfer data because with any data transfer you need to properly handle things like encoding, data types, delimiters, and nulls. An extra column could be added accidentally, for example, which will cause errors down the line. 

As bad as these issues can be for data quality, they are preventable, and Lee says the best way to avoid problems is to have preventative measures in place. “It all starts from how the data is first gathered,” he said. 

Maybe you would prevent a name field from allowing numbers or symbols. He did note that for international customers there may be some names that wouldn’t be allowed under a blanket rule, so adding customizations to allow certain characters can help ensure everyone can actually put in their information correctly.

Another example is a date of birth field where, rather than allowing someone to type in a date, you would provide a calendar picker or drop down to ensure dates will be in the proper format. 

Once the data is submitted you could also have prompts that ask them to verify the information is correct. For example, if an address entered is different from the verified one, you may prompt them to choose the correct one. 

There are tools that can ensure everything in your database is in a standardized format and is validated. Once the data is in your system it’s also important to make certain that it is up-to-date and doesn’t “go stale.”  Things like email address and mailing address could change and people may not think to update it themselves. 

According to Lee, the frequency at which you do these checks really depends on your use case. For example, a company that sends out mass emails on a monthly basis may want to do their checks on a weekly or monthly basis, while a company using that data less frequently may be able to get away with annual cleanups, he explained.

Another consideration for frequency is how sensitive the information is and how important it is for it to be accurate, such as for companies creating reports based on the data. 

“We do highly recommend that it’s not just a one-time cleansing thing,” he said. “There’s going to be maintenance required over time.”

Finally, Lee mentioned making sure your data quality initiatives aligned with your business goals. “The last consideration is cost and resources. If the frequency is changed, how much change is expected? How much more will it cost? Theoretically, you can continually run data quality tools every week for a massive database to get the most recent updates. This may guarantee that you have the latest changes, but it may not be required, nor will it be cost effective.”

So while you can cut down on errors at the start by carefully planning how you collect data, it’s also important to make data hygiene an ongoing process.