We’ve all come across the statement, “garbage in, garbage out!”. If you work in a data field, this statement hits differently.

The use of data in the business setting has become synonymous with productivity and efficiency. When working with data, high-quality data is needed to achieve the most relevant and reliable results, which are later reflected in decisions. You need data to plan your business processes, analyze the data, and interpret the predictions and insights to give you an edge. This might sum up the point I am trying to make. Are you still wondering why we are talking about data quality?
We come back to the statement summed up in “GIGO.” When we use low-quality data in our analysis or make predictions, the results and insights we get are of low quality. We sometimes have minimal impact on our planning and efficiency bits on business activities. No one wants to use ill-informed insights or inputs in business decisions- it means losses in revenue and your competitive edge. On the other hand, having high-quality data means that whatever decisions you make are informed and will give you an advantage over your rivals in business and help increase your revenue. High-quality data also means you reduce the cost of preparing it to meet the requirements for any analysis you might have. So how do we measure data quality, you ask… hold on!
Dimensions of data quality. Data quality is measured under six dimensions. These include:
- Completeness
- Accuracy
- Validity
- Consistency
- Integrity
- uniqueness
Let’s explore each of these in simpler terms
Completeness
This attribute simply asks whether the data has all the vital characteristics to be considered usable and to solve the problem at hand. If you want to map your customers' locations, you need their addresses. Any data point that misses this information is incomplete. When collecting data, you must ensure that all the vital information needed for this specific purpose is captured. Data is only complete if it includes bits of this essential information or information that would help answer the impending questions you have at hand.
Accuracy
Accuracy in data relates to how the data represents the real world. When collecting data about our customers, you must ensure that the phone number and date of birth are correct and accurate for each customer. When you realize the phone numbers are wrong, they cannot be used to contact your customers, so they are inaccurate.
Validity
Let’s discuss validity using the previous scenario, where we were collecting dates of birth (discussing the collection of dates of birth). When you check your data and find customers over 100 years old and others under 15, your data is more likely to be invalid. However, this also depends on your business activities. If you are collecting data on the end of care homes, ages above 90 and 100 are likely to be included in some cases. If your data is about children’s games, having fewer than 20, 30, or 50 players is expected to be invalid data. (depends, though ).
Consistency
Data consistency primarily concerns data accuracy. Simply put, data consistency checks whether the data stored in one location matches similar data stored in another. If you have customer data in your sales data records and similar records in your accounts department, consistency dimensions check whether the data is similar and to what extent. The data, if representing the same clients, should be consistent and accurate. Suppose the data differs because some errors or discrepancies in one record are supposed to match those in another. In that case, the data quality is suspicious and needs to be monitored and corrected.
Integrity
Data integrity encompasses all other dimensions of maintenance, assurance, accuracy, and consistency throughout its entire life cycle. Data integrity is about maintaining the data’s true state across its lifecycle. When data integrity is lost, the related data record becomes invalid, inaccurate, or inconsistent, affecting its overall quality. In other instances, data integrity refers to the safety and compliance of the data with various data protection regulations, such as GDPR.
Uniqueness
The data record should appear once in the dataset. Uniqueness ensures that there are no duplicate values in your dataset. It reduces cases of overlap. Having a single entry for each data point makes it possible to ensure compliance and customer engagement.
When checking data quality compliance, these are the dimensions one focuses on. These are also the main checks when doing data cleaning for analysis. Next, we will look at how to ensure we collect quality data in the collection process.