“Any fool can know. The point is to understand.” — Albert Einstein
When looking at raw data, we need to be mindful of its limitations, and aware of its potential. Data can reveal the most intricate subtleties about a subject, and help us to understand the meaning behind the numbers. It can help us to build useful documents like reports, graphs and budgets that make planning and forecasting much easier. But it’s not the fountain of all knowledge… at least, not in its raw form.
What Does Data Look Like?
What do you think of when you imagine data?
Reams of 1s and 0s? Computers printing out reams and reams of characters? Letters and numbers trickling down a green monochrome monitor, a bit like those scenes in The Matrix?
All of those examples are quite good at describing what data is, and what it’s for.
Because data is, by its very nature, unsorted. Even when it’s accurate, it’s difficult for us to understand because it doesn’t have any scales, or headings, or pie charts. Data is the material that computers use to come up with solutions. As humans, we find it difficult to work with.
So how do human brains work with data? By turning into a format we can understand. To do this, we need two things: data quality solutions to sift out the white noise (and ensure the data we have is accurate), and the development of our clean data into more understandable formats.
About the Data Pyramid
Using data means organizing and interpreting it, and using it as the basis of decision making processes. The DIKW (Data::Information::Knowledge::Wisdom) hierarchy is a visual way of describing this process.
Fundamentally, the DIKW heirarchy is based on the concept of building blocks. Data sits at the base of the pyramid, and the rest of the theory goes like this:
Data (D) is our building block, or foundation
From clean data, we can interpret certain facts and come up with reliable Information (I). This is the second tier in the pyramid.
By compiling useful information in the right way, we create Knowledge (K) from our Information.
By retaining Knowledge, and knowing when to apply it, we formulate Wisdom (W).
Russell Lincoln Rackoff first proposed the DIKW model in 1988 during a speech to the International Society for General Systems Research. As our use of data has broadened, it has become an enduring example of how good data results in quality results, and bad data creates weak foundations.
The Pyramid in Practice
Russell Lincoln Rackoff described data as an observation. This is a good way to look at the building block. In order to make use of data – to build a pyramid on top of it – we need to organize it in some way.
Consider this example:
On a car dashboard, we see the digits “60.” This is raw data. What does it mean? We don’t know yet, as we don’t have any context for the digits. The Data is the building block, but we need further processing to make decisions.
Looking around the digits, we see that there is a needle pointing at the “60” on the dashboard. On the left, we see the sequence 0, 20, 40. And on the right, we see 80 and 100. Our “60” now has context. We know that it is a measurement of something – a scale that rises from 0 to 100. That is our Information.
Assuming we have travelled in a car before, and possibly passed a driving test, we have been taught that this particular dial tells us our speed of travel. And the unit of measurement will be either kilometres per hour, or miles per hour, depending on local norms; in the UK, it’s the latter. Now, we are have Knowledge: we’re in the UK, and the car is travelling at 60 miles per hour.
Looking at the road ahead, we are aware that we are driving on a motorway. In the UK, the speed limit on the motorway is 70 miles per hour. Wisdom lets us deduce that we are traveling within the legal speed limit, and we can make a decision not to decrease our speed.
This primitive example explains what data is.
The initial reading has to be interpreted, put into context, and we need to apply our own interpretations to it.
We cannot simply use data to make a snap decision – we must formulate an understanding, as Einstein pointed out in the quote at the top of this article.
The DIKW pyramid proves that data is not information. It also proves that there is a process we must follow to put data to good use.
In our example, we relied on the fact that the speed reading – “60” – was correct. What if it wasn’t?
What if the speedometer was broken, and the car was actually traveling at 75 miles per hour?
What if the car had two speedometers showing different readings?
In order to achieve quality results, we must ensure that our raw materials are not polluted or unclean. The DIKW pyramid demonstrates the importance of data quality, and it shows us how easy it is for unclean data to wreak havoc, and cost money,
Without data quality, we have no hope of being able to put data in context. We cannot guarantee that we will derive pure information that is reliable. It becomes more difficult to find a meaningful context for data. Reporting becomes inaccurate; people start to look for meaning elsewhere. And as we build on bad foundations, the businesses we run begin to hemorrhage money trying to stem the flow of bad data.
Wasted effort, wasted resource, wasted insight. It all adds up. If we cannot draw a reliable conclusion, based on the data we have, everything that follows is probably junk.
Good data quality helps us to polish those data diamonds. We can remove the dirt and the artifacts, and work only with the clean, raw material that we need.