Data Analysis – The Second Requisite

To recap, the first requisite of a competent data analyst is the ability to look.  On reflection, we might clarify this by saying that it’s the willingness to demonstrate the ability to look.  I mean let’s face it – if given sufficient and/or correct things to look at, anyone should be able to look, right?  Well what about this?  What about being willing to show that you can look?

Let’s take a short side-excursion and explore this notion, shall we?

Consider a kid.  First or second grade maybe, a little shy (not being entirely used to being out of the home for such long periods of time) but eager to make friends, and full of the idea that if he can do something that people will admire, friendships will occur.  So he takes the one skill that he’s been praised so much for (by his parents): let’s say it’s playing the violin.

He brings his violin to school one day and has the guts to stand up in front of the whole class and play a song.  The class sits silently, then from the back comes a stifled chortle.  From the side, a raspberry.  A titter from the pretty girl in the second row.  The teacher intervenes loudly, but the damage is done.  The kids don’t want to see talent, they want to see the latest fashion, the latest electronic gizmo, the (fill in the blank – it’s something that our burningly embarrassed protagonist doesn’t have).  So he sits down, puts his violin away, and resolves to never, ever show talent again.

Far-fetched?  Ask any K-12 teacher.  I dare you.

Point is, you can look all you want, but if you’re not willing to demonstrate your ability, you might as well have blinders on.

So now let’s go on.  Given that a person can look, what’s next?

Next is the second requisite of a competent data analyst: the ability to see.

Wait a minute.  The eyes are open, there’s something there in front of the guy’s face, there’s nothing wrong with his optic nerve, what do you mean, the ability to see?

Well, let’s do an experiment.  This evening, when you go to bed, bring a book with you.  A nice, thick book that you’ve been promising yourself to read for quite some time now, but never could get started on it.  Atlas Shrugged, maybe.  Whatever it is, start reading.  Eventually, if you’re like most people, you’ll reach the end of a page and suddenly realize that for whatever reason, your eyes have scanned every line but the words that you “saw” somehow never made it to your brain.

So did you “see” them?  Of course you did.

Or did you?

I’m not going to get into cognitive psychology in this discussion – merely point out that just because someone is observing does not mean that he is observant.

The consequences of having someone “see” data and not really see it are, of course, similar to the consequences of driving a train while texting.

I’ll give you an example.

There’s a well-known data aggregator that uses a procedure as part of their DQ arsenal that they call “stare and compare”.  Two data files are put side by side: on one side, the previous file and on the other,  the new one.  The analyst then pages through the two documents, eyes scanning from one to the other, until he’s satisfied that the data is satisfactory.  Any changes between the two files must be researched and explained in terms of known source data changes or known process changes, otherwise it’s back in the barrel for the production team.

So what happens?  The analyst’s eyes glaze over and God only knows what gets passed on to the customer.

As with the ability to look, the ability to see can be enhanced by training and practice.  And as with the ability to look, the ability to see is followed by yet another requisite for a competent data analyst.  We’ll look at that (and hopefully, see it too) next time.

 

 

 

 

 

 

 


Data Analysis – The First Requisite

Okay, as promised, we’re going to take a look at what it takes to do data analysis.

First though, let’s clear away some underbrush.

We’re not going to talk about programs here.  If you’ve been reading along, you’ll know that I’m not the sort of person who stands in awe of digital manipulation tools.  Sure, number-crunchers have their uses and give you results that you can’t get any other way, but in terms of getting f2f with the data, there is absolutely nothing in the world that beats actually getting f2f with the data.

Following, it then looks like data analysis has something to do with data quality.  The relationship is obvious: if you can’t see what’s going on, your DQ efforts are going to go nowhere.  (On the plus side, of course, they’ll go nowhere fast, but that may not be the sort of upside you’re looking for.)

So what makes a good data analyst?  If anyone with two or three brain cells can be trained to use the tools, then the answer to this question does not lie in the realm of commercially available (or, for that matter, open source) programs.  It lies within the individual, and if it can be developed and enhanced, that development slash enhancement will probably not come in classrooms, seminars, or tutorials.

In my experience, the first requisite for data analysis is the ability to look.  Notice that this does not say what to look at, or what to look for.  Those are skills that can be learned and drilled to the point of competency.  The simple ability to look underlies those skills and forms the basis of all observation.

For whatever reason, individuals vary in their ability – or willingness – to confront what’s in front of them.  They sit in movie theaters and cover their faces with their hands and exclaim, “Oh, I can’t watch!”.  People shy away from things they cannot confront.  They can’t look at chaos, at evil, at mayhem, and if chaos, evil, and mayhem do not form the matrix one sees when one looks at data, then there are others who similarly can’t look at tables of numbers, printed directions, and computer monitors.

Conversely, there are those who are able to look at what there without flinching, without experiencing an emotional reaction, and without selectively looking at, or looking for, that which coincides with pre-existent belief.  Those are the people who have the first attribute of a (potentially) successful DQ analyst.  Those are the people who have a foundation on which good data analysis practices can be built, and from whom effective data quality measures can flow.

Now this is a quick-and-dirty exposition, and there’s more to be said both about the ability to look and the other core competencies that form the complete foundation of competent DQ.  Next time, we’ll look at number two.

 

 

 


There’s Madness to My Method

Before we swing into any sort of discussion of fixing DQ errors – much less finding the pesky little things – let’s take a look at the wonderful array of tools that are available to us.  These tools, every one of them electronic, virtually guarantee that we WILL have errors to find, and fix.

Wait!  What’s that?  The tools that we’re using to handle data quality issues actually CREATE those issues?  That’s crazy!

Or is it?

 

[pause for effect]

 

Alright now, settle down and let’s take a look at how this may be possible.  After all, if it IS possible then we, as DQ professionals, better know about it so we don’t all get snookered!

Let’s take a hypothetical scenario.  We have a data shop of some sort, and they have the latest and greatest Cadillac of all data manipulation programs.  I won’t name names since we all know what programs we’re talking about here – they’re the industry leaders, the “must-have-on-resume”s, the ones with the bells and the whistles that management can’t seem to do without.

Now this particular program has been through the development mill.  It’s been beaten and forged and hammered and tonged to within an inch of its binary little life, and by all accounts, even among the developers who were responsible for the hammering and tonging, it does a pretty good job.

Of course it has its limitations – after all, there will be a new version coming out in a year or two to fix some of those limitations – but by and large, it does as good a job as an electronic program can reasonably be expected to do.

Perfect?  No.

Pretty darn close?  You bet.  That’s why you bought it.

But wait.  If it’s not perfect, then there are situations – some known, some ready to rise up and bite you in the keister – where it will fail to one extent or another.  It’ll misplace a decimal point.  It’ll fail to round.  It’ll truncate.  It’ll do something unanticipated in some situation, most likely (a) when you’re least expecting it and (b) when the downside for failure is greatest.

The problem isn’t that the program has flaws.  Everybody knows that the program has flaws!  Just ask the developers!

The problem is that somebody somewhere thinks that it is flawless.  “Look, we ran it through the logic and the results speak for themselves,” they say, as if the Voice of the Machine can never stumble, stutter, or mispronounce a word.

The assumption of perfection – the eyes made glassy watching PowerPoint presentations – collides with the reality of oops, and the result is a trainwreck.

Now that’s half the story.

The other half is the human element: the nut behind the wheel.  And again, it’s assumptions that make the whole thing collapse.  Let’s see how it works.

Back in our scenario, we now have a perfect program!  It works flawlessly in all circumstances and there won’t be another version out until Windows becomes Doors, or whatever the next OS paradigm is.  The program is sitting there, ready to launch, and at the keyboard sits Joe Collegegrad, head filled with stuff that he’s absorbed (more or less, between parties) at school.

Does he know how to run the program?  Yes.

Does he have the manual in case he gets into trouble?  Yes.

Does he know who to ask in case the manual doesn’t say what to do?  Yes.

Does he know where the little boys’ (or girls’) room is?  Of course!  He’s an Employee, and he’s ready to Rock and Roll!

So he launches the program and sets it to work, and we notice, sooner or later, that a curious thing has happened.  You see, it really doesn’t make any difference who this Employee is, how long and how well he was trained, how much expereience he has – sooner or later, he’s gonna goof up.  He’ll sneeze and his little finger’ll hit the CONTROL key by mistake.  It’ll be something and all of a sudden, whether or not you know it, you have errors in the data.

And again, it’s the result of a collision between the assumption (that the guy went to school and had the experience and therefore knows what he’s doing) and the reality (that sooner or later, he’s gonna trip and fall) that throws the sand into the gears.

So what can be done?  Should we throw the programs away and fire the employees who run them?  Of course not.

We should – and here’s what we’ll be talking about next time – keep the limitations of the program and the employee in mind.  Don’t assume that just because you have the program, it’ll work flawlessly every time.  Don’t assume that the guy sitting at the keyboard will be perceptive enough to see what’s going wrong, as it happens.  Having a Program and an Employee puts one in a box: the box that’s wrapped in paper printed with the slogan “It’s all OK”.

If you don’t want to live in that box, you need to cultivate the skills needed to stay out of it.

That’s what we’ll talk about next time.