Year after year, industry analysis will put Test Data on the naughty step of the biggest blockers within Testing. But does Test Data really deserve the bad boy image of Testing, especially when we consider its elder brother, data, is being cast as the new oil!
Taking the analogy of oil a little further, some challenge the extent of the comparison, challenging the ability of organizations to find the value in the data they hold against that of the wealth generated from the raw product of oil. Company’s know exactly how to refine oil to make tons of value (money) and for me this is where its strangely hard, but I believe commonplace, to carry on the comparison, especially if you add the word Test, before Data – I question if people really understand the value they are trying to derive from Test data?
Almost 10 years ago, I was given what I thought was the poison chalice of creating a Test Data service in my organization, to face into, what in the pre-GDPR days, was a Data Protection Act requirement, solving a security / regulatory problem. For those who have trod this path, or for those still to walk it in the name of GDPR, no doubt you will be planning to commit the same mistakes we did – you are trying to solve a perceived solely security-related problem with a technology solution, by buying a Test Data masking tool to mask all your “live” records. Hopefully, I can help you not make the mistakes we did, or at least go into it with your eyes wide open.
Test data has more than one customer
Understanding data classifications for an organization is key to satisfying the Security customer, all data is live! it’s the personally and/or commercially sensitive data that we need to target and it for organizations to define what that means – once that’s understood, technology can enable satisfying this through masking or synthetic capabilities.
We have more than one customer for test data, Security are satisfied, we are keeping the right data safe and satisfying the regulatory requirements expected of us, but Testers are forgotten, or they might be unintentionally excluding themselves. if I had a pound for every time I heard a tester say I just need 100 rows of generic data and later come to raise a data related defect around the data not having the edge case they were supposedly proving, I would be very wealthy – structured testing and indeed the need to have defined predictable outcomes in designing tests is key to delivering effective data, obvious right? Something that we are all doing?
“You have to understand, most IT Managers / Practitioners are not ready to be unplugged. And many of them are so inured, so hopelessly dependent on the system, that they will fight to protect it.” Adapted quote from Morpheus from the Matrix to set the scene
Stuck in the Matrix of technology
The industry I think most of us will identify as working within would be IT or Information Technology to give it its proper title – but all too often, strategies seem to lose the Information element in favor of majoring on technology, over time we’ve lost the “I” part for which the “T” was an enabler, we’ve stopped refining data into information and we seem oblivious to it, instead finding comfort in upgrading technology or looking at the next technology solution – the Matrix has got us! take the blue pill—the story ends, believe whatever you want to believe. You take the red pill—and I show you how deep our rabbit hole goes.
Creating test data is far more than “just mask it” and sadly that one saying is often the biggest barrier to overcome with lots of stakeholders who believe test data is just a test data problem, indeed the fact that test data is really called out as a I blocker I believe is a misunderstanding of the role data really has in the ecosystem of testing. Data is unrefined oil, as a tester, although I will ask you for oil, I really want petrol…… refined data is information, which in turn gives me knowledge, with knowledge come wisdom (Fig 1), I’m starting to unlock value…… maybe data is the new oil!
Data is the new Oil!
The raw product of data in Test can be seen as just a manifestation of an item that will trigger behavior in a system – if I know that, getting data is easy, in fact it’s a commodity….. I can mass produce and really cash in on the value I’m unlocking – my small data team at the time of writing this article has delivered in excess 4 billion masked or synthetic data item this calendar year.
If I don’t know what data I need, I don’t have the knowledge not least to create the data I need, but more worryingly, I should be more concerned about my ability to prove the system I’m being asked to test – do your testers have a good understanding of the test data they ask for?
The most obvious warning sign is often the very sales pitch of lots of data masking vendors – trying to mask a full copy of production for functional testing is just not needed and a waste of time and resources – you can take a ratio of ~1:1 of Test Case : Test Data so why would you need millions of records unless you are running millions of tests. I mentioned we made technology mistakes, one being to miss this warning sign, we created a technology solution to mask an entire data store of 20 million records – mapped to no test cases whatsoever. After learning from our mistakes, that mammoth masking exercise is now a lean, profiled 200k gold data set made up of masked, conditioned and synthetic data, but more importantly that data set is profiled against a well-defined set of test cases where the objective of the test is clear and the data just a means to exercise the desired outcome of the Test, back to the commodity point, repeatable for a world of continuous Testing in my DevOps pipelines.
More than production
The data set is more than the profile of production, of course, it is, the spread of production data is very happy path, we need data in testing that production just does not carry to trigger negative scenarios in our testing that would never exist in production (ironically if it did our past testing efforts had missed it)
The Evolution of Test Data Management
An effective, mature Test Data Management strategy (Fig 2) has to be more than simply creating masked or synthetic data, in fact, I would go further than to say a standalone data strategy is doomed to fail. Much like Test Automation, Test Data has to be an integral part of an enterprise Test Strategy where data is an intrinsic enabler, and any blockers perceived as test data are truly understood to the origination, in our experiences, the symptom of a test data issue really comes back to a lack of acknowledgment that a knowledge gap sits within the IT system we are custodians of – having an effective Test Data capability is a great sign that your organizations is facing into the Matrix, don’t get caught in its grip, and beware the Matrix had sequels, and so might the Technology Matrix, in the first instance choose “I” over “AI”, if you don’t know, how has the robot ever got a chance of being trained to make the right decision…
Written by Richard Jordan, Test Engineering Manager at Nationwide.