Big data applications

Richard J Self, Research Fellow – Big Data Lab, University of Derby, examines the role of software testing in the achievement of effective information and corporate governance.

As a reminder, software testing is about both verifying that software meets the specification and also validating that the software system meets the business requirements. Most of the activity of the software testing teams attempts to verify that the code meets the specification. A small amount of validation occurs during user acceptance testing, at which point it is normal to discover many issues where the system does not do what the user needs or wants.

It is only too clear that current approaches to software testing do not, so far, guarantee successful systems development and implementation.

IT project success

The Standish Group have been reporting annually on the success and failure of IT related projects since the original CHAOS report of 1994, using major surveys of projects of all sizes. They use three simple definitions of project successful, failed and challenged projects, as follows:

Project successful:

The project is completed on time and on budget, with all features and functions as initially specified.

Project challenged:

The project is completed and operational but over‑budget, over the time estimate, and offers fewer features and functions than originally specified.

Project failed:

The project is cancelled at some point during the development cycle.

Due to significant disquiet amongst CIOs about the definition of success requiring meeting the contracted functionality in a globalised and rapidly changing world, Standish Group changed the definition in 2013 to:

Project successful:

The project is completed on time and on budget, with a satisfactory result; which is, in some ways, a lower bar.

As the graph in Figure 1 shows, the levels of project success, challenge and failure have remained remarkably stable over time.

Figure 1. Standish Group IT project success and failure rates.

It is clear that, as an industry, IT is remarkably unsuccessful in delivering satisfactory products. There is a range of estimates of the resultant costs of challenged and failed projects which range from approximately US$500 billion to US$6 trillion, which compares to the annual ICT spend of US$3 trillion in a world GDP of approximately US$65 trillion.

Clearly something needs to be done.

The list of types of systems and software failures is too long to include here but a few examples include the recent announcements by YAHOO of the loss of between 500 and 700 million sets of personal data in 2012 and 2014, the loss of 75 million sets of personal and financial data by Target in 2012 and regular failures of operating system updates for iOS and Windows etc.

Common themes, verification and validation

Evaluating some of the primary causes of the long list of failures suggests some common themes and causes ranging from incomplete requirements capture, unit testing failures, volume test failures due to using too small an environment and too small sets of data, inappropriate HCI factors and the inability to effectively understand what machine learning is doing.

Using the waterfall process as a way of understanding the fundamentals of what is happening, even in agile and DevOps approaches, we can see that software verification is happening close to the end of the process just before implementation.

As professionals we recognise that there is little effective verification and validation activity happening earlier in the process.

The fundamental question for systems developers is, therefore, whether there is any way that the skills and processes of software testing can be brought forward to earlier stages of the systems development cycle in order to more effectively ensure fully verified and validated requirements specifications, architectures and designs, software, data structures, interfaces, APIs etc.

Impact of big data

As we move into the world of big data and the internet of things, the problems become ever more complex and important. We have the three traditional Vs of big data: velocity, volume and variety which stress the infrastructures, cause problems with ensuring data dictionaries are consistent between the various siloes of databases, the ability to guarantee valid and correct connections between corporate master data and data being found in other databases and social media.

Improved project governance

If the IT industry is to become more successful, stronger information and project governance is required that is based on a holistic approach to the overall project, ensures a more effectively validated requirement specification, far more effectively verified and validated non‑functional requirements, especially in the areas of security by design and the human‑to‑computer interfaces.

It is also vital to ensure that adequate contingencies are added to the project estimates. The 2001 Extreme Chaos report observed that for many of the successful projects, the IT executives took the best estimates multiplied by 2 and added another 50%. This is in direct contrast to most modern projects where the best and most informed estimates are reduced by some large percentage and a ‘challenging target’ is presented to the project team. Inevitably, the result is a challenged or failed project.

If we can achieve more effective project governance, with effective verification and validation of all aspects from the beginning of the project, the rewards are very large in terms of much more successful software that truly meets the needs of all the involved stakeholders.

12 Vs of project governance and big data

One effective approach is to develop a set of questions that can be asked of the various stakeholders, the requirements, the designs, the data, the technologies and the processing logic.

In the field of information security, ISO 27002 provides a very wide range of questions that can help an organisation of any size to identify the most important aspects that need to be solved. By analogy, a set of 12 Vs have been developed at the University of Derby which pose 12 critical questions which can be used both with big data and IoT projects and also for more traditional projects as the ‘12 Vs of IT Project Governance’.

The 12 Vs are:

Volume (size).

Velocity (speed).

Variety (sources/format/type).

Variability (temporal).

Value (what/whom/when?).

Veracity (truth).

Validity (applicable).

Volatility (temporal).

Verbosity (text).

Vulnerability (security/reputation).

Verification (trust/accuracy).

Visualisation (presentation).

As an example, the Value question leads towards topics such as:

Is the project really business focused? What are the questions that can be answered by the project and will they really add value to the organisation and who will get the benefit and what is the benefit? Is it monetary? Is it usability? Is it tangible or intangible?

What is the value that can be found in the data? Is the data of good enough quality?

The Vulnerability question leads towards: Is security designed into the system, or added as an afterthought? Major consequences could result in significant reputation damage.

Incorrect processing leads to reputation damage.

The Veracity question is developed from the observation by J Easton2 that 80% of all data is of uncertain veracity, we cannot be certain which data are correct or incorrect, nor by how much the incorrect data are incorrect (the magnitude of the errors).

Data sourced from social media is of highly uncertain veracity, it is difficult to detect irony, humans lie, change their likes and dislikes, etc. Data from sensor networks suffer from sensor calibration drift of random levels over time, smart device location services using assisted GPS have very variable levels of accuracy. A fundamental question that needs to be asked of all these data, is how can our ETL processes detect the anomalies? A second question is to what extend do undetected errors affect the Value of the analyses and decisions being made?

Formal testing of BI and analytics

One further fundamental issue (identified by the attendees at The Software Testing Conference North 2016)3 was that the formal software testing teams are very infrequently involved in any of the big data analytics projects. The data scientists, apparently, ‘do their own thing’ and the business makes many business critical decisions based on their ‘untested’ work. In one comment, the models developed by the data scientists produced different results depending on the order in which the data were presented, when the result should have been independent of the sequence.

In conclusion, the fundamental challenge to the testing profession is to determine how their skills, knowledge, experience, processes and procedures and be applied earlier in the development lifecycle in order to deliver better validated and verified projects which can be delivered as a ‘successful project’ (in Standish Group terms)? Are there opportunities to ensure more comprehensive and correct requirements specifications?

This article is based on the presentation delivered on the 28th September 2016 at The Software Testing Conference North 2016. Video can be found here

This article first appeared in the November 2016 issue of TEST Magazine. Edited for web by Jordan Platt.



  1. PMBOK Guide 4th
  2. IBM, 2012