Software Testing Challenges on AI-based Healthcare Domains

Artificial intelligence (AI) applications in healthcare are accelerating rapidly, with potential applications being demonstrated across various domains of medicine and healthcare. However, there are currently limited software testing tools and strategies to test AI-based healthcare products. This article then explores the main challenges and limitations of Software Testing of AI-based healthcare domains and considers the steps required to improve current testing strategies, potentially transforming technologies from research to clinical practice.



“The exciting promise of artificial intelligence (AI) in healthcare has been widely reported, with potential applications across many different domains of medicine” (Topol,2019). Healthcare organization around to world has been welcomed AI-based healthcare systems and domains globally. Nevertheless, Healthcare organizations have not been realized the potential of AI yet.

However, a grooving number of products and systems have started to use AI algorithms in clinical practice and clinical systems. The following section will give a brief overview of AI in healthcare.


The growing potential of artificial intelligence in healthcare

Growing numbers of literature have demonstrated the various applications of AI in healthcare, such as interpreting radiographs, cancer, and other long-term health conditions. (Hwang et al. l,2019).

“Analysis of the immense volume of data collected from electronic health records (EHRs) offers promise in extracting clinically relevant information and making diagnostic evaluations (Escobar et al. l 2016) as well as in providing real-time risk scores for transfer to intensive care (Liang et al. l 2016), predicting in-hospital mortality, readmission risk, prolonged length of stay and discharge diagnoses (Rajkumar et al. l,2019), predicting future deterioration, including acute kidney injury Tomašev et al. l,2019), improving decision-making strategies, including weaning of mechanical ventilation (Prasad et al. l 2019)and management of sepsis, and learning treatment policies from observational data .”

AI has a great potential to improve healthcare outcomes rapidly. AI tools and software will shape the future of healthcare delivery with a more individualist, patient-centered approach.


Software testing in AI and ML

The core element of developing machine learning and AI algorithms are testing. You may compare this with unit testing of the Software Testing. The AI/ML engineers develop an AI algorithm and verify that the training data does a good enough job of accurately classifying or regressing data with good generalization without overfitting or underfitting the data. Engineers also use some validation techniques, which are like test data of software testing.

AI-based software uses algorithms and data together to take into account hyperparameter configuration data and associated metadata, which are mainly working together to show the results. If the algorithm’s validation phase gets wrong parameters which might affect the results which we are looking for. To get more accurate results, the engineer needs to revisit the algorithms themselves, change the hyperparameters, and rebuild the model, perhaps with better training data. This might be compared to the system test, which the tester was doing to understand the behaviours of the system.

In contrast, Al engineers could do quite some work to understand the behaviours of the algorithms. Sometimes the algorithms and models work well; however, when you deployed that to the real world, you might get many errors, Yet we did everything we were supposed to do in the training phase. Our model passed meeting expectations, but it’s not passing in the “inference” phase when the model is operationalized. This means we need to have a QA approach to deal with models in production.

The next section will discuss the QA approaches in AI-based healthcare domains.


Software testing approach on AI-ML based Healthcare Domains

A standard Healthcare Domain Testing is a process to test healthcare applications with factors like safety, compliance, cross dependency with other entities, etc. The tester ensures that the quality, reliability, performance, safety, and efficiency of the Healthcare application on its place and software behave as accepted. Current AI-based tools and software come with algorithms and logical tests, which Al engineers already did.

However, the challenging part for the tester is to test how the algorithm behaves within the software and the system. QA teams need to have domain knowledge and backgrounds on healthcare systems, algorithms and how these both work together, etc. Testing AI-based healthcare solution requires a strategic approach adjusted for each specific bespoke scenario and healthcare system needs.

Mostly healthcare algorithms are pretty complex and challenging to predict for software testers. The algorithm is going through training and testing sets, creating some meaningful data associating with human behaviours. An insufficient or incomplete data set or one with low-quality data can lead to biases in the solution. A system is over-trained to see the same thing or is not trained enough to make an accurate judgment.

Another challenge that testers face while they are testing AI-based healthcare systems is the amount of data required to test the system. Approaching restricted data items will not provide statistical assurance of the system. That opens a gate to another challenge for testers as to what kind of skills should a tester have and how should they interact with these systems of that complexity level.


Which skills and approaches are needed for AI-ML-based healthcare domain testing?

As we discussed before, creating and training the algorithms are a manual and automated process with some of the testing elements. Mainly testers are using boundary testing and dual coding to resolve most of the issues related to complexity. Testers need to have some data knowledge, and familiarity with Algorithms would be an essential skill.

Sometimes, the algorithm used, data volumes, or solution complexity, testing these systems can be as complex as the solutions themselves. It requires extensive technical and data science expertise from the testers, making the AI tester’s job different from any manual or automation tester. However, the agile approach and interdisciplinary approach to the testing process would help testers understand more about the algorithms and their functionalities.

In conclusion, AI-based healthcare products will take grove day by day. As testers, we need to be ready to test one of the most complex algorithms and logic, potentially saving lives and protecting the people.


Article written by Dr. Ali Yildirim, Oxford Health EHR Software Test Lead