09 Jan 24

Machine Learning in Healthcare: Cancer Detection

Max Collins

Skin cancer is a prevalent and wide-ranging health concern, affecting millions of individuals globally. As with most illnesses, early detection of cancer significantly improves the likelihood of successful treatment. Because of this, it is important to diagnose as quickly and accurately as possible. Machine Learning in healthcare has started to play a pivotal role in this regard.

Dermatologists traditionally rely on visual inspection for diagnosing skin conditions. However, the human eye has limitations, especially when dealing with subtle or complex patterns. The various categories of skin lesions can look very similar to a person, and especially under time pressure a misdiagnosis can be easily made.

Machine Learning in Healthcare:

Skin Cancer Detection

Overview of Image Classification

Image classification is a machine-learning technique that assigns images to a set of predefined labels or categories. This powerful tool has found applications across various fields, including medicine, where it proves invaluable in automating and augmenting diagnostic processes.

With the current advances in machine learning and especially neural networks, image classification is more suited than ever for faster and potentially more accurate classification of potential skin cancer cases. By using machine learning in healthcare, we can negate many disadvantages of traditional early cancer screening methods and invite new advantages, saving lives.

One of the primary advantages of employing machine learning in skin cancer diagnosis is the potential to save time and reduce the strain on healthcare resources. Using machine learning in healthcare allows for Automated image classification would be nearly instantaneous and even if just serving as a first opinion it would free up time enabling healthcare professionals to focus on more complex cases and patient care.

Finding a Dataset

The foundation of any machine learning project is to find a dataset with good quality and quantity. You need enough good-quality data that the machine learning algorithm can learn the differences between the categories it needs to distinguish. For this project, the HAM MNIST 10000 dataset is available from Kaggle, which is particularly beneficial for developing a Skin cancer detection AI. This dataset includes over 10,000 images of skin lesions with their associated diagnosis, providing a diverse and comprehensive set for training and evaluation.

The dataset includes examples of seven of the most common categories of skin lesions:

Actinic Keratoses: Precancerous growths induced by sun exposure.
Basal Cell Carcinoma: A common form of skin cancer found in sun-exposed areas.
Benign Keratosis-like Lesions: Non-cancerous skin lesions, including benign warty growths.
Dermatofibroma: Benign, hard nodules, often on the lower legs.
Melanoma: A malignant form of skin cancer originating from melanocytes, which is one of the primary focuses of machine learning in healthcare.
Melanocytic Nevi: Common and benign moles or nevi.

Vascular Lesions: Non-cancerous blood vessel tumours and port-wine stains.

As well as including the images, the dataset has additional patient information such as age, gender, and the specific location of the lesion. This additional data can also be used by the image classification model, potentially increasing its accuracy. For machine learning in healthcare, such comprehensive data is invaluable.

Exploring the Dataset

As with any data-based project, the first step was to perform exploratory data analysis (EDA). This foundational process is crucial for machine learning in healthcare, you go through the dataset looking for quality issues and interesting high-level insights. During this, I noticed that the dataset is predominantly composed of melanocytic nevi (moles). Given the widespread occurrence of benign moles, this composition aligns with expectations and mirrors patterns often encountered in skin cancer detection AI projects.

However, the prevalence of one category raises concerns about dataset imbalance. Addressing this imbalance is critical to prevent biased model training, as without it the model may just preferentially predict the majority class as it finds this gives good accuracy. As an extreme example, imagine a dataset where 90% of the images were of one category, by always predicting this class the model would achieve 90% accuracy. This model has not learnt what we wanted and is very biased. The main ways of addressing this are to resample the dataset to balance it, assign different weights to each class, or use a different performance metric.

Another noteworthy aspect is the dataset’s predominant representation of images of Caucasian skin. This limitation prompts consideration of potential challenges in accurately classifying skin lesions in individuals with darker skin tones.

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) work especially well with image data, making them the architecture of choice for image classification tasks. As the name suggests, CNNs employ convolutional layers, which extract patterns from images.

Pretrained Models

With the dataset being very large and made up of images, which hold a lot of data, the training process for this task can be quite lengthy. To speed it up, pre-trained models such as VGG16 and MobileNet can be used. These have been trained for general image classification on much larger datasets. Although they haven’t been trained on the specific topic of image classification, much of their training is still useful. Transfer learning can be used to allow these models to be slightly adapted for the task of skin cancer classification. By transferring their general feature extraction capabilities whilst training them to distinguish the categories for this task, they proved a very useful tool for quickly building effective models.

Custom Model Architectures

While pre-trained models offer training efficiency, I still wanted to try some custom architectures that I could tailor to the specifics of this task. Because the dataset has additional information about the lesion, such as patient age, sex, and lesion location, I wanted to utilize multi-input neural networks. These combine a convolutional and regular deep neural network, allowing all of the information in the dataset to be used.

Grad CAM: A Visual Insight into Model Interpretation

When using machine learning in healthcare, model transparency is a very common issue. It is often unclear as to what pattern the network is using to classify the input. For these images, it is important to check that the model is focussed on the section of the lesion with the images, as this gives an indication that its diagnosis is based on the lesion. A critical aspect of deploying these models is ensuring that they focus on the lesions in images. Grad CAM (Gradient-weighted Class Activation Mapping) serves as a valuable tool to assess where the model is directing its attention within the image.

Making the Models Accessible and User-Friendly

To make these powerful models accessible to a broader audience, I created a Flask web application to host it. Flask serves as a framework to showcase the various neural network models and allows users to assess the performance of each model.

The user interface then allows them to upload their images of skin lesions, select their preferred model, and receive a classification of the uploaded images. This approach simplifies the user experience, making it much more accessible than running models through the Python scripts I used.

Machine Learning in Healthcare: Future Work

Recognizing the importance of diverse representation in medical datasets, future work for machine learning in healthcare involves finding additional data that has a broader range of skin tones. This would improve the model’s generalization across different demographic groups, making it a useful tool for everyone.

The models in skin cancer detection AI are currently achieving accuracies of around 70%, so continual improvement remains a priority. At this level of performance, they are not yet reliable enough for practical use, underscoring the necessity for ongoing refinements for machine learning in healthcare through model architecture adjustments and targeted training. However, this does serve as a valuable proof of concept, and I firmly believe these techniques will be used in the future.

The use of machine learning in healthcare for image classification is a very promising tool that could save healthcare professionals a lot of time. Using a large, publicly available dataset I was able to adapt pre-trained models and design custom architectures to classify various types of skin lesions. This was then made accessible through a user-friendly web interface, providing a very promising proof of concept that this technology will be widely available in the future. There is still some way to go and the development of these models is of paramount importance as their early classification has the potential to save lives.

Interested in joining our diverse team? Find out more about the Rockborne graduate programme here.

Frequently Asked Questions on the topic of Machine Learning in Healthcare:

How is machine learning used in healthcare?

Machine learning and AI are making substantial strides in healthcare, impacting drug discovery, disease prevention, clinical research, and more. These technologies are revolutionizing diagnostics, enabling precision in medical imaging analysis, and facilitating personalized medicine. With continuous advancements, machine learning and AI are poised to further transform healthcare, enhancing patient outcomes and streamlining clinical processes.

What are some examples of AI and machine learning in healthcare?

AI and machine learning are revolutionizing healthcare by:

- Enhancing medical image analysis and offering virtual assistant services.

- Improving surgical precision with robotics.

- Advancing patient monitoring systems.

- Streamlining healthcare data management.

- Facilitating genome sequencing for personalized medicine.

- Utilising AI in mental health for predictive treatment plans.

These technologies are making healthcare more accessible and tailored to individual needs, significantly boosting the efficiency of diagnoses and treatments.

How big is the machine learning in the healthcare market?

The global AI in healthcare market, valued at USD 20.9 billion in 2024, is forecasted to reach USD 148.4 billion by 2029, growing at a CAGR of 48.1%. This growth is fueled by the generation of large healthcare datasets, the need for cost reduction, improved computing power, and the increasing use of AI for enhanced healthcare services.

What are the risks of using AI in healthcare?

The risks of machine learning in healthcare, as highlighted by the UK AI Safety Summit 2023, include issues around accountability, privacy, fairness, and transparency. The summit's focus on collaborative international action for AI safety, including the development and testing of AI models, reflects the complexity of these ethical considerations. Balancing AI's potential benefits against these risks is crucial for its responsible integration into healthcare.

Life at Rockborne

20 Jun 25

Federated Learning: The Future of Collaborative AI in Action

Federated Learning: The Future of Collaborative AI in Action The way we build, deploy, and govern AI is evolving, and so are the expectations placed on organisations to do this...

09 Sep 24

Tips to Succeed in Data Without a STEM Degree

By Farah Hussain I graduated in Politics with French, ventured into retail management, dabbled in entrepreneurship, a mini course in SQL and now… I am a Data Consultant at Rockborne....

Farah Hussain

15 Apr 24

Game Development at Rockborne: How is Python Used?

Just how is Python used in game development? In this blog post, we see the Rockborne consultants put their theory into practice. As the final project in their Python Basics...