Diagnosis of Thyroid Nodules from Medical Ultrasound Images with Deep Learning

By Eunjung Lee, School of Mathematics and Computing (CSE), Yonsei University

Small lumps or growths on the thyroid gland are usually benign and cause no symptoms. A small percentage of these thyroid nodules, however, are malignant. Physicians use high-resolution ultrasonography¹ to diagnose thyroid nodules, following up with a biopsy for nodules that exhibit common signs of malignancy; these include solidity, irregular margins, microcalcifications, and a shape that is taller than it is wide (Figure 1).

Ultrasonography images of a benign nodule and a malignant nodule. — Figure 1. Ultrasonography images of a benign nodule (left) and a malignant nodule (right).

While the characteristics of malignant nodules are well established, diagnosing malignancy from ultrasound images remains a challenge. The accuracy of the diagnosis depends on the experience of the radiologist, and radiologists assessing the same nodule can arrive at different diagnoses.

Our research team at Yonsei University and Severance Hospital (Seoul, Korea) used MATLAB^® to design and train convolutional neural networks (CNNs) to identify malignant and benign thyroid nodules. Diagnostic tests have shown that these CNNs perform as well as expert radiologists. We validated the CNNs against data sets from multiple hospitals, packaged them with a user interface, and deployed them as a web application. The application is used by medical students as part of their training and by experienced radiologists who need an objective second opinion on diagnoses.

Previous Machine Learning and Deep Learning Approaches

Before exploring the use of deep learning for thyroid nodule diagnosis, we tried conventional machine learning models. We performed feature engineering and applied a variety of machine learning methods available in MATLAB, including support vector machine (SVM) and random forest classification. The models performed about as well as a radiologist with 10 to 15 years of experience. Our goal was to develop software that would perform as well as experienced radiologists but provide consistent and objective results all the time, and so we began evaluating deep learning approaches.

One difficulty in using deep learning for medical imaging classification is the lack of available data. Medical data, including images, is protected by numerous privacy regulations, making it difficult to assemble a data set large enough to train a CNN. To address this challenge, we worked with 17 pretrained networks in MATLAB, including AlexNet, SqueezeNet, ResNet, and Inception.

We saw that each pretrained CNN classified an image in a slightly different way, and quickly realized that a combination would perform better than any single network. Combining networks proved difficult, however, because we had initially used a variety of programming languages and environments to create the networks. By using MATLAB, we would then have a single environment to preprocess images; design, train, and combine CNNs; analyze and visualize results; and deploy the CNNs as a web app.

Designing, Training, and Validating the CNNs

Before we began training CNNs in MATLAB, we performed several preprocessing steps on ultrasound images from four different hospitals in Korea. For example, we performed normalization to ensure that the pixel values of all the grayscale images were within the 0 to 255 range (Figure 2). For each image, we extracted a region of interest that focused on the nodule. We performed left-right mirroring on several hundred images of benign nodules to give us an equal number of benign and malignant nodules in our training data set.

Image captures of nodules showing an example of normalization performed on a region of interest. The nodules are shown before and after normalization. — Figure 2. An example of normalization performed on a region of interest.

We trained 17 different pretrained networks on a data set of more than 14,000 images that were approved by the institutional review boards (IRBs) of Severance Hospital: 4-2019-0163. Based on the performance of each, we chose a subset that included AlexNet, GoogLeNet, SqueezeNet, and InceptionResNetV2 to use in classification ensembles.

We experimented with two approaches for creating the ensembles, one that combined features and another that combined probabilities. For the feature-based combination, we used the outputs of the final fully connected layer in each CNN as features to train an SVM or random forest classifier. For the probability-based combination, we calculated a weighted average of the classification probability produced by each CNN. For example, if one CNN classified a nodule as benign with a 55% probability and another classified the same nodule as malignant with a 90% probability, then depending on the weighting, the ensemble was likely to classify the nodule as malignant.

To validate the diagnostic performance of the ensembles, we generated receiver operating characteristic (ROC) curves and compared the area under the curve (AUC) for each ensemble against the AUC for expert radiologists. We performed this comparison on the internal data set from Severance Hospital (part of the Yonsei University Health System) and on external data from three other hospitals. On the internal test set, the AUC of the AlexNet-GoogLeNet-SqueezeNet-InceptionResNetv2 ensemble was significantly higher than that of the radiologists. On the external test sets, the AUC was about the same (Figure 3).

Four graphs with specificity on the x-axis and sensitivity on the y-axis. These graphs compare the R O C curves of the CNN ensemble and those of expert radiologists for differentiating thyroid nodules across four data sets. — Figure 3. ROC curves of an AlexNet-GoogLeNet-SqueezeNet-InceptionResNetv2 ensemble and of expert radiologists for differentiating thyroid nodules across four data sets².

Deploying the SERA Web App

To make our CNNs available in the hospitals that Yonsei University works with, we created a web app named SERA and deployed it with MATLAB Web App Server™. Accessible via a web browser, the SERA app is used for academic purposes only at present. Doctors use SERA to get a second opinion in the diagnostic process. The app is also used to train first- and second-year doctors.

A simple user interface enables radiologists to run SERA on newly captured ultrasound images (Figure 4). Once an image is loaded, the CNNs diagnose the thyroid nodule in the image and the app displays a classifier accuracy score along with the probability that the nodule is malignant. Based on this probability, the app may recommend a fine needle aspiration (FNA) biopsy to confirm the diagnosis.

A screenshot of the SERA web app presenting results on an image of a nodule and categorizing it as malignant with a probability of 94.10%. Classifier accuracy is 87%. — Figure 4. The SERA web app.

CNN Explainability and Extensibility

Our team is currently working on the explainability of our CNNs—that is, why and how they come to a decision when classifying thyroid nodules. To do this, we are studying specific layers in the CNNs—in particular, filters in the convolution layer—to understand which image features the network is using to make its decision. We plan to meet with our most experienced radiologists to determine to what degree highly trained humans and trained CNNs use the same image features in thyroid nodule diagnoses.

We are also planning to develop CNN-based applications for diagnosing breast cancer and skin cancer.

¹ Ultrasonography images usually have low resolution. However, due to their noninvasive and less harmful properties compared with other medical imaging tools, high-resolution ultrasonography is widely used even for pregnant women and infants.

² Koh, Jieun, Eunjung Lee, Kyunghwa Han, Eun‑Kyung Kim, Eun Ju Son, Yu‑Mee Sohn, Mirinae Seo, Mi‑ri Kwon, Jung Hyun Yoon, Jin Hwa Lee, Young Mi Park, Sungwon Kim, Jung Hee Shin, and Jin Young Kwak. “Diagnosis of thyroid nodules on ultrasonography by a deep convolutional neural network.” Scientific Reports 10, no. 1 (September 2020): 15245.

Published 2022

Products Used

Learn More

Medical Image Analysis - Overview
Deep Learning for Medical Imaging - Blog
Automating Genotoxicity Assays with Imaging Flow Cytometry and Deep Learning - Article
Using Deep Learning to Reduce Radiation Exposure Risk in CT Imaging - Article

View Articles for Related Capabilities

View Articles for Related Industries

Biotech and Pharmaceutical