Organ volumes, tumor dimensions, size of lymph nodes; these are some examples of measures that provide valuable information for radiologists to use in diagnostic processes. For example, brain volumetry on MR images to track atrophy development in dementia cases, MRI-based prostate volume measurements to determine PSA density in prostate cancer diagnostics, or tumor volume evaluation in just about any organ that can be imaged. However, obtaining volumes in a precise way is not a quick and easy task if done manually. Structures need to be delineated image by image, slice by slice, making volume measurements a tedious and time-consuming job. Alternatively, the volume can be estimated rather than measured in an exact way. In prostate volume determination, for example, this is a widely used approach. A radiologist measures the prostate in three directions, applies a simple formula and obtains an estimate of prostate volume. This is, however, exactly what it promises to be: an estimate. Of course, having a more accurate approach would result in more trustworthy outcomes improving the quality of the diagnosis.
Enter artificial intelligence (AI). AI radiology algorithms are exquisitely suited to obtain volumes from medical images. They do not require the radiologist to spend hours segmenting organs and other structures. In addition, automatic analysis eliminates inter- and intra-observer differences, as the segmentation results will always be the exact same for the same input image. Especially deep learning-based methods have claimed some impressive results over the past few years. But how do these algorithms work exactly? And why are they so suitable for segmentation?
In this article we will explore what medical image segmentation exactly is and how an AI algorithm approaches this task. We will address a few basic segmentation algorithms that have been around for a long time and we will discuss the more recent deep learning-based approaches of convolutional neural networks.
What is medical image segmentation?
Medical image segmentation refers to indicating the surface or volume of a specific anatomical structure in a medical image. This anatomy can be (part of) an organ, a tumor, or another type of anatomical or pathological structure. Input images can range from X-rays, to ultrasonography, CT and MRI scans.
Typical for a segmentation algorithm is that it outputs a segmentation. Yes, that seems like a no-brainer, but it is important to realize what that means. Other types of algorithms can return an answer to a very simple yes or no question (for example, does this image of the brain contain a tumor?), they can assign a certain class to an image (what PI-RADS score should we assign to this prostate tumor? 1, 2, 3, 4, or 5?), or they can return a continuous value based on the input image (what oxygenation level does this tissue have?). A segmentation algorithm, however, returns an image. It may be just a matrix filled with ones and zeroes denoting which volumes belong to the segmented volume and which do not. But it is always something with the same size as the input image.
Segmenting a medical image, how does a computer do it?
Now that we understand the characteristics and the requirements of a segmentation algorithm, let’s have a look at how a computer goes about segmenting a medical image.
2 ways to segment an anatomical structure
There are roughly two types of approaches an algorithm can use to describe the volume of an anatomical structure; a contour-based approach or a voxel-based approach.
A contour-based approach searches for the border between the structure-to-be-segmented and its surroundings. Quite similar to how a radiologist would manually segment an organ: draw a line between the organ and other body parts.
A voxel-based approach has similarities with classification methods in such a way that the algorithm actually classifies each voxel, meaning that the computer will look at each voxel separately and determine whether it is part of the structure you want to segment or it is not. After addressing each voxel in the image, a segmentation will be the result, i.e. for all voxels it was decided whether they do belong to the structure or not.
Contour-based segmentation, an example
A classic contour-based segmentation method is the active contour model also known as snakes. This algorithm starts with an initial contour close to the object that you want to segment. Subsequently, it looks at voxels right next to those that are part of the contour to see whether these are brighter or darker than the contour voxels. Based on this information the contour shifts a little bit and checks again, shifts a little bit, etc. Until it found its optimal position. What this optimal position is, depends on the rules that are provided. These rules can be simple such as “move towards the brighter pixels”, but they can also provide restrictions such as “avoid ending up with sharp corners”.
So, this is not a voxel-based method, but it still uses information from voxels? That is indeed confusing. The difference with a voxel-based method is that it does not analyze every voxel in the image because the algorithm starts with a contour and adjusts the shape of this until it has found edges. If the initial contour is close to the object, most voxels of the image will probably never be looked at.
Figure 1: A snake model adjusts a contour in multiple steps to get to a delineation, a segmentation, of a structure.
Voxel-based segmentation, an example
A very basic example of a voxel-based segmentation algorithm is simple thresholding. This is one of the most straightforward methods to use as it just looks at the intensity of a voxel, compares it to a set threshold, marks it as being part of the structure if it is above a certain value and marks it as being “surroundings” if it is below this value (or the other way around of course in case of hypo-intense structures). Of course, this only works well for organs and other structures that have a very different intensity from their surroundings and are not affected too much by noise. It is not without reason that we call it “simple” thresholding.
Figure 2: Simple thresholding is used to segment an image based on voxel intensity. As is already apparent from the image, it is tricky to segment only one structure with this approach as structures are rarely the only objects in an image that have an intensity above or below a certain value.
What about registration-based methods?
Interesting one! The general idea of a registration-based segmentation method is that you already have a segmented structure in one image (or a set of images), and you want to have this same structure segmented in a second image. What you then need is a very exact description of how you need deform the first image, so that it becomes the second image. We call this a transformation. Applying this same transformation to the segmentation of first image will provide you the segmentation for the second image. Technically speaking though, this method belongs to the voxel-based methods as it addresses all voxels.
Figure 3: A registration-based segmentation algorithm first determines what adjustment, or transformation, is needed to change image one into image two. Secondly, applying this exact same transformation to an already available segmentation on image one, will lead to a segmentation of the same structure in image two.
Medical image segmentation using deep learning, how does it work?
The previous section describes a few very basic examples of the many algorithm-based methods suitable for segmentation on medical images. However, over the past years deep learning has proven to be exquisitely suitable for the task. Why is this so? What characteristics do deep learning-based algorithms have that make them so fitting for this job?
In this section we will briefly discuss a few examples of deep learning-based methods, the way they work and what makes them so suitable for segmentation.
Convolutional neural networks
A convolutional neural network, or CNN for short, is a specific form of a deep neural network . It uses a combination of convolution steps, i.e. applying a certain filter to each pixel in the image and pooling steps, i.e. down sampling the image. These steps are performed in what we call different layers of the neural network.
After applying both steps a couple of times, the algorithm has filtered out the most important information in the image and is able to determine what the image contains (or classify the image content).
Figure 4: A convolutional neural network applies special convolution filters to a medical image to deduce the structures or organs present in the image.
Not every CNN is explicitly suited for segmenting objects. Actually, the most basic form we just discussed, is mostly suited for classification as its output answers one simple question: What does the image contain? With a little bit of creativity, it is possible to use this type of CNN for segmentation. The way to do this is to apply a CNN to each voxel separately. In that case a patch around the voxel is selected, this is given to the CNN as input and the CNN classifies this one voxel. And then the next voxel and the next, etc.
Specific forms of CNNs, such as fully convolutional neural networks and U-nets have the ability to directly output a segmentation.
Fully Convolutional networks
A fully convolutional network (FCN) can contain convolutional as well as pooling layers, however, it compensates for the down-sampling in the pooling layers by adding up-sampling layers that increase the dimensions of the image until it is at its original size. Hence, it can be a segmentation that can be overlaid on the original input image.1
Figure 5: A fully convolutional neural network assures the segmentation output of the network has the same size as the input medical image by using up sampling layers.
Another example of a CNN that has the dedicated characteristics for image segmentation is a U-net. It contains both convolutional layers and pooling layers in the first part (the left side of the U in the image below), hence this is where the image gets down-sampled. However, the network will up-sample the image again by using up-sampling layers until dimensions are the same as the input layer (the left side of the U in the image below). A U-net is very similar to a FCN, you could even say it is a type of FCN. The characteristic factor is that a U-net has connections between the downward path and the upward path (grey arrows in the image below). Thus, it uses information in the up-sampling process that was otherwise lost during down-sampling.2
Figure 6: A U-net is a specific type of fully convolutional network, named after the U that is present in most schematic representations of such a network (just as in the image above).
Dilated convolutional network
Another type of network specifically developed for segmentation are dilated CNNs. Once you understand the basic idea of CNNs, the adjustment to get to dilated CNNs is a small step to grasp. Where regular CNNs use a filter of voxels that are all next to each other, dilated CCNs use a more “spread” filter, hence the filter uses information from voxels that are not direct neighbors, but that are spread over a wider area. What are the benefits of this method? Dilated CNNs use multiple layers of these special filters with different “dilation”, i.e. the distance between the voxels used in the filter is not always the same. This allows the network to include a lot of surrounding information to get to a decision for all voxels in the image without the use of pooling layers, hence the network does not down-sample. So it also does not need to up-sample to get to the right size for the output segmentation. Think of it as a similar approach to how a radiologist would look at a larger part of the image to recognize the anatomy.3
Figure 7: A dilated convolutional neural network uses dilation to collect information from a wider patch of pixels without using pooling layers, hence no up-sampling is needed.
As you can see, there are many different possibilities to approach image segmentation. None is necessarily “better” than the other, this very much depends on your training data, input scans and what it actually is you are trying to segment (to name a few factors). Deep learning, and especially FCNs, are exquisitely suited for image segmentation. Many research groups have used such networks, or variations on those, for medical segmentation purposes. Results have been impressive, but this does not mean that they can directly be applied in a clinical setting. Curious to learn more about the segmentation products we have available or are planning to release soon? Check our product website.
- Long, J. & Shelhamer, E. Fully Convolutional Networks for Semantic Segmentation. at CVPR (2015).
- Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. at MICCAI 1–8 (2015).
- Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. at ICLR (2016).