Just wanted to take a moment and share some quick stain normalization type experimental results. We have a trained in-house nuclei segmentation model which works fairly well when the test images have similar stain presentation properties, but when new datasets arrive which are notably different we tend to see a decreased classifier performance.
Here we look at one of these images and ways of improving classifier robustness.
We take an input image (left) and apply a nuclei segmentation model to produce the output mask (right), where we can see the quality is in fact very poor:
![]() |
![]() |
When we compare this test image to the types of images in the training set, we can immediately see that they don’t live in a similar color space:

One very fast way to see if color normalization would help, is to take the test image, and perform a per channel normalization to an image from the training set (we’ll call it a template image “ref”). This takes only 3 lines of matlab code:
- out=size(io);
- for zz=1:3
- out(:,:,zz)=imhistmatch(io(:,:,zz),ref(:,:,zz));
- end
A version which supports ignoring of white space:
- back=rgb2gray(io)>200;
- idx=find(back);
- out=size(io);
- for zz=1:3
- ioc=io(:,:,zz);
- refc=ref(:,:,zz);
- ioutt=imhistmatch(ioc(idx),refc(idx));
- ioc(idx)=ioutt;
- io(:,:,zz)=ioc;
- end
which produced this (same) input image in a slightly different color space. Note how there are even obvious image artifacts from this crude process:

Now we take this new image and apply the same classifier:

where we can see the normalization has had a profound effect, and arguably has yielded similar results as to an image from the original training cohort.
So, in this particular case, even gross stain normalization produces a significant improvement in results
The real question now: is it possible to perform data augmentation at training time so that these images will work “naturally” (without individually normalizing them). To that end I wrote an augmentation layer for caffe which during training time will randomly modify the color space in hopes of improving robustness.
I estimated the parameters by looking at how a test image differs from a training image:
- for zz=1:3
- c=double(io(:,:,zz));
- cmean=mean(c(:));
- cref=double(iref(:,:,zz));
- crefmean=mean(cref(:));
- [cmean,crefmean,cmean/crefmean]
- end
- ans = 210.0452 172.8703 *1.2150*
- ans = 152.0753 113.6565 *1.3380*
- ans = 207.2472 152.5671 *1.3584*
And finally settled on:
- param_str: '{"rotate": True, "color": True, "color_a_max":1.5, "color_a_min": .3}'
Then retrained the network and produced output on the unaltered input image:
![]() |
![]() |
This provides evidence for the hypothesis that if you can quantitatively measure differences in training and testing sets, and accommodate for them using data augmentation during training time, the DL will learn to be robust.
Another idea I tested is to completely remove the mean file from both the training and testing phases. Instead, each patch is individually mean centered on the fly. Essentially this should help combat obvious colorspace translations which tend to show up as brightness. By still using the augmentation layer before the re-meaning, we leverage a coordinate system which centers around zero regardless of stain presentation. The results are stunning as well.
An approach like this is likely something we should look further into, especially in the context of cross-site normalization
So there is still not standard pipeline of normalization in the field of digital pathology ,right?
it is a very very challenging field, especially in the context of validation. while there are a number of approaches which have been published, i’m not aware of a “standard” pipeline which is universally suitable and employed
Would you please recommend some publications about normalization in which the approaches were widely used?
the main point is that there is no “widely” used approach : ) you can check out some of these papers: https://warwick.ac.uk/fac/sci/dcs/people/research/csrlai/publications/tbme-00001-2013-r1-preprint.pdf https://www.ncbi.nlm.nih.gov/pubmed/27373749 https://openreview.net/forum?id=SkjdxkhoG https://arxiv.org/pdf/1804.01601.pdf
here is a nice review: https://www.sciencedirect.com/science/article/pii/S0968432818300982
here is a nice review: https://www.sciencedirect.com/science/article/pii/S0968432818300982
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7460968
Thank you for all your posts and replies which helped me with my first deeplearning network for digital pathology images and I will follow you all the way
What does the variable/function ‘io’ represent?
From lines 1 and 3:
out=size(io);
out(:,:,zz)=imhistmatch(io(:,:,zz),ref(:,:,zz));
original image