One of the challenges in working in digital pathology is that the associated images can be excessively large, too large to load fully into memory, as well as too large to use in common pipelines. For example, a Aperio SVS file that we’ll look at today is 60,000 x 42,600 pixels. If we tried to load such an image, in RGB space, uncompressed it would require ~7GB, making it too large to consider using in our deep learning pipelines as there wouldn’t be enough RAM on the GPU for both the data and the filter activations.
The obvious way of managing this situation is to split the image into smaller tiles, operate on them separately, and merge the images back together. While we have wrappers which do this in a reasonable fashion in common languages, I was looking for a much more generic way, which affords the opportunity for additional speed ups. As a result, I’ll run through a process I developed using various snippets of code on the net.
The basic premise is that we can use Matlab to split the image in an organized way, with as much code re-use as possible. In fact, this isn’t nearly as difficult as one might expect, except that SVS files contain multiple pages, which contain the same image at different magnifications for improved image navigation. Lets look at some code.
Breaking the image apart
The code for this part is very similar to this blog post which leverages the idea of using image adapters to define how large images are read.
- tileSize = [2000, 2000]; % has to be a multiple of 16.
- input_svs_page=3; %the page of the svs file we're interested in loading
- input_svs_file='36729.svs';
- [~,baseFilename,~]=fileparts(input_svs_file);
- svs_adapter =PagedTiffAdapter(input_svs_file,input_svs_page); %create an adapter which modulates how the large svs file is accessed
- tic
- fun=@(block) imwrite(block.data,sprintf('%s_%d_%d.png',baseFilename,block.location(1),block.location(2))); %make a function which saves the individual tile with the row/column information in the filename so that we can refind this tile later
- blockproc(svs_adapter,tileSize,fun); %perform the splitting
- toc
While the tileSize needs to be a multiple of 16, that isn’t a constraint at this stage. Its actually a requirement to save large tif images as described in the following part. We can see here the basic premise is straight forward, we use blockproc to iterate through the image and save sub images. At this point, if we want to process the tiles we can (for example by resizing smaller, or actually doing the analysis). I didn’t opt for that here for two reasons, (a) the deep learning pipeline I have isn’t in matlab its in python, so these tiles will be used outside of this development environment and (b) since now we have all the tiles, we can easily parallelize whatever processing we’re interested in and compute the output for multiple tiles at the same time.
So, we start with this image:
And after splitting, we can see that there are now multiple, non-overlapping tiles:
Note that not all the images are 2000 x 2000. Also, we can see that some of them consist entirely of background. We can leverage this fact to avoid computation of the entire panel should we desire (for example, by requiring a minimum number of pixels to be non-background via a color threshold).
Interlude
Now that we have our tiles, we can compute their respective output. Nothing too surprising here 🙂
We can see the output here:
Putting it back together
Having the output means its time to stitch the images back together.
- tic
- outFile='36729.tif'; %desired output filename
- inFileInfo=imfinfo(input_svs_file); %need to figure out what the final output size should be to create the emtpy tif that will be filled in
- inFileInfo=inFileInfo(input_svs_page); %imfinfo returns a struct for each individual page, we again select the page we're interested in
- outFileWriter = bigTiffWriter(outFile, inFileInfo.Height, inFileInfo.Width, tileSize(1), tileSize(1),true); %create another image adapter for output writing
- fun=@(block) imresize(repmat(imread(sprintf('%s_%d_%d_prob.png',baseFilename,block.location(1),block.location(2))),[1 1 3]),1.666666666); %load the output image, which has an expected filename (the two locations added). In this case my output is 60% smaller than the original image, so i'll scale it back up
- blockproc(svs_adapter,tileSize,fun,'Destination',outFileWriter); %do the blockproc again, which will result in the same row/column coordinates, except now we specify the output image adatper to write the flie outwards
- outFileWriter.close(); %close the file when we're done
- toc
This process should be straight forward. We need to specify the desired output filename (ending in .tif, of course). Then we leverage the bigTiffWriter provided here, to incrementally fill in the final image. Notice that we’ve made the strong assumption that blockproc is deterministic in that given the same image it will always crop the tiles at the same places, which is in fact true. The only small difference here is that my output images are of a different size (due to the deep learning pipeline that created the output), so I take this opportunity to scale them back up to the expected size. Also, I’ve added the option to my bigTiffWriter to support compression, which is the 6th argument in the constructor.
This final image is 10MB, and is nicely stitched back together. I lovingly call this process humpty dumpty. We can see that the DL pipeline is doing a great job of identifying the cribriform pattern, but that conversation is for another day : )
Code is available here.
Another approach using imagemagik:
this will split it into 1000 x 1000 images:
convert -crop 1000×1000 INPUT_IMAGE_NAME cropped_%d.png
this will merge them back together. “9x” specifies 9 tiles across, which i got by dividing the image width by 1,000 pixels (From previous command) and then taking the ceil:
montage `ls -1 cr* | sort -V` -tile 9x -geometry +0+0 result_prob.png
I do not have svs files but jp2 files. Do you have any suggestions for splitting them? I open them in the Aperio ImageScope too but I do not have svs files of them!
Hi Nik,
How did your code look at the end? I’m working on jp2 files too and it just won’t work. Cheers
I got my answer! blockproc supports TIFF and JPEG2000 (jp2) natively. So there is no need to use the function “adapt” to this data format.
you can also use “convert” from imagemagick, the command is in a comment above
Hi Nik,
I want to work with pathology jp2 images which are large. I am a beginner in this research . Do you know how I could split big jp2 images to blocks?
Thank you
try the convert process discussed in the comments above
Dear Nik,
can you please explain how we can use this code if my image is in jpg format.
thanks
i would recommend using the imagemagik convert command (described in the comments). have you tried that?
I tried using this code but my output images consist of only black pixels(empty tiff images are being created) I am not able to understand why
the matlab version? what kind of input image is it? some of the codecs aren’t supported, unforunately
if you scroll up the comments, you’ll find a linux command line version using convert from imagemagick, which may be more robust. they have pretty robust image format support
input image is an svs image..when i run iminfo on it , i get only 4 rows. Also I am able to display the thumbnail version using page 2. Any other page number gives a black image.
Matlab version 2017b
it must be an old svs file (or from an older scanner), they use some compression which matlab can never seem to read correctly. if you want to use matlab, i recommend using the openslide library to read the image
Thank you for the post.
How do I create overlapping blocks?
never really looked into that, seems like it would be more difficult to merge back together? this is an interesting post which may help: https://imagemagick.org/discourse-server/viewtopic.php?t=22942
ultimately, any approach is going to be 2 nestled for loops, so that may be the way to go directly for customized output
Hi, Can you tell me what and where do I have to make changes if I read my file through openslide. I am trying but, I cannot change the code appropriately.
sorry, i don’t use this code anymore. the imagemagik convert approach discussed above is my preferred method. have you tried it? if you really want to use matlab + openslide, i’d suggest just using 2 for loops and avoiding the PagedTiffAdapter approach
Can you go into further detail on how you used the imagemagick convert approach? Where do we execute this command? What are the inputs/parameters?
not sure what you want to know? i typed the command into a linux terminal and voila 🙂 you can find the documentation here https://www.imagemagick.org/script/command-line-options.php#crop
I see! It seems to be working now. However, for large .svs files, the cropping function takes quiet a long time. Do you know if there is any way to accelerate this process by using the computer’s GPU?
if you look at the system resources while the conversion is taking place, i think you’ll find that the bottleneck is getting the file off of the hard drive and writing the tiles. there is essentially no “computation” which takes place, just loading and saving, so if anything i imagine adding a GPU to the mix will make things slower instead of faster.
I have tried to used this method to split images with 4K resolution and after processing merge them together. However, when I run the split images to merge them, the output .tif file doesn’t show me anything instead it prompts that the format is not supported. I tried to change the RGB Scale value and compression rate too but nothing work in merging the images although the split part is working fine.
Anyone with heads up kindly help. Thanks
have you tried the imagemagik convert version discussed in the comments?
please help me . How can input those commands on aperio. I don` have that much experience in writing scripts but could at least help me on how to input those commnads on aperio. your reply will be greatly delighted.
I have .svs file by the way.
i think you’ll need to find someone with some experience to help walk you through this. none of these works are designed to be used through the imageviewer, but instead through programming environments
is there an easy program to reduce file size in svs? mine are coming out as 2-3GB at least per slide
sorry not sure i understand your questions, svs files tend to be about 2GB simply because they contain very large images of e.g., 100k x 100k pixels. if you’re interested in reducing the size you could reprocess them with heavy compression, but you’ll start to see artifacts and drops in overall algorithmic performance as a result. we wrote a paper about that here: https://pubmed.ncbi.nlm.nih.gov/32155093/