Serialization and storage of GeoJson in Digital Pathology

GeoJSON, a widely used format based on JSON (JavaScript Object Notation), is specifically designed for encoding a variety of geographic data structures. This versatile format excels in representing simple geographical features, such as points, lines, and polygons, along with their non-spatial attributes. In the realm of digital pathology, GeoJSON has emerged as a common format for storing annotations, enabling precise documentation of regions of interest, cellular structures, and other critical details within pathology images. The popularity of GeoJSON in this field is bolstered by its broad support across numerous tools (e.g., Qupath) and thus facilitates seamless integration and analysis in digital pathology workflows.

Despite its widespread adoption, there are several open questions regarding the efficient use of GeoJSON that can significantly impact performance. One key concern is the best method for storing GeoJSON in a compressed format to minimize storage requirements while preserving the integrity of the data. Efficient compression techniques are crucial, especially when dealing with large-scale pathology datasets.

Continue reading Serialization and storage of GeoJson in Digital Pathology

Data Exploration Of Features For Outcome Association In Digital Pathology

Introduction

In the field of digital pathology, a frequent approach for the creation of image-based biomarkers involves extracting features from scanned pathology slides. These features, which are often related to the morphology or spatial distribution of various tissue or cell types, provide valuable insights into the underlying biology of diseases. In cancer research, it is particularly important to examine how these features correlate with clinical outcomes such as overall survival (OS), progression-free survival (PFS), or other binary outcomes (e.g., response to a specific treatment).

Here we release python code that can be executed in a notebook to facilitate this process. It accepts a pandas DataFrame and generates a one-page summary PDF file, facilitating the analysis of individual features and their potential correlation with clinical outcomes.

Continue reading Data Exploration Of Features For Outcome Association In Digital Pathology

Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 3 Intro to Serving Models

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (2 on this topic, basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

This is the second to last blog post in the series, (the first one here, second one here), where we will go into greater detail about how we can use Ray Serve to set up a server waiting to respond to our requests for processing. These last two are the most complex blogpost in the series and require some understanding of how HTTP, REST, and web services work. You can find relevant prereading here.

Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, Tensorflow, and Keras, to Scikit-Learn models, to arbitrary Python business logic.

Continue reading Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 3 Intro to Serving Models

Ray: An Open-Source API For Easy, Scalable Distributed Computing In Python – Part 2 Distributed Scaling

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

This is the second blog post in the series, (the first one here), where we will go into greater detail about how Ray Cluster creation works, associated terminology, requirements for successful execution, and extend our previous local-only example to a distributed environment.

Continue reading Ray: An Open-Source API For Easy, Scalable Distributed Computing In Python – Part 2 Distributed Scaling

Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 1 Local Scaling

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

I am personally very excited by the opportunities afforded by Ray, its been a long time desire to have such an easy-to-use library!

Okay, lets start off by talking about scaling local computation with Ray!

Continue reading Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 1 Local Scaling

Approach for Easy Visual Comparison between ground-truth and predicted classes

Although classification metrics are good for summarizing a model’s performance on a dataset, they disconnect the user from the data itself. Similarly, a confusion matrix might tell us that performance is suffering because of false positives, but it obscures information about what patterns may have caused those misclassifications and what types of false positives there might be. 

One way to gain interpretability is to group sampled images by the category of their output (true negative, false negative, false positive, true positive), and display them in a powerpoint file for facile review. These visualizable categories make it easy to identify patterns in misclassified data that can be exploited to improve performance (e.g., hard negative mining, or image analysis based filtering).

This blog post describes and demonstrates a workflow that produces such a powerpoint slide deck automatically for review, as shown below:

Continue reading Approach for Easy Visual Comparison between ground-truth and predicted classes

Using QuPath To Help Identify An Optimal Threshold For A Deep Or Machine Learning Classifier

Digital pathology projects often require assigning a class to cells/objects. For example, you may have a segmentation of cells/glomeruli/tubules and want to identify the ones which are lymphocytes/sclerotic/distal. This classification process can be done using machine or deep learning classifiers by supplying the object of question and receiving an output score which indicates the likelihood that that particular object is of that particular type.

This blog post will demonstrate an efficient way of using QuPath to help find the ideal likelihood threshold for your classifier.

Continue reading Using QuPath To Help Identify An Optimal Threshold For A Deep Or Machine Learning Classifier

A masterclass in Scientific CV writing

Introduction: Another day, another application form

Writing applications for jobs, grants and all manner of other reviews is a continual process within the scientific World. Forms tend to ask for specific, nuanced information leading to more of our precious time being spent digging up decades-worth of buried events just to evidence ‘A time I have communicated with a diverse audience’ than actually writing. Then, we have the doubt to contend with: What if I missed something? Surely I have a better example! I remember doing that – but when was it?

Given A) how short academic contracts can be and B) how many distinct workplaces our generation tends to work in over the course of a career, writing CVs can consume a considerable chunk of our adult lives. The application process is not going anywhere in the near future. We need to ask ourselves how we can make it as painless and efficient as possible.

Well, there are a few ‘hacks’. Apply for a few jobs and you will start to notice themes in the application process and in the ‘winning’ CVs. Let’s go over these themes and learn to not only ‘hack’ our time but more importantly, our success rate. Doing so, we can earn back so much more time to do the things we love – science!

Continue reading A masterclass in Scientific CV writing

How to Select the Correct Magnification and Patch Size for Digital Pathology Projects

In digital pathology, input data is often exceedingly too large for DL models to process directly, with Whole Slide Images (WSI) around 100k x 100k pixels. This post provides a quantitative and qualitative method, with code, to help optimize important digital pathology specific hyperparameters: patch size and magnification. Optimizing these variables can decrease training times, lowers hardware requirements, and reduces the amount of data required to effectively train a model.

Read more