All posts by choosehappy

Serialization and storage of GeoJson in Digital Pathology

GeoJSON, a widely used format based on JSON (JavaScript Object Notation), is specifically designed for encoding a variety of geographic data structures. This versatile format excels in representing simple geographical features, such as points, lines, and polygons, along with their non-spatial attributes. In the realm of digital pathology, GeoJSON has emerged as a common format for storing annotations, enabling precise documentation of regions of interest, cellular structures, and other critical details within pathology images. The popularity of GeoJSON in this field is bolstered by its broad support across numerous tools (e.g., Qupath) and thus facilitates seamless integration and analysis in digital pathology workflows.

Despite its widespread adoption, there are several open questions regarding the efficient use of GeoJSON that can significantly impact performance. One key concern is the best method for storing GeoJSON in a compressed format to minimize storage requirements while preserving the integrity of the data. Efficient compression techniques are crucial, especially when dealing with large-scale pathology datasets.

Continue reading Serialization and storage of GeoJson in Digital Pathology

Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 3 Intro to Serving Models

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (2 on this topic, basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

This is the second to last blog post in the series, (the first one here, second one here), where we will go into greater detail about how we can use Ray Serve to set up a server waiting to respond to our requests for processing. These last two are the most complex blogpost in the series and require some understanding of how HTTP, REST, and web services work. You can find relevant prereading here.

Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, Tensorflow, and Keras, to Scikit-Learn models, to arbitrary Python business logic.

Continue reading Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 3 Intro to Serving Models

Ray: An Open-Source API For Easy, Scalable Distributed Computing In Python – Part 2 Distributed Scaling

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

This is the second blog post in the series, (the first one here), where we will go into greater detail about how Ray Cluster creation works, associated terminology, requirements for successful execution, and extend our previous local-only example to a distributed environment.

Continue reading Ray: An Open-Source API For Easy, Scalable Distributed Computing In Python – Part 2 Distributed Scaling

Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 1 Local Scaling

Through a series of 4 blog posts, we’ll discuss and provide working examples of how one can use the open-source library Ray to (a) scale computing locally (single machine), (b) distribute scaling remotely (multiple-machines), and (c) serve deep learning models across a cluster (basic/advanced). Please note that the blog posts in this series increasingly raise in difficulty!

I am personally very excited by the opportunities afforded by Ray, its been a long time desire to have such an easy-to-use library!

Okay, lets start off by talking about scaling local computation with Ray!

Continue reading Ray: An Open-Source Api For Easy, Scalable Distributed Computing In Python – Part 1 Local Scaling

Using QuPath To Help Identify An Optimal Threshold For A Deep Or Machine Learning Classifier

Digital pathology projects often require assigning a class to cells/objects. For example, you may have a segmentation of cells/glomeruli/tubules and want to identify the ones which are lymphocytes/sclerotic/distal. This classification process can be done using machine or deep learning classifiers by supplying the object of question and receiving an output score which indicates the likelihood that that particular object is of that particular type.

This blog post will demonstrate an efficient way of using QuPath to help find the ideal likelihood threshold for your classifier.

Continue reading Using QuPath To Help Identify An Optimal Threshold For A Deep Or Machine Learning Classifier

Application of ICC profiles to digital pathology images

Background on Color Calibration

Digital whole slide image scanners are designed to take stained tissue on glass slides and digitize them into bytes for usage in the digital world. The process by which slide scanners perform this operation does not produce a perfect digital equivalent of the original slide as the hardware involved (led/blub, camera sensor, quantizer) can introduce some biases during the sampling process. For example, different camera sensors may detect colors with different levels of specificity/accuracy/density, resulting in similar but not perfect representations of the associated real-world subjects.

Concretely, there is often a difference between the color you perceive in the real-world under a microscope versus what you would see if you looked at the corresponding digital copy of the same slide. This blog post discusses how to correct for this discrepancy using ICC profiles.

Continue reading Application of ICC profiles to digital pathology images

Tutorial: Quick Annotator for Tubule Segmentation

The manual labeling of large numbers of objects is a frequent occurrence when training deep learning classifiers in the digital histopathology domain. Often this can become extremely tedious and potentially even insurmountable.

To aid people in this annotation process we have developed and released Quick Annotator (QA), a tool which employs a deep learning backend to simultaneously learn and aid the user in the annotation process. A pre-print explaining this tool in more detail is available [here].

Continue reading Tutorial: Quick Annotator for Tubule Segmentation

Transferring data FASTER to the GPU With Compression

Utilization of current GPUs is often limited by the ability to get the data onto and off the device quickly. More precisely, this means taking data from the host RAM, transferring it over the PCI-e bus to the GPU RAM is the bottleneck of many deep learning use cases.

Continue reading Transferring data FASTER to the GPU With Compression

The noise in our digital pathology slides

In adding new features to HistoQC , I stumbled upon a very interesting insight that I thought I would take a moment to share. The amount of noise and artifacts in digital pathology (DP) whole slide images (WSI) is far more extensive than I had previously thought.

Continue reading The noise in our digital pathology slides