All posts by Jackson Jacobs

Insertion and Query of Spatial Databases

As part of the scaling up of our QuickAnnotator tool, we executed a series of tests to benchmark backend technology. In particular, we were interested in looking at storage mechanisms for polygons which not only allow for their storage, but most importantly their spatial query. This implies that we could push geometries into a database, and then as part of a query, submit a second polygon (or bounding box) to identify those which intersect (among other spatial operations). The number of objects we were aiming for was at least 1 million rows in the database, as this is on the order of the number of unique cells within a typical whole slide image. 

We as well wanted to consider different scalability options. It should not be surprising that if one wants to have extremely high-throughput, this typically requires the usage of more than 1 machine working in concert, which at the same time comes at the cost of additional setup and support complexity. It is not clear for us at the moment how much modern hardware has eliminated this cost, i.e., is 1 million “a lot” or “a little” with current technology. Regardless, one interesting way of managing complexity is through an abstraction layer, such that the backend can be readily and easily changed, without having an impact (or as small as an impact as possible) on the code being used to interact with that database. 

Continue reading Insertion and Query of Spatial Databases

Approach for Easy Visual Comparison between ground-truth and predicted classes

Although classification metrics are good for summarizing a model’s performance on a dataset, they disconnect the user from the data itself. Similarly, a confusion matrix might tell us that performance is suffering because of false positives, but it obscures information about what patterns may have caused those misclassifications and what types of false positives there might be. 

One way to gain interpretability is to group sampled images by the category of their output (true negative, false negative, false positive, true positive), and display them in a powerpoint file for facile review. These visualizable categories make it easy to identify patterns in misclassified data that can be exploited to improve performance (e.g., hard negative mining, or image analysis based filtering).

This blog post describes and demonstrates a workflow that produces such a powerpoint slide deck automatically for review, as shown below:

Continue reading Approach for Easy Visual Comparison between ground-truth and predicted classes