As part of the scaling up of our QuickAnnotator tool, we executed a series of tests to benchmark backend technology. In particular, we were interested in looking at storage mechanisms for polygons which not only allow for their storage, but most importantly their spatial query. This implies that we could push geometries into a database, and then as part of a query, submit a second polygon (or bounding box) to identify those which intersect (among other spatial operations). The number of objects we were aiming for was at least 1 million rows in the database, as this is on the order of the number of unique cells within a typical whole slide image.
We as well wanted to consider different scalability options. It should not be surprising that if one wants to have extremely high-throughput, this typically requires the usage of more than 1 machine working in concert, which at the same time comes at the cost of additional setup and support complexity. It is not clear for us at the moment how much modern hardware has eliminated this cost, i.e., is 1 million “a lot” or “a little” with current technology. Regardless, one interesting way of managing complexity is through an abstraction layer, such that the backend can be readily and easily changed, without having an impact (or as small as an impact as possible) on the code being used to interact with that database.
Continue reading Insertion and Query of Spatial Databases