A performance comparison of using Openslide open and read WSIs via AWS S3 for S3FS and Goofys

Background

These days, developers In digital pathology frequently use OpenSlide to read whole slide images (WSIs). Since WSIs are typically large, transferring and sharing them can be challenging. Cloud storage provides a scalable, durable, and secure solution for storing and sharing WSIs remotely. By mounting cloud storage as a local file system, existing tools can access WSIs without additional modifications.

AWS S3 (Simple Storage Service) and MinIO (S3 compliant, onsite and open-source) are a widely used, robust, and highly scalable cloud storage service. While Mountpoint for AWS S3 is a promising option for treating cloud storage as a local file system, several open-source projects also allow users to mount an S3 bucket as a local file system. Among the most well-known are s3fs and Goofys. In this post, we’ll compare the performance of these two popular open-source solutions.

What are s3fs and goofys

s3fs is an open-source FUSE (Filesystem in Userspace) module that allows users to mount an Amazon S3 (or Minio) bucket as a local file system on Linux or macOS. This enables seamless interaction with S3-like storage using standard file operations, making it appear as part of the local file system.

Goofys, on the other hand, is an open-source FUSE-based file system designed for performance. It optimizes metadata operations and makes trade-offs in POSIX compliance to achieve higher efficiency. Goofys allows users to mount an S3 bucket as a local file system on Linux, macOS, and other Unix-like systems, offering a faster and more efficient way to access cloud storage.

what is the difference between s3fs and goofys?

s3fs provides POSIX-compliant behavior, meaning it supports standard file operations like chmod, mv, and chown. However, this compliance comes at a cost—s3fs frequently communicates with S3 to maintain consistency, making it slower, especially for workloads involving frequent small file operations. While it supports caching to improve performance, its reliance on S3’s object storage model can still be inefficient for high-throughput tasks. It is best suited for scenarios requiring strict POSIX compatibility, such as applications expecting a traditional filesystem interface. To use s3fs, you need an AWS profile (set up an AWS profile) or a .passwd-s3fs file (configure .passwd-s3fs).

Goofys, on the other hand, prioritizes performance over full POSIX compliance. It handles S3 metadata more efficiently, making it significantly faster than s3fs for many operations, particularly for large files and streaming reads/writes. However, since it does not fully support operations like chmod, chown, or truncate, it may not work well with software that depends on those features. Goofys is a great choice for workloads that need high-speed access to S3, such as data processing pipelines or applications that don’t require strict file system semantics.

If you have any questions about installation and configuration of s3fs or goofys, please check their GitHub repository – s3fs and goofys

Test environment setup

For the s3fs test, we mounted S3 directly and ran s3fs on bare metal.

For the Goofys test, we evaluated two scenarios. First, we mounted the S3 bucket directly and ran Goofys on bare metal. Second, we mounted the S3 bucket using Goofys on the host OS and then mounted that folder into a running Docker container. The test results showed no significant difference between the two approaches.

The following steps outline the setup process in a Docker container for Goofys:

  • Edit /etc/fuse.conf and enable user_allow_other
  • Mount the S3 bucket as a local folder using Goofys:
    goofys -o allow_other <s3_bucket> <local/path/folder>
  • Run the Docker container and mount the folder: docker run -it -v <local/path/folder>:<container/path/folder> <docker_image> bash

Tip: To allow other users (including processes inside Docker) to access the mounted S3 bucket, you must enable user_allow_other in /etc/fuse.conf and use the -o allow_other flag when running Goofys.

Software and hardware specifications used for testing:
s3fs – 1.90
Goofys – 0.24.0
OS: Ubuntu 22.04.4 LTS 64-bit
CPU: 13th Gen Intel® Core™ i5-1350P × 16
Memory: 32GB
Disk: 1TB SSD

Test results

The test compared s3fs on bare metal vs Goofys on bare metal. We use 7 various sizes of WSI files as test samples. For each WSI, we use openslide to test and time to initialize an OpenSlide object, as well as the get_thumbnail, and read_region functions on local, s3fs mounted and goofy mounted.

Local: directly read the WSIs from local storage.
s3fs: mounted S3 directly via s3fs and read the WSIs from mounted folders.
goofys: mounted S3 directly via goofys and read the WSIs from mounted folders.

The following table shows the test results:

namesize(mb)modeopen_time(s)thumbnail_time(s)read_time(s)
0590XY112001_01_06.svs1585.873local0.00220.20530.0214
goofys0.05170.27520.0358
s3fs3049.7225395.41291033.0831
0590XY112001_01_07.svs2224.478local0.00240.24670.0232
goofys0.05750.29270.0387
s3fs2724.0224613.4214977.6505
0590XY112001_01_08.svs1715.34local0.00370.18490.0214
goofys0.05980.2370.0325
s3fs2935.5364401.4624959.4518
HTT-TILS-001-03B.ndpi229.806local0.00090.00590.0104
goofys0.00590.00680.046
s3fs72.1550.1936800.1757
HTT-TILS-001-26B.svs 301.336local0.00440.13810.0486
goofys0.06860.1450.0575
s3fs941.99970.6746640.7553
HTT-TILS-001-27B.ndpi238.319local0.00090.0110.015
goofys0.00520.01180.0635
s3fs71.69270.21815610.9871
HTT-TILS-001-37B.svs1126.718local0.00780.43730.0489
goofys0.10630.55840.0613
s3fs1733.025173.3216748.63

We calculated the mean, min, and max of the time of executing OpenSlide, read_region, and get_thumbnail in different ways.


For executing OpenSlide time:

Open time(s)
modemeanminmax
goofys0.0507140.00520.1063
local0.0031860.00090.0078
s3fs1646.87900071.69273049.7225


For executing read_region time:

Read Time(s)
modemeanminmax
goofys0.0479000.03250.0635
local0.0269860.01040.0489
s3fs2395.819071640.75536800.1757


For executing get_thumbnail time:

Thumbnail Time
modemeanminmax
goofys0.2181290.00680.5584
local0.1756000.00590.4373
s3fs236.3862860.1930613.4214

FYI: The s3fs and Goofys test results are only for reference since they are affected by the speed of the local network status.

Conclusion

Mountpoint for S3, Goofys, and s3fs are tools that enable applications expecting traditional file systems to interact with Amazon S3. s3fs is ideal for scenarios requiring POSIX compatibility, such as backup, archiving, and caching, though it comes with performance trade-offs. Goofys prioritizes speed and efficiency, making it a great choice for lightweight, resource-constrained environments where full POSIX support isn’t necessary. For enterprise-ready, high-performance S3 access, Mountpoint for S3 is the best option, offering fast, reliable throughput, particularly for read-heavy applications like data lakes and machine learning. When choosing an S3 access method, consider whether compatibility, simplicity, or performance best aligns with your project’s needs.

Leave a Reply

Your email address will not be published. Required fields are marked *