cunoFS joins the distributed cloud

Is POSIX Outdated in the Cloud Era?

POSIX — the Portable Operating System Interface — is the standard API used by UNIX and conforming non-UNIX (including Windows) operating systems to maintain compatibility with each other. The POSIX standard allows applications written for one operating system to work on others with minimal-to-no reworking depending on the other APIs required, greatly reducing the complexity and time required to write cross-platform applications.

POSIX is widely adopted, well understood by developers, and well maintained with established compliance tests. But it’s no longer ubiquitous, especially for file access in the cloud space. Cloud object storage systems like Amazon S3 instead offer REST APIs that are very much removed from POSIX concepts. The rapid emergence of new technologies and use cases raises the question: is POSIX still relevant, or a legacy technology awaiting replacement?

POSIX has been around for a while; why hasn’t something replaced it?

Whether or not POSIX is outdated or too slow to evolve to meet new requirements is an old debate:

“Over the years, we’ve done lots of nice “extended functionality” stuff. Nobody ever uses them. The only thing that gets used is the standard stuff that everybody else does too.” Linus Torvalds

POSIX hasn’t been replaced or seen any major evolutions for a simple reason — developers want the most compatibility and longevity for the code they write, so any competitor to the throne would need to combine a large improvement in functionality with a huge amount of momentum to overcome POSIX dominance.

However, while POSIX hasn’t been superseded, it isn’t being adopted in some prominent new use cases — particularly for interacting with files stored in low-cost cloud object storage.

One of the most widely adopted storage APIs is Amazon’s S3 API — and it's not POSIX

Object storage is not a filesystem, and was developed with different use cases in mind. Unlike the filesystems POSIX is built to interact with, object storage is unstructured, making it readily scalable, including by pooling storage across different locations. Due to its flat structure, data can be retrieved incredibly quickly if you have the identifier or metadata to locate it.

Due to their inherent differences, filesystems and object storage work in completely different ways, including how they are accessed. Object storage is usually accessed using RESTful HTTP APIs and is an example of how files can be interacted with in the absence of POSIX. Still, due to its affordability and scalability, object storage has become incredibly popular and is often treated by users like any other storage medium. It is used to host websites, to host backups, for staging files and file transfers, and even as a stand-in for regular file storage through creative use of its API and through filesystem adapters.

The API implemented by AWS’s S3 has become the de facto standard for object storage and has been adopted by other vendors including Google Cloud Platform, Oracle Cloud, and on-premises object storage systems. The adoption of the S3 API, despite its proprietary nature, provides a view of what establishing a new standard for filesystem access might look like — different provider and user requirements pushing and pulling against each other, all while users are trying to apply these incomplete implementations to novel use cases. It’s a mess.

Standards exist for a reason

Cloud storage vendors may claim to be 100% S3 compatible, but developers often find that software that works with Amazon S3 does not work on the supposedly S3-compatible solutions of its competitors, or that software that once worked no longer does. This is due to differences in underlying technologies and because, as all of these object storage solutions are being actively developed, key APIs may be missing, especially if everyone is trying to copy whatever AWS is doing after the fact. 

At cunoFS, we’ve worked with a number of developers on diverse object-storage-backed solutions for high performance and big data purposes. The issues we’ve faced when working with supposedly 100% S3-compatible storage include:

  • Commas and other special characters not working while using the official AWS C++ SDK with Google Cloud Storage
  • Failed transfers caused by incorrect responses when operating under high load with a popular public cloud vendor
  • Listing object keys not being in lexicographical order (as per the S3 spec) in another widely used public-cloud S3 storage vendor
  • Inconsistency issues, and deleted directory keys persisting for months as phantom keys in a major on-premises object storage system
  • Errors when not using requester-pays for another public-cloud S3 storage vendor
  • Performance issues with non-AWS providers caused by various missing or faulty APIs that force us to fall back to slower alternatives

Adding rudimentary support for a cloud provider’s object storage to an application is not difficult — the APIs are documented and SDKs are available. But, as our experience in the field shows, the challenge is getting decent performance with wide compatibility, consistency, and fault handling. 

This highlights the need for a standard that is able to bridge the compatibility gap and can be readily adopted and implemented by solutions that are performant and reliable under high load.

Every cloud provider has its own incomplete S3 API implementation

There are companies investing billions of dollars in their object storage solutions, trying to catch up to AWS, while relying on it to provide the API specification they will use to be compatible. Amazon has no incentive to consult with these competitors about establishing a true set of standards for object storage.

In the meantime, vendors such as Azure and Google and projects like OpenStack have decided to publish their own APIs. This lack of standardization negatively impacts developers, hurts businesses’ bottom line by locking them in to a single provider, and reduces the value of applications that have been developed, as they would require re-working to remain compatible with other platforms, or even the same platform if its API changes.

Courtesy of https://xkcd.com/927

Developers need to be able to write once and run anywhere, with decent performance. That’s why a single standardized layer of abstraction makes sense.

Extending POSIX to solve the biggest issues developers face with object storage

Discarding an established, proven standard rather than extending it to new use cases could be considered reckless. Just because the novel solution to accessing object storage uses a RESTful API doesn’t mean that it is an improvement over existing file access APIs — it’s just a solution to a problem not yet met by other solutions, and it comes with its own laundry list of issues.

POSIX is an established standard that developers know how to interact with, providing them with a less complicated and less error-prone development process. Lots of existing software is built on the POSIX filesystem paradigm, ready to run in any environment that supports it or be integrated into new toolchains to reduce overall development time. POSIX stays relevant not because of bleeding-edge features, but because developers like its reliability — they only use object storage APIs because they have to.

Extending POSIX to the cloud — so that developers can code against a familiar POSIX model, using existing tools, and skipping the security, consistency, and compatibility complexities of integrating object storage directly into their applications — will see it not just remain relevant, but become an important part of the cloud storage landscape. Software such as s3fs and goofys are already working to bridge this gap, allowing developers to treat affordable object storage as if it were a local filesystem; however, issues remain around performance, compatibility, and reliability.

cunoFS is our high-performance POSIX compatibility layer for S3 compatible object storage that works with all major vendors, providing a consistent POSIX environment for the development and execution of your workflows.

cunoFS is up to 50x faster than other solutions (including the native AWS CLI), and files are kept in their original format, so there is no platform lock-in. Part of our solution was to extend POSIX, rather than introducing new methods — cunoFS utilizes ultra-fast syscall and POSIX API interception, allowing us to more flexibly extend POSIX for use in the cloud. This is much faster than FUSE-based approaches that are notoriously slow. We’ve also built on existing POSIX operations so that they can work with URI paths (e.g. s3://mybucket/myobject) as well as traditional file paths, and we’re continuing to push for the further evolution of the standard for cloud applications.

The future of POSIX and storage APIs

We believe that there are many more ways to extend POSIX to the cloud domain. The increasing availability of POSIX-compliant access to object storage will see many developers move away from object storage APIs and return to writing their applications to work in cross-platform POSIX environments. 

As evidence of the future of the concepts behind POSIX, we see WebAssembly System Interface (WASI) as a point in how the convenience of POSIX has entered even WebAssembly and will lead to powerful new applications.

These advancements, however, must be planned carefully  — stable standards quickly lose their appeal to developers if they are disrupted.

Related Posts

Table of Contents