S3 API Compatibility

By Alex Merced

S3 API Compatibility

S3 API Compatibility refers to the industry-wide phenomenon where competing cloud providers, independent software vendors, and hardware manufacturers have adopted the proprietary Application Programming Interface (API) originally designed by Amazon Web Services for its Simple Storage Service (S3). In the modern data ecosystem, the “S3 API” is no longer just a way to talk to AWS; it has become the de facto universal standard for interacting with object storage, serving as the foundational communication protocol for the open data lakehouse.

The Evolution of a Standard

When Amazon launched S3 in 2006, they created a simple, RESTful HTTP interface to manage data. Developers used HTTP PUT to upload a file, GET to download it, and DELETE to remove it. This simplicity, combined with Amazon’s massive first-mover advantage, led to rapid and widespread adoption.

As the volume of data grew globally, organizations realized they needed object storage outside of AWS—either in their own private data centers (for compliance and cost control) or in competing clouds. However, developers had already spent years writing applications, backup scripts, and data engineering pipelines specifically coded against the AWS S3 SDK.

To break into the market, competitors realized that forcing developers to rewrite millions of lines of code to use a new, proprietary API was a losing battle. Instead, they chose to emulate Amazon. Vendors like MinIO, Ceph, Cloudian, and hardware giants like Dell (ECS) and Pure Storage built their storage systems to intercept and understand S3 API calls.

From the perspective of an application (like an Apache Spark job), writing data to an on-premises MinIO cluster looks entirely indistinguishable from writing to AWS S3. The application uses the standard AWS SDK, but simply points the endpoint URL to the local server instead of Amazon’s cloud.

The Core S3 API Operations

To achieve S3 compatibility, a storage system must accurately implement the semantics of the core S3 operations. This includes:

  • Bucket Operations: CreateBucket, DeleteBucket, ListObjects.
  • Object Operations: PutObject, GetObject, DeleteObject, CopyObject.
  • Multipart Uploads: A critical feature for big data. The API must support breaking a massive 100GB Parquet file into smaller chunks, uploading them in parallel, and reassembling them on the server (CreateMultipartUpload, UploadPart, CompleteMultipartUpload).
  • Security and Authentication: The system must implement AWS Signature Version 4 (SigV4), the complex cryptographic signing process Amazon uses to authenticate API requests securely.

The Lakehouse Enabler

S3 API compatibility is the glue that holds the open data lakehouse together. The lakehouse relies on decoupling compute from storage. This decoupling is only viable if the compute engines share a common language with the storage layers.

Because the S3 API is the undisputed standard, the creators of open table formats (Apache Iceberg, Delta Lake) and compute engines (Trino, Dremio, DuckDB) only need to build robust support for one protocol.

If Trino can read an Iceberg table using the S3 API, that exact same Trino engine can immediately query data stored in AWS, an on-premises MinIO appliance, or a Ceph cluster, without requiring any custom driver development. This interoperability prevents vendor lock-in at both the storage and compute layers. An organization can seamlessly migrate their petabyte-scale data lake from AWS to a private on-premises cloud simply by migrating the data and repointing the S3 endpoint URL in their compute engine configuration.

Summary and Tradeoffs

The S3 API represents a rare instance in the technology industry where a proprietary, vendor-owned interface organically became the universal, open standard simply by virtue of utility and massive adoption. It is the lingua franca of object storage.

The primary challenge with S3 API compatibility is the definition of “compatibility.” The S3 API is massive and constantly evolving as AWS adds new features (like S3 Select, Object Lock, or complex lifecycle policies). Most “S3-compatible” vendors only implement a core subset of the API.

If a data engineer utilizes a highly specific, niche AWS feature in their pipeline, they may find that their code breaks when pointed at an S3-compatible vendor that hasn’t implemented that specific endpoint. Therefore, when evaluating storage solutions for a hybrid-cloud lakehouse, data architects must rigorously test their specific application workloads to ensure that the chosen vendor’s level of S3 compatibility meets all required operational semantics, particularly around multipart uploads and security signing.

Visual Architecture

S3 Api Compatibility