Branching (WAP)
Branching (WAP)
Software engineering relies heavily on version control systems like Git. Developers create branches, write code, test it in isolation, and only merge it into the main production branch when it passes all quality checks.
Historically, data engineering did not have this luxury. When an ETL job wrote data to a Hive table, the data was immediately live. If the ETL job wrote corrupt data or dropped millions of rows by accident, downstream reports and dashboards were instantly broken.
Apache Iceberg introduces Git-like semantics directly into the data lakehouse through Branching.
The Concept of Branching
Because Iceberg tables are defined by a linear chain of Snapshots, it is trivial to “fork” that chain.
You can create an audit-branch that diverges from the main branch. When your ETL pipelines write new data, they write exclusively to the audit-branch. They create new Snapshots that are completely isolated.
During this time, if an end-user runs a SELECT * query against the table, the catalog directs them to the main branch. They see exactly what the table looked like before the ETL job started. They are completely shielded from the incomplete or potentially corrupted data being written in the background.
Diagram 1: Branching in Apache Iceberg

The Write-Audit-Publish (WAP) Workflow
Branching is the foundational technology that enables the Write-Audit-Publish (WAP) pattern, a gold standard for data quality engineering.
- Write: The ETL pipeline executes, transforming data and writing it to the isolated
audit-branch. - Audit: A Data Quality engine (like Great Expectations or dbt tests) runs a suite of tests against the
audit-branch. It checks for null values, schema violations, row counts, and anomaly detection. - Publish: If the tests fail, the pipeline alerts the engineering team, and the
audit-branchcan simply be discarded or investigated. If the tests pass, a simplefast-forwardoperation merges theaudit-branchpointer into themainbranch.
At the exact moment of the fast-forward publish, the new, fully-audited data becomes instantaneously visible to all downstream consumers. The WAP pattern guarantees that no user will ever query corrupted or incomplete data.
Diagram 2: The WAP Workflow
