Apache® Spark™ News

Diving Into Delta Lake: DML Internals (Update, Delete, Merge)

In previous blogs Diving Into Delta Lake: Unpacking The Transaction Log and Diving Into Delta Lake: Schema Enforcement & Evolution, we described how the Delta Lake transaction log works and the internals of schema enforcement and evolution.  Delta Lake supports DML (data manipulation language) commands including `DELETE`, `UPDATE`, and `MERGE`. These commands simplify change data capture (CDC), audit and governance, and GDPR/CCPA workflows, among others. In this post, we will demonstrate how to use each of these DML commands, describe what Delta Lake is doing behind the scenes when you run one, and offer some performance tuning tips for each one.  More specifically: