Today, we’re celebrating an important milestone for the Spark project — it’s now been five years since Spark was first open sourced. When we first decided to release our research code at UC Berkeley, none of us knew how far Spark would make it, but we believed we had built some really neat technology that we wanted to share with the world. In the five years since, we’ve been simply awed by the numerous contributors and users that have made Spark the leading-edge computing framework it is today. Indeed, to our knowledge, Spark has now become the most active open source project in big data (looking at either contributors per month or commits per month). In addition to contributors, it has built up an array of hundreds of production use cases from batch analytics to stream processing.