In 2016, Apache Spark released its second major version 2.0 and outgrew our wildest expectations: 4X growth in meetup members reaching 240,000 globally, and 2X growth in code contributors reaching 1000.
Apache® Spark™ News
A short and easy paper from the Databricks team to end the week. Given the pace of development in the Apache Spark world, a paper published in 2015 about enhancements to Spark will of course be a little dated. But this paper nicely captures some of the considerations in the transition from research project to commercial software – we see two years of that journey.
The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.
Today, the Datanami Readers’ and Editors’ Choice Awards recognized the sweeping changes Apache Spark is bringing to the Big Data landscape with four awards: Readers’ Choice – Best Big Data Product or Technology: Machine Learning Readers’ Choice – Best Big Data Product or Technology: Real-Time Analytics Readers’ and Editors’ Choice – Top 5 Open Source Projects to Watch Readers’ Choice – Best Big Data Startup: Databricks
In an interview with SiliconANGLE at the summit, Zaharia provided a primer on Spark, explained how he hopes to make it accessible to more mainstream business analysts, and gave his view on how open source business models are evolving. This is an edited version of the conversation
An overview of 13 core Apache Spark concepts, presented with focus and clarity in mind. A great beginner's overview of essential Spark terminology.
Thanks to an impressive grab bag of improvements in version 2.0, Spark's quasi-streaming solution has become more powerful and easier to manage.
The global big data market is poised to explode over the next decade, according to a new forecast, topping an estimated $92 billion by 2026 as new streaming analytics technologies emerge.
Matei Zaharia, the creator of Apache Spark, recently detailed three "exciting" improvements to the open source Big Data analytics project coming soon in version 2. Zaharia started the whole Spark thing pursuing his PhD at UC Berkeley. He's now an assistant professor of computer science at MIT and the CTO of Databricks Inc., a company he co-founded that now serves as the commercial steward of the popular data processing engine.
The folks at Databricks last week gave a glimpse of what’s to come in Spark 2.0, and among the changes that are sure to capture the attention of Spark users is the new Structured Streaming engine that leans on the Spark SQL API to simplify the development of real-time, continuous big data apps. In his keynote at Spark Summit East last week, Spark creator Matei Zaharia said the new Structured Streaming API that will debut later this year in Spark 2.0 will enable the creation of applications that combine real-time, interactive, and batch components.