Apache® Spark™ News

Spark 1.1: The State of Spark Streaming

With Spark 1.1 recently released, we’d like to take this occasion to feature one of the most popular Spark components – Spark Streaming – and highlight who is using Spark Streaming and why.

Announcing Spark 1.1

Today we’re thrilled to announce the release of Spark 1.1! Spark 1.1 introduces many new features along with scale and stability improvements. This post will introduce some key features of Spark 1.1 and provide context on the priorities of Spark for this and the next release. Read more

Statistics Functionality in Spark 1.1

One of our philosophies in Spark is to provide rich and friendly built-in libraries so that users can easily assemble data pipelines. With Spark, and MLlib in particular, quickly gaining traction among data scientists and machine learning practitioners, we’re observing a growing demand for data analysis support outside of model fitting. To address this need, we have started to add scalable implementations of common statistical functions to facilitate various components of a data pipeline. Read more

Shark, Spark SQL, Hive on Spark, and the future of SQL on Spark

With the introduction of Spark SQL and the new Hive on Spark effort (HIVE-7292), we get asked a lot about our position in these two projects and how they relate to Shark. At the Spark Summit today, we announced that we are ending development of Shark and will focus our resources towards Spark SQL, which will provide a superset of Shark’s features for existing Shark users to move forward. In particular, Spark SQL will provide both a seamless upgrade path from Shark 0.9 server and new features such as integration with general Spark programs.

Announcing Spark 1.0

Today, we’re very proud to announce the release of Apache Spark 1.0. Spark 1.0 is a major milestone for the Spark project that brings both numerous new features and strong API compatibility guarantees. The release is also a huge milestone for the Spark developer community: with more than 110 contributors over the past 4 months, it is Spark’s largest release yet, continuing a trend that has quickly made Spark the most active project in the Hadoop ecosystem.

Making Spark Easier to Use in Java with Java 8

One of Spark’s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala and Python, but its Java API was verbose due to the lack of function expressions. With the addition of lambda expressions in Java 8, we’ve updated Spark’s API to transparently support these expressions, while staying compatible with old versions of Java. This new support will be available in Spark 1.0.

Spark 0.9.1 Released

We are happy to announce the availability of Spark 0.9.1! This is a maintenance release with bug fixes, performance improvements, better stability with YARN and improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release.

Spark Now a Top-level Apache Project

We are delighted with the recent announcement of the Apache Software Foundation that Spark has become a top-level Apache project. This is a recognition of the fantastic work done by the Spark open source community, which now counts over 140 developers from 30+ companies.