Apache® Spark™ News

Spark 0.8.1 Released

We are happy to announce the release of Apache Spark 0.8.1. In addition to performance and stability improvements, this release adds three new features. First, Spark now supports for the newest versions of YARN (2.2+). Second, the standalone cluster manager supports a high-availability mode in which it can tolerate master failures. Third, shuffles have been optimized to create fewer files, improving shuffle performance drastically in some settings.

Putting Spark to Use – Fast In-Memory Computing for Your Big Data Applications

Apache Hadoop has revolutionized big data processing, enabling users to store and process huge amounts of data at very low costs. MapReduce has proven to be an ideal platform to implement complex batch applications as diverse as sifting through system logs, running ETL, computing web indexes, and powering personal recommendation systems. However, its reliance on persistent storage to provide fault tolerance and its one-pass computation model make MapReduce a poor fit for low-latency applications and iterative computations, such as machine learning and graph algorithms.