Apache® Spark™ News

Understanding your Spark application through visualization

In the past, the Spark UI has been instrumental in helping users debug their applications. In the latest Spark 1.4 release, we are happy to announce that the data visualization wave has found its way to the Spark UI. The new visualization additions in this release includes three main components:

MemSQL’s Essential Resources for Apache Spark

There’s no doubt about it. Apache Spark is well on its way to becoming a ubiquitous technology. Over the past year, we’ve created resources to help our users understand the real-world use cases for Spark as well as showcase how our technologies compliment one another. Now, we’ve organized and consolidated those materials into this very post.

Databricks is now Generally Available

We are excited to announce today, at Spark Summit 2015, the general availability of the Databricks – a hosted data platform from the team that created Apache Spark. With Databricks, you can effortlessly launch Spark clusters, explore data interactively, run production jobs, and connect third-party applications. We believe Databricks is the easiest way to use big data.

Databricks and IBM Collaborate to Enhance Apache Spark Machine Learning

At today’s Spark Summit, Databricks and IBM announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project.  Over the course of the next few months, Databricks and IBM will collaborate to expand Spark’s machine learning capabilities. The companies plan to introduce new domain specific algorithms to the Spark ecosystem and add new machine learning primitives in the Apache Spark Project. IBM and Databricks will also collaborate to integrate IBM’s SystemML – a robust machine-learning engine for large-scale data, with the Spark platform.

Announcing Apache Spark 1.4

Today I’m excited to announce the general availability of Spark 1.4! Spark 1.4 introduces SparkR, an R API targeted towards data scientists. It also evolves Spark’s DataFrame API with a large number of new features. Spark’s ML pipelines API first introduced in Spark 1.3 graduates from an alpha component. Finally, Spark Streaming and Core add visualization and monitoring to aid in production debugging.  We’ll be publishing in-depth posts covering Spark’s new features over the coming weeks. Here I’ll briefly outline some of the major themes and features in this release.

Announcing SparkR: R on Spark

I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell.

Statistical and Mathematical Functions with DataFrames in Spark

We introduced DataFrames in Spark 1.3 to make Apache Spark much easier to use. Inspired by data frames in R and Python, DataFrames in Spark expose an API that’s similar to the single-node data tools that data scientists are already familiar with. Statistics is an important part of everyday data science. We are happy to announce improved support for statistical and mathematical functions in the upcoming 1.4 release.

Databricks Launches MOOC: Data Science on Spark

For the past several months, we have been working in collaboration with professors from the University of California Berkeley and University of California Los Angeles to produce two freely available Massive Open Online Courses (MOOCs). We are proud to announce that both MOOCs will launch in June on the edX platform!