Streaming data is changing the way people look at Big Data processing. The benefit of streaming is that data can be used as it comes in, rather than waiting for it to be sorted, stored and evaluated. In a market where data-driven customer interactions happen in seconds, this is a huge advantage. To help bring streaming data into focus, Jeff Frick and George Gilbert, cohosts of theCUBE, from the SiliconANGLE Media team, joined Reynold Xin at the Spark Summit East 2016 conference. Xin is the cofounder and chief architect for Spark at Databricks, Inc.
Apache® Spark™ News
Databricks 2015 Year In Review: Democratizing Access to Data
To learn more about Spark, attend Spark Summit East in New York in Feb 2016.
Announcing Spark 1.6
To learn more about Spark, attend Spark Summit East in New York in Feb 2016.
Guest Blog: Streamliner – An Open Source Spark Streaming Application
This is a guest blog from Ankur Goyal, VP of Engineering at MemSQL
Announcing Spark 1.5
The inaugural Spark Summit Europe will be held in Amsterdam this October. Check out the full agenda and get your ticket before it sells out!
Diving into Spark Streaming’s Execution Model
With so many distributed stream processing engines available, people often ask us about the unique benefits of Spark Streaming. From early on, Apache Spark has provided an unified engine that natively supports both batch and streaming workloads. This is different from other systems that either have a processing engine designed only for streaming, or have similar batch and streaming APIs but compile internally to different engines. Spark’s single execution engine and unified programming model for batch and streaming lead to some unique benefits over other traditional streaming systems. In particular, four major aspects are:
Four Things to Know about Reliable Spark Streaming with Typesafe and Databricks
Last week, we were happy to have a Typesafe co-webinar with Databricks, the company founded by the team that started the Spark research project at UC Berkeley that later became Apache Spark. Our Big Data Architect Dean Wampler and Datatbrick's Lead Engineer for Spark Streaming, Tathagata Das (TD) provided a 1-hour presentation with Q/A on Spark Streaming, which makes it easy to build scalable fault-tolerant streaming applications with Apache Spark. In this webinar, we reviewed: - See more at: https://www.typesafe.com/blog/four-things-to-know-about-reliable-spark-streaming-typesafe-databricks#sthash.7Nm47kiw.dpuf
New Visualizations for Understanding Spark Streaming Applications
Earlier, we presented new visualizations introduced in Spark 1.4.0 to understand the behavior of Spark applications. Continuing the theme, this blog highlights new visualizations introduced specifically for understanding Spark Streaming applications. We have updated the Streaming tab of the Spark UI to show the following:
Databricks is now Generally Available
We are excited to announce today, at Spark Summit 2015, the general availability of the Databricks – a hosted data platform from the team that created Apache Spark. With Databricks, you can effortlessly launch Spark clusters, explore data interactively, run production jobs, and connect third-party applications. We believe Databricks is the easiest way to use big data.
Recent performance improvements in Apache Spark: SQL, Python, DataFrames, and More
In this post, we look back and cover recent performance efforts in Spark. In a follow-up blog post next week, we will look forward and share with you our thoughts on the future evolution of Spark’s performance.