Apache® Spark™ News

Four Things to Know about Reliable Spark Streaming with Typesafe and Databricks

Last week, we were happy to have a Typesafe co-webinar with Databricks, the company founded by the team that started the Spark research project at UC Berkeley that later became Apache Spark. Our Big Data Architect Dean Wampler and Datatbrick's Lead Engineer for Spark Streaming, Tathagata Das (TD) provided a 1-hour presentation with Q/A on Spark Streaming, which makes it easy to build scalable fault-tolerant streaming applications with Apache Spark. In this webinar, we reviewed: - See more at: https://www.typesafe.com/blog/four-things-to-know-about-reliable-spark-streaming-typesafe-databricks#sthash.7Nm47kiw.dpuf

Python Versus R in Apache Spark

The June update to Apache Spark brought support for R, a significant enhancement that opens the big data platform to a large audience of new potential users. Support for R in Spark 1.4 also gives users an alternative to Python. But which language will emerge as the winner for doing data science in Spark? We spoke to Databricks Ali Ghodsi for answers.

Configuring and Deploying Apache Spark

I gave this talk at the inaugural SF Spark and Friends Meetup group in San Francisco during the week of the Spark Summit this year. While researching this talk, I realized there is very little material out there giving an overview of the many rich options for deploying and configuring Apache Spark. There are some specific articles by vendors - targeting YARN, or DSE, etc., but I think what developers really want is a broad overview. So, this post will give you that, but you will have to look through the slides here to dig through the meat of it. ...

A Spark is Lit in HDInsight

Apache Spark has garnered a lot of developer attention and is often the top of agenda in my customer interactions. Since we announced support for Spark in HDP, we have seen broad customer adoption of our Spark offering. Our customers love Spark for the simplicity of its API, speed of development and the runtime performance. Spark is also democratizing Machine Learning and making it easier and approachable to more developers. Today Microsoft announced support for Spark in HDInsight – this is a big step towards driving customer adoption for Spark workloads on Hadoop clusters in Azure.

How-to: Do Data Quality Checks using Apache Spark DataFrames

Apache Spark’s ability to support data quality checks via DataFrames is progressing rapidly. This post explains the state of the art and future possibilities. Apache Hadoop and Apache Spark make Big Data accessible and usable so we can easily find value, but that data has to be correct, first. This post will focus on this problem and how to solve it with Apache Spark 1.3 and Apache Spark 1.4 using DataFrames. (Note: although relatively new to Spark and thus not yet supported by Cloudera at the time of this writing, DataFrames are highly worthy of exploration and experimentation. Learn more about Cloudera’s support for Apache Spark here.)

Leaf in the Wild: Stratio Integrates Apache Spark and MongoDB to Unlock New Customer Insights for One of World’s Largest Banks

There is no question that Apache Spark is on fire. It’s the most active big data project in the Apache Software Foundation, and was recently “blessed” by IBM who committed 3,500 engineers to advancing it. While some are still confused by what it is, or claiming it will kill Hadoop (which it won’t, or at least not the non-MapReduce parts of it), there are already companies today harnessing its power to build next generation analytics applications. Stratio are one such company. With an impressive client list including BBVA, Just Eat, Santander, SAP, Sony and Telefonica, Stratio claims more projects and clients with its Apache Spark-certified Big Data (BD) platform than pretty much anyone else.

Apache Spark in the Enterprise and in China

IBM’s announcements at the recent Spark Summit in SF bodes well for enterprise adoption of Spark. Ben Horowitz jokingly referred to IBM’s endorsement as akin to a Rabbi blessing Spark as kosher for use in an enterprise. I recently sat down with a set of luminaries at the Spark Summit and asked them about how Spark is perceived in enterprises. Below is a selection of responses...

Guest blog: PMML Support in Spark MLlib

This is a guest blog from our friend Vincenzo Selvaggio who contributed this feature. He is a Senior Java Technical Architect and Project Manager, focusing on delivering advanced business process solutions for investment banks.

Couchbase Spark Connector 1.0 Beta Release

More or less exactly two months after the second developer preview, I'm delighted to announce that we've shipped the first (and hopefully only) beta release of the Couchbase Spark Connector. It is a major step forward, bringing Spark 1.4 support as well as official documentation and lots of smaller enhancements. In particular: ....