Apache® Spark™ News

Analyze one year of radio station songs aired with Apache Spark SQL, Apache Spark, Spotify, and Databricks

This article will present the year 2016 for 4 main french radio stations through fun SQL queries, then we will connect each song to the Spotify API to create the radio stations’ musical profile. We will use the Databricks community version to visualize our data. All SQL queries and all results are available on this notebook. It’s the “backstage” of this article, where the magic happens if we can say.

Why Spark Is Proving So Valuable for Data Science in the Enterprise

What do Bloomberg, CapitalOne, and Comcast have in common? If you said they’re operationalizing data science using Apache Spark, then give yourself a gold star. These companies shared their stories of the upstart analytics toolbox during the recent Spark Summit East conference, and as the stories show, Spark is not only helping enterprises achieve analytic dreams, but they’re accelerating the development of Spark along the way. If you’re a regular reader of this publication, then you already know about that Apache Spark is currently the hottest project in the big data analytics and data science community. The Hadoop distributors have long since jumped on the Spark bandwagon, and even IBM is now singing the praises of the free and open source distributed analytic framework that competes with so many of its proprietary offerings.

Why you should use Spark for machine learning

As organizations create more diverse and more user-focused data products and services, there is a growing need for machine learning, which can be used to develop personalizations, recommendations, and predictive insights. Traditionally, data scientists are able to solve these problems using familiar and popular tools such as R and Python. But as organizations amass greater volumes and greater varieties of data, data scientists are spending a majority of their time supporting their infrastructure instead of building the models to solve their data problems. To help solve this problem, Spark provides a general machine learning library -- MLlib -- that is designed for simplicity, scalability, and easy integration with other tools. With the scalability, language compatibility, and speed of Spark, data scientists can solve and iterate through their data problems faster. As can be seen in both the expanding diversity of use cases and the large number of developer contributions, MLlib’s adoption is growing quickly.

How Apache Spark Helped Eight Companies Grow Their Businesses

As real-time applications become more mainstream and companies continue to collect massive amounts of data, users have embraced Apache Spark for its ability to do sophisticated analytics at scale. First developed in the AMPLab at UC Berkeley, Apache Spark is a powerful open-source processing engine built around speed, ease of use and sophisticated analytics that's powering millions of real-time applications every single day. Spark lets you quickly write applications in Java, Scala or Python and supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case. Spark has quickly become the largest open-source community in big data, with more than 750 contributors from 200-plus organizations. In this slide show, eWEEK combed through online archives and the Apache Spark Website and worked with in-memory database company MemSQL to develop a list of companies that are using Spark to support and grow their businesses.

Mesosphere Infinity: You’re 4 Words Away from a Complete Big Data System

One command, four words and users will have a next-generation big data system in place, capable of processing the streams of information flowing into their companies every second of every day. That’s the promise we’re making with the announcement of Mesosphere Infinity, a new product that combines a best-of-breed real-time analytics stack into a single package in our Datacenter Operating System (DCOS).

Featurizing Data: Spark and Beyond

In this Hortonworks’ partner guest blog, Abhimanyu Aditya, Senior Product Manager and co-founder at Skytree, explains how Skytree APIs solve challenges facing data engineers, simplifies data preparation and data transformation, using Apache Spark on YARN with Hortonworks Data Platform (HDP).

Page 1 Page 2