With Spark 1.3, MLlib now supports Latent Dirichlet Allocation (LDA), one of the most successful topic models. LDA is also the first MLlib algorithm built upon GraphX. In this blog post, we provide an overview of LDA and its use cases, and we explain how GraphX was a natural choice for implementation.
Apache® Spark™ News
What’s new for Spark SQL in Spark 1.3
The Spark 1.3 release represents a major milestone for Spark SQL. In addition to several major features, we are very excited to announce that the project has officially graduated from Alpha, after being introduced only a little under a year ago. In this blog post we will discuss exactly what this step means for compatibility moving forward, as well as highlight some of the major features of the release.
Using MongoDB with Spark
This is a guest blog from Matt Kalan, a Senior Solution Architect at MongoDB
Announcing Spark 1.3!
Today I’m excited to announce the general availability of Spark 1.3! Spark 1.3 introduces the widely anticipated DataFrame API, an evolution of Spark’s RDD abstraction designed to make crunching large datasets simple and fast. Spark 1.3 also boasts a large number of improvements across the stack, from Streaming, to ML, to SQL. The release has been posted today on the Apache Spark website.
Introducing DataFrames in Spark for Large Scale Data Science
Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience.
Spark: A review of 2014 and looking ahead to 2015 priorities
2014 has been a year of tremendous growth for Apache Spark. It became the most active open source project in the Big Data ecosystem with over 400 contributors, and was adopted by many platform vendors – including all of the major Hadoop distributors. Through our ecosystem of products, partners, and training at Databricks, we also saw over 200 enterprises deploying Spark in production.
Apache Spark selected for Infoworld 2015 Technology of the Year Award
Recently Infoworld unveiled the 2015 Technology of the Year Award winners, which range from open source software to stellar consumer technologies like the iPhone. Being the creators and driving force behind Spark, Databricks is thrilled to see Spark in their ranks. In fact, we built our flagship product, Databricks Cloud, on top of Spark with the ambition to revolutionize big data processing in ways similar to how iPhone revolutionized the mobile experience.
An introduction to JSON support in Spark SQL
In this blog post, we introduce Spark SQL’s JSON support, a feature we have been working on at Databricks to make it dramatically easier to query and create JSON data in Spark. With the prevalence of web and mobile applications, JSON has become the de-facto interchange format for web service API’s as well as long-term storage. With existing tools, users often engineer complex pipelines to read and write JSON data sets within analytical systems. Spark SQL’s JSON support, released in version 1.1 and enhanced in Spark 1.2, vastly simplifies the end-to-end-experience of working with JSON data.
Spark Summit East 2015 Agenda is Now Available
We are thrilled to announce the availability of the agenda for Spark Summit East 2015! This inaugural New York City event on March 18-19, 2015 has over thirty jam-packed sessions – offering a combination of longer deep-dive presentations and shorter intensive talks. You will have the opportunity to engage the speakers and your peers in discussion and a cross-pollination of ideas.
Spark Certified Developer exams available online!
Complementing our on-going direct and partner-led Spark training efforts, Databricks has teamed up with O’Reilly to offer the industry’s first standard for measuring and validating a developer’s expertise with Spark.