While the above might be obvious, users are often surprised to discover that:
Apache® Spark™ News
To take advantage of this, we worked to introduce a new millisecond low-latency mode of streaming called continuous mode in Apache Spark 2.3, now available in Databricks Runtime 4.0 as part of Databricks Unified Analytics Platform.
Since we introduced Structured Streaming in Apache Spark 2.0, it has supported joins (inner join and some type of outer joins) between a streaming and a static DataFrame/Dataset. With the release of Apache Spark 2.3.0, now available in Databricks Runtime 4.0 as part of Databricks Unified Analytics Platform, we now support stream-stream joins. In this post, we will explore a canonical case of how to use stream-stream joins, what challenges we resolved, and what type of workloads they enable. Let’s start with the canonical use case for stream-stream joins – ad monetization.
In recent years, machine learning has become ubiquitous in industry and production environments. Both academic and industry institutions had previously focused on training and producing models, but the focus has shifted to productionizing the trained models. Now we hear more and more machine learning practitioners really trying to find the right model deployment options.
The open source community has been working over the past year to enable first-class support for data processing, data analytics and machine learning workloads in Kubernetes. New extensibility features in Kubernetes, such as custom resources and custom controllers, can be used to create deep integrations with individual applications and frameworks.
Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want to thank the Apache Spark community for all their valuable contributions to Spark 2.3 release.
In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and how they impact Databricks from a security and performance perspective.
Last week, the details of two industry-wide security vulnerabilities, known as Meltdown and Spectre, were released. These exploits enable cross-VM and cross-process attacks by allowing untrusted programs to scan other programs’ memory.
We are excited to announce the general availability of Databricks Cache, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark workloads up to 10x, without any application code change.
We are excited to announce the general availability of DBIO caching, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark workloads up to 10x, without any application code change.