Apache® Spark™ News

How to build a Quality of Service (QoS) analytics solution for streaming video services

As traditional pay TV continues to stagnate, content owners have embraced direct-to-consumer (D2C) subscription and ad-supported streaming for monetizing their libraries of content. For companies whose entire business model revolved around producing great content which they then licensed to distributors, the shift to now owning the entire glass-to-glass experience has required new capabilities such as building media supply chains for content delivery to consumers, supporting apps for a myriad of devices and operating systems, and performing customer relationship functions like billing and customer service.

Faster SQL Queries on Delta Lake with Dynamic File Pruning

There are two time-honored optimization techniques for making queries run faster in data systems: process data at a faster rate or simply process less data by skipping non-relevant data. This blog post introduces Dynamic File Pruning (DFP), a new data-skipping technique, which can significantly improve queries with selective joins on non-partition columns on tables in Delta Lake, now enabled by default in Databricks Runtime.”

COVID-19 Datasets Now Available on Databricks: How the Data Community Can Help

With the massive disruption of the current COVID-19 pandemic, many data engineers and data scientists are asking themselves “How can the data community help?” The data community is already doing some amazing work in a short amount of time including (but certainly not limited to) one of the most commonly used COVID-19 data sources: the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. The following animated GIF is a visual representation of the proportional number of confirmed cases (counties) and deaths (circles) spanning from March 22nd to April 14th.

On-Demand Webinar: Granular Demand Forecasting At Scale

We recently hosted a live webinar — How Starbucks Forecasts Demand at Scale with Facebook Prophet and Databricks — During this webinar we learnt why Demand Forecasting is critical to Retail/ CPG firms and how it enables 22 other use cases. Brendan O’Shaughnessy, Data Science Manager at Starbucks walked us through how Starbucks does demand forecasting at scale. We also did a step by step demo on how to perform fine-grained demand forecasts on a day/store/SKU level with Databricks and Facebook’s Prophet

Automating Digital Pathology Image Analysis with Machine Learning on Databricks

With technological advancements in imaging and the availability of new efficient computational tools, digital pathology has taken center stage in both research and diagnostic settings. Whole Slide Imaging (WSI) has been at the center of this transformation, enabling us to rapidly digitize pathology slides into high resolution images. By making slides instantly shareable and analyzable, WSI has already improved reproducibility and enabled enhanced education and remote pathology services.

Fine-Grained Time Series Forecasting At Scale With Facebook Prophet And Apache Spark

Advances in time series forecasting are enabling retailers to generate more reliable demand forecasts. The challenge now is to produce these forecasts in a timely manner and at a level of granularity that allows the business to make precise adjustments to product inventories. Leveraging Apache Spark™ and Facebook Prophet, more and more enterprises facing these challenges are finding they can overcome the scalability and accuracy limits of past solutions.