The subscription model is experiencing a renaissance. Gone are the days of the penny music CD clubs, replaced by an ever-increasing assortment of digital streaming services delivering music, videos and more directly to consumers’ devices in exchange for a modest recurring fee. Today, 70% of US households subscribe to at least one subscription streaming service with an average of 3.4 such subscriptions per subscriber household.
Apache® Spark™ News
How to Extract Market Drivers at Scale Using Alternative Data
Watch the on-demand webinar Alternative Data Analytics with Python for a demonstration of the solution discussed in this blog and/or download the following notebooks to try it yourself.
A data-driven approach to Environmental, Social and Governance
The future of finance goes hand in hand with social responsibility, environmental stewardship and corporate ethics. In order to stay competitive, Financial Services Institutions (FSI) are increasingly disclosing more information about their environmental, social and governance (ESG) performance. By better understanding and quantifying the sustainability and societal impact of any investment in a company or business, FSIs can mitigate reputation risk and maintain the trust with both their clients and shareholders. At Databricks, we increasingly hear from our customers that ESG has become a C-suite priority. This is not solely driven by altruism but also by economics: Higher ESG ratings are generally positively correlated with valuation and profitability while negatively correlated with volatility. In this blog post, we offer a novel approach to sustainable investing by combining natural language processing (NLP) techniques and graph analytics to extract key strategic ESG initiatives and learn companies’ relationships in a global market and their impact to market risk calculations.
Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies
A Databricks cluster policy is a template that restricts the way users interact with cluster configuration. Today, any user with cluster creation permissions is able to launch an Apache Spark™ cluster with any configuration. This leads to a few issues:
Announcing GPU-aware scheduling and enhanced deep learning capabilities
Databricks is pleased to announce the release of Databricks Runtime 7.0 for Machine Learning (Runtime 7.0 ML) which provides preconfigured GPU-aware scheduling and adds enhanced deep learning capabilities for training and inference workloads.
Time Traveling with Delta Lake: A Retrospective of the Last Year
Try out Delta Lake 0.7.0 with Spark 3.0 today!
Customer Lifetime Value Part 2: Estimating Future Spend
Download the Customer Lifetimes Part 2 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You can also go to Part 1 to learn how to estimate customer lifetime duration.
On-Demand Virtual Session: Customer Lifetime Value
Before you can provide personalized services and offers to your customers, you need to know who they are. In this virtual workshop, retail and media experts will demonstrate how to build advanced customer lifetime value (CLV) models. From there companies can provide the right investment into each customer in order to create personalized offers, save tactics, and experiences.
Simplify Data Conversion from Apache Spark to TensorFlow and PyTorch
Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. We are excited to announce that Petastorm 0.9.0 supports the easy conversion of data from Apache Spark DataFrame to TensorFlow Dataset and PyTorch DataLoader. The new Spark Dataset Converter API makes it easier to do distributed model training and inference on massive data, from multiple data sources. The Spark Dataset Converter API was contributed by Xiangrui Meng, Weichen Xu, and Liang Zhang (Databricks), in collaboration with Yevgeni Litvin and Travis Addair (Uber).
Accelerating Somatic Variant Calling with the Databricks TNSeq Pipeline
Genetic analyses are a critical tool in revolutionizing how we treat cancer. By understanding the mutations present in tumor cells, researchers can gain clues that lead to drug targets and eventually new therapies. At the same time, genetic characterizations of individual tumors enables physicians to tailor treatments to individual patients and improve outcomes while reducing side effects.