We are excited to announce the release of Delta Lake 0.5.0, which introduces Presto/Athena support and improved concurrency.
Apache® Spark™ News
We recently hosted a live webinar — Geospatial Analytics and AI in Public Sector — during which we covered top geospatial analysis use cases in the Public Sector along with live demos showcasing how to build scalable analytics and machine learning pipelines on geospatial data at sale.
Advances in time series forecasting are enabling retailers to generate more reliable demand forecasts. The challenge now is to produce these forecasts in a timely manner and at a level of granularity that allows the business to make precise adjustments to product inventories. Leveraging Apache Spark™ and Facebook Prophet, more and more enterprises facing these challenges are finding they can overcome the scalability and accuracy limits of past solutions.
We started Databricks in 2013 in a tiny little office in Berkeley with the belief that data has the potential to solve the world’s toughest problems. We entered 2020 as a global organization with over 1000 employees and a customer base spanning from two-person startups to Fortune 10s.
Machine learning models can seem like magical savants. They can distinguish hot dogs from not-hot-dogs, but that’s long since an easy trick. My aunt’s parrot can do that too. But machine-learned models power voice-activated assistants that effortlessly understand noisy human speech, and cars that drive themselves more or less safely. It’s no wonder we assume these are at some level artificially ‘intelligent’.
The evolution and convergence of technology has fueled a vibrant marketplace for timely and accurate geospatial data. Every day billions of handheld and IoT devices along with thousands of airborne and satellite remote sensing platforms generate hundreds of exabytes of location-aware data. This boom of geospatial big data combined with advancements in machine learning is enabling organizations across industry to build new products and capabilities.
Cross posted from the Glow blog.
Companies rely on their big data and analytics platforms to support innovation and digital transformation strategies. However, many Hadoop users struggle with complexity, unscalable infrastructure, excessive maintenance overhead and overall, unrealized value. We help customers navigate their Hadoop migrations to modern cloud platforms such as Databricks and our partner products and solutions, and in this post, we’ll share what we’ve learned.
In the post Using AutoML Toolkit to Automate Loan Default Predictions, we had shown how the Databricks Labs’ AutoML Toolkit simplified Machine Learning model feature engineering and model building optimization (MBO). It also had improved the area-under-the-curve (AUC) from 0.6732 (handmade XGBoost model) to 0.723 (AutoML XGBoost model). With AutoML Toolkit’s Release 0.6.1, we have upgraded to MLflow version 1.3.0 and introduced a new Pipeline API that simplifies feature generation and inference.
The original blog is from Viacheslav Inozemtsev, Senior Data Engineer at Zalando, reproduced with permission.