Apache® Spark™ News

Why you should use Spark for machine learning

As organizations create more diverse and more user-focused data products and services, there is a growing need for machine learning, which can be used to develop personalizations, recommendations, and predictive insights. Traditionally, data scientists are able to solve these problems using familiar and popular tools such as R and Python. But as organizations amass greater volumes and greater varieties of data, data scientists are spending a majority of their time supporting their infrastructure instead of building the models to solve their data problems. To help solve this problem, Spark provides a general machine learning library -- MLlib -- that is designed for simplicity, scalability, and easy integration with other tools. With the scalability, language compatibility, and speed of Spark, data scientists can solve and iterate through their data problems faster. As can be seen in both the expanding diversity of use cases and the large number of developer contributions, MLlib’s adoption is growing quickly.