At Collective we are in programmatic advertisement business, it means that all our advertisement decisions (what ad to show, to whom and at what time) are driven by models. We do a lot of machine learning, build thousands predictive models and use them to make millions decision per second.
Apache® Spark™ News
I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell.
Join us at Spark Summit to hear more about new functionalities of Apache Spark. Use the code Databricks20 to receive a 20% discount!
We introduced DataFrames in Spark 1.3 to make Apache Spark much easier to use. Inspired by data frames in R and Python, DataFrames in Spark expose an API that’s similar to the single-node data tools that data scientists are already familiar with. Statistics is an important part of everyday data science. We are happy to announce improved support for statistical and mathematical functions in the upcoming 1.4 release.
For the past several months, we have been working in collaboration with professors from the University of California Berkeley and University of California Los Angeles to produce two freely available Massive Open Online Courses (MOOCs). We are proud to announce that both MOOCs will launch in June on the edX platform!
We’re proud to announce that the new Spark Summit website is live! This includes the full list of community talks along with the first set of keynotes. With over 260 submissions this year, the Program Committee had its work cut out narrowing the list to 54 talks. We would like to thank everyone who submitted and invite everyone to submit a presentation for a future Spark Summit (just in case you have not heard as yet, we are taking the Spark Summit to Amsterdam this fall.).
In a previous blog post, we looked back and surveyed performance improvements made to Spark in the past year. In this post, we look forward and share with you the next chapter, which we are calling Project Tungsten. 2014 witnessed Spark setting the world record in large-scale sorting and saw major improvements across the entire engine from Python to SQL to machine learning. Performance optimization, however, is a never ending process.
In this post, we look back and cover recent performance efforts in Spark. In a follow-up blog post next week, we will look forward and share with you our thoughts on the future evolution of Spark’s performance.
This is a guest blog post from Huawei’s big data global team.
This is a guest post from Bob DuCharme.