Apache® Spark™ News

100x Faster Bridge between Apache Spark and R with User-Defined Functions on Databricks

SparkR UDF API transfers data between Spark JVM and R process back and forth. Inside the UDF function, user gets a wonderful island of R with access to the entire R ecosystem. But unfortunately, the bridge between R and JVM is far from efficient. It currently only allows one “car” to pass on the bridge at any time, and the “car” here is a single field in any Row of a SparkDataFrame. It should not be a surprise that traffic on the bridge is very slow.