Loading Events

« All Events

  • This event has passed.

Winning Daytona GraySort: Shuffle + Network + CPU Cache and Perf Optimizations

September 3, 2015 @ 6:30 pm - 9:00 pm

Code-level Deep Dive into the optimizations that allowed Spark to win the Daytona GraySort Challenge:

https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html

We’ll discuss the following at a code level:

1) Sort-based Shuffle (less OS resources)

https://issues.apache.org/jira/browse/SPARK-2045

2) Netty-based Network module (epoll, async, ByteBuffer reuse)

https://issues.apache.org/jira/browse/SPARK-2468

3) External Shuffle Service (also allows for auto-scaling of Worker nodes)

https://issues.apache.org/jira/browse/SPARK-3796

4) AlphaSort style cache locality optimizations

http://www.slideshare.net/SparkSummit/deep-dive-into-project-tungsten-josh-rosen (slide 22)

https://issues.apache.org/jira/browse/SPARK-7082

5) https://issues.apache.org/jira/browse/SPARK-9850

 

Winning Daytona GraySort: Shuffle + Network + CPU Cache and Perf Optimizations

Details

Date:
September 3, 2015
Time:
6:30 pm - 9:00 pm
Event Category:
Website:
http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/223665950/

Venue

IBM Spark Technology Center
425 Market St
San Francisco, CA United States
+ Google Map