Parallel Data Analysis | Tutorials
An overview of parallel computing techniques available from Python and hands-on experience with a variety of frameworks. This course has two primary goals: 1. Teach students how to reason about parallel computing 2. Provide hands-on experience with a variety of different parallel computing frameworks. Students will walk away with both a high-level understanding of parallel problems and how to select and use an appropriate parallel computing framework for their problem. They will get hands-on experience using tools both on their personal laptop, and on a cluster environment that will be provided for them at the tutorial. For the first half we cover programming patterns for parallelism found across many tools, notably map, futures, and big-data collections. We investigate these common APIs by diving into a sequence of examples that require increasingly complex tools. We learn the benefits and costs of each API and the sorts of problems where each is appropriate. For the second half, we focus on the performance aspects of frameworks and give intuition on how to pick the right tool for the job. This includes common challenges in parallel analysis, such as communication costs, debugging parallel code, as well as deployment and setup strategies.
Ben Zaitlen
Ben is a data scientist and developer at Continuum Analytics. He has several years of experience with Python and is passionate about any and all forms of data. Part of his duties at Continuum include exploring a vast array of data (social networks, climate, astronomy, biology, finance, etc.).
Matthew Rocklin
Matthew is a full time open source developer at Continuum Analytics where he builds Python tools for parallel data analysis.
Min Ragan-Kelley
Min has been a core developer of IPython (and now Jupyter) since 2006. He holds a PhD from UC Berkeley in Applied Science & Technology, with an emphasis in computational plasma physics. He now works as a postdoctoral researcher at Simula Research Laboratory in Oslo, Norway, on the Jupyter and OpenDreamKit projects, focusing on JupyterHub and the Jupyter protocols for interactive computing.
Room 7
Thursday, 18th May, 13:20 - 16:40