Anaconda: Accelerating your Python Data Science code with Dask and Numba | Sponsor tutorials
Anyone doing numerical computing with Python will have run into performance barriers. Using Anaconda is a great start to get a suite of extension packages where the underlying data structures and algorithms are written in C or Fortan. We'll briefly review the state of numerical computing in Python, look at some examples to help you remember why you should use NumPy based packages whenever possible, and focus on two options for acceleration: faster serial computing or parallelization. Continuum Analytics has developed two popular open source packages to address these issues: Numba, which provides an LLVM-based JIT that can be easily accessed just through a decorator; and Dask, which provides a distributed computing framework and some high quality data structures that are similar to a Pandas DataFrame or a NumPy NDarray. Participants should have the latest release of Anaconda installed and have some familiarity with Python in order to follow along interactively with the tutorial where we'll learn how to efficiently leverage Dask and Numba.
Ian Stokes-Rees
Ian is a computational scientist and engineer at Continuum Analytics. He loves Python, and finding great ways to use it to solve big hairy problems in scientific computing, data analysis, and visualization. Ian helped develop a Python-based computational infrastructure for the CERN LHCb experiment during his PhD at Oxford, and followed that with work on distributed MC option pricing algorithms while a postdoctoral research at INRIA (France). Prior to joining Continuum, Ian spent 5 years at Harvard, first developing a science gateway for computational biology (in Python, of course), and then as lecturer in the School of Engineering.
Room B118-119
Thursday, 18th May, 13:30 - 15:00