No More Sad Pandas: Optimizing Pandas Code for Speed and Efficiency | Talks
When I first began working with the Python Pandas library, I was told by an experienced Python engineer: 'Pandas is fine for prototyping a bit of calculations,but it's too slow for any time-sensitive applications.' Over multiple years of working with the Pandas library, I have realized that this was only true if not enough care is put into identifying proper ways to optimize the code's performance.This talk will review some of the most common beginner pitfalls that can cause otherwise perfectly good Pandas code to grind to a screeching halt, and walk through a set of tips and tricks to avoid them.Using a series of examples, we will review the process for identifying the elements of the code that may be causing a slowdown,and discuss a series of optimizations, ranging from good practices of input data storage and reading, to the best methods for avoiding inefficient iterations, to using the power of vectorization to optimize functions for Pandas dataframes.
Sofia Heisler
Sofia Heisler is the Lead Data Scientist at Upside Travel, where she develops pricing and product selection algorithms for the travel industry. Previously, she headed up data analytics for a D.C. startup dedicated to connecting small businesses to vendors, as well as performed data analysis on behalf of some of the largest Fortune 500 companies as a Senior Consultant at an economic consulting company. She holds a Master’s degree in Predictive Analytics from Northwestern University, as well as a B.A. and a B.S. in Economics with a concentration in Statistics from the University of Pennsylvania.
Oregon Ballroom 201-202
Saturday, 20th May, 16:30 - 17:00