The Wild West of Data Wrangling | Talks
Data science introductory courses might give you the impression that dealing with data is neat, tidy, and simple. They present you with a simplistic dataset and the scikit-learn or Pandas documentation, and a day or so later, you're done! Piece of cake, right? The real world of data isn't that easy! As a data scientist who has worked in the industry for several years, I have had a lot of experience dealing with messy, inaccurate, incomplete data, and I want to share those experiences with you. I'll talk my way through three real-world situations where I've had to analyze and build models on untidy and complex data, going through how I've preprocessed the data and prepared it for modeling. You'll leave with an understanding of how a data scientist thinks about data and what she does when the data is complicated.
Sarah Guido
Sarah is a Senior Data Scientist at Mashable where she studies user behavior through data. She is the chair of the Machine Learning/Artificial Intelligence track at the 2017 SciPy Conference and is an accomplished conference speaker. She is also an O'Reilly Media author, having co-authored Introduction to Machine Learning with Python. Community involvement is very important to Sarah, and she is a co-organizer of the NYC Python Meetup, the largest Python meetup in the world. Sarah attended graduate school at the University of Michigan's School of Information.
Portland Ballroom 254-255
Friday, 19th May, 17:10 - 17:40