PyCon 2017

Saturday, 20th May

14:35 - 15:05

Human-Machine Collaboration for Improved Analytical Processes | Talks

Over the last several years, Python developers interested in data science and analytics have acquired a variety of tools and libraries that aim to facilitate analytical processes. Libraries such as Pandas, Statsmodels, Scikit-learn, Matplotlib, Seaborn, and Yellowbrick have made tasks such as data wrangling, statistical modeling, machine learning, and data visualization much quicker and easier. They have accomplished this by automating and abstracting away some of the more tedious, repetitive processes involved with analyzing and modeling data. Over the next few years, we are sure to witness the introduction of new tools that are increasingly intelligent and have the ability to automate more complex analytical processes. However, as we begin using these tools (and developing new ones), we should strongly consider the level of automation that is most appropriate for each case. Some analytical processes are technically difficult to automate, and therefore require large degrees of human steering. Others are relatively easy to automate but perhaps should not be due to the unpredictability of results or outputs requiring a level of compassionate decision-making that machines simply don’t possess. Such processes would benefit greatly from the collaboration between automated machine tasks and uniquely human ones. After all, it is often systems that utilize a combination of both human and machine intelligence that achieve better results than either could on their own. In this talk, we will discuss human-machine collaboration as it applies to analyzing data with Python. We will review a framework for exploratory data analysis with the goal of identifying which tasks should be automated, which tasks should not, and which tasks would benefit from a more interactive, symbiotic, and collaborative process between the human and the machine. We will explore Python libraries that we can use to build tools that allow us to perform different types of analysis. We’ll also introduce the Cultivar project, an example of a hybrid analytics tool that combines a Django framework with Javascript visualizations and Celery for task management to facilitate more efficient and effective human-machine systems for data analysis.

Tony Ojeda

[Tony Ojeda](https://www.linkedin.com/in/tonyojeda) is a data scientist, author, and entrepreneur with expertise in streamlining business processes and over a decade of experience creating innovative data products. He is the Founder of District Data Labs and a Co-founder and former President of Data Community DC. Tony has an MS in Finance from Florida International University and an MBA in Strategy and Entrepreneurship from DePaul University. He co-authored the Practical Data Science Cookbook, published by Packt, and is also a co-author of the forthcoming O'Reilly book Applied Text Analytics with Python.

Portland Ballroom 252-253

Saturday, 20th May, 14:35 - 15:05

Talks