Speakers

Alex Dadgar
Alex Dadgar (Project Lead Hashicorp)

Alex is the project lead for Nomad, a distributed, highly-available cluster scheduler by HashiCorp. Prior to joining HashiCorp, Alex worked at Google where he architected a streaming-processing system to handle terabytes of YouTube data a day. Having seen the dream of infrastructure at Google, he joined HashiCorp to build it for the rest of the world!


Sessions

Alex Dadgar Project Lead Hashicorp
Ali Zaidi
Ali Zaidi (Data Scientist Microsoft)

Ali is a data scientist in the Algorithms and Data Science team at Microsoft. He spends his day trying to make distributed computing in the cloud easier, more efficient, and more enjoyable for data scientists and developers alike. He focuses on R, Spark, and Bayesian learning.


Sessions

Ali Zaidi Data Scientist Microsoft
David Ojika
David Ojika (Doctoral Student University of Florida)

David Ojika is an Intel-fellowship recipient and a 4th-year doctoral student of computer engineering at the University of Florida. He completed several internships at Intel, working on near-memory accelerators and on heterogeneous platforms (Xeon+FPGA). Working with Dr. Darin Acosta and Dr. Ann Gordon-Ross, his research focuses on the intersection of computing and physics by investigating machine learning systems that enhance the study of high-energy particles (such as muons) at CERN. In the summer of 2017, David will join Microsoft’s AI & Research group to embark on an internship with the group’s Project Catapult.


Sessions

David Ojika Doctoral Student University of Florida
Derek Bennet
Derek Bennet (Platform Infrastructure Team Lead Stitch Fix)

Derek Bennett is the lead for the Platform Infrastructure team in the Algorithms group at Stitch Fix. He and his team develop and support our Spark capabilities, event logging infrastructure using Amazon Kinesis and Apache Kafka, along with associated tools and applications to help make data available and useable. Derek holds a Ph.D. in Operations Research from UC Berkeley.


Sessions

Derek Bennet Platform Infrastructure Team Lead Stitch Fix
Felix Cheung
Felix Cheung (PMC/Committer Microsoft)

Felix Cheung is a Committer of Apache Spark and a PMC/Committer of Apache Zeppelin. He has been active in the Big Data space for 3+ years, he is a co-organizer of the Seattle Spark Meetup, presented several times and he was a teaching assistant to the very popular edx Introduction to Big Data with Apache Spark, and Scalable Machine Learning MOOCs in the summer of 2015.


Sessions

Felix Cheung PMC/Committer Microsoft
Gwen Shapira
Gwen Shapira (Product Manager Confluent)

Gwen is a product manager at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen is the author of “Kafka – The Definitive Guide” and “Hadoop Application Architectures”, and a frequent presenter at industry conferences. Gwen is a PMC member on the Apache Kafka project and committer on Apache Sqoop. When Gwen isn’t building data pipelines or thinking up new is-features, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.


Sessions

Gwen Shapira Product Manager Confluent
Hossein Falaki
Hossein Falaki (Software Engineer Databricks)

Hossein Falaki is a software engineer and data scientist at Databricks, working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with a Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).


Sessions

Hossein Falaki Software Engineer Databricks
J White Bear
J White Bear ( IBM)

University of Michigan—Computer Science Databases, Machine Learning/Computational Biology, Cryptography University of California San Francisco—Computational Biology/Bioinformatics Machine Learning/Multi Objective Optimization/Statistical Mechanics for Protein-Protein Interactions McGill University Machine Learning/Multi-objective Optimization for Path Planning/ Cryptography


Sessions

J White Bear IBM
Jennifer Shin
Jennifer Shin (Founder 8 Path Solutions)

Min Shen is an engineer on LinkedIn’s Hadoop infrastructure development team, where he builds services and tools to tackle scaling challenges in operating large-scale multi-tenancy Hadoop deployment. Recently, he has been helping with creating tools to support operating Spark at scale as well as developing and running Spark jobs easily at LinkedIn.


Sessions

Jennifer Shin Founder 8 Path Solutions
Jim Dowling
Jim Dowling (Associate Professor KTH Royal Institute of Technology)

Jim Dowling is an Associate Professor at the School of Information and Communications Technology in the Department of Software and Computer Systems at KTH Royal Institute of Technology as well as a Senior Researcher at SICS – Swedish ICT. He received his Ph.D. in Distributed Systems from Trinity College Dublin (2005) and worked at MySQL AB (2005-2007). He is a distributed systems researcher and his research interests are in the area of large-scale distributed computer systems. He is lead architect of Hadoop Open Platform-as-a-Service (www.hops.io), a next generation distribution of Hadoop for Humans.


Sessions

Jim Dowling Associate Professor KTH Royal Institute of Technology
Jonathan Bloom
Jonathan Bloom (Co-Founder, Hail Team Broad Institute of MIT and Harvard)

Jonathan Bloom is a mathematician, engineer, and co-founder of the Hail team at the Broad Institute of MIT and Harvard. Prior to joining the Broad, he did research in geometry and algebraic topology as a Moore Instructor and NSF Fellow in Mathematics at the Massachusetts Institute of Technology. While there, he re-architected the department’s introductory course on probability and statistics, now available on MIT OpenCourseWare. He received his B.A. from Harvard University and Ph.D. from Columbia University in Mathematics.


Sessions

Jonathan Bloom Co-Founder, Hail Team Broad Institute of MIT and Harvard
Jordan Volz
Jordan Volz (Systems Engineer Cloudera)

Jordan Volz is a Systems Engineer at Cloudera. He helps clients design and implement big data solutions using Cloudera’s Distribution of Hadoop, across a variety of industry verticals. Previously, he has worked as a consultant for HP Autonomy delivering compliance archiving, e-Discovery, and electronic surveillance solutions to regulated financial services companies, and as a developer at Epic Systems building HIPPA-compliant EMR software.


Sessions

Jordan Volz Systems Engineer Cloudera
Joseph Bradley
Joseph Bradley (Software Engineer Databricks)

Joseph Bradley is a Spark Committer working on MLlib at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon U. in 2013. His research included probabilistic graphical models, parallel sparse regression, and aggregation mechanisms for peer grading in MOOCs.


Sessions

Joseph Bradley Software Engineer Databricks
Kimoon Kim
Kimoon Kim ( Pepperdata)

Kimoon joined Pepperdata in 2013. Previously, he worked for the Google Search and Yahoo Search teams for many years. Kimoon has hands-on experience with large distributed systems processing massive data sets.


Sessions

Kimoon Kim Pepperdata
Leah McGuire
Leah McGuire (Technical Staff Salesforce.com)

Leah McGuire is a Lead Member of Technical Staff at Salesforce, building platforms to enable the integration of machine learning into Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.


Sessions

Leah McGuire Technical Staff Salesforce.com
Matteo Interlandi
Matteo Interlandi (Scientist Microsoft CISL)

Matteo Interlandi recently joined Microsoft CISL as a Research Scientist. Prior to joining Microsoft, Matteo was Postdoctoral Scholar at the University of California, Los Angeles. His research lies in between databases, distributed systems and declarative languages. In particular, he loves to build systems and tools that make it easier to design and implement data-driven distributed applications.


Sessions

Matteo Interlandi Scientist Microsoft CISL
Michael Malak
Michael Malak ( Oracle)

Michael Malak is the lead author of Spark GraphX In Action and has been developing Spark solutions at two Fortune 200 companies since early 2013. He has been programming computers since before they could be bought pre-assembled in stores.


Sessions

Michael Malak Oracle
Min Shen
Min Shen (Engineer LinkedIn)

Min Shen is an engineer on LinkedIn’s Hadoop infrastructure development team, where he builds services and tools to tackle scaling challenges in operating large-scale multi-tenancy Hadoop deployment. Recently, he has been helping with creating tools to support operating Spark at scale as well as developing and running Spark jobs easily at LinkedIn.


Sessions

Min Shen Engineer LinkedIn
Nan Zhu
Nan Zhu (Software Engineer Microsoft)

Nan Zhu is a Software Engineer from Microsoft, where he works on serving Spark Streaming/Structured Streaming on Azure HDInsight. He is a contributor of Apache Spark (known as CodingCat) and also serves as the committee member of Distributed Machine Learning Community (DMLC) and Apache MxNet (incubator).


Sessions

Nan Zhu Software Engineer Microsoft
Nikita Shamgunov
Nikita Shamgunov (CTO MemSQL)

Nikita Shamgunov co-founded MemSQL and has served as CTO since inception. Prior to co-founding the company, Nikita worked on core infrastructure systems at Facebook. He served as a senior database engineer at Microsoft SQL Server for more than half a decade. Nikita holds a bachelor’s, master’s and doctorate in computer science, has been awarded several patents and was a world medalist in ACM programming contests.


Sessions

Nikita Shamgunov CTO MemSQL
Patrick Stuedi
Patrick Stuedi (Research Staff Member IBM)

I’m a member of the research staff at IBM research Zurich. My research interests are in distributed systems, networking and operating systems. I graduated with a PhD from ETH Zurich in 2008 and spent two years (2008-2010) as a Postdoc at Microsoft Research Silicon Valley. My current work is about exploiting fast network and storage hardware in data processing systems.


Sessions

Patrick Stuedi Research Staff Member IBM
Prabhu Kasinthan
Prabhu Kasinthan (Chief Data Engineer Paypal)

Prabhu Kasinathan is the chief data engineer in Big Data Platform at Paypal with 5+ years of big data experience. He is creating APIs, tools and services for Spark platform to support multi-tenancy and large scale computation-intensive applications. He is an expert in building data warehousing solutions on Hadoop and Teradata platform with 11+ years of data experience.


Sessions

Prabhu Kasinthan Chief Data Engineer Paypal
Ross Gardler
Ross Gardler (VP Apache Software Foundation)

Ross Gardler has been involved with open source in one form or another since the mid ‘90s. He is a member of the Apache Software Foundation where he currently serves as the foundation’s President. He works at Microsoft on the Linux Compute team in Azure where he is responsible for the Azure Container Service.


Sessions

Ross Gardler VP Apache Software Foundation
Ryan Blue
Ryan Blue ( Netflix)

SRyan Blue works on open source projects, including Spark, Avro, and Parquet, at Netflix.


Sessions

Ryan Blue Netflix
Ryan Williams
Ryan Williams (Software Developer Mount Sinai School of Medicine)

Ryan writes tools for analyzing genomic data using Spark at Hammer Lab.


Sessions

Ryan Williams Software Developer Mount Sinai School of Medicine
Sam Penrose
Sam Penrose ( Mozilla)

Sam Penrose loves how working with data at scale for Mozilla brings out the power and beauty of mathematics. Previously he helped Industrial Light and Magic bring the power and beauty of giant robots out to movie screens everywhere.


Sessions

Sam Penrose Mozilla
Shay Nativ
Shay Nativ (Software Developer Redis Labs)

Shay is an experienced software developer, architect, and entrepreneur. He was the founder and VP R&D of Peak-Dynamics—an energy saving solution for water utilities and CTO at Utab, a web platform for musicians. Shay loves solving complex problems and writing performant code.


Sessions

Shay Nativ Software Developer Redis Labs
Songtao Guo
Songtao Guo (Principal Data Scientist LinkedIn)

Songtao Guo is a Principal Data Scientist and tech lead of Data Mining team at Linkedin where he leads many of data driven products and analytics systems. His work involves building large-scale knowledge base, inventing data mining platforms to scale business analytics and partnering with product, sales, and marketing to deliver impactful solutions. Before joining LinkedIn, Songtao was a senior researcher at AT&T interactive, focusing on improving data quality and search relevancy for local business search. He holds a PhD in computer science from University of North Carolina at Charlotte.


Sessions

Songtao Guo Principal Data Scientist LinkedIn
Ted Malaska
Ted Malaska (Technical Group Architect Blizzard Inc.)

Ted is working on the Battle.net team at Blizzard, helping support great titles like World of Warcraft, Overwatch, HearthStone, and much more. Previously, he was a Principal Solutions Architect at Cloudera, helping clients be successful with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is also a co-author or O’Reilly “Hadoop Application Architectures” and a frequent speaker at many conferences, and a frequent blogger on data architectures.


Sessions

Ted Malaska Technical Group Architect Blizzard Inc.
Tejas Patil
Tejas Patil (Software Engineer Facebook)

Tejas is a software engineer at Facebook. For the past 3 years, he has been part of the Data Infrastructure group at Facebook and primarily works on building large scale distributed data processing systems responsible for handling batch workloads. He is currently a PMC member and committer of Apache Nutch and has contributed to several open source projects. Tejas obtained a Master’s Degree in Computer Science from University Of California, Irvine.


Sessions

Tejas Patil Software Engineer Facebook
Timothy Poterba
Timothy Poterba (Engineer and Computational Biologist Broad Institute of MIT and Harvard)

Tim Poterba is an engineer and computational biologist on the Hail team at the Broad Institute of MIT and Harvard. Prior to joining the Broad, he studied protein folding dynamics at the Max Planck Institute for Biochemistry on a Fulbright Scholarship. He received his B.A. in Biophysics from Amherst College in 2013.


Sessions

Timothy Poterba Engineer and Computational Biologist Broad Institute of MIT and Harvard
Wei Di
Wei Di (Business Analytic Data mining team LinkedIn)

Wei Di is currently the staff member in Business Analytic Data mining team. She is passionate about creating smart and scalable solutions that can impact millions of individuals and empower successful business. She has wide interests covering artificial intelligence, machine learning and computer vision. She was previously associated with eBay Human Language Technology and eBay Research Labs, with focus on large scale image understanding and joint learning from visual and text information. Prior to that, she was with Ancestry.com working in the areas of record linkage and search relevance. She received her PhD from Purdue University in 2011.


Sessions

Wei Di Business Analytic Data mining team LinkedIn
Xiangrui Meng
Xiangrui Meng (Software Engineer Databricks)

Xiangrui Meng is an Apache Spark PMC member and a software engineer at Databricks. His main interests center around developing and implementing scalable algorithms for scientific applications. He has been actively involved in the development and maintenance of Spark MLlib since he joined Databricks. Before Databricks, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. His Ph.D. work at Stanford is on randomized algorithms for large-scale linear regression problems.


Sessions

Xiangrui Meng Software Engineer Databricks
Yin Huai
Yin Huai (Software Engineer Databricks)

Yin Huai is a Software Engineer at Databricks and mainly works on Spark SQL. Before joining Databricks, he was a PhD student at The Ohio State University and was advised by Xiaodong Zhang. His interests include storage systems, database systems, and query optimization. He is also an Apache Hive committer.


Sessions

Yin Huai Software Engineer Databricks