Fuzzy Search Algorithms: How and When to Use Them | Talks
Fuzzy Searching or approximate string matching is powerful because often text data is messy. For example, shorthand and abbreviated text are common in various data sets. In addition, outputs from OCR or voice to text conversions tend to be messy or imperfect. Thus, we want to be able to make the most of our data by extrapolating as much information as possible.In this talk, we will explore the various approaches used in fuzzy string matching and demonstrate how they can be used as a feature in a model or a component in your python code. We will dive deep into the approaches of different algorithms such as Soundex, Trigram/n-gram search, and Levenshtein distances and what the best use cases are. We will also discuss situations where it’s important to take into account the meaning or intent of a word and demonstrate approaches for measuring semantic similarity using nltk and word2vec. Furthermore, we will demonstrate via live coding how to implement some of these fuzzy search algorithms using python and/or built-in fuzzy search functions within PostgreSQL.
Eleanor Stribling
Eleanor Stribling is a product manager, developer and team builder for tech startups. Since 2015, she has been the VP of Product at Kevala, an energy analytics start-up and was previously an early employee and VP of Product Management and consumer insights at TubeMogul, an ad tech company (NASDAQ:TUBE). Outside of work, she volunteers for healthcare, gun safety and political causes. Eleanor earned her MBA at the Massachusetts Institute of Technology and a BA at the University of Toronto. She lives in San Francisco with her family.
Oregon Ballroom 203-204
Saturday, 20th May, 17:10 - 17:40