Talk Python To Me artwork

#90 Data Wrangling with Python

Talk Python To Me

English - December 21, 2016 08:00 - 1 hour - 56.7 MB - ★★★★★ - 418 ratings
Technology Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed


Do you have a dirty, messy data problem? Whether you work as a software developer or as a data scientist, you've surely run across data that was malformed, incomplete, or maybe even wrong. Don't let messy data wreck your apps or generate wrong results.



What should you do? Listen to this episode of Talk Python To Me with Katharine Jarmul about the book she co-authored called Data Wrangling with Python and her PyCon UK presentation entitled How to Automate your Data Cleanup with Python.



Links from the show:



Katharine on the web: kjamistan.com


Katharine on twitter: @kjam


Book: Data Wrangling with Python: Tips and Tools to Make Your Life Easier: amzn.to/2fGc0Cx


Pycon 2016: How to Automate your Data Cleanup with Python: youtube.com/watch?v=gp-ngPV_ZX8



Packages from Data Cleanup talk


Dedupe Python Library: github.com/datamade/dedupe


probablepeople: github.com/datamade/probablepeople


usaddress: github.com/datamade/usaddress


jellyfish: github.com/jamesturk/jellyfish


Fuzzywuzzy: github.com/seatgeek/fuzzywuzzy


scrubadub: github.com/datascopeanalytics/scrubadub


pint: pint.readthedocs.io


arrow: github.com/crsmithdev/arrow


pdftables.six: github.com/vnaydionov/pdftables


Datacleaner: github.com/rhiever/datacleaner


Parserator: github.com/datamade/parserator


Gensim: radimrehurek.com/gensim


Faker: github.com/joke2k/faker


Dask: dask.pydata.org


SpaCy: spacy.io


Airflow: airflow.incubator.apache.org


Luigi: luigi.readthedocs.io


Hypothesis (testing): hypothesis.works



Katharine's courses



Data Pipelines with Python


shop.oreilly.com/product/0636920055334.do


Data Wrangling & Analysis with Python. Learn Pandas


shop.oreilly.com/product/0636920051831.do



Sponsors


Rollbar: rollbar.com/talkpythontome


GoCD: go.cd




Sponsors



Rollbar

GoCD

Talk Python Training

Twitter Mentions