Talk Python To Me artwork

#283 Web scraping, the 2020 edition

Talk Python To Me

English - September 23, 2020 08:00 - 48 minutes - 22.9 MB - ★★★★★ - 418 ratings
Technology Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed


Web scraping is pulling the HTML of a website down and parsing useful data out of it. The use-cases for this type of functionality are endless. Have a bunch of data on governmental sites that are only listed online in HTML without a download? There's an API for that! Do you want to keep abreast of what your competitors are featuring on their site? There's an API for that. Need alerts for changes on a website, for example enrollment is now open at your college and you want to be first to get in and avoid the 8am Monday morning course slot? There's an API for that.



That API is screen scraping and Attila Tóth from ScrapingHub is here to tell us all about it.



Links from the show



Attila Tóth on LinkedIn: linkedin.com

Scrapy project: scrapy.org

Scrapinghub on Twitter: @scrapinghub

Scrapinghub: scrapinghub.com

cookiecutter template for Scrapy projects: github.com

Splash: headless browser designed specifically for web scraping: scrapinghub.com/splash

Awesome Web Scraping list: github.com



Talk Python episode 50 on web scraping: talkpython.fm

How Web Scraping is Revealing Lobbying and Corruption in Peru: blog.scrapinghub.com

Web Data Extraction Summit event: extractsummit.io


Sponsors



Talk Python Training

Linode

Twitter Mentions