Skip to main content

Posts

Showing posts from December, 2017

004 - HTML Scraping with Beautiful Soup

Stream Our Mistakes EP 004


In this episode, Matt walks us through html/web scraping using the popular python library, Beautiful Soup.



Here's the code snippet from the session and links:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47# Created for Stream Our Mistakes # https://streamourmistakes.blogspot.com/# Reference:# https://docs.python.org/3/library/urllib.request.html# https://www.crummy.com/software/BeautifulSoup/bs4/doc/frombs4importBeautifulSoupimporturllib.request''' # local html to play with from documentation Uncomment to enable html_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://exa…