I needed to create a list of all WI schools and their addresses. The WI DPI site had a lot of data but not what I needed. So, I acquired the necessary information by scraping www.greatschools.org.
Using the BeautifulSoup library docs as reference, I started gradually diving into the html. The hardest part is usually figuring out the sub class you need to scrape. Usually it’s several layers down.
First, I got all the links to the school district pages using the get_school_urls() function and wrote the output to a .csv file. Then the get_school_info() function grabs the names and address of all schools within the district and writes them to another .csv file.
I had to do some minor post processing in excel to improve data usability but it only took a couple of minutes. If I had to use this script more often, further refinement would have been done.
Code is on Github