Language: Python v3.5
Library Used:
- Native Library
urllib
: Requesting web responseos
: Creating a filecsv
: Creating.csv
file
- Third Party Library
BeautifulSoup
: Reading and Parsing html.
Webscraping Logic
- Read html from
baseURL
anddegree_programURL
. - Find target tags (
div
->ul
->li
) - Store it into dictionary (
Link
andMajor Name
) - Grab all present links.
- From that link go to specific tags(
div
->section
->p
) - Grab all the present link and parse it if it has
#
. - After that write it.