scraping-examples-py

This repo contains some web scraping examples.

Scrapers

IMDb

imdb_chart

imdb_chart.py scrapes IMDb most popular movies or TV shows.

Parameters

-o: output CSV file to store results.
-t: optional flag. If specified, the script will scrape IMDb TV shows chart instead of moviemeter.

How to run

Moviemeter: cd .\IMDb\;python imdb_chart.py -o "./output/moviemeter-example.csv"
TVmeter: cd .\IMDb\;python imdb_chart.py -o "./output/tvmeter-example.csv" -t

NOTE: Powershell syntax.

moviemeter_cast

moviemeter_cast.py scrapes the cast for the films in Moviemeter Chart.

Parameters

-o: output JSON file to store results.
-l: optional flag. It limits the number of actors and actresses retrieved for each film.

How to run cd .\IMDb\;python .\moviemeter_cast.py -l 3 -o .\output\moviemeter_cast.json

Billboard

billboard_hot100

billboard_hot100.py scrapes Billboard Hot 100 chart. So far, this script fetches the name of the song and author.

Parameters

-o: output CSV file to store results.

How to run cd .\Billboard\;python .\billboard_hot100.py -o .\output\billboard_top100.csv

goodreads

goodreads_top100

goodreads_top100.py scrapes Goodreads Top 100 - Highest Rated Books on Goodreads with at least 10,000 Ratings. It can scrape any other list in the website if we pass the URL as an argument.

Parameters

-o: output CSV file to store results.
-m: optional flag. If specified, it writes the scraped data to a MongoDB collection specified as environment variables.
-u: optional. We can pass a url to any list in goodreads website

How to run

Default url value (Goodreads Top 100): cd .\goodreads\;python .\goodreads_top100.py -o .\output\goodreads_top100.csv [-u][-m]

For any other list:

Best Books of the Decade 2020's: cd .\goodreads\;python .\goodreads_top100.py -o .\output\top_decade_2020.csv -u https://www.goodreads.com/list/show/143500.Best_Books_of_the_Decade_2020_s?ref=ls_fl_0_seeall
Best Books of 20th century: cd .\goodreads\;python .\goodreads_top100.py -o .\output\top_20th_century.csv -u https://www.goodreads.com/list/show/6

BBC

bbc_news

bbc_news.py scrapes BBC News iterating through articles.

Parameters

-o: output JSON file to store results.

How to run cd .\BBC\;python .\bbc_news.py -o .\output\bbc-news-example.json

CoinMarketCap

trending-cryptocurrencies

trending-cryptocurrencies.py scrapes the hottest trending cryptocurrencies on CoinMarketCap.

Parameters

-o: output CSV file to store results.

How to run cd .\coinmarketcap\;python .\trending-cryptocurrencies.py -o .\output\trending-cryptocurrencies.csv

Docker setup

docker build -t scraping-examples-py .
docker compose up -d
docker exec -it scraping-examples-py bash
Run a scraper: cd BBC/ && python bbc_news.py -o ./output/docker-example-bbc.json

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
BBC		BBC
Billboard		Billboard
IMDb		IMDb
coinmarketcap		coinmarketcap
goodreads		goodreads
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scraping-examples-py

Contents

Scrapers

IMDb

imdb_chart

moviemeter_cast

Billboard

billboard_hot100

goodreads

goodreads_top100

BBC

bbc_news

CoinMarketCap

trending-cryptocurrencies

Docker setup

About

Uh oh!

Releases

Packages

Languages

angelagonzalezp/scraping-examples-py

Folders and files

Latest commit

History

Repository files navigation

scraping-examples-py

Contents

Scrapers

IMDb

imdb_chart

moviemeter_cast

Billboard

billboard_hot100

goodreads

goodreads_top100

BBC

bbc_news

CoinMarketCap

trending-cryptocurrencies

Docker setup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages