pyBegin / projects / web-scraping-jujustu-kaisen-manga
🕷️

Web Scraping Jujustu Kaisen Manga

A simple yet interesting web scraping project to download all the chapters from Jujustu Kaisen Manga till date.

213 lines🖥 Desktop only

Web Scraping Jujustu Kaisen Manga

A simple yet interesting web scraping project to download all the chapters from Jujustu Kaisen Manga till date.

Tech Stack : Python + various modules like re, os, bs4 ( beautifulSoup ), urllib, requests, zipfile and selenium ( basic )


Description

Web scraping is an essential tool for gathering data and there are plenty of examples that focus on getting tabular data from websites. I would like to improve on it and extract and download other resources such as images, links, etc. This tutorial covers the following topics in the same order :

Usage

Install the following packages :

pip install beautifulsoup4 selenium

Note that Selenium also requires a webdriver. This project uses the chrome webdriver ( download the executable from here and note its absolute path as we will be needing it ) but feel free to use any other web drivers that you see fit.


Before running the program, Please read the comment titled IMPORTANT in the scrape_data() function in app.py. Directly running the program will download ALL the chapters ( there are currently 192 chapters and each of them has around 30 images ) which will take about 6 hours. So, to limit or to choose the exact number of chapters, look into the above mentioned function


Run the program :

python app.py

Made with 💙 by Vishvam

Pyodide-runnable

No — it uses Selenium and requests to scrape a live website and download images.