Web Scraping Jujustu Kaisen Manga
A simple yet interesting web scraping project to download all the chapters from Jujustu Kaisen Manga till date.
A simple yet interesting web scraping project to download all the chapters from Jujustu Kaisen Manga till date.
A simple yet interesting web scraping project to download all the chapters from Jujustu Kaisen Manga till date.
Tech Stack : Python + various modules like re, os, bs4 ( beautifulSoup ), urllib, requests, zipfile and selenium ( basic )
Web scraping is an essential tool for gathering data and there are plenty of examples that focus on getting tabular data from websites. I would like to improve on it and extract and download other resources such as images, links, etc. This tutorial covers the following topics in the same order :
SeleniumBeautifulSoup and requests Regex urllib and os module.cbz format :Install the following packages :
pip install beautifulsoup4 selenium
Note that Selenium also requires a webdriver. This project uses the chrome webdriver ( download the executable from here and note its absolute path as we will be needing it ) but feel free to use any other web drivers that you see fit.
Before running the program, Please read the comment titled IMPORTANT in the
scrape_data()function inapp.py. Directly running the program will download ALL the chapters ( there are currently 192 chapters and each of them has around 30 images ) which will take about 6 hours. So, to limit or to choose the exact number of chapters, look into the above mentioned function
Run the program :
python app.py
Made with 💙 by Vishvam
No — it uses Selenium and requests to scrape a live website and download images.