Image/Link Web Crawler

Overview

The Image/Link Web Crawler is a Python script used to check if there are any broken images/links for a given list of sites. The urls.py file was created since I had a list of several subdomains for a given domain that I needed to crawl with this project.

Setup

Set up your environment with:

python3 -m venv venv
. venv/bin/activate

Install the dependencies:

pip install scrapy

To run the script, make sure you cd into the image-link-web-crawler directory. Then run the following command:

scrapy runspider script.py -o report.csv

The script will pull a CSV report letting you know which pages have 404ed for various external links and images.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
script.py		script.py
urls.py		urls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image/Link Web Crawler

Overview

Setup

About

Releases

Packages

Languages

megancoyle/image-link-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Image/Link Web Crawler

Overview

Setup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages