DOM Scraper

This project fetches the DOM of multiple URLs using Puppeteer, saves the content as HTML files, and processes several URLs concurrently. You can configure how many URLs are processed at the same time by passing a concurrency parameter.

Prerequisites

Node.js (version 20 or higher)
npm (comes with Node.js)

Installation

Clone the repository (or download the project files):
```
git clone https://github.com/dudekm/dom-scraper.git
```
Navigate to the project directory:
```
cd dom-scraper
```
Install the required Node.js packages:
```
npm install
```
This will install puppeteer and other dependencies defined in package.json.

Configuration

The script accepts three parameters from the command line:

URL list file: A .txt file with one URL per line.
Output directory: The directory where the HTML files will be saved.
Concurrency: The number of URLs to process concurrently.

Example of URL list file (`urls.txt`):

https://example.com
https://another-example.com
https://example.org
https://another-site.com

Running the Project

Once you have installed the necessary dependencies and created your urls.txt file, you can run the project using Node.js.

To run the scraper, use the following command:

node index.js <urls.txt> <outputDir> <concurrency>

Example

node index.js urls.txt ./dom_output 5

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
urls.txt		urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOM Scraper

Prerequisites

Installation

Configuration

Example of URL list file (`urls.txt`):

Running the Project

Example

About

Releases

Packages

Languages

dudekm/dom-scraper

Folders and files

Latest commit

History

Repository files navigation

DOM Scraper

Prerequisites

Installation

Configuration

Example of URL list file (urls.txt):

Running the Project

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example of URL list file (`urls.txt`):

Packages