A NodeJS based web-Crawler which can scale on the go!
- Support both Static & Dynamic Page Crawling
- Linux (Ubuntu)
- Redis
- Nodejs
- Install
NodeJS
by executing the below command in root directory of project:$ cd init-scripts/ $ sudo bash install-nodejs.sh
- Install
Redis
$ sudo bash install-redis.sh
- Install project dependencies. In root directory of the project execute the following command:
$ npm install
$ node index.js "<url>" "path-to-store-url"
$ node index.js "https://stacksapien.com" "./temp"
- In Above Example, Files like
valid-urls.txt
,external-urls.txt
&invalid-urls.txt
will be generated intemp
folder of your git project directory.