Skip to content

billcccheng/ptt-multithread-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ptt Crawler

This is a multithreaded crawler that crawls all the articles of the board you specified.

How to use

python ptt_crawler.py <target-board> <thread-number>

e.g. Want to crawl tech_job board with 100 threads.

python ptt_crawler.py tech_job 100

Each threads will store the data it crawled in their own files. Thread-1 will store the data in data-1.json and so on.

Special Thanks

Special thanks to wy36101299 for making the ptt crawler

ptt_crawler.py

Modified the original ptt_crawler.py to fit my needs.

About

This is a multithread crawler for ptt-search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages