Skip to content

Crawls and extracts the text of a URL containing a Substack post

Notifications You must be signed in to change notification settings

davidpoblador/substack-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

substrack-extractor

Crawls and extracts the text of a URL containing a Substack post

Why

It's useful to be able to pipe those articles into your favorite LLM.

Utilization

  • Clone the repo
  • Create Virtual Environment
python3 -m venv .venv
  • Activate Virtual Environment
source .venv/bin/activate
  • Install required packages
pip install -r requirements.txt
  • Install browser drivers (required because we need to support JS)
playwright install
  • Crawl
./substack_extractor.py "https://newsletter.keepitboring.com/p/the-evolution-of-digital-interactions"

About

Crawls and extracts the text of a URL containing a Substack post

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages