Release Browsertrix Crawler 0.11.0 · webrecorder/browsertrix-crawler

New Features

Store favicon urls as favIconUrl in pages.jsonl
Support for filtering sitemap by date (from specified date)
Link extraction optimizations
Behaviors only run after page is fully loaded and links extraction has finished, previously autoplay/autofetch would start right away.

link extraction optimization: for scopeType page, set depth == extraH… by @ikreymer in #364
improve exit features: individual instance exit + exit code for interrupt by @ikreymer in #366
feat: precommit by @Chickensoupwithrice in #363
Capture Favicon by @Chickensoupwithrice in #362
logging: resolve confusion with 'crawl done' not being written to log… by @ikreymer in #375
logging fixes: avoid duplicate logging for same error by @ikreymer in #377
Surface lastmod option for sitemap parser by @ghukill in #367
Add example of mounting custom behaviours by @Chickensoupwithrice in #369
various fixes regarding state restart: by @ikreymer in #370
status: fix typo setting status to log message by @ikreymer in #379
Add option to output stats file live, i.e. after each page crawled by @benoit74 in #374
behavior logging tweaks, add netIdle by @ikreymer in #381
Update tldextract cache for pywb during build by @vnznznz in #383
Enhance file stats test to detect file modification by @benoit74 in #382
optimize link extraction: (fixes #376) by @ikreymer in #380

Full Changelog: v0.10.4...v0.11.0