You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full Python 3 support π π π» (#106), all the thanks goes to @Preetwinder.
canonicalize_url method removed in favor of w3lib implementation.
The whole Request (incl. meta) is propagated to DB Worker, by means of scoring log (fixes #131)
Generating Crc32 from hostname the same way for both platforms: Python 2 and 3.
HBaseQueue supports delayed requests now. βcrawl_atβ field in meta with timestamp makes request available to spiders only after moment expressed with timestamp passed. Important feature for revisiting.
Request object is now persisted in HBaseQueue, allowing to schedule requests with specific meta, headers, body, cookies parameters.
MESSAGE_BUS_CODEC option allowing to choose other than default message bus codec.
Strategy worker refactoring to simplify itβs customization from subclasses.
Fixed a bug with extracted links distribution over spider log partitions (#129).