Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize link extraction: (fixes #376) #380

Merged
merged 3 commits into from
Sep 15, 2023
Merged

optimize link extraction: (fixes #376) #380

merged 3 commits into from
Sep 15, 2023

Commits on Sep 15, 2023

  1. optimize link extraction: (fixes #376)

    - dedup urls in browser first
    - don't return entire list of URLs, process one-at-a-time via callback
    - add exposeFunction per page in setupPage, then register 'addLink' callback for each pages' handler
    - optimize addqueue: atomically check if already at max urls and if url already seen in one redis call
    - better logging: log rejected promises for link extraction
    ikreymer committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    d458756 View commit details
    Browse the repository at this point in the history
  2. add queue improvements:

    - change error state to differntiate limit hit vs dupe url
    - add QueueState enum to indicate success, limit hit, or dupe url
    ikreymer committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    57150c7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9d8ec2b View commit details
    Browse the repository at this point in the history