Skip to content

Commit

Permalink
fix shouldIncludeFrame() check: was actually erroring out and never a…
Browse files Browse the repository at this point in the history
…ccepting any iframes!

now used not only for link extraction but also to run() behaviors
Dockerfile: add commented out line to use local behaviors.js
  • Loading branch information
ikreymer committed Sep 15, 2023
1 parent 97c840c commit d03e856
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 4 deletions.
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ RUN ln -s /app/main.js /usr/bin/crawl; ln -s /app/create-login-profile.js /usr/b

WORKDIR /crawls

# enable to test custom behaviors build (from browsertrix-behaviors)
# COPY behaviors.js /app/node_modules/browsertrix-behaviors/dist/behaviors.js

ADD docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]

Expand Down
8 changes: 4 additions & 4 deletions crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -607,11 +607,11 @@ self.__bx_behaviors.selectMainBehavior();

const frameUrl = frame.url();

const frameElem = await frame.frameElement();
// this is all designed to detect and skip PDFs, and other frames that are actually EMBEDs
// if there's no tag or an iframe tag, then assume its a regular frame
const tagName = await frame.evaluate("window.frameElement && window.frameElement.tagName");

const tagName = await frame.evaluate(e => e.tagName, frameElem);

if (tagName !== "IFRAME" && tagName !== "FRAME") {
if (tagName && tagName !== "IFRAME" && tagName !== "FRAME") {
logger.debug("Skipping processing non-frame object", {tagName, frameUrl, ...logDetails}, "behavior");
return null;
}
Expand Down

0 comments on commit d03e856

Please sign in to comment.