puppeter-infinite-scroll

Just a helper to scrape data in sites that use infinete scroll.

npm install puppeter-infinite-scroll

The problem

in most of solution that I found use a timing to scroll down the webpage and evaluate what you need, but if the request or network slow down and take more time than defined in the code and then the scraper just fail.

See working

npm run test

How use?

const puppeteerInfiniteScroll = require('./src/puppeter-infinite-scroll')

;(async ()=>{
try {
  const browser = new puppeteerInfiniteScroll()
  await browser.start()
  await browser.open({
    url: 'https://medium.com/search?q=python',
    endpoint: 'https://medium.com/search/posts?q',
    loadImages: false,
    onResponse: (res)=>{
      //console.log(res)
    },
    onScroll: ()=>{
      console.log(`onScroll ${browser.scrollCount}`)

    }
  })
} catch (e) {
  console.error(e)
}

})()

async browser.start() = puppeteer.lauch(opts)

    //params(opts)
    //default: { headless: false, devtools: true }
    await browser.start()

async browser.open()

this method create a new page. setViewport({ width: 1280, height: 926 }), setRequestInterception(true)

    //params(opts)
    //default: { url, onResponse, onScroll, loadImages = true, endpoint }
    //url = 'https://medium.com/search?q=python' - url of the page to be loaded
    //endpoint = 'https://medium.com/search?q=python' - endpoint wich load content to page
    //loadImages = true - if you need to prevent to load images set to false
    //onResponse = (response)=>{ } - if you need do something with request object
    //onScroll = ()=>{} - trigged after every scroll

    await browser.open({
    url: 'https://medium.com/search?q=python',
    endpoint: 'https://medium.com/search/posts?q',
    loadImages: false,
    onResponse: (res)=>{
      //console.log(res)
    },
    onScroll: ()=>{
      console.log(`onScroll ${browser.scrollCount}`)
    }
  })

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

puppeter-infinite-scroll

The problem

See working

How use?

async browser.start() = puppeteer.lauch(opts)

async browser.open()

About

Releases

Packages

Languages

License

tawsbob/puppeteer-infinite-scroll

Folders and files

Latest commit

History

Repository files navigation

puppeter-infinite-scroll

The problem

See working

How use?

async browser.start() = puppeteer.lauch(opts)

async browser.open()

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages