Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: set browser accept language #37

Open
peterk opened this issue Dec 5, 2018 · 6 comments
Open

Feature request: set browser accept language #37

peterk opened this issue Dec 5, 2018 · 6 comments

Comments

@peterk
Copy link
Contributor

peterk commented Dec 5, 2018

When running Squidwarc on server hosts in other countries, websites will sometimes present the UI in the language relating to the IP address range of the server host. (E.g. when I run archiving of Facebook pages from a server in Germany it will present the Facebook interface in German). If it was possible to set the chrome accept language parameter from the job json it would be possible to give more control to the archiver.

@machawk1
Copy link
Collaborator

machawk1 commented Dec 6, 2018

This is a good suggestion for an option, @peterk. http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html provides some examples of weirdness in language detection via IA submission. It would be interesting to test this from different IPs and Accept-Language values to see if the effects are replicable.

@N0taN3rd
Copy link
Owner

N0taN3rd commented Feb 5, 2019

This issue is up next on the big list of things to do

@N0taN3rd
Copy link
Owner

N0taN3rd commented Feb 10, 2019

know this is a semi-long time coming but once the chrome-remote-interface-extra-intergration branch is merged this and a hole lot more things will be possible using Squidwarc

PS spread the word, you dont need puppeteer to simply use the CDP
https://github.com/N0taN3rd/chrome-remote-interface-extra ;)

@N0taN3rd
Copy link
Owner

N0taN3rd commented Feb 24, 2019

Hey y'all I finally got node-warc and chrome-remote-interface-extra in a position to support this feature request.

I am thinking the API for this is as follows:

You can, like you do for supplying a user script that is run before WARC generation, supply a function that is passed as its only argument the page object of chrome-remote-interface-extra, puppeteer or the chrome-remote-interface client object in order to customize the behavior of the browser.

Example when using chrome-remote-interface-extra (type definitions for the arguments of pageOrClient.setGeolocation is not valid JS but provided for your convince)

module.exports = async function chromeCustomizer (pageOrClient) {
    // set the download path of files downloaded by the browser
    await pageOrClient.setDownloadBehavior('<path to new downloads folder>')

    // set the Accept-Language HTTP header
    await pageOrClient.setAcceptLanguage('<new language>')

    // set navigator.platform
    await pageOrClient.setNavigatorPlatform('<new platform>')

    // set new geolocation
    await pageOrClient.setGeolocation({longitude: number, latitude: number, accuracy: (number|undefined)})
}

For both chrome-remote-interface-extra and puppeteer the connection to the browser tab is found on pageOrClient._client if you need more fine tuned customization and as always please consult the CDP documentation for details.

Please let me know if there are any suggestions or concerns about how to make this as user friendly as possible.

@N0taN3rd
Copy link
Owner

N0taN3rd commented Feb 25, 2019

Documentation on the upcoming chrome-remote-interface-extra integration https://n0tan3rd.github.io/chrome-remote-interface-extra/

@N0taN3rd
Copy link
Owner

Hey y'all, If you want to start test running things today this feature is living in the chrome-remote-interface-extra-intergration branch.
The entry point to make changes like this is the chromeCustomizer.js file.

Puppeteer CI is failing currently and chrome-remote-interface-extra's CI is good except for an pesky net::ERR_NAME_NOT_RESOLVED vs net::ERR_NAME_RESOLUTION_FAILED error message that happens on travis for some reason and using google chrome canary....
CI link: https://travis-ci.com/N0taN3rd/chrome-remote-interface-extra

Full documentation for the more you can do with this library than with puppeteer is found here https://n0tan3rd.github.io/chrome-remote-interface-extra/.

I'm gona add redis frontier support and frontier customization functions before this feature gets merged into master (I'm tired of in memory frontiers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants