-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: set browser accept language #37
Comments
This is a good suggestion for an option, @peterk. http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html provides some examples of weirdness in language detection via IA submission. It would be interesting to test this from different IPs and |
This issue is up next on the big list of things to do |
know this is a semi-long time coming but once the chrome-remote-interface-extra-intergration branch is merged this and a hole lot more things will be possible using Squidwarc PS spread the word, you dont need puppeteer to simply use the CDP |
Hey y'all I finally got node-warc and chrome-remote-interface-extra in a position to support this feature request. I am thinking the API for this is as follows: You can, like you do for supplying a user script that is run before WARC generation, supply a function that is passed as its only argument the page object of chrome-remote-interface-extra, puppeteer or the chrome-remote-interface client object in order to customize the behavior of the browser. Example when using chrome-remote-interface-extra (type definitions for the arguments of pageOrClient.setGeolocation is not valid JS but provided for your convince) module.exports = async function chromeCustomizer (pageOrClient) {
// set the download path of files downloaded by the browser
await pageOrClient.setDownloadBehavior('<path to new downloads folder>')
// set the Accept-Language HTTP header
await pageOrClient.setAcceptLanguage('<new language>')
// set navigator.platform
await pageOrClient.setNavigatorPlatform('<new platform>')
// set new geolocation
await pageOrClient.setGeolocation({longitude: number, latitude: number, accuracy: (number|undefined)})
} For both chrome-remote-interface-extra and puppeteer the connection to the browser tab is found on Please let me know if there are any suggestions or concerns about how to make this as user friendly as possible. |
Documentation on the upcoming chrome-remote-interface-extra integration https://n0tan3rd.github.io/chrome-remote-interface-extra/ |
Hey y'all, If you want to start test running things today this feature is living in the chrome-remote-interface-extra-intergration branch. Puppeteer CI is failing currently and chrome-remote-interface-extra's CI is good except for an pesky net::ERR_NAME_NOT_RESOLVED vs net::ERR_NAME_RESOLUTION_FAILED error message that happens on travis for some reason and using google chrome canary.... Full documentation for the more you can do with this library than with puppeteer is found here https://n0tan3rd.github.io/chrome-remote-interface-extra/. I'm gona add redis frontier support and frontier customization functions before this feature gets merged into master (I'm tired of in memory frontiers) |
When running Squidwarc on server hosts in other countries, websites will sometimes present the UI in the language relating to the IP address range of the server host. (E.g. when I run archiving of Facebook pages from a server in Germany it will present the Facebook interface in German). If it was possible to set the chrome accept language parameter from the job json it would be possible to give more control to the archiver.
The text was updated successfully, but these errors were encountered: