-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some website return some invalid UTF16 ?? #666
Comments
Can you provide a small reproduce script? What OS are you on? what PHP version are you running? I failed to reproduce the issue with: <?php
declare(strict_types=1);
require_once('vendor/autoload.php');
$chromiumPath = '/usr/bin/chromium-browser';
$factory = new \HeadlessChromium\BrowserFactory($chromiumPath);
$browser = $factory->createBrowser();
$page = $browser->createPage();
$page->navigate('https://www.smithsfoodanddrug.com')->waitForNavigation();
var_dump($page->getHtml()); it produced
|
linux, php 8.1, |
@momala454 still unable to reproduce, i tried for($i=0;$i<100;++$i) {
$page->navigate('https://www.smithsfoodanddrug.com/signin?redirectUrl=/')->waitForNavigation();
var_dump($page->getHtml());
} and I think it ran like 30 times before I got IP banned, but even with~30 iterations, I did not get the invalid UTF16 error you speak of. Can you please write a script to reproduce the issue? |
the issue occurs with $page
->getSession()
->sendMessageSync(new \HeadlessChromium\Communication\Message('Fetch.enable', [
"handleAuthRequests" => true,
"patterns" => [
['urlPattern' => '*']
]
])); i will try to make a script |
i am able to reproduce using this script require_once('vendor/autoload.php');
$chromiumPath = 'google-chrome';
$factory = new \HeadlessChromium\BrowserFactory($chromiumPath);
$browser = $factory->createBrowser([
'customFlags' => [
'--excludeSwitches=["enable-automation"]',
'--lang=en',
'--no-sandbox',
'--disable-setuid-sandbox',
'--enable-features=NetworkService',
'--disable-namespace-sandbox',
'--enable-logging',
'--disable-features=site-per-process,ChromeWhatsNewUI,Translate,AcceptCHFrame,MediaRouter,OptimizationHints,ProcessPerSiteUpToMainFrameThreshold',
'--disable-blink-features=AutomationControlled',
'--enable-blink-features=WebBluetooth,WebBluetoothRemoteCharacteristicNewWriteValue',
'--disable-infobars',
'--start-maximized',
'--no-default-browser-check',
'--remote-allow-origins=*',
'--password-store=basic',
'--disable-save-password-bubble',
],
'headless' => false,
'envVariables' => ['DISPLAY' => ':99'],
'sendSyncDefaultTimeout' => 40000,
'debugLogger' => 'php://stdout',
]);
$page = $browser->createPage();
$page
->getSession()
->on('method:Fetch.requestPaused', function ($params) use ($page) {
echo date('r'). ' received Fetch.requestPaused '/*.print_r($params,true)*/."\n";
$page
->getSession()
->sendMessageSync(new \HeadlessChromium\Communication\Message('Fetch.continueRequest', [
'requestId' => $params["requestId"]
]));
});
$page
->getSession()
->sendMessageSync(new \HeadlessChromium\Communication\Message('Fetch.enable', [
"handleAuthRequests" => true,
"patterns" => [
['urlPattern' => '*']
]
]));
try {
$navigation = $page->navigate('https://www.smithsfoodanddrug.com/');
$navigation->waitForNavigation(\HeadlessChromium\Page::LOAD, 50000);
$navigation->waitForNavigation(\HeadlessChromium\Page::NETWORK_IDLE, 50000);
$navigation = $page->navigate('https://www.smithsfoodanddrug.com/signin?redirectUrl=/');
$navigation->waitForNavigation(\HeadlessChromium\Page::LOAD, 50000);
$navigation->waitForNavigation(\HeadlessChromium\Page::NETWORK_IDLE, 50000);
} catch (\Throwable $e) {
echo $e->__toString();
sleep(4);
throw $e;
}
sleep(2);
echo 'end';
var_dump($page->getHtml()); |
for me it crashed after 2 minutes in
with
which i suspect is valid, if you remove
and keep
does it still happen? I suspect some kind of localization issue where it sends people from your region different responses from people in my region (Norway), and your region response triggers the issue, but not my region's response 🤔 |
fwiw when I remove the
(because some javascript is constantly keeping my network un-idle)
|
without this |
nice, it took 5 minutes and multiple retries, but I managed to reproduce it!
Seems to me like there is a bug in the chromium dev protocol where it does not properly encode json! very interesting 🤔 multiple json validators agree that it's not valid json Now the question is how to easily reproduce it.. it took 5 minutes and many retries for me to reproduce it just once. |
Found a way to easily reproduce it: $factory = new \HeadlessChromium\BrowserFactory($chromiumPath);
$browser = $factory->createBrowser([
'customFlags' => [
'--excludeSwitches=["enable-automation"]',
'--lang=en',
'--no-sandbox',
'--disable-setuid-sandbox',
'--enable-features=NetworkService',
'--disable-namespace-sandbox',
'--enable-logging',
'--disable-features=site-per-process,ChromeWhatsNewUI,Translate,AcceptCHFrame,MediaRouter,OptimizationHints,ProcessPerSiteUpToMainFrameThreshold',
'--disable-blink-features=AutomationControlled',
'--enable-blink-features=WebBluetooth,WebBluetoothRemoteCharacteristicNewWriteValue',
'--disable-infobars',
'--start-maximized',
'--no-default-browser-check',
'--remote-allow-origins=*',
'--password-store=basic',
'--disable-save-password-bubble',
],
'headless' => false,
//'envVariables' => ['DISPLAY' => ':99'],
'sendSyncDefaultTimeout' => 40000,
'debugLogger' => 'php://stdout',
]);
$page = $browser->createPage();
$page
->getSession()
->on('method:Fetch.requestPaused', function ($params) use ($page) {
echo date('r') . ' received Fetch.requestPaused '/*.print_r($params,true)*/ . "\n";
$page
->getSession()
->sendMessageSync(new \HeadlessChromium\Communication\Message('Fetch.continueRequest', [
'requestId' => $params["requestId"]
]));
});
$page
->getSession()
->sendMessageSync(new \HeadlessChromium\Communication\Message('Fetch.enable', [
"handleAuthRequests" => true,
"patterns" => [
['urlPattern' => '*']
]
]));
$attempts = 0;
for (;;) {
++$attempts;
try {
$navigation = $page->navigate('https://www.smithsfoodanddrug.com/');
$navigation->waitForNavigation(\HeadlessChromium\Page::LOAD, 50000);
//$navigation->waitForNavigation(\HeadlessChromium\Page::NETWORK_IDLE, 50000);
$navigation = $page->navigate('https://www.smithsfoodanddrug.com/signin?redirectUrl=/');
$navigation->waitForNavigation(\HeadlessChromium\Page::LOAD, 50000);
//$navigation->waitForNavigation(\HeadlessChromium\Page::NETWORK_IDLE, 50000);
} catch (\Throwable $e) {
echo "Attempt $attempts reproduces the issue\n";
throw $e;
}
echo "Attempt $attempts completed\n";
sleep(10);
} will reproduce it every time, but it may take a few minutes. |
warning, when trying to copy out just the invalid json, it seems VSCode actually converted it to valid json (maybe stripped some invalid utf8?), which means the json in my post may be valid json, and you'll need to check the full 5MB log to actually see the invalid json... which will not make it any easier to debug this 🤔 sorry, this is too much for me to look into for free, hope someone else can investigate it. |
thanks :) |
Hi
I'm receiving the following error
Uncaught exception 'HeadlessChromium\Exception\CommunicationException\CannotReadResponse' with message Response from chrome remote interface is not a valid json response. JSON error: 10
JSON error 10 means
The exception is thrown on
vendor/chrome-php/chrome/src/Communication/Connection.php
Method dispatchMessage
The data is from a message
Fetch.requestPaused
, when it's trying to send a tracking data fromhttps://www.smithsfoodanddrug.com
.The post data contains unreadable data like this, which prevent the library from doing json_decode over it
The text was updated successfully, but these errors were encountered: