Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New headless chrome randomly hangs at the end #27264

Closed
maciejmrozinski opened this issue Jul 11, 2023 · 29 comments
Closed

New headless chrome randomly hangs at the end #27264

maciejmrozinski opened this issue Jul 11, 2023 · 29 comments
Labels
stage: needs information Not enough info to reproduce the issue stale no activity on this issue for a long period type: bug

Comments

@maciejmrozinski
Copy link

maciejmrozinski commented Jul 11, 2023

Current behavior

Using new headless mode in Chrome it sometimes hangs after running test file (also mentioned in this comment #25972 (comment))

Desired behavior

Consistent, correct behaviour just like with headless=old

Test code to reproduce

There is no simple repo to reproduce. From my observation it hangs mostly if test flow contains the page that has external (iframe) loaded content (in my case it's google recaptcha). CI runs are more vulnerable to this issue, but I was able to reproduce it few times locally inside docker container.

Simple test code that sometimes fails:

describe('test_1', () => {
    it('Test test', () => {
        cy.visit('/login');// visit page that has reCaptcha loaded
        cy.get('img.deskop__logo').click();// click on main logo, get back to homepage
    });
});

Cypress Version

12.17.0

Node version

18.16.0

Operating System

Docker image: cypress/browsers:node-18.16.0-chrome-113.0.5672.92-1-ff-113.0-edge-113.0.1774.35-1

Debug Logs

Last few lines of good run (exited correctly):
  cypress:launcher:browsers chrome stderr: [0711/114213.126568:ERROR:nacl_helper_linux.cc(355)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly +831ms
  cypress:launcher:browsers chrome exited: { code: 0, signal: null } +1ms
  cypress:server:preprocessor removeFile /tests/files/test.ts +13s
  cypress:server:preprocessor base emitter plugin close event +0ms
  cypress:server:preprocessor base emitter native close event +2ms
  cypress:server:preprocessor base emitter native close event +0ms
  cypress:server:browsers:chrome closing remote interface client +416ms
  cypress:server:cypress about to exit with code 0 +19s
  cypress:webpack close /tests/files/test.ts +8s
  cypress:server:browsers browsers.kill called with no active instance +15s
  cypress:proxy:http:util:prerequests metrics: { browserPreRequestsReceived: 108, proxyRequestsReceived: 84, immediatelyMatchedRequests: 29, unmatchedRequests: 6, unmatchedPreRequests: 0 } +0ms
  cypress:cli child event fired { event: 'exit', code: 0, signal: null } +20s
  cypress:cli Stopping Xvfb +21s
  cypress:cli child event fired { event: 'close', code: 0, signal: null } +5ms

Last few lines of bad run (hangs):
  cypress:launcher:browsers chrome exited: { code: 0, signal: null } +1s
  cypress:server:preprocessor removeFile /tests/files/test.ts +18s
  cypress:server:preprocessor base emitter plugin close event +0ms
  cypress:server:preprocessor base emitter native close event +1ms
  cypress:server:preprocessor base emitter native close event +0ms
  cypress:server:browsers:chrome closing remote interface client +10ms
  cypress:webpack close /tests/files/test.ts +11s
  cypress:launcher:browsers chrome stderr: [0711/114246.895419:ERROR:nacl_helper_linux.cc(355)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly +4ms

Other

No response

@jennifer-shehane
Copy link
Member

jennifer-shehane commented Jul 12, 2023

@maciejmrozinski headless=new is what is run in Cypress by default since 12.15.0, so you don't need to pass the workaround tp get the new headless. Does turning off headless=new resolve your issue? Pass the code below, let us know.

setupNodeEvents: function setupNodeEvents(on, config) {
  on('before:browser:launch', (browser = {}, launchOptions) => {
    if (browser.name === 'chrome' && browser.isHeadless) {
        launchOptions.args.push('--headless');
    }
    return launchOptions;
  });
}

@maciejmrozinski
Copy link
Author

Yes, I've already tested this before and using old headless mode is resolving the issue.

@jennifer-shehane
Copy link
Member

@maciejmrozinski We'll need a test to run to reproduce this behavior fully. We haven't observed this behavior on our test suite, so there is something particular about your run that's causing the hang that we'd need to be able to run so we can track down the issue.

@mikejav
Copy link

mikejav commented Jul 20, 2023

It's also applicable to my project.
Thx guys for mentioning --headless=old. It works like a charm.

@tiehfood
Copy link

tiehfood commented Aug 1, 2023

We have the same issue, will there be a fix in the future?

@vire
Copy link

vire commented Aug 11, 2023

I have this issue as well on CI (CircleCI) I get yarn exited with code 1 🤷

When I try cypress run --record false --browser chrome --headless=old
with config:

> yarn cypress --version
Cypress package version: 12.17.3
Cypress binary version: 12.17.3
Electron version: 21.0.0
Bundled Node version: 16.16.0

I get
image

@cheuk0324
Copy link

Is there update for this issue?

@MikeMcC399
Copy link
Contributor

MikeMcC399 commented Sep 15, 2023

@vire

I have this issue as well on CI (CircleCI) I get yarn exited with code 1 🤷

When I try cypress run --record false --browser chrome --headless=old with config:

--headless=old is not to be used as a CLI parameter for Cypress. It is a command line flag for Google Chrome. The way to pass this argument is described in #27264 (comment). The CLI argument for Cypress is --headless. (See Cypress Guides > Command Line > Options).

See also Google Chrome Developer > Try out the new Headless

  • Be aware however that there is a new issue with Google Chrome 117 which stops this working. See Disable --headless=new for Chrome - fails with Chrome 117 cypress-documentation#5483 for details. The Chrome arguments --headless and --headless=old are supposed to be equivalent, and this worked from Google Chrome 112 through 116. In Chrome 117 there is a bug which crashes Chrome when these arguments are passed.
    Edit: Fixed in Google Chrome 117.0.5938.132.

@bericp1
Copy link

bericp1 commented Sep 15, 2023

We're also encountering this issue with our cypress runs in CI.

We were using the --headless=old workaround but as mentioned that's broken in latest chrome. Removing that argument from browser launch options brought back this hanging issue for us.

We use parallelization via DeploySentinel. You can see that one of the parallel runs in this example (#5) drops to 0% CPU and hangs, producing now output, eventually being cancelled by CircleCI due to inactivity.

I'll enable verbose debug logging on the process handler and the chrome browser launch to get chrom stderr and report back if that reveals anything.

@jennifer-shehane
Copy link
Member

We need a reproducible example provided so that we can narrow down the cause of hanging for some users with the new headless behavior. Please can someone provide one.

@bericp1
Copy link

bericp1 commented Sep 19, 2023

The problem is that it's not reproducible, it's seemingly random. We're not sure what triggers it.

@tiehfood
Copy link

We also still have this issue. Currently no clue on how to reproduce it, as it occurs randomly

@cheuk0324
Copy link

This issue is random, it seems it is hanging with one of the longest test or with the last instance. But this seems to get resolved after upgraded to the latest

@bericp1
Copy link

bericp1 commented Sep 22, 2023

@jennifer-shehane we don't have a reproducible example but we did manage to capture debug logs for a run that this happened to. See attached file.

circleci.com_api_v1.1_project_github_nursefly_nursefly-web_482932_output_120_4_file=true&allocation-id=650cecab5769582ef4129e07-4-build%2FABCDEFGH.txt

This is with the following:

DEBUG=cypress:server:util:process_profiler,cypress:launcher:browsers

The logs that look promising there to me are shortly before the tests just hang we get:

  �[36;1mcypress:launcher:browsers �[0mchrome exited: { code: �[33m0�[39m, signal: �[1mnull�[22m } �[36m+3s�[0m
  �[36;1mcypress:launcher:browsers �[0mchrome stderr: [0922/014838.655632:ERROR:nacl_helper_linux.cc(354)] NaCl helper process running without a sandbox!

And then the chrome process is missing from the process list that immediately follows. It pops up again later but maybe cypress isn't reconnecting to the new chrome process and hence the hanging?

@jsotelo
Copy link

jsotelo commented Sep 23, 2023

--headless=old also worked for us.

We did notice that peak memory usage was around 3.5 GB with new, whereas old uses about 2.4 GB. Our github actions runner has a 4GB resource limit. Perhaps chrome is running out of memory and is making cypress hang (pure guess).

We are using the following github actions config:

  test:
    runs-on: [self-hosted, prod]
    container:
      image: cypress/included:cypress-13.2.0-node-20.6.1-chrome-116.0.5845.187-1-ff-117.0-edge-116.0.1938.76-1
      options: --ipc=host

and the following cypress.config.ts:

  e2e: {
    setupNodeEvents(_on, _config) {
      _on('before:browser:launch', (browser, launchOptions) => {
        if (browser.name === 'chrome') {
          launchOptions.args.push('--disable-dev-shm-usage');
          launchOptions.args.push('--headless=old');
        }
        console.log(launchOptions.args);
        return launchOptions;
      });
    },

@PavanGurram-DevOps
Copy link

Hi there, I'm also facing same issue with cypress version 12.7.4 and chrome version 112 when try to execute the tests in parallel. I have tried using above '--headless=old' but no luck.

Please can someone help? Thanks

@cheuk0324
Copy link

Our issue was resolved by itself after upgrade to v13

@PavanGurram-DevOps
Copy link

Unfortunately, I can't upgrade to v13 so need some workaround please

@jennifer-shehane
Copy link
Member

We have observed a slowdown in performance for one project when using headless=new. We're still interested in having examples that show this behavior so that we can narrow down the issue. We suspect there's likely a bug in Chrome headless, but it's specific to some situation.

@joergschiller
Copy link

joergschiller commented Dec 11, 2023

Not sure if it's really helpful but it could support the hypothesis that it's a bug in Chrome headless.

We're having the same issue that Chrome with new headless modes just hangs randomly after running all tests. But with a whole different stack: We're on Ruby and using RSpec/Capybara.

@Javediqbal2
Copy link

@jennifer-shehane I'm facing same issue with electron browser. Cypress tests hangs up sometimes at the end and sometimes before starting. In cypress cloud it show "This spec does not have any test results because it timed out". I've faced same issue from cypress 12.13.0 to 12.17.0 and for some people it hangs in firefox too. I'm mentioning this issue for reference

cypress-io/github-action#620

@jennifer-shehane
Copy link
Member

Is this still occuring for people? We haven't had comments for a couple of months.

@pirate
Copy link

pirate commented Feb 28, 2024

Yes, myself and other users of my project are still seeing headless chrome randomly hang before exit, even when it's run directly via CLI outside of cypress. I'm almost positive it's an upstream chrome bug. Rebooting often fixes it, waiting an hour and trying again sometimes fixes it, force-reinstalling chrome also often fixes it, which puts it squarely into heisenbug territory.

Related:

@jennifer-shehane
Copy link
Member

@pirate What version of Chrome are you using? Have you tried updating?

@pirate
Copy link

pirate commented Feb 29, 2024

This issue has been present as far back as v60 but got much worse in v112 (when we switched to the new headless=new), and has persisted all the way up to v121.0.6167.57 and beyond with some versions worse than others.

It's intermittent and hard to verify sometimes, so many issues I've found about it on related projects have gotten closed as "cannot reproduce". I've just confirmed it's happening particularly consistently with v121 though, but I still can't figure out why or when, as sometimes weird things like rebooting make it go away. I can post back here as I collect more reports on the latest versions.

There are also widespread reports of similar issues with the two telltale symptoms:

  • chrome headless seeems to hang indefinitely on exit sometimes
  • my [insert headless driver here] appears to have a memory leak (caused by chrome child processes hanging on exit and not releasing their memory)

Possibly related reports:

It's possible some of these issues ^ are unrelated, but it's also possible they all stem from the same underlying issue of child chromium processes not exiting correctly.

The problem is widespread enough that many of the tools that use chrome headless have implemented hacky workarounds like this: https://devforth.io/blog/how-to-simply-workaround-ram-leaking-libraries-like-puppeteer-universal-way-to-fix-ram-leaks-once-and-forever/ (spawning chrome under a child process then doing killasgroup -9 after every run)


I tried again just for fun and managed to reproduce this on the first try!

I didn't even add any of the extra args we usually use (--disable-gpu, --no-sandbox, --disable-features=dbus, etc.), it hung immediately on the first try with only --headless=new and --screenshot!

Screenshot 2024-02-29 at 5 30 24 PM

This dispelled the last of my doubts, I think this is 100% an upstream Chromium bug and has nothing to do with Cypress/Playwright/Puppeteer/ArchiveBox/any driver.

I just opened an upstream bug report on the Chromium bug tracker, follow over there for progress: https://issues.chromium.org/issues/327583144 👾

htop

profiling1profiling2

@jennifer-shehane
Copy link
Member

@pirate Thanks for the detailed writeup and opening an issue with chromium. We'll take a look. It is extremely difficult to track down with all the variables involved as you explained. Is there a way to provide the project you're running where you got it to hang immediately?

@pirate
Copy link

pirate commented Mar 2, 2024

It's not in any project, it's just raw chromium headless from the command line, no extra env vars, hidden CLI flags, or profile directory provided:

# using chromium downloaded via puppeteer
# recommended by: https://www.chromium.org/getting-involved/download-chromium/
$ npx @puppeteer/browsers install [email protected]
$ ~/chrome/mac_arm-121.0.6167.57/chrome-mac-arm64/Google\ Chrome\ for\ Testing.app/Contents/MacOS/Google\ Chrome\ for\ Testing --headless=new --screenshot 'https://example.com'
[63086:259:0301/184829.306692:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.
72602 bytes written to file screenshot.png
# ... hangs indefinitely ...
^C⏎
[138.101s]

# OR equivalent using playwright's chromium
$ pip install --upgrade playwright
$ playwright install --with-deps chromium
$ ~/Library/Caches/ms-playwright/chromium-1097/chrome-mac/Chromium.app/Contents/MacOS/Chromium --headless=new --screenshot 'https://example.com'
[63478:259:0301/185212.347544:ERROR:policy_logger.cc(156)] :components/enterprise/browser/controller/chrome_browser_cloud_management_controller.cc(161) Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.
72602 bytes written to file screenshot.png
# ... hangs indefinitely ...
^C⏎
[241.309s]

both_puppeteer_and_playwright_chrome_hanging

Hung on the first try for both methods. Other versions besides 121.0.6167.57 do it too, but this one appears to do it particularly consistently on my machine. I can also reproduce this on freshly installed Ubuntu 22.04, and on both x86 and arm64 machines with both macOS and Linux.

@cypress-app-bot
Copy link
Collaborator

This issue has not had any activity in 180 days. Cypress evolves quickly and the reported behavior should be tested on the latest version of Cypress to verify the behavior is still occurring. It will be closed in 14 days if no updates are provided.

@cypress-app-bot cypress-app-bot added the stale no activity on this issue for a long period label Sep 15, 2024
@cypress-app-bot
Copy link
Collaborator

This issue has been closed due to inactivity.

@cypress-app-bot cypress-app-bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage: needs information Not enough info to reproduce the issue stale no activity on this issue for a long period type: bug
Projects
None yet
Development

No branches or pull requests