Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github.com/login flashes the tab to all-grey with some Xorg video drivers #20

Open
martinwguy opened this issue Nov 5, 2020 · 16 comments

Comments

@martinwguy
Copy link
Contributor

Bizarre. On this 32-bit laptop (Acer Aspire 5100), Browser to github.com/login flashes the login box and then makes the whole tab gray, then redisplays the login box then goes all gray forever. Same on 32-bit Debian buster.

The same live image running on a different machine (which has a 64-bit CPU) works fine, so it's not a 32/64-bit issue.

Theory: It depends on the graphics chip and X driver in use,

The laptop has NVidia G72M graphics and Xorg is selecting the nouveau driver, while the other machine has Radeon HD graphics and is using the modeset driver

Experiment: Remove xserver-xorg-video-nouveau, restart X and test.

Result: the Sign In page flashes irregularly circa 1 second between the Sign In page and a gray background. Clicking the username box puts the cursor in there but anything you type is ignored except for making it flash faster. Xorg is using the modeset driver, which says in Xorg.0.log:

(II) modeset(0): [DRI2] DRI driver: nouveau
(II) modeset(0): [DRI2] VDPAU driver: nouveau

which is the same as is used by the nouveau driver. The other machine uses modeset driver with DRI and VDPAU driver r600

Theory: it' depends on the nouveau DRI2 driver

Experiment: Disable the modeset driver, restart X and test. Not so easy, because modeset is part of xserver-xorg-core but we can move /usr/lib/xorg/modules/drivers/modesetting_drv.so to /tmp to disable it.

Result: The github login page now works, and Xorg is selecting the fbdev driver without DRI2.

Conclusion: It's caused by using the nouveau DRI2 module, which exists even when xserver-xorg-video-nouveau is not installed.

Possible fix: remove all accelerated xorg-video drivers (or just nouveau) and disable the modeset driver as above to force everyone to use the fbdev driver. After all, Sugar doesn't benefit much from graphics acceleration as far as I know. Are there video devices that are supported by the other xorg-video drivers but not by the kernel's fbdev?

@martinwguy
Copy link
Contributor Author

The same happens at the sugarizer Login/New User page. However, if you blindly type your username and password at github, it does let you in, so it-s just a visualization issue.

@martinwguy
Copy link
Contributor Author

Test images, to see if the bug affects your video card too, are under
https://sunjammer.sugarlabs.org/~martinwguy/test
Feedback is welcome to help isolate and fix the bug

@ghost
Copy link

ghost commented Nov 8, 2020 via email

@martinwguy
Copy link
Contributor Author

martinwguy commented Nov 8, 2020 via email

@quozl
Copy link
Contributor

quozl commented Nov 8, 2020

My guess is by mixing the now unmaintained combination of kernel and X drivers with 32-bit hardware, you've found a bitrot type of regression that unless someone reports or repairs may never be fixed. Rather than iterate through configuration settings, this kind of problem is better solved at its root, by downgrading kernel and X drivers until it comes good, then using git bisect on the kernel or X to find what change broke it. Then engage with the relevant project to try getting it fixed. It isn't likely to be a problem with Sugar or Sugar Live Build per se. Can be verified by running Epiphany inside an X session instead of Browse and Sugar. You might also consider the version of Metacity in use, as it has a compositor option in some versions. Can be verified by running another window manager.

@martinwguy
Copy link
Contributor Author

Thanks. I've tried with some other WebKit browsers and other Window Managers:

      nouveau/xfce/Epiphany: Works.
      nouveau/xfce/Midori: Works.
      nouveau/xfce/Surf: Works.
      modeset/xfce/Epiphany: Works.
      modeset/xfce/Midori: Works.
      modeset/xfce/Surf: Works.

      nouveau/enlightenment/epiphany: Works.
      nouveau/enlightenment/Midori: Doesn't load the login page. All white.
      nouveau/enlightenment/Surf: Works.
      modeset/enlightenment/Epiphany: Works.
      modeset/enlightenment/Midori: Flashing between login box and all white
      modeset/enlightenment/Surf: Works.
      fbdev/enlightenment: The WM hangs as soon as you launch xterm.

      nouveau/sugar/Epiphany: Works.
      nouveau/sugar/Midori: Works.
      nouveau/sugar/Surf: Works.
      nouveau/sugar/Browser: Flashing to all gray.
      modeset/sugar/Browser: Flashing to all gray.
      fbdev/sugar/Browser: Works

Which all seems fairly random, but the modeset/enightenment/Midori failure shows that it's not a Sugar-only or Browser-only issue, which leaves the Xorg/kernel DRI2 nouveau driver, depending on how the WM or application use it. Nasty.

this kind of problem is better solved at its root, by downgrading kernel and X drivers
until it comes good, then using git bisect on the kernel or X to find what change broke it.

Yes, but how long do you think that would take? I already have immediate "solutions" to this issue and to the metacity bug (enable only fbdev and build for i386 only), and am trying to assess possible negative impacts of these.
I've never seen a 64-bit box that wouldn't boot a 32-bit system but am told they exist. I'm guessing they are rarer than old 32-bit crates (better info is welcome!). I'm also assuming that the Linux frame buffer, being old, probably works on a wide range of cards, including old ones, and that Sugar can live without DRI2 graphics acceleration..

Getting Debian to update metacity in stable (request already sent; no reply yet) or making Live Build compile that package with the patch... and now - gak! - bisecting Linux back to who-knows-when and maybe also X to isolate a bug in the kernel or X's DRI2 drivers... Sorry, but that's too much!

My objective is to make a working Sugar system that's easy to use for most people on cheap computers, but if that objective recedes into the future faster than I manage to get closer to it, the only thing that makes sense is to give up.

@martinwguy
Copy link
Contributor Author

OK, now that I've let off steam... ;-)

You might also consider the version of Metacity in use, as it has a compositor option in some versions.

Can you point me where to look? If we can simply turn off use of DRI2 acceleration, that would be even simpler.

@martinwguy
Copy link
Contributor Author

OK, I've found

gconftool -g /apps/metacity/general/compositing_manager
dconf read /org/gnome/metacity/compositing-manager
gsettings get org.gnome.metacity compositing-manager

of which the first is an obsolete gnome2 thing, dconf seems to be its replacement and gsettings an interface to dconf.

On a Debian buster installation, setting either of the second two to "false" and restarting X, it works OK.
(If you set either to "true" and then to "false", the bug remains until you log out of Sugar and log back in).

On SLB with either metacity-3.30 or metacity-3.34, only gsettings is installed and it says that c-m is false, but the github login box still flashes twice and disappears.
If I install packages dconf-cli and gconf2, dconf agrees with gsettings and gconftool says that compositing_manager is unset. If I set it explicitly to false and restart X, the github login box shows without flashing, but as soon as you click in the Username field, the tab goes gray and stays that way.

So now, between Debian buster's sucrose and SLB build from the same versions of the same packages with, apparently, the same drivers and settings, one can be made to work and the other not. I am out of theories.

But I'm still not happy about feeling the bar being raised each time I find a working solution. I'm not here to be pushed to learn. nor am I here to fix the whole open source world as far upstream as possible.
I'm here to obtain a concrete result that I think would help make Sugar more widely usable, and sugarlabs' support is useful to me only because if I should succeed, the work would benefit worldwide instead of just the few friends I hand out live CDs to.

If other sugarlabs members have other, grander objectives, that conflict with quickly making the Sugar system available to as many people as possible to try easily, then I wish them the best of luck in achieving them.

@quozl
Copy link
Contributor

quozl commented Nov 10, 2020

Know how you feel. I faced similar escalation issues with AbiWord, GTK, Telepathy, Metacity and other things that Sugar depends on. You're right, it can take a while. Fixing the AbiWord problem for instance took me about three weeks full time, and even then the AbiWord project didn't like how I fixed it and so my effort was partially wasted; though I did get to show them why it was the flickering problem was happening. I only started trying to fix it because neither Fedora, Ubuntu, nor Debian were willing to work on it, and the bug had been opened at AbiWord for months.

I'm not trying to raise the bar. Regarding objectives, I can't speak for other Sugar Labs members, and I'm off the oversight board once this election is called. You may need some more patience. There are so few active contributors. It can take a few weeks to get a response from some of them.

Sugar starts Metacity with an explicit disabling of the compositor. See src/jarabe/main.py.

https://github.com/sugarlabs/sugar/blob/master/src/jarabe/main.py#L187

There's no configuration feature in Sugar to change that; you'd have to change the source to test with the compositor enabled. I've not tested the compositing-manager configuration key, but it does appear in metacity:src/core/prefs.c ... though I don't know what priority it has over the command line option.

If changing the compositor, either way, does have an impact, then that points at a problem between GTK, X, and the video card drivers. Like a missing expose event and redraw. Or the recent changes to GTK default stylesheets.

@martinwguy
Copy link
Contributor Author

martinwguy commented Nov 11, 2020 via email

@quozl
Copy link
Contributor

quozl commented Nov 11, 2020

Aha! So I guess Debian starts sugar in some different way, that leaves it possible to enable/disable it. I still don't have a clue though, as it was disabling it that made it work on pure Debian, so it it's always disabled in Live Sugar that should be the same, no? I dunno.

It could be the same. You can prove it by starting Terminal and using ps axfww to look for the metacity process and the command line options that were given to it.

Thanks, I may look at that too, but don't understand the problem so I'll probably just make it use fbdev to cut the bull's head off, as they say here, so I can look at the few other, hopefully more tractable, problems. Cheers M

Sure, if you like. It sounds like the problem is specific to the Nvidia hardware. I don't have any of that, so I can't try to fix it further.

@martinwguy
Copy link
Contributor Author

Trying to isolate the bug more closely...

Theory: It's caused by use of the DRI2 nouveau driver (as it's unlikely to be the VDPAU one, which speeds video decoding)

Experiment: Remove /usr/lib/i386-linux-gnu/dri/nouveau_dri.so and restart X

Result: Xorg.0.log says NOUVEAU(0): [DRI2] DRI driver nouveau but later AIGLX fails to dlopen that file and says "error: unable to load driver nouveau". No further errors are evident and the bug is gone.

Conslusion: The nouveau Xorg driver works and the bug is caused by using the nouveau DRI driver.

As to whether this is a better solution than ripping out all Xorg drivers except fbdev:

  • Do accelerated chip-specific drivers really make a significant difference to Sugar or its Actiities?
  • Are there any video cards that work with chip.specific drivers but not with fbdev?
  • It's known that the accelerated Radeon r660 and Intel i945 DRI drivers work OK, but might there be other chipsets with this problem?

@quozl
Copy link
Contributor

quozl commented Nov 20, 2020

@martinwguy wrote:

  • Do accelerated chip-specific drivers really make a significant difference to Sugar or its Actiities?

Thanks for the question.

Yes, very significant. But the change in performance might be invisible if it is an increase from 10 µs to 5000 µS for a typical drawing operation. Animation is the greatest cost, because it repeats. It is best to consider the question from the angle of GTK and other libraries. Sugar delegates all this to GTK. Activities mostly delegate this to GTK, to Pygame (SDL), to GStreamer, or to the WebKit browser library. If there's doubt, I suggest making performance measurements of Sugar and activity startup time, and animation frame rate.

@quozl
Copy link
Contributor

quozl commented Jan 19, 2021

@martinwguy, a recent mailing list thread suggests installing the firmware-linux-nonfree package. This package contains 279 firmware files for Nvidia products. There's also a xserver-xorg-video-nvidia package to try.

@martinwguy
Copy link
Contributor Author

martinwguy commented Jan 21, 2021 via email

@quozl
Copy link
Contributor

quozl commented Jan 21, 2021

As you've seen in the mailing list thread, I'm unwilling to use my membership of Sugar Labs to make these packages available for download without reviewing the licenses. If you're happy to do that, I won't stand in your way. It's up to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants