[FIRE 35011] Weird patterned extreme CPU usage when using more than 6gb vram on 10g card - WIP Test #66

minerjr · 2025-01-16T02:36:41Z

Hi, so I have been looking into this issue [https://jira.firestormviewer.org/browse/FIRE-35011]( FIRE 35011 ) which is related to I think other user's performance issues that have been submitted.

I create a couple of videos walking though what the issue is and my solution. This is a work in progress but wanted to get some feedback on the approach.

[https://youtu.be/ulHrdk4Wc8A](FIRE 35011 Progress - Part 1)
[https://youtu.be/NCqOwPDU-9Q](FIRE 35011 Progress - Part 2))

Showing the CPU not spiking from switching from High memory usage to lowered.

The issue stems from low "VRAM" status, where the Bias starts to increase and the system has an initial 1.5 spike that is designed to purge the off screen textures and 2.0 where in the
void LLViewerTextureList::updateImageDecodePriority(LLViewerFetchedTexture* imagep, bool flush_images)
Method, the system then forces on screen textures to down scale.

Also, the system uses the bias to determine how many objects to work on. The higher the value the more textures it works on. Up to 80% of the entire texture pool in 1 frame is selected to be processed. This is what adds all the fetch's when the bias is greater then 2, even textures in camera get get downscaled.

Then the cycle begins. Now that the textures have been deleted, the bias decreases, and then now there is free memory and as in
F32 LLViewerTextureList::updateImagesFetchTextures(F32 max_time), it does another update with all 40% of the textures and now they all at once get a request for higher texture memory size (mDesiredDiscard where 0 is the max resolution and 5 (MAX_DISCARD) is the upper limit, but in OpenGL the numbers are in reverse...

They all suddenly start creating new textures and trying to load from the various cashes and causes out of memory and once again the system then flushes them all down and rinse and repeat.

I have been trying a few various things and I came up with using a memory pool for the deleted objects. The biggest performance killer is newing/deleting memory and re-setting up the objects and connections. Instead of deleting like it does currently, in the function I move the texture to a new mUUIDDeleteMap object similar to the mUUIDMap already used by the LLViewerTextureList. I then update the find methods for any call which requests the textures to search the normal memory first and if not found, check the delete pool for the texture and if found, re-add it to the main texture pool. This in itself had a tremendous perforce uplift for my viewer.

I also made it so that the increase of checks only happens while the bias is increasing and not all the time.

I also changed it so that textures now have a mini state machine which tracks what happened to them in their life time. This can be further used to refine the behaviors of what to do with deleted memory, how to handle edge cases and know if a texture was deleted just from regular use or from a memory overage event. I also add a delay for when a texture can be updated again to try to spread out the load more to quell the spikes.

When a texture is downscaled or deleted and brought back, they are delayed on how quickly they will try to up-res to try and help with the surging of the requests.

I also did a test of faking the amount of ram I had and doubled the report available RAM and it ran well.

Created a shadow mUUIDMap called mUUIDDeleteMap which contains a key/image pair of any object that is deleted. When a texture is deleted, it not moves over to this list, and is removed from the normal mUUIDMap and mImageList. Currently the callbacks are still there, but not being called by not being in the main lists. Added the reduction of calls as bias falls, adding flag to turn on/off the new feature. Fixed issue of not disconnecting the textures on shutdown.

Added new handler for when memory runs low. Stopped deleting textures and instead put delete detects on a separate map which then can be later referenced and restored when the same texture request comes back again. Added sFreeVRAMMegabytes = llmax(target - used, 0.001f); fix that LL implemented. Added state object for the LLVIewerTexture memory (Fetched and LOD)

Fixed up the comments to include the full JIRA issue. Added some comments on possible further investigations and tests if needed.

Update to have more checks around new code for enable/disable the feature. Re-added checks on updatefetch to limit the deleted and scaled texture desired. Added comment out code to automatically delete data in the delete UUIDMap, but want to test out the current system as is.

beqjanus · 2025-01-16T03:19:11Z

Thanks, I've pulled the change in to my local repo and will test it in the morning.
As mentioned in discussions, we need to think about how this works as we've escalated this issue to LL who will doubtless make their own changes (there is one change already in ForeverFPS) I think that with it logically separated and configurable that gives us a good opportunity to test them side by side.

beqjanus · 2025-01-16T17:51:01Z

I'm not sure that it is working right for me, but it is hard to know as I never really saw the issues before (not reliably at least)
Limiting the texture memory never seems to work for me. I've not explored why or whether that is just me and some rogue setting.

Legacy algorithm

FIRE-35011

This was at Warehouse 21, busy high texture load.

I TP'd home, and the texture memory never dropped. despite it being very low texture and mesh (my home is predominantly a 2008 build with a handful of my newer things around)
back home - comparatively low load

after a restart

On the current beta (so existing behaviour), the texture usage does revert back to that stable number for that region after a TP; the new version seems to hold on to things.

Current beta version at busy venue

Current beta version after returning

As you'd expect, there is more texture fetch activity with the old method/beta, so there is definite improvement in that respect the memory profile though is concerning (or I am misreading it)

Some change, but not working well currently.

Ansariel · 2025-01-16T19:52:24Z

LL is working in the same area in ForeverFPS right now, so it doesn't seem to make much sense to fiddle with this issue in the current master branch. I even might have to revert the entire changes outright when merging master into the ForeverFPS merge repo.

minerjr · 2025-01-16T19:54:28Z

Yeah, I am going to pull the test, it seems to be not working quit right anyway. Thanks for the feed back.

minerjr · 2025-01-16T19:55:29Z

Closing this pull as not performing how I was expecting and needs more work.

minerjr added 4 commits January 15, 2025 01:25

FIRE-35011 - Comment fixes and formatting updates

5a3663b

Fixed up the comments to include the full JIRA issue. Added some comments on possible further investigations and tests if needed.

FIRE - 35011

62369b0

Some change, but not working well currently.

minerjr closed this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIRE 35011] Weird patterned extreme CPU usage when using more than 6gb vram on 10g card - WIP Test #66

[FIRE 35011] Weird patterned extreme CPU usage when using more than 6gb vram on 10g card - WIP Test #66

minerjr commented Jan 16, 2025

beqjanus commented Jan 16, 2025 •

edited

Loading

beqjanus commented Jan 16, 2025

Ansariel commented Jan 16, 2025

minerjr commented Jan 16, 2025

minerjr commented Jan 16, 2025

[FIRE 35011] Weird patterned extreme CPU usage when using more than 6gb vram on 10g card - WIP Test #66

[FIRE 35011] Weird patterned extreme CPU usage when using more than 6gb vram on 10g card - WIP Test #66

Conversation

minerjr commented Jan 16, 2025

beqjanus commented Jan 16, 2025 • edited Loading

beqjanus commented Jan 16, 2025

Ansariel commented Jan 16, 2025

minerjr commented Jan 16, 2025

minerjr commented Jan 16, 2025

beqjanus commented Jan 16, 2025 •

edited

Loading