Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: refactoring BitmapToVideoMem #2190

Merged

Conversation

ericoporto
Copy link
Member

@ericoporto ericoporto commented Oct 19, 2023

Hi, tried a bit of moving code around in BitmapToVideoMem. In my tests, this benchmark below that clears fullscreen bitmap and pushes to memory using an Overlay went from 22.6 FPS using what we currently have on master to 26.9 FPS after the modifications from this PR. It's nothing ground breaking, but still somewhat an improvement. Please verify this improvement.

Test Game: ClearBenchMark.zip
Project: ClearBenchmarkProject.zip

I am not sure about the many commits and how to organize them or if I should merge them in a single one, and also if the code that I got to in the end is acceptable.

Ah, I don't know any 16-bit game, so I didn't test the 16-bit path, it would be nice to get more testing from this branch.

I also noticed that if original source bitmap had a 1 pixel transparent margin, the linear interpolation would not require all checks it has.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Oct 19, 2023

I think it's best to start with turning these function into template, without the commit that splits into 3 functions, because otherwise this pr first does one change, and then completely different change overriding first.
Then apply any further refactor and changes into a single template function; or squash them into one commit with template, since these are changes to 1 function that does a straightforward thing.

EDIT: commit called "adjust videomemimpl pointer conventions" is wrong, because it edits the code to opposite of our convention (our code style convention is to have T *ptr).

Making "linear filtering" a template arg seems like a good idea.

EDIT: there's also another function nearby called BitmapToVideoMemOpaque (i think), its code may be turned into template too probably.

@ericoporto
Copy link
Member Author

ericoporto commented Oct 19, 2023

Thanks, I will adjust and also do templates on the opaque variant too. Did you get some speed up with in the test game too? I was more looking into speed up things but it's nice we get shorter code.

The lastPixelWasTransparent variable weird approach there, if you happen to know a game or test case that makes evident why that is used, I may look into some alternative way to it. (I think it's there to fight something like this but I am not sure). Edit: no alternative fix I tried worked better than what we have or was faster, so keeping what is there for now. :/

@ericoporto ericoporto force-pushed the perf-gfxdriverbase-refact branch 3 times, most recently from fb6ae63 to 7afc34a Compare October 19, 2023 23:01
@ericoporto
Copy link
Member Author

Adjusted the PR commits! (sorry for the few push-forces, but I think it's done now!)

there's also another function nearby called BitmapToVideoMemOpaque (i think), its code may be turned into template too probably.

I also included that too this time!

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Oct 20, 2023

The "benchmark" test shows an increase of only about 1 fps for me, from 12.6 to 13.5 (it's rather stable), running with provided config (d3d9, linear filter).
Running with stdscale filter adds ~0.2-0.3 to that.
OpenGL has less fps for some reason, and less difference.

This seem practically negligible improvement, much less then the stats posted in this ticket (22 to 26 fps). This might depend on CPU and other factors maybe.


Ah, I don't know any 16-bit game, so I didn't test the 16-bit path, it would be nice to get more testing from this branch.

Majority of games made before 2010 were 16-bit, they may be found in AGS game database.


The lastPixelWasTransparent variable weird approach there, if you happen to know a game or test case that makes evident why that is used,

I have vague memories of seeing the problems caused by this, but I cannot remember where, and running random game without this variable does not show any immediate problems. Some investigation may be necessary. Maybe it's not just this variable, but whole get_pixel_if_not_transparent section works for this.
Also, it's strange that according to comments this is only required for "linear filtering", but this variable is used regardless of the filter arg.

In regards to optimizing this, if your aim is to avoid if operators where possible, there's this trick of replacing a conditional bool with a factor to use in math, either as a multiplier or a bitwise & factor. For example:

uint32_t lastPixelTrsFactor = 0;

// lastPixelWasTransparent = true; is replaced with -
lastPixelTrsFactor = 0xFFFFFFFF;

// applying this factor is replaced with -
idst[x - 1] = (idst[x - 1] & ~lastPixelTrsFactor) | ((idst[x] & 0x00FFFFFF) & lastPixelTrsFactor);

This means that if factor is set to 0, then the [x-1] pixel keeps full value from itself, and 0 value from [x], and if factor is set to 0xFFFFFFFF, then [x-1] gets 0 from itself and full value from [x].
I cannot predict how much faster this will be (if faster at all).

Unfortunately, it seems that this cannot be applied directly in this code, as you must to test that x > 0 (otherwise this may cause array underflow). Overall, checking for surface bounds result in most conditions there (for the linear filter case), so if that may be restructured somehow maybe it will result in higher speed. But idk if it's worth it.

@ericoporto
Copy link
Member Author

ericoporto commented Oct 20, 2023

The black border issue can be evidenced by making a game in some low resolution (say 320x180) and then having it set to use linear filtering, and run it at a bigger resolution. Then one needs to import assets that are 32-bit, with parts that are fully transparent and parts that aren't fully transparent with some color different than black. Say a color like 0xF000F0FF (pink) now have the fully transparent pixels be 0x00000000. What happens is the linear interpolation from Directx/OpenGL will make the interpolation between all parts (R,G,B,A), meaning R and B in this case will approximate to 0, as the alpha also approaches, resulting in a black border. I will make a project that showcases this issue.

(Ah, this could be fixed in the Editor at import time too Edit: actually it can’t because we use magic pink color for fully transparent pixels, I don't remember if this is replaced at runtime or editor import.)

I have a lot of difficulty comprehending these RGBA maths (or ARGB, I don't remember what video uses), but from what I read online premultiplied alpha offers some fix to this somehow, but I couldn't understand.

I found it weird you did not get as much improvement as I with what is there already. :/

I don't plan to add anything to this PR right now.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Oct 21, 2023

I tested 16-bit and 8-bit games, and things seem to work as before (in terms of visuals). So I will merge this, as this may have a benefit of having a single universal algorithm.

@ivan-mogilko ivan-mogilko merged commit 9159ea3 into adventuregamestudio:master Oct 21, 2023
20 checks passed
@ericoporto ericoporto deleted the perf-gfxdriverbase-refact branch October 21, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants