Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made ScriptString store length in data header, use to speed up API functions somewhat #2187

Merged

Conversation

ivan-mogilko
Copy link
Contributor

@ivan-mogilko ivan-mogilko commented Oct 17, 2023

Fix #2177 .

Bring ScriptString to the form similar to DynamicArray and UserStruct, where string's meta-info is stored in the same allocated buffer, prepended before the actual text data. This is for general consistency and to be able to retrieve extra info from the received pointer in script functions.

ScriptString's header has two values:

  • string length in bytes;
  • string length in characters, unicode-compatible.

This is because some operations may require either one or another.

EDIT: later added 2 uint16 values which store last requested char index and corresponding char offset. This is done to speed sequential Chars[] iteration up. Because of utf-8 strings the direct access at index is no longer possible, so this seem to be the only possible solution in the absence of a string iterator (this is something left for future ags4 dev). Also see comments in code and in this thread for details.

In all the String's member functions assume that the first argument is the ScriptString, and has this info prepended. Therefore, don't call strlen on arg1, but retrieve precalculated values.
Unfortunately we cannot do the same with the second string arg in functions like String.Append, because we do not know where it came from (it could be a string literal, etc). This cannot be resolved unless we either overhaul how strings are handled in script vm, or provide overloaded methods which accept explicit String argument.


UPDATE

Used following script for a performance test (under spoiler).
bool test_appending;
bool test_iterating;
String test_string;

function room_Load()
{
	test_string = "";
	for (int i = 0; i < 1000; i++)
		test_string = test_string.AppendChar('A');
}

function on_key_press(eKeyCode k)
{
	if (k == eKey1)
		test_appending = !test_appending;
	if (k == eKey2)
		test_iterating = !test_iterating;
}

function noloopcheck room_RepExec()
{
	if (test_appending)
	{
		test_string = "";
		for (int i = 0; i < 1000; i++)
			test_string = test_string.AppendChar('A');
	}
	if (test_iterating)
	{
		for (int c = 0; c < test_string.Length; c++)
		{
			char ch = test_string.Chars[c];
		}
	}
}

After trying couple of variants, the latest fps results look like this:

Test 3.6.1 PR 2187
AppendChar 715 1440-1500*
Iterate 128 470

(*) AppendChar test fps jumps too much, perhaps due to lots of memory reallocations, but later settles at around 1500 fps.
In the end it seems that appending a character in this test became x2 faster and iterating over string x3.6 faster.


UPDATE 2

After adding a iteration speed up fix, the fps results look like:

Test 3.6.1 PR 2187
AppendChar 715 ~1440-1500
Iterate 128 ~1600

@ivan-mogilko ivan-mogilko added context: script api context: performance related to improving program speed or lowering system requirements labels Oct 17, 2023
@ivan-mogilko ivan-mogilko force-pushed the 361--scriptstringperf5 branch 2 times, most recently from d694ea5 to 71bb753 Compare October 18, 2023 00:03
@ivan-mogilko
Copy link
Contributor Author

Hmm, so the situation is ambiguous.

Used following script to test.
bool test_appending;
bool test_iterating;
String test_string;

function room_Load()
{
	test_string = "";
	for (int i = 0; i < 1000; i++)
		test_string = test_string.AppendChar('A');
}

function on_key_press(eKeyCode k)
{
	if (k == eKey1)
		test_appending = !test_appending;
	if (k == eKey2)
		test_iterating = !test_iterating;
}

function noloopcheck room_RepExec()
{
	if (test_appending)
	{
		test_string = "";
		for (int i = 0; i < 1000; i++)
			test_string = test_string.AppendChar('A');
	}
	if (test_iterating)
	{
		for (int c = 0; c < test_string.Length; c++)
		{
			char ch = test_string.Chars[c];
		}
	}
}

Results, in approximate fps:

Test 3.6.1 PR 2187
Append 715 448
Iterate 128 467

Which means that appending a string char by char became about x1.6 slower, but iterating over it x3.6 faster.
The slowness likely comes from precalculating unicode-compatible length, but this is also something that works for iterating things faster (.Length and .Chars[] properties).
I might look if it's feasible to reduce length recalculation, as we might expect that pre-existing String arg is already valid utf-8. Or may we?... got to think about this.

@ivan-mogilko ivan-mogilko force-pushed the 361--scriptstringperf5 branch from 1939a9e to 08aae0b Compare October 20, 2023 22:49
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 20, 2023

I might look if it's feasible to reduce length recalculation, as we might expect that pre-existing String arg is already valid utf-8. Or may we?... got to think about this.

Hmm, so, quick experiment shows Append test rise to ~1400 fps, x2 speed up compared to 3.6.1. This may be a way to go, only have to reorganize the code...

EDIT: also have some idea about what to do with invalid utf-8.

This potentially allows to access any metadata while passing string pointer around the script functions.

As a consequence: never wrap external buffer, always allocate your own.
Instead of giving out a pointer to ScriptString internals, have a dedicated ScriptString::Buffer struct that already has a buffer allocated according to ScriptString's rules (with meta header).
ScriptString provides a method for creating such buffer, and for creating a new ScriptString object from the Buffer.
This has an advantage of one less memory copy compared to passing a const char* into a factory method, which may make some difference when modifying very long strings.
@ivan-mogilko ivan-mogilko force-pushed the 361--scriptstringperf5 branch from 08aae0b to e2beb30 Compare October 24, 2023 23:39
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 24, 2023

New variant, which reuses known unicode-compatible lengths of substrings (in Append, AppendChar, etc).

Results, in approximate fps:

Test 3.6.1 PR 2187
AppendChar 715 1440-1500*
Iterate 128 470

(*) AppendChar test fps jumps too much, perhaps due to lots of memory reallocations, but later settles at around 1500 fps.

In the end it seems that appending a character in this test became x2 faster and iterating over string x3.6 faster.

I doubt this may get much faster while having immutable strings which have to be reallocated on Append etc.

In regards to getting a character by index, something that should be kept in mind, this became slower in Unicode mode, as utf-8 does not allow to jump to an index, it has to parse string char by char to find where char N resides. In script this is complicated by not being able to continue from where the last character was found (when iterating in a loop). This likely may be improved only by introducing iterators of some kind (?).

@ivan-mogilko ivan-mogilko marked this pull request as ready for review October 24, 2023 23:58
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 25, 2023

In regards to getting a character by index, something that should be kept in mind, this became slower in Unicode mode, as utf-8 does not allow to jump to an index, it has to parse string char by char to find where char N resides. In script this is complicated by not being able to continue from where the last character was found (when iterating in a loop). This likely may be improved only by introducing iterators of some kind (?).

Hmm, I did an experiment that saves last found char index and offset in a string's header, and then uses them if the requested index is equal or higher than the saved one.
This sped up Chars[] iteration test up to almost 1600 fps, which is x3.4 faster vs this PR (470 fps), and x12 faster vs original engine (128 fps).
:-/
this is too good to ignore, but at the same time it means another 8 extra bytes per string in memory.
Current PR adds 8 bytes (2 int32 values), so this potential fix would add 8 to a total 16 bytes, which is already comparable with a length of a short meaningful string.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 25, 2023

Given the above, I think I might have done a mistake when transitioning to Unicode. It could have been beneficial to leave Chars[] as a ascii-only access for faster accessing, and add another property like UChars (or whatever) for accessing unicode chars.
Of course, normally one should have iterators for parsing strings sequentially char by char, but in the current situation with AGS where it has none above could have been a working solution to keep things faster.

I also wonder how long should the string be on average when this problem becomes noticeable.

I'd like to compare performance between ASCII and Unicode game modes, in both cases: before this PR and after this PR, to see roughly how gains by optimizing storing speed and fast access compare between each other.


UPDATE:

First of all, I found out that there's a mistake in the engine where it deals with some operations, it always uses "unicode" versions of functions, even in ascii mode, which makes operations slower than they could be. That's something that may (should?) be fixed separately.

Now, comparing "fast char access" in ascii mode and char access in unicode mode, the gain is following:

Engine ASCII chars Unicode chars
3.6.1 257 128
PR 2187 1600+ 470

This experiment shows that:

  • A repeating strlen call was a major bottleneck, as soon as it's removed by this PR - other faster operations cause much more effect.
  • Removing this bottleneck alone actually makes iterating utf-8 faster than ascii char access had been prior to unicode support (470 vs 250+ fps). This is a kind of a relief...
  • ASCII iteration result is practically equal to the result of my experiment above where I stored "last accessed" character offset in String's header. But of course that experimental solution is only applicable for sequential iteration, and not a random access. But that's the general problem with utf-8 strings, which cannot be solved, I guess.

@fernewelten
Copy link
Contributor

It could have been beneficial to leave Chars[] as a ascii-only access

Or, we might perhaps add a property RawChars[] that offers fast ascii-only access for those that need it for some reason, and keep Chars[] unicode-enabled as the default case for most situations.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 25, 2023

Or, we might perhaps add a property RawChars[] that offers fast ascii-only access for those that need it for some reason, and keep Chars[] unicode-enabled as the default case for most situations.

That too, although I've been thinking about backwards compatibility. But then, all the pre-3.6.0 games are run in ASCII mode, so the API compatibility issue is mostly related to games started in 3.6.0+ and having Unicode mode.

On the other hand, having iterators is still a must, as there are cases when you need to parse a unicode string too (various speech modules, and others which work with "human text"). So this would be only a part of solution.

PS. Updated the previous comment with some test results.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 25, 2023

So, to summarize:

  • This PR is already quite good, as it speeds things up.
  • This PR alone already speeds up character iteration rougly by x2 compared to what it's been prior to unicode support and roughly x4 compared to unicode strings in 3.6.0.
  • PR change requires 8 extra bytes per a script String object (length in bytes + precalculated length in unicode chars), compared to 4 bytes in the existing engine.
  • PR changes only affect runtime storage in program memory, game saves are not affected.

Extra:

  • There's an opportunity to speed unicode string sequential iteration by roughly x3.5-x4 more, at a cost of storing additional 8 bytes per String. That's not much on its own, but keep in mind that this cost is paid per every String object, regardless of whether String is being iterated or not at all in script. OTOH this fix would not impose any dependency, or change save format etc, and is easily reversible.
  • BTW, this may be reduced to 4 bytes if it would restrict "last char index & offset" info to uint16 values, that is work within 64k long strings, and get worse performance after 64k characters.
  • The alternative is to stop here and plan String iterator, likely in ags4.

@ericoporto
Copy link
Member

ericoporto commented Oct 25, 2023

at a cost of storing additional 8 bytes per String

When you write this, just to see if I understood, this is not "paid" by string in the game data that are like in a player.Say("this string"); or by something in the old string, this is strictly by String objects that are being hold at runtime, and not for save files containing such objects, right? Also, I now wonder if the FastString type we use in the AGS Editor C# has something like this.

Overall I have no idea when you put this if this is significant memory or not - it feels like it isn't, I don't remember seeing so many String objects being hold for a long period in memory, and as many small String objects as globals.

In any case I don't have a strong opinion, I went to check if my JSON parser could be sped up by this, but turns I don't have an use-case that really requires the speed from here.

The alternative is to stop here and plan String iterator, likely in ags4

That looks fine too, I think by this you mean something that would be more explicit about this sequential nature, I don't know what the API is but it's fine to get it in ags4.

Anyway, this looks ready for merge.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 25, 2023

When you write this, just to see if I understood, this is not "paid" by string in the game data that are like in a player.Say("this string"); or by something in the old string, this is strictly by String objects that are being hold at runtime, and not for save files containing such objects, right?

This is correct, all this is relevant strictly to managed String objects, and only in runtime mem.


BTW, on a curious note, the old ags scummvm port (I've been checking it again lately) had every string in the script allocated and passed around as a ScriptString which wrapped a scummvm's own string in itself. This was done for everything including string literals in script mem. Right now it's impossible to do the same for AGS, because of plugin API being able to call script functions, as there's no way for plugins to know what kind of object they received, and how to unwrap it to get text data.
This is a general issue shared with dynamic arrays, a reason why I reverted plans for "wrapping" image buffer in managed array. I suppose this may be fixed as long as plugin API is amended with ways to handle such objects. But this is something to plan for future ags4 versions.

This speeds up utf-8 string iteration by remembering last requested char index and buffer offset. This seems to be the only solution in the absence of proper ScriptString iterators in script API.
Costs 4 extra bytes per managed String object.
I intentionally limit these 2 indexes to uint16_t (64k char/byte) to save bit of mem. On average Strings are not that long, in the worst case things may be sped up by splitting into substrings in the script.

This change does not impose any dependencies, does not change save format, and may be safely reverted anytime.
@ivan-mogilko ivan-mogilko force-pushed the 361--scriptstringperf5 branch from 2b42ffd to 1ae98d9 Compare October 26, 2023 14:28
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 26, 2023

So, I added a variant of Chars iteration speed up with 4 extra bytes per String (2 bytes for 2 uint16). This limits the speed up fix to 64k bytes, after which iteration becomes slower again.

Added comments in code, explaining why this is done and its limitation.

Updated fps test result:

Test 3.6.1 PR 2187
AppendChar 715 ~1440-1500*
Iterate 128 ~1600

Could someone help testing this PR to make sure that String functions work well?

@ericoporto
Copy link
Member

I can produce a test demo by Saturday using

  • my json script module
  • my tap script module

Essentially, run some unit tests for string correctness, run a performance comparison on my json parser (parse some json many times per frame and measure fps).

I don’t have an existing text heavy, regular AGS game right now.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Oct 26, 2023

Hm, I did not know there are unit tests, but been thinking that it would be good to have some "test scripts" repo in ags organization.

@ericoporto
Copy link
Member

ericoporto commented Oct 26, 2023

Oh, that would be great. Löve project has a Test Suite as a separate project, this would help us work separately from main AGS so it can start as a simple project - probably a single AGS game, without CI, something simple we will add things, and at some point it can mature to have automation or not, but right now just something we can explore as a smaller project.

Alternatively it can work also as just as a place to archive the test games we create for the PRs, maybe both.

ivan-mogilko added a commit that referenced this pull request Oct 28, 2023
The purpose is to at least keep string iteration relatively fast in ASCII game mode, for backwards compatibility.
The proper solution would be to use precalculated string length and use iterator, but that's not a task for 3.6.0 patches. See PR #2187 done for 3.6.1.

Also fixed String.Length for plugins in Unicode mode.
@ivan-mogilko ivan-mogilko merged commit c4a26ef into adventuregamestudio:master Oct 31, 2023
20 checks passed
@ivan-mogilko ivan-mogilko deleted the 361--scriptstringperf5 branch October 31, 2023 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
context: performance related to improving program speed or lowering system requirements context: script api
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve script's String performance by storing length along with data
3 participants