Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve XmlSerializationWriter.WriteTypedPrimitive #76436

Conversation

TrayanZapryanov
Copy link
Contributor

Fixes 76434

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 30, 2022
@ghost
Copy link

ghost commented Sep 30, 2022

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes 76434

Author: TrayanZapryanov
Assignees: -
Labels:

area-System.Xml

Milestone: -

@TrayanZapryanov
Copy link
Contributor Author

TrayanZapryanov commented Sep 30, 2022

I've created following benchmark to validate benefits:

[BenchmarkCategory(Categories.Libraries)]
public class Perf_XmlSerializationWriter
{
private const int Iterations = 1000;

private readonly MyXmlSerializationWriter _writer = new MyXmlSerializationWriter();

private static readonly DateTime Now = new DateTime(2022, 9, 30, 9, 4, 15, DateTimeKind.Utc);
private static readonly DateTimeOffset DtoNow = Now.AddDays(1);
private static readonly TimeSpan Ts = new TimeSpan(1, 2, 3, 4, 5);
private static readonly byte[] BArray = new byte[] { 33, 44, 55 };

[IterationSetup]
public void CleanWriter() => _writer.Clean();

[Benchmark(OperationsPerInvoke = Iterations)]
public int AddPrimitives()
{
	for (int i = 0; i < Iterations; i++)
	{
		_writer.Write('a');
		_writer.Write(123);
		_writer.Write(123.45m);
		_writer.Write(Now);
		_writer.Write(Ts);
		_writer.Write(DtoNow);
		_writer.Write((short)55);
		_writer.Write(2345324L);
		_writer.Write((sbyte)11);
		_writer.Write((ushort)34);
		_writer.Write((uint)4564);
		_writer.Write((ulong)456734767);
		_writer.Write((byte) 67);
		_writer.Write(BArray);
		_writer.Write(Guid.NewGuid());
	}

	return _writer.GetXmlLength();
}
}

internal class MyXmlSerializationWriter : XmlSerializationWriter
{
private readonly StringBuilder _builder = new StringBuilder();

public MyXmlSerializationWriter()
{
	Writer = new XmlTextWriter(new StringWriter(_builder));
}

protected override void InitCallbacks()
{
}

public void Write<T>(T value) => WriteTypedPrimitive(null, null, value, false);

public void Clean() => _builder.Clear();
public string GetXml() => _builder.ToString();
public int GetXmlLength() => _builder.Length;
}

And results from it:

Runtime = .NET 7.0.0 (7.0.22.42610), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 6.098 us, StdErr = 0.026 us (0.43%), N = 14, StdDev = 0.098 us
Min = 6.008 us, Q1 = 6.042 us, Median = 6.059 us, Q3 = 6.136 us, Max = 6.382 us
IQR = 0.093 us, LowerFence = 5.902 us, UpperFence = 6.276 us
ConfidenceInterval = [5.988 us; 6.209 us] (CI 99.9%), Margin = 0.111 us (1.81% of Mean)
Skewness = 1.62, Kurtosis = 5.18, MValue = 2
-------------------- Histogram --------------------
[5.991 us ; 6.220 us) | @@@@@@@@@@@@@
[6.220 us ; 6.435 us) | @

// * Summary *

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 10 (10.0.19043.2006/21H1/May2021Update)
11th Gen Intel Core i9-11900K 3.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-rc.1.22431.12
[Host] : .NET 7.0.0 (7.0.22.42610), X64 RyuJIT AVX2
Job-ACUTDB : .NET 7.0.0 (7.0.22.42610), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=1 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1

Before
| Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
|-------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| AddPrimitives | 6.424 us | 0.0916 us | 0.0812 us | 6.420 us | 6.312 us | 6.581 us | 987 B |

After
| Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
|-------------- |---------:|---------:|---------:|---------:|---------:|---------:|----------:|
| AddPrimitives | 8.160 us | 3.973 us | 4.575 us | 11.67 us | 2.127 us | 12.29 us | 363 B |

Unfortunately I see small regression here :(

The problem looks more into benchmark - once I've increase iterations(to 10k) I can see different results:
Before

Method Mean Error StdDev Median Min Max Gen 0 Allocated
AddPrimitives 2.032 us 0.0286 us 0.0223 us 2.040 us 1.985 us 2.061 us 0.1000 984 B

After

Method Mean Error StdDev Median Min Max Allocated
AddPrimitives 1.960 us 0.0249 us 0.0194 us 1.954 us 1.935 us 2.007 us 360 B

@TrayanZapryanov
Copy link
Contributor Author

@eiriktsarpalis , @krwq Any thoughts here ?

@krwq
Copy link
Member

krwq commented Oct 24, 2022

@TrayanZapryanov really sorry for delay, I've been quite busy in the last weeks

@TrayanZapryanov
Copy link
Contributor Author

@krwq Can you check again?
Also I have one question about WriteRaw(char[]) or WriteChars().

@krwq
Copy link
Member

krwq commented Jan 19, 2023

cc: @HongGit


char[] buffer = _primitivesBuffer ??= new char[128];
switch (Type.GetTypeCode(t))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ArrayPool<char>.Shared.Rent(128); and returning at the end be better? I think this class can be used by multiple threads simultaneously if you have more than 1 thread serializaing concurrently using the same serializer instance. This shared buffer will become corrupted in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mconnew You are right for synchronization problems and I followed your suggestion and used ArrayPool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mconnew Unfortunately this renting decreased performance even worse than before. Here are samples using benchmark from here : dotnet/performance#2623

Before:

Method Mean Error StdDev Median Min Max Gen0 Allocated
AddPrimitives 1.878 us 0.0175 us 0.0163 us 1.876 us 1.857 us 1.908 us 0.0900 792 B

After:

Method Mean Error StdDev Median Min Max Gen0 Allocated
AddPrimitives 2.192 μs 0.0433 μs 0.0384 μs 2.177 μs 2.155 μs 2.275 μs 0.0400 360 B

There is definitely win in memory, but regression in execution.
I've looked at where XmlSerializationWriter is instantiated and found only here :

public void Serialize(XmlWriter xmlWriter, object? o, XmlSerializerNamespaces? namespaces, string? encodingStyle, string? id)
.
Looks like it always creates new instance and there is method "Init" which also implies that this class have state and cannot be used concurrently.
Do you remember where this class is used concurrently ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same benchmark, but with previous implementation

Method Mean Error StdDev Median Min Max Gen0 Allocated
AddPrimitives 1.906 us 0.0209 us 0.0185 us 1.908 us 1.855 us 1.936 us 0.0400 360 B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using a pre-generated serializer, it provides an instance derived from XmlSerializationWriter which I think is cached and reused with multiple calls to Serialize. There's also the reflection based serialization used when it's using SOAP mapping which might not use a new instance each time. The allocation is small, what's the cost of allocating it fresh each time? You could also implement it in such a way that it's optimal for single usage. Eg have a byte[] field and use Interlocked.Exchange replacing it with null to "rent" the buffer. If another thread is using it, you will get a null back from the exchange so you create a new array locally. When you are done with it, use Interlocked.Exchange to put it back. This avoids the overhead of renting, will have zero allocation in the common case, and degrades to something no worse than it was originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mconnew I like your idea and pushed new commit with it. Could you check again ?
Also results now are better then before:

Method Mean Error StdDev Median Min Max Gen0 Allocated
AddPrimitives 1.969 us 0.0351 us 0.0293 us 1.962 us 1.942 us 2.047 us 0.0400 360 B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mconnew One more question. I create char[128] which might be too big. Any suggestion here? When I execute tests from Performance repo I see increase of bytes allocated(maximum 8), but I really hoped to see some reduction :(.
Should I increase code complexity by starting with lower buffer like char[64] and then to inspect TryFormat methods and retry with bigger buffer, or we can leave this regression for small number of primitives, and have bigger wins when xml produces is much bigger ?
Here are results from XML performance comparison tests:

Statistics

Total: 68
Same: 75.00 %
Slower: 5.88 %
Faster: 10.29 %
Noise: 8.82 %
Unknown: 0.00 %

Statistics per Architecture

Architecture Same Slower Faster Noise Unknown
X64 75.00 % 5.88 % 10.29 % 8.82 % 0.00 %

Statistics per Operating System

Operating System Same Slower Faster Noise Unknown
Windows 10 75.00 % 5.88 % 10.29 % 8.82 % 0.00 %

Statistics per Namespace

Namespace Same Slower Faster Noise Unknown
MicroBenchmarks.Serializers 78.12 % 9.38 % 12.50 % 0.00 % 0.00 %
Microsoft.Extensions.Configuration.Xml 75.00 % 0.00 % 25.00 % 0.00 % 0.00 %
System.Xml.Linq 73.68 % 0.00 % 0.00 % 26.32 % 0.00 %
System.Xml.Tests 66.67 % 16.67 % 16.67 % 0.00 % 0.00 %
XmlDocumentTests.XmlDocumentTests 66.67 % 0.00 % 33.33 % 0.00 % 0.00 %
XmlDocumentTests.XmlNodeListTests 100.00 % 0.00 % 0.00 % 0.00 % 0.00 %
XmlDocumentTests.XmlNodeTests 50.00 % 0.00 % 0.00 % 50.00 % 0.00 %

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought of a way to eliminate the allocation. You could use stackalloc to allocate the memory on the stack. The only problem with this is that XmlWriter needs a char[] passed which means you can't assign the stack allocated memory to a Span<char> initially, we need it as an array. marking the method as unsafe. WriteTypedPrimitive is a protected method on an existing public type. I don't know if adding the unsafe keyword is a breaking change. This is easily solved by moving the implementation to an inner private method which is marked unsafe and calling that from this method. So something like this:

protected void WriteTypedPrimitive(string? name, string? ns, object o, bool xsiType)
{
    UnsafeWriteTypedPrimitive(name, ns, o, xsiType);
}

private void UnsafeWriteTypedPrimitive(string? name, string? ns, object o, bool xsiType)
{
    char* buffer = stackalloc char[128];
    Span<char> bufferSpan = buffer;
    // rest of implementation
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mconnew Unfortunately I am not so good with pointer math.
How can I convert char* to char[]?
I cannot see any method of XmlWriter which accepts char*.
In XmlUtf8RawTextWriter I can see that there is iteration using char*, but that's all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this some more and realized multiple methods can't be called on different threads concurrently. The class wraps the XmlWriter (stored in the _w field) which means the thread usage must match that of XmlWriter. And as you can only have one thread at a time trying to write to an XmlWriter, only one thread at a time will be calling methods on this class. It's safe to have a single instance of the buffer which exists for the lifetime of this class.
I tried to work out a way to cast a char* to a char[] and there's no good clean way to do it.

@TrayanZapryanov
Copy link
Contributor Author

Code-wise looks good to me but I question if value of this optimization outweights risk of regression and added code complexity. I'll let owners decide if they want to take this. XML part looks correct

@mconnew, @krwq Next changes :

@@ -38,6 +38,9 @@ public abstract class XmlSerializationWriter : XmlSerializationGeneratedCode
private bool _soap12;
private bool _escapeName = true;

//char buffer for serializing primitive values
private char[]? _primitivesBuffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mconnew mentioned, it should be safe to keep one buffer alongside the reference to a single instance of XmlWriter (aka, _w on line 27). Calling into here simultaneously would already be asking for trouble racing to the _w.Write*() calls at the end of the method.

Suggested change
private char[]? _primitivesBuffer;
private char[] _primitivesBuffer = new char[128];

And then use this buffer through the rest of WriteTypedPrimitive without the need for any interlocked exchange.

Copy link
Contributor Author

@TrayanZapryanov TrayanZapryanov Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephenMolloy, @mconnew
I will remove it, but have two questions:

  • char[128] might be too big - Do you see any primitive type that cannot fit in char[64] for example?
  • If you check next changes, we will need same buffer in other classes that inherits it. Should we make it protected with this PR or later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have following stopper if we cannot format some value in this buffer:
image

Maybe this is enough to experiment with 64, or if you prefer I can replace Debug.Assert with throw exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I set it to char[64]

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 23, 2023
@StephenMolloy
Copy link
Member

Test failure appears to be unrelated. #64227

@StephenMolloy StephenMolloy merged commit b79adc5 into dotnet:main Apr 1, 2023
@TrayanZapryanov TrayanZapryanov deleted the improve_xmlserializatiowriter_writetypedprimitive branch April 1, 2023 15:44
@ghost ghost locked as resolved and limited conversation to collaborators May 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Serialization community-contribution Indicates that the PR has been added by a community member
Projects
None yet
4 participants