Performance optimizations for EscapingUtilities#13426
Performance optimizations for EscapingUtilities#13426DustinCampbell wants to merge 28 commits intomainfrom
Conversation
- Convert Fact-based tests to Theory with InlineData - Replace xUnit assertions with Shouldly - Remove empty XML doc comments and #nullable disable - Use expression-bodied test methods
Cover UnescapeAll, Escape, EscapeWithCaching, ContainsEscapedWildcards, and round-trip scenarios with MemoryDiagnoser to establish a performance baseline before optimization.
Many thanks to @JeremyKuhne for providing this BufferScope<T> implementation.
- Make public-facing methods `public` (was `internal`) - Enable nullable reference types; add `[return: NotNullIfNotNull]` to `UnescapeAll` and `Escape` - Use file-scoped namespace - Merge `Escape` + `EscapeWithCaching` into `Escape(string? value, bool cache = false)` - Inline `AppendEscapedChar` and `AppendEscapedString` helpers into `Escape` - Rename field `s_unescapedToEscapedStrings` → `s_escapedStringCache` - Rename parameters to `value` consistently across all public methods - Rename local variables for clarity (`percentIndex`, `startIndex`, `endIndex`, `hi`/`lo`, `specialCharIndex`, etc.) - Use collection expression for `s_charsToEscape`; target-typed `new` for cache dictionary - Use expression bodies for simple members (`HexDigitChar`, `Escape`, etc.) - Use relational pattern matching in `TryDecodeHexDigit`; list patterns in `ContainsEscapedWildcards` - Use `do...while` in `UnescapeAll` since `percentIndex` is already known non-negative on entry - Replace `ch / 0x10` with `ch >> 4` in nibble extraction - Clean up XML doc comments throughout - Remove stale and redundant inline comments
- Address unnecessary allocation when a string contains a '%' but no valid escape sequence. - Add test for edge case when a string contains a '%' but no valid escape sequence and 'trim' is set to true. In that case, the trimmed string should still be returned.
- Add System.Buffers.SearchValues<char> field initialized from s_charsToEscape (#if NET) - Extract IndexOfAnyEscapeChar helper to abstract the platform difference, keeping Escape itself free of #if guards - Fall back to char[] / IndexOfAny on .NET Framework
- All chars in s_charsToEscape fall within the ASCII range ['$' (0x24) .. '@' (0x40)] - Encode membership as a 29-bit uint bitmask indexed by (c - '$') - Replace O(n×k) IndexOfAny array scan with an O(n) range check + bit test per char - Eliminates managed-to-native transition overhead on each IndexOfAnyEscapeChar call - No change on .NET (SearchValues path unchanged)
- Capture the result of the initial IndexOfAnyEscapeChar(value) fast-path check rather than discarding it and repeating the same scan at the start of the loop - Switch from while(true)/break to do...while since specialCharIndex >= 0 is established before entering the loop, eliminating a redundant branch per iteration
Collect all special char positions in a first pass using RefArrayBuilder<int>, compute the exact output length, then write directly into a freshly-allocated string with no intermediate buffer:
- On .NET: string.Create writes directly into the output string without an intermediate buffer
- On .NET Framework: new string('\0', length) + Buffer.MemoryCopy via unsafe fixed pointers for fast, native-speed chunk copies
- Extract cache operations into TryGetFromCache and AddToCache helpers
- Remove StringBuilderCache dependency from Escape entirely
- Compute trim bounds (startIndex/endIndex) before searching for '%' so the initial scan and inner-loop scan skip leading/trailing whitespace - Scope the inner-loop IndexOf to [percentIndex+1, endIndex) to match - Extract GetDefaultResult static local to deduplicate the no-escape- sequences return path (used in both the early-exit and sb-is-null cases)
- Add HexConverter with a 256-entry hex digit table (ReadOnlySpan<byte> on .NET; static byte[] on Framework — the Framework JIT can't elide bounds checks on ReadOnlySpan<byte> like the .NET Core JIT can) - TryDecodeHexDigit delegates to HexConverter.FromChar
e06a889 to
2ff888d
Compare
There was a problem hiding this comment.
Pull request overview
This pull request refactors Microsoft.Build.Shared.EscapingUtilities hot paths (Escape, UnescapeAll) and adds supporting low-allocation infrastructure (pooled buffer + ref-struct array builder) plus benchmarks and expanded unit tests to validate the new behavior and performance characteristics.
Changes:
- Rewrite
EscapingUtilities.EscapeandEscapingUtilities.UnescapeAllto reduce allocations and improve throughput (including new fast paths and cache-aware escape). - Add new low-allocation helper types (
BufferScope<T>,RefArrayBuilder<T>,HexConverter,TypeInfo<T>) plus polyfills needed for newer language/runtime features across TFMs. - Add/expand benchmarks and unit tests for escaping/unescaping and the new infrastructure.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/EscapingUtilities.cs | Rewrites escape/unescape implementations for fewer allocations and faster scanning/encoding. |
| src/Framework/Utilities/BufferScope.cs | Adds pooled-buffer ref struct to support temporary storage with minimal allocations. |
| src/Framework/Collections/RefArrayBuilder.cs | Adds ref-struct builder over BufferScope<T> for efficient index collection / array building. |
| src/Framework/Utilities/HexConverter.cs | Adds lookup-table hex decoding helper used by unescape fast path. |
| src/Framework/Utilities/TypeInfo.cs | Adds type inspection helper used to decide whether to clear pooled arrays on return. |
| src/Framework/Polyfills/UnscopedRefAttribute.cs | Adds polyfill / forwarder for UnscopedRefAttribute across target frameworks. |
| src/Framework/Polyfills/StringExtensions.cs | Adds string null/empty/whitespace extension helpers used by updated code paths. |
| src/MSBuild.Benchmarks/EscapingUtilitiesBenchmark.cs | Adds BenchmarkDotNet coverage for Escape, UnescapeAll, and related helpers. |
| src/Framework.UnitTests/EscapingUtilities_Tests.cs | Expands and modernizes escaping/unescaping tests (Shouldly + theories). |
| src/Framework.UnitTests/BufferScopeTests.cs | Adds test coverage for pooled buffer behavior, growth, pinning, and disposal. |
| src/Framework.UnitTests/RefArrayBuilder_Tests.cs | Adds extensive coverage for builder operations (add/insert/remove/range/etc.). |
| src/Framework.UnitTests/TypeInfoTests.cs | Adds coverage for TypeInfo<T>.IsReferenceOrContainsReferences() behavior and caching. |
| src/Utilities/TaskItem.cs | Updates call site to use Escape(..., cache: true) instead of EscapeWithCaching. |
| src/Shared/TaskParameter.cs | Updates call site to use Escape(..., cache: true) instead of EscapeWithCaching. |
| src/Build/BackEnd/TaskExecutionHost/TaskExecutionHost.cs | Updates call site to use Escape(..., cache: true); removes trailing whitespace. |
| src/Utilities.UnitTests/StringExtensions_Tests.cs | Disambiguates StringExtensions.Replace call site due to new Microsoft.Build.StringExtensions. |
|
Looks great. Trying to better understand the intent here. Did this path show up hot on a perf trace somewhere? |
I added some quick-and-dirty and found that these methods are called a lot. In my tests, I built a significant part of Roslyn and found 3,000,000+ invocations of these methods. I'm definitely intending to reduce the number of times these are called, but I decided to improve the performance of the methods themselves as a first step. |
Ensure that Insert(...) only takes the fast path if "_count < _scope.Length" to potentially shifting past the end of the scope. Added tests to cover this issue, which is more likely to appear with a stack-allocated buffer is used, since a rented array might actually be larger than the requested minimum length.
|
Sweet. Are these changes already noticeable end-to-end in the roslyn build (the eval part of it)? |
What metric did you mean? Obviously, this change is reducing milliseconds and lowering allocations. Many such changes in aggregate would likely show wall clock improvements. |
I missed adding string resources to SR.resx when cherry-picking RefArrayBuilder from another local branch.
|
/review |
|
✅ Expert Code Review (command) completed successfully! Correctness review complete. All edge cases traced — no concrete failing scenario found. Dimension is LGTM. |
Expert Code Review — 24-Dimension AnalysisExcellent performance work! The algorithmic improvements are sound — bitmask scan, two-pass direct allocation, lazy StringBuilder, and Dimension Summary
Key Findings
|
There was a problem hiding this comment.
Performance & Allocation Awareness
All other patterns are clean:
- Two-pass Escape (
string.Create/unsafe fixed) — exact-size allocation, no waste. ✅ - Lazy
StringBuilderinUnescapeAll— acquire/release pattern is correct;sbis only acquired on the first valid%XXhit, so purely-invalid or escape-free strings allocate nothing. ✅ IndexOfAnyEscapeChar—SearchValues<char>on .NET (SIMD), bitmask scan on .NET Framework. Both are tighter than the oldIndexOfAny(char[]). ✅HexConverter—ReadOnlySpan<byte>from PE data section on .NET (no allocation);static readonly byte[]on .NET Framework (single allocation, cached). ✅BufferScope.Dispose— clears only whenTypeInfo<T>.IsReferenceOrContainsReferences(), avoiding unnecessary zeroing for value types likeint. ✅
One issue raised inline: RefArrayBuilder(initialCapacity: 64) always rents from ArrayPool<int>.Shared even for the common case of 1–2 special chars. A stackalloc int[8] scratch buffer via the existing Span<T> constructor eliminates that overhead for the common case.
Generated by Expert Code Review (command) for issue #13426 · ● 26.6M
- Change TypeInfo<T> from public to internal - Remove unused "type" variable - Prefer null-coalescing assignment operator
Thanks for catching this! It's now fixed.
This is intentional. Because it is tricky to implement,
Fixed.
The initial capacity has been reduced from 64 to 16. However, passing |
|
@JanKrivanek: The expert code review feedback has been addressed. |
|
@JeremyKuhne: Here's the PR that includes |
Replace hand-written [DllImport] declarations and COM [ComImport]
interfaces with CsWin32 source-generated interop. This enables
Native AOT scenarios where built-in COM marshalling and
System.Runtime.InteropServices marshalling infrastructure do not
exist. The approach follows patterns established in WinForms, WPF,
and the dotnet/sdk repository.
CsWin32 is configured in Microsoft.Build.Framework with
allowMarshaling: false and useSafeHandles: false, producing raw
pointer signatures that are AOT-compatible. Other projects consume
generated types via InternalsVisibleTo.
Key changes:
Infrastructure:
- Add FEATURE_WINDOWSINTEROP compile-time gate in
Directory.BeforeCommon.targets, disabled for source-only builds
- Add CsWin32 (Microsoft.Windows.CsWin32) and PolySharp package
references to Framework.csproj
- Add NativeMethods.txt and NativeMethods.json for CsWin32
configuration
- Add .editorconfig suppression for CS3019 in Windows/ folders
- Add cswin32-interop skill document for agent guidance
New utility types:
- BufferScope<T>: ref struct for stack-allocated buffers with
ArrayPool fallback, replacing manual ArrayPool rent/return and
stackalloc patterns throughout interop code
- TypeInfo<T>: cached RuntimeHelpers.IsReferenceOrContainsReferences
polyfill for net472
- ComScope<T>: COM pointer lifetime management (using-disposable)
- ComClassFactory: AOT-compatible COM activation without
Activator.CreateInstance
- IID: generic IID lookup via IComIID interface
- VARIANT, BSTR, HRESULT, FILETIME extensions: CsWin32 partial
type augmentations for safe usage patterns
NativeMethods.cs cleanup:
- Delete ~130 hand-written constants, enums, structs, and
[DllImport] declarations replaced by CsWin32 typed equivalents
- Replace MemoryStatus class with MEMORYSTATUSEX struct via
TryGetMemoryStatus
- Replace GetCurrentDirectory/GetFullPath/GetShortPathName/
GetLongPathName with BufferScope-based implementations
- Replace StreamHandleType enum with bool useStandardError parameter
- Delete DirectoryExists/FileExists/FileOrDirectoryExists wrappers;
callers use PInvoke.GetFileAttributes directly
- Delete HResultSucceeded/HResultFailed; callers use HRESULT.Failed
- Add [UnsupportedOSPlatformGuard("windows")] to IsUnixLike
- Version all [SupportedOSPlatform] attributes to "windows6.1"
COM interop modernization:
- Replace [ComImport] WMI interfaces (IWbemLocator, IWbemServices,
IEnumWbemClassObject, IWbemClassObject) with struct-based COM
using delegate* unmanaged[Stdcall] vtables
- Add IDebugClient4-based command line retrieval as alternative to
WMI (CommandLineSource.DebugEngine)
- Move IFixedTypeInfo from NativeMethods.cs to its own file in Tasks
Platform guard pattern:
- Apply dual-guard pattern: #if FEATURE_WINDOWSINTEROP wraps runtime
IsWindows checks, eliminating dead code in source builds
- Use IsUnixLike (positive guard) instead of !IsWindows for CA1416
platform compatibility analysis
- Files excluded at project level via <Compile Remove> when
FeatureWindowsInterop is off (WindowsFileSystem, Windows/ folder,
WMI structs)
Behavioral changes:
- Unzip.cs: !IsWindows → IsUnixLike for correct CA1416 analysis
- ProcessExtensions: ArrayPool<T> replaced with BufferScope<T>
- AssemblyInformation/GlobalAssemblyCache: remove dead non-Windows
code paths in net472-only (#if !FEATURE_ASSEMBLYLOADCONTEXT)
blocks, which only run on Windows
Some related changes are also in dotnet#13426.
Replace hand-written [DllImport] declarations and COM [ComImport]
interfaces with CsWin32 source-generated interop. This enables
Native AOT scenarios where built-in COM marshalling and
System.Runtime.InteropServices marshalling infrastructure do not
exist. The approach follows patterns established in WinForms, WPF,
and the dotnet/sdk repository.
CsWin32 is configured in Microsoft.Build.Framework with
allowMarshaling: false and useSafeHandles: false, producing raw
pointer signatures that are AOT-compatible. Other projects consume
generated types via InternalsVisibleTo.
Key changes:
Infrastructure:
- Add FEATURE_WINDOWSINTEROP compile-time gate in
Directory.BeforeCommon.targets, disabled for source-only builds
- Add CsWin32 (Microsoft.Windows.CsWin32) and PolySharp package
references to Framework.csproj
- Add NativeMethods.txt and NativeMethods.json for CsWin32
configuration
- Add .editorconfig suppression for CS3019 in Windows/ folders
- Add cswin32-interop skill document for agent guidance
New utility types:
- BufferScope<T>: ref struct for stack-allocated buffers with
ArrayPool fallback, replacing manual ArrayPool rent/return and
stackalloc patterns throughout interop code
- TypeInfo<T>: cached RuntimeHelpers.IsReferenceOrContainsReferences
polyfill for net472
- ComScope<T>: COM pointer lifetime management (using-disposable)
- ComClassFactory: AOT-compatible COM activation without
Activator.CreateInstance
- IID: generic IID lookup via IComIID interface
- VARIANT, BSTR, HRESULT, FILETIME extensions: CsWin32 partial
type augmentations for safe usage patterns
NativeMethods.cs cleanup:
- Delete ~130 hand-written constants, enums, structs, and
[DllImport] declarations replaced by CsWin32 typed equivalents
- Replace MemoryStatus class with MEMORYSTATUSEX struct via
TryGetMemoryStatus
- Replace GetCurrentDirectory/GetFullPath/GetShortPathName/
GetLongPathName with BufferScope-based implementations
- Replace StreamHandleType enum with bool useStandardError parameter
- Delete DirectoryExists/FileExists/FileOrDirectoryExists wrappers;
callers use PInvoke.GetFileAttributes directly
- Delete HResultSucceeded/HResultFailed; callers use HRESULT.Failed
- Add [UnsupportedOSPlatformGuard("windows")] to IsUnixLike
- Version all [SupportedOSPlatform] attributes to "windows6.1"
COM interop modernization:
- Replace [ComImport] WMI interfaces (IWbemLocator, IWbemServices,
IEnumWbemClassObject, IWbemClassObject) with struct-based COM
using delegate* unmanaged[Stdcall] vtables
- Add IDebugClient4-based command line retrieval as alternative to
WMI (CommandLineSource.DebugEngine)
- Move IFixedTypeInfo from NativeMethods.cs to its own file in Tasks
Platform guard pattern:
- Apply dual-guard pattern: #if FEATURE_WINDOWSINTEROP wraps runtime
IsWindows checks, eliminating dead code in source builds
- Use IsUnixLike (positive guard) instead of !IsWindows for CA1416
platform compatibility analysis
- Files excluded at project level via <Compile Remove> when
FeatureWindowsInterop is off (WindowsFileSystem, Windows/ folder,
WMI structs)
Behavioral changes:
- Unzip.cs: !IsWindows → IsUnixLike for correct CA1416 analysis
- ProcessExtensions: ArrayPool<T> replaced with BufferScope<T>
- AssemblyInformation/GlobalAssemblyCache: remove dead non-Windows
code paths in net472-only (#if !FEATURE_ASSEMBLYLOADCONTEXT)
blocks, which only run on Windows
Some related changes are also in dotnet#13426.
Tip
I recommend reviewing this pull request commit-by-commit. I made sure that each commit is distinct, cohesive and has a detailed description.
This PR rewrites and optimizes
EscapingUtilities.EscapeandEscapingUtilities.UnescapeAll— two methods called heavily throughout MSBuild evaluation — to reduce allocations and improve throughput on both .NET and .NET Framework 4.7.2.The largest wins are in
Escape(especially the no-special-chars fast path) and strings with invalid escape sequences, with speed improvements of up to 3.3× on .NET 10.0 and 3.0× on .NET Framework 4.8.1, and theUnescapeAllallocation for strings with invalid escape sequences eliminated entirely on both runtimes.Highlights
🚀 Speed Improvements (.NET 10.0)
Escape_NoSpecialCharsEscape_FewSpecialCharsEscape_ManySpecialCharsUnescapeAll_InvalidEscapeSequencesEscapeWithCaching_FewSpecialCharsEscapeWithCaching_ManySpecialCharsRoundTrip_EscapeThenUnescape🧹 Allocation Reductions (.NET 10.0)
UnescapeAll_InvalidEscapeSequences📊 .NET Framework 4.8.1
UnescapeAll_InvalidEscapeSequencesEscape_NoSpecialCharsEscape_ManySpecialCharsEscape_FewSpecialCharsRoundTrip_EscapeThenUnescapeThe
UnescapeAll_InvalidEscapeSequencesallocation is also eliminated on .NET Framework (40 B → 0 B, 100% reduction).Detailed Summary of Changes
Infrastructure
BufferScope<T>: a new ref struct that manages a stack-allocated buffer withArrayPool<T>fallback for heap overflow, enabling low-allocation temporary storage.RefArrayBuilder<T>: a new ref struct for cheaply building arrays usingBufferScope<T>, used by the Escape two-pass algorithm.EscapingUtilities: BenchmarkDotNet benchmarks coveringEscapeandUnescapeAllacross typical MSBuild string patterns.EscapeoptimizationsSearchValues<char>on .NET: replaces theIndexOfAny(char[])scan with aSearchValues<char>-based scan, enabling vectorized SIMD search.IndexOfAnywith a bitmask scan on .NET Framework: all 9 escapable characters fall in the ASCII range['$'...'@'], so a 29-bit bitmask replaces the O(k) per-character array scan with a single range check + bit test.IndexOfAnyEscapeCharresult: avoids a redundant scan pass by feeding the initial hit directly into the collection loop.RefArrayBuilder<int>, then a second pass allocates a single exact-sized string (viastring.Createon .NET; unsafe fixed+Buffer.MemoryCopyon .NET Framework), eliminating theStringBuilderentirely.UnescapeAlloptimizationsStringBuilderlazily: theStringBuilderis only rented fromStringBuilderCachewhen the first decodable%XXsequence is actually found, so strings with%signs that aren't valid escape sequences return the original string with zero allocations.IndexOf('%')to the active trim window: when trim: true, the percent-sign search is bounded to[startIndex, endIndex)computed from the trimmed range, avoiding scanning whitespace that will be discarded.TryDecodeHexDigitarithmetic withHexConverterlookup table: delegates to the internalHexConverter.FromCharlookup table, replacing the multi-branch switch with a single table lookup.Test improvements
EscapingUtilities_Tests: migrated to[Theory]/[InlineData]withShouldlyassertions and removed redundant test helpers.UnescapeAll,UnescapeAll(trim: true),Escape, round-tripEscape↔Unescape, andContainsEscapedWildcards.EscapingUtilitiesin preparation for performance work: modernized XML docs, renamed parameters for clarity, and tightened nullability annotations ahead of the algorithmic changes.Benchmarks
Initial Results (baselines)
.NET 10.0
.NET Framework 4.8.1
Final Results
.NET 10.0
.NET Framework 4.8.1