Skip to content

CSHARP-5992: Add LINQ translation benchmark suite#2004

Open
adelinowona wants to merge 7 commits into
mongodb:mainfrom
adelinowona:csharp5992
Open

CSHARP-5992: Add LINQ translation benchmark suite#2004
adelinowona wants to merge 7 commits into
mongodb:mainfrom
adelinowona:csharp5992

Conversation

@adelinowona
Copy link
Copy Markdown
Contributor

@adelinowona adelinowona commented May 19, 2026

Summary

Adds a 10-benchmark LinqBench suite that exercises the LINQ-to-aggregation translation layer in isolation (no DB I/O), plus a small LinqEndToEnd suite that compares LINQ vs hand-built BsonDocument query plans end-to-end. Wires LinqBench into the perf-job category filter and the composite-score path. Designed to give us a defensible signal when the translator (especially SerializerFinder, AstSimplifier, and the various method-call sub-translators) moves.

Cross-commit runs against pre-/post-SerializerFinder and against an in-flight optimization PR (#1961) show the suite catches the kinds of regressions and improvements we'd want it to catch — see Validation below.

Motivation

The driver has spec-driven benchmarks for I/O-heavy patterns (Find, BulkWrite, GridFS, BSON encode/decode) but nothing for the LINQ translator. As internal code paths shift — SerializerFinder overhauls, AstSimplifier changes, new visitor support — we currently have no signal on translation-cost movement. When CSHARP-5572 introduced SerializerFinder (#1700), we had no way to quantify what it cost us. This PR closes that gap.

What this adds

File Purpose
benchmarks/MongoDB.Driver.Benchmarks/Linq/LinqTranslationBenchmark.cs 10-benchmark translation suite (filter / field / projection / update / IQueryable entry points)
benchmarks/MongoDB.Driver.Benchmarks/Linq/LinqEndToEndBenchmark.cs 12 LINQ-vs-raw end-to-end benchmarks (6 patterns × {LINQ, Raw}) against a local mongod
benchmarks/MongoDB.Driver.Benchmarks/Linq/README.md Suite-level docs: what each benchmark exercises, how to interpret, threshold story
benchmarks/MongoDB.Driver.Benchmarks/DriverBenchmarkCategory.cs New LinqBench const; LinqBench and BulkWriteBench added to AllCategories so composite scores are emitted
benchmarks/MongoDB.Driver.Benchmarks/BenchmarkResult.cs + Exporters/*.cs Composite-score labelling for LinqBench
evergreen/run-perf-tests.sh Adds LinqBench to the --anyCategories filter

Cedar/SPS auto-discovers new composite categories — no dashboard config required.

Design decisions

  • Translation-only as the primary suite. LinqTranslationBenchmark calls into translator entry points directly (LinqProviderAdapter.TranslateExpressionTo*, ExpressionToExecutableQueryTranslator.Translate). No DB execution. This isolates translator regressions from network and serialization noise. End-to-end coverage lives in LinqEndToEndBenchmark for visibility but is not what we'd alert on first.
  • Representative user queries, not a visitor matrix. An earlier proposal organized benchmarks by which SerializerFinderVisit*.cs file they exercised. Pushed back to "what users actually write" framing; matrix coverage remains an option for targeted gaps in the future.
  • LinqBench does not cross-tag with the spec categories (DriverBench, BsonBench, etc.). Keeps LINQ numbers out of the spec composite averages. LinqBench does land in AllCategories so its own composite is emitted.
  • BulkWriteBench composite addition bundled in this PR. Was previously excluded with a "not part of the benchmarking spec" comment. Team wanted its composite tracked too; doing it here avoids a tiny standalone PR.
  • [MemoryDiagnoser] on everything. Allocation regressions matter independently of time and surface earlier than time regressions on noisy hardware.

Benchmark inventory

Translation suite (10 benchmarks)

Filters (4)TranslateExpressionToFilter:

Benchmark Pattern Translator path
MultiFieldSearch Status == s && CustomerName.StartsWith(prefix) && ShippingAddress.City == city && CreatedAt > cutoff && !IsPaid And → Comparison, MethodCall (StartsWith), nested MemberAccess, Not + boolean MemberAccess
OrFilter 4-way == OR with literal constants Or → Comparison. No closures — fastest filter, most sensitive to small regressions
BatchLookup ids.Contains(x.Id) MethodCall → ContainsMethodToFilterTranslator → $in
ArrayElementQuery x.Items.Any(i => i.Price > t) MethodCall → AllOrAnyMethodToFilterTranslator → $elemMatch, @<elem> symbol

Field (1)TranslateExpressionToField:

Benchmark Pattern Notes
FieldSelection x => x.Items[0].ProductId GetItemMethodToFilterFieldTranslator path. List<T>[0] is a MethodCallExpression in C#, not IndexExpression.

Projections (2)TranslateExpressionToProjection:

Benchmark Pattern Notes
AggregationProjection 4-field DTO with computed Subtotal + Tax - Discount and Items.Select(i => i.ProductId) Full ExpressionToAggregationExpressionTranslator + AstSimplifier
ProjectionSentinel x => x Fast-path early return. Sentinel for breakage detection, not for translator perf

Update (1)TranslateExpressionToSetStage:

Benchmark Pattern Notes
UpdatePipeline Sets 3 fields including computed expression ExpressionToSetStageTranslator, MemberInit pattern matching

IQueryable (2)ExpressionToExecutableQueryTranslator.Translate:

Benchmark Pattern Notes
QueryablePipeline Where → Select → OrderBy → Take Filter + projection + sort + limit stages
GroupByAggregation GroupBy → Select with Count + Sum GroupByMethodToPipelineTranslator, $group, IGroupingSerializer, accumulators

End-to-end suite (12 benchmarks)

Six patterns (MultiFieldSearch, OrFilter, GroupBy, Projection, InFilter, PagedQuery), each run twice: once written in LINQ, once as a hand-built BsonDocument / pipeline. Seeds 500 documents plus secondary indexes on Status, CreatedAt, and ShippingAddress.City into a local mongod in [GlobalSetup]. The LINQ and Raw versions of each pair render to byte-equivalent BSON (verified via LinqProviderAdapter), so LINQ−Raw delta isolates translator + provider overhead from query-shape differences. Useful for the "how much of the user-visible latency is translation?" question. Not part of the regression-alert path.

Validation

Five lines of evidence that the suite produces actionable signal.

1. Within-run noise

Across all perf-hw runs (n=10 each, multiple commits), BDN-reported within-run StdDev is <1% on every non-sentinel benchmark (typically 0.2-0.9%). The ProjectionSentinel and FieldSelection micro-benchmarks land at 0.3-1.4%. Each individual run is a well-converged measurement.

2. Selectivity — targeted regression injection

Thread.SpinWait(300) (~10 µs on M1) injected into four specific translator code paths in turn:

Injection point Expected affected benchmarks
SerializerFinder.FindSerializers() All non-sentinel (universal path)
GetItemMethodToFilterFieldTranslator FieldSelection only
NotExpressionToFilterTranslator MultiFieldSearch (Not), OrFilter (chains Comparison dispatch)
GroupByMethodToPipelineTranslator GroupByAggregation only

Result: each injection moved only the benchmarks that should have moved. ProjectionSentinel stayed flat across all injections. The suite has clean per-path selectivity — when something regresses, the benchmarks that move tell you which translator moved.

3. Cross-run drift on the actual perf-job hardware (n=10 on rhel90-dbx-perf-large)

Submitted via evergreen patch against test-csharp-spec-benchmarks with the runner looping 10× through --anyCategories "LinqBench". .NET 8.0, X64 RyuJit. All runs in a single perf-task invocation on the same host.

Benchmark Min Median Max Range% CV% Within-run StdDev%
MultiFieldSearch 494.0 µs 497.2 µs 507.8 µs 2.8% 1.03% 0.23%
OrFilter 13.3 µs 13.5 µs 13.7 µs 3.1% 1.01% 0.23%
BatchLookup 144.9 µs 145.8 µs 148.1 µs 2.2% 0.74% 0.88%
ArrayElementQuery 177.4 µs 179.4 µs 181.4 µs 2.2% 0.77% 0.51%
FieldSelection 5,610 ns 5,795 ns 6,025 ns 7.1% 2.30% 0.32%
AggregationProjection 406.5 µs 412.6 µs 420.1 µs 3.3% 1.07% 0.47%
ProjectionSentinel 36.18 ns 37.22 ns 38.17 ns 5.3% 1.47% 0.54%
UpdatePipeline 95.5 µs 96.3 µs 98.4 µs 3.0% 1.06% 0.59%
QueryablePipeline 874.5 µs 890.7 µs 913.6 µs 4.4% 1.38% 0.52%
GroupByAggregation 489.7 µs 495.1 µs 506.6 µs 3.4% 0.94% 0.77%

All non-sentinel benchmarks land in 2-4.5% range, CV ≤1.5%. That's a ~5× compression of the M1 drift bands we'd characterized earlier (which sat at 12-38% range on the heavier benchmarks). Only FieldSelection (~6µs benchmark, ~7% range) and ProjectionSentinel (~37 ns, ~5% range) drift wider — both are fast enough that small absolute drift looks proportionally large.

Two consequences:

  • Thresholds can be 5-10× tighter on perf-hw than on M1 (see below).
  • Sub-10% deltas are now individually resolvable. That matters for Set B (in-flight optimization) where M1 noise swamped the signal.

For reference, M1 Max characterization on the same suite (n=7) showed ranges of 5-30%, with the heavier IQueryable benchmarks at CV 8-11%. M1 numbers are kept as a development-machine guide; perf-hw numbers are what we'd actually alert on.

4. Cross-commit reality check on perf-hardware — does the suite catch real translator changes?

The suite was transplanted onto pinned commits and run on rhel90-dbx-perf-large (n=10 per commit). Set A captures a known historical regression (the SerializerFinder introduction in #1700). Set B captures an in-flight optimization (PR #1961). Together these answer "does the suite detect what we want it to detect?"

Set A — SerializerFinder cost-of-introduction. Parent of #1700 (46640eac98) vs the #1700 merge (59c9d34180). Both commits are from 2026-02-09; the only meaningful diff between them is the SerializerFinder introduction.

Benchmark Pre-SF median Post-SF median Δ time Pre-SF alloc Post-SF alloc Δ alloc
MultiFieldSearch 458.8 µs 497.8 µs +8.5% 24.7 KB 27.1 KB +10.0%
OrFilter 5,245 ns 13,100 ns +150.7% 4.2 KB 5.5 KB +31.3%
BatchLookup 120.0 µs 146.4 µs +22.0% 7.6 KB 11.5 KB +50.6%
ArrayElementQuery 150.2 µs 179.5 µs +19.5% 8.7 KB 12.0 KB +38.8%
FieldSelection 1,989 ns 5,644 ns +183.8% 1.5 KB 2.3 KB +49.7%
AggregationProjection 141.5 µs 408.1 µs +188.3% 27.1 KB 67.0 KB +147.5%
ProjectionSentinel 37.5 ns 40.3 ns +7.5% 32 B 32 B
UpdatePipeline 90.6 µs 658.0 µs +626.1% 9.1 KB 91.2 KB +897.7%
QueryablePipeline 350.3 µs 880.7 µs +151.4% 33.8 KB 105.6 KB +212.2%
GroupByAggregation 158.3 µs 489.3 µs +209.1% 28.7 KB 75.1 KB +161.8%

Every non-sentinel benchmark moves above the 2-7% drift band. ProjectionSentinel correctly stays close to flat (the fast-path detection bypasses SerializerFinder entirely). The smallest non-sentinel delta is MultiFieldSearch at +8.5%/+10.0%, which still clears its 2.8% drift band cleanly. Most benchmarks cluster around 2-3× regressions, consistent with adding a per-translation serializer-discovery pass over the full expression tree.

UpdatePipeline is a structural outlier (+626% time, +898% allocation — 9 KB → 91 KB), and the suite revealed why. Reading the two commits side by side: TranslateExpressionToFilter, TranslateExpressionToProjection, and ExpressionToExecutableQueryTranslator.Translate all preprocess the lambda once at the top (LinqExpressionPreprocessor.Preprocess(expression)) and then let TranslationContext.Create run SerializerFinder on the canonicalized tree. TranslateExpressionToSetStage does it the other way around — it runs SerializerFinder on the raw lambda first and only preprocesses each field's value expression later, inside ExpressionToSetStageTranslator.TranslateNewWithOptionalMemberInitializers. Since SerializerFinder is a fixed-point visitor that loops until it stops making progress, running it over an un-preprocessed tree (partial-evaluated constants not collapsed, CLR-compat rewrites not applied) costs disproportionately more passes for expressions like our Total = x.Subtotal + x.Tax - x.Discount arithmetic chain. The same factor shows up on M1 (+735%) and perf-hw (+626%), ruling out machine-specific artifacts. This is a real driver asymmetry — follow-up ticket noted below. (Note that the design isn't a bug: SetStage's dispatch pattern-matches on body is NewExpression | MemberInitExpression, which PartialEvaluator would collapse if applied at the top. Any fix has to preserve that dispatch shape.)

Set B — In-flight optimization PR #1961 (SerializerFinderVisitMethodCall switch → MethodInfo-keyed lookup table, plus a follow-up Lookup performance optimization commit). Base (66780341e7 = current main) vs head (54973d039a). Same perf-task host, runs interleaved with Set A.

Benchmark B1 (main) median B2 (PR head) median Δ time B1 alloc B2 alloc Δ alloc
MultiFieldSearch 497.2 µs 495.0 µs 27.1 KB 27.1 KB
OrFilter 13.5 µs 13.3 µs -1.5% (noise) 5.5 KB 5.5 KB
BatchLookup 145.8 µs 145.0 µs -0.6% (noise) 11.5 KB 11.1 KB -3.3%
ArrayElementQuery 179.4 µs 176.5 µs -1.6% (noise) 12.0 KB 10.5 KB -12.4%
FieldSelection 5,795 ns 5,612 ns -3.2% (noise) 2.3 KB 2.3 KB
AggregationProjection 412.6 µs 409.9 µs -0.6% (noise) 67.5 KB 67.3 KB
ProjectionSentinel 37.22 ns 36.72 ns -1.3% (noise) 32 B 32 B
UpdatePipeline 96.3 µs 96.7 µs 10.2 KB 10.2 KB
QueryablePipeline 890.7 µs 868.0 µs -2.6% 106.3 KB 101.6 KB -4.5%
GroupByAggregation 495.1 µs 483.0 µs -2.4% 75.4 KB 72.9 KB -3.3%

Honest reading: at perf-hw resolution, the PR delivers measurable allocation wins on IQueryable benchmarks (ArrayElementQuery -12.4%, QueryablePipeline -4.5%, BatchLookup -3.3%, GroupByAggregation -3.3%) — all of which walk method-call subtrees that SerializerFinderVisitMethodCall dispatches through. The two largest time deltas (QueryablePipeline -2.6%, GroupByAggregation -2.4%) are at the edge of their respective 2.9-3.4% drift bands — directionally consistent but not slam-dunk on a single n=10 sample. Everything else sits inside noise.

The takeaway for the suite isn't "PR #1961 is great" or "PR #1961 is marginal" — it's that the suite resolves what kind of improvement this is: a real allocation reduction in the method-call dispatch path, with smaller time effects. That distinction was invisible at M1 noise levels (where everything looked like noise or a 5% time improvement). At perf-hw, allocation deltas are the cleaner metric for changes this size.

5. End-to-end overhead — Atlas dev cluster, perf-hardware, n=10

To answer "how much of user-visible LINQ latency is translator cost?", the e2e suite runs six patterns twice each (LINQ-translated vs hand-built BsonDocument / pipeline) against a live Atlas dev cluster from the perf host. 500 documents seeded per run with secondary indexes on Status, CreatedAt, and ShippingAddress.City; dropped at cleanup. Each LINQ-Raw pair renders to byte-equivalent BSON — verified by running each LINQ expression through LinqProviderAdapter and diffing the resulting filter/pipeline — so LINQ−Raw delta reflects translator and provider overhead, not query-shape differences.

Pattern LINQ median Raw median LINQ−Raw LINQ/Raw Translator share LINQ alloc Raw alloc Alloc ratio
MultiFieldSearch 1,966 µs 1,185 µs +782 µs 1.66× 39.7% 74.2 KB 43.6 KB 1.70×
OrFilter 7,321 µs 7,230 µs +91 µs 1.01× 1.2% 719.1 KB 711.1 KB 1.01×
GroupBy 2,034 µs 1,476 µs +558 µs 1.38× 27.4% 65.4 KB 20.6 KB 3.18×
Projection 2,160 µs 1,051 µs +1,109 µs 2.05× 51.3% 124.8 KB 35.5 KB 3.52×
InFilter 1,100 µs 784 µs +315 µs 1.40× 28.7% 41.5 KB 28.8 KB 1.44×
PagedQuery 1,645 µs 1,213 µs +432 µs 1.36× 26.3% 63.3 KB 34.8 KB 1.82×

Translator share is the fraction of user-visible LINQ time that disappears if you write raw BsonDocument instead — an upper bound on translator cost (the LINQ delta also includes provider overhead like cursor construction and command serialization).

  • Projection-heavy patterns (Projection 51%, MultiFieldSearch 40%): translator is ~40-50% of user-visible latency on indexed selective queries. Projection is the upper-bound case — LINQ has to translate a Where filter and construct a serializer for the anonymous-type projection. A 10% translator regression here is a ~5% user-visible regression.
  • Filter / aggregation patterns (GroupBy 27%, InFilter 29%, PagedQuery 26%): translator is ~25-30%. Each does meaningful server-side work ($group, $in, sort-skip-limit) so server time partially offsets translator overhead.
  • Broad scans (OrFilter 1.2%): translator is ~1% because the 4-way OR matches a large fraction of documents (no index helps), and the result set serializes ~700 KB. A 10% translator regression here is invisible to users.
  • Allocation ratios highlight LINQ-side overhead independently of time: Projection allocates 3.5× more in LINQ (125 KB vs 36 KB), GroupBy 3.2× more (65 KB vs 21 KB). Projected-type-serializer and IGroupingSerializer construction is allocation-heavy; this is the metric that would catch translator-side allocation regressions even when network time masks the time delta.

Caveat: Atlas dev cluster across the internet from the perf host. Per-run range on individual benchmarks is wide — typically ±15-30%, up to ±50% on patterns where server time dominates (GroupBy Raw, Projection Raw). The translator-share ratios are more stable than the absolute times because LINQ and Raw runs on the same iteration see correlated network noise. Single-batch result, 500 docs — multi-batch cursor iteration may behave differently.

An earlier iteration of this suite (v1, 3 patterns, no secondary indexes, partial LINQ-Raw shape divergences) reported different shares for the overlapping patterns (OrFilter 4.5%, GroupBy 32.6%). The shifts trace to v1's biases — missing indexes inflated server time on selective queries (lowering apparent translator share) and shape inequivalences in MultiFieldSearch and GroupBy Raw added non-translator contributions to LINQ−Raw delta. The v2 numbers above are the defensible ones.

Regression-alert thresholds (perf-hardware-calibrated)

Calibrated to observed drift on rhel90-dbx-perf-large:

Bucket Threshold Benchmarks Rationale
Tight 8% MultiFieldSearch, BatchLookup, ArrayElementQuery, AggregationProjection, UpdatePipeline, QueryablePipeline, GroupByAggregation, OrFilter Observed range 2.2–4.4%. 8% gives ~2× headroom over the worst-observed drift.
Wider 12% FieldSelection ~6µs benchmark; observed range 7.1%.
Sentinel 15% ProjectionSentinel ~37 ns benchmark; observed range 5.3%. Meant to catch fast-path breakage, not measure translator perf.

Allocation thresholds should be even tighter — observed allocation drift is 0-1.2% across all benchmarks. A 5% allocation threshold would catch real allocation regressions while ignoring observed noise.

These replace the M1-calibrated 15%/30%/100% bands used during development.

Follow-ups (not in this PR)

  • File a ticket to investigate the TranslateExpressionToSetStage preprocessing asymmetry surfaced by Set A. The current design (Preprocess each field's value individually, after dispatching on the un-preprocessed body shape) is intentional — PartialEvaluator would collapse MemberInitExpression if applied at the top — but it causes SerializerFinder to do disproportionate work on arithmetic update expressions. The fix is likely either a partial preprocess that preserves dispatch shape, or a SerializerFinder optimization on un-preprocessed trees. Not a one-line change.
  • Smaller-magnitude regression detection on real translator changes — current cross-commit experiments validate large deltas (Set A: 100%+) and modest allocation deltas (Set B: 3-12% alloc); sub-3% time-delta detection on real changes hasn't been measured end-to-end. The perf-hw drift bands suggest it's feasible but needs a controlled experiment to confirm.
  • Possible future additions for coverage gaps: Lookup/Join, Distinct, SelectMany, Cast, $expr fallback paths.

Test plan

  • dotnet run -c Release -- --filter "*LinqTranslation*" from benchmarks/MongoDB.Driver.Benchmarks/ — all 10 benchmarks run, no build errors, results land in BenchmarkDotNet.Artifacts/results/
  • dotnet run -c Release -- --filter "*LinqEndToEnd*" — requires a reachable mongod (local or remote), all 12 benchmarks run
  • dotnet run -c Release -- --driverBenchmarks --anyCategories "LinqBench" — full perf-task path: composite score is emitted for LinqBench
  • Perf-job dry run on Evergreen via evergreen patch against test-csharp-spec-benchmarks / driver-performance-tests — task succeeds, results uploadable
  • Cedar/SPS picks up LinqBench composite without dashboard changes (verified by team after merge)

Caveats reviewers should know

  • All quantitative numbers in this description are from rhel90-dbx-perf-large runs (n=10 per commit). Cross-run drift is 2-7% on time, 0-1.2% on allocation. Set A and Set B results survive that noise floor as described per-row.
  • Set A's per-commit comparison required running the benchmark suite at two Feb-2026 commits. The benchmark code was transplanted onto those commits with one minor patch — subPathRoot parameter was added to TranslateExpressionToField between Feb and April, so the FieldSelection benchmark drops that arg at A1/A2. This isolates the comparison to driver code differences, not benchmark code differences.
  • Set B's PR1961 head commit (54973d039a) is not on mongodb/mongo-csharp-driver main's history. Submitting via evergreen patch-file --base <sha> silently runs the project's mainline task instead — we hit this once. Re-submitted via evergreen patch --uncommitted from a worktree at 54973d039a so the PR1961 commits land inside the patch diff. Documented for anyone who needs to do this in the future.
  • LinqEndToEndBenchmark uses a real MongoClient. Without a reachable cluster it fails at [GlobalSetup]. The translation suite has no such dependency.

@adelinowona adelinowona added the maintenance Non-code maintenance (deps, docs, configs, etc.). label May 19, 2026
@adelinowona adelinowona force-pushed the csharp5992 branch 3 times, most recently from ea1069d to 148d1de Compare May 20, 2026 18:15
15 benchmarks covering filter, projection, and IQueryable composition
translation paths. New LinqBench category added to AllCategories and
the perf-test runner. BulkWriteBench also added to AllCategories.
Document benchmark inventory, code path coverage, interpretation
guidance, and provisional thresholds. Record targeted injection test
results validating selective sensitivity of each benchmark to its
target translator code path.
PartialEvaluator injection test showed OrChainFilter is still
affected (evaluator traverses all expressions). Reframed as a
sensitivity amplifier rather than diagnostic isolator.
…methods

LinqBench uses translations/second instead of MB/s since there is no
data throughput to measure. Add Unit and MetricName to BenchmarkResult
so exporters label scores correctly. Change all benchmark methods to
return their values to prevent JIT dead-code elimination.
The composite score loop was using default MB/s labels for all
categories including LinqBench. Now correctly labels LinqBench
composites as translations_per_second.
…d-to-end benchmarks; update regression thresholds

- Redesign LinqTranslationBenchmark.cs: 15 individual feature benchmarks → 10
  representative user queries covering distinct translator code paths
  (MultiFieldSearch, OrStatusFilter, BatchLookup, ArrayElementQuery,
  FieldSelection, AggregationProjection, ProjectionSentinel, UpdatePipeline,
  QueryablePipeline, GroupByAggregation)
- Add LinqEndToEndBenchmark.cs: one-off characterization of translation overhead
  vs pre-built BsonDocument queries on a live collection; not wired into CI
- Update README regression thresholds based on 7-run M1 Max drift characterization:
  tight bucket (15%) for MultiFieldSearch/UpdatePipeline/BatchLookup/ArrayElementQuery;
  wider bucket (30%) for OrStatusFilter/FieldSelection/AggregationProjection/
  QueryablePipeline/GroupByAggregation
Copy link
Copy Markdown
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look very good overall.

// --- OrStatusFilter: simple OR filter ---

[Benchmark]
public List<OrderDocument> OrStatusFilterLinq()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think we need more descriptive naming, not tied to a specific field name. Something like OrFilterLinq, or FindOrFilterLinq

{
var pipeline = PipelineDefinition<OrderDocument, BsonDocument>.Create(_groupByPipeline);
return _collection.Aggregate(pipeline).ToList();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add additional case with all the following

  • Select/Pojection,
  • something with arrays, like $in, $all....
  • OrderBy.ThenBy, Take

{
return _collection.Find(x =>
x.Status == _statusFilter &&
x.CustomerName.StartsWith(_prefix) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does StartsWith create the exact same regex as in the raw version?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might makes sense to made a first translation somewhere in GlobalSetup and compare the produced MQL. So if we will change the translation in future the Benchmark will throw.

{
private const string DatabaseName = "linqbench";
private const string CollectionName = "orders";
private const int SeedCount = 500;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating indexes in the database such that server time is minimized and translation time changes will be more apparent.

public List<BsonDocument> GroupByLinq()
{
return _collection.Aggregate()
.Group(x => x.Status, g => new { Status = g.Key, Count = g.Count(), TotalRevenue = g.Sum(x => x.Total) })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is projection to an anonymous type here, followed by creation of the BSON document. The raw example does not project to an anonymous type.

…quivalence fixes; rename OrStatusFilter to OrFilter
@adelinowona adelinowona changed the title WIP: Add LINQ translation benchmark suite CSHARP-5992: Add LINQ translation benchmark suite May 28, 2026
@adelinowona adelinowona marked this pull request as ready for review May 28, 2026 15:59
@adelinowona adelinowona requested a review from a team as a code owner May 28, 2026 15:59
Copilot AI review requested due to automatic review settings May 28, 2026 15:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LINQ-focused benchmark suite to the driver benchmarks project, enabling perf-job tracking and composite scoring for LINQ translation performance (plus an optional end-to-end comparison suite).

Changes:

  • Introduces LinqTranslationBenchmark (translation-only, no query execution) and LinqEndToEndBenchmark (LINQ vs raw query plans with real DB execution).
  • Wires new LinqBench category into perf-job filtering and composite-score export (including score units/metric names).
  • Extends benchmark category constants and composite export output to include LinqBench (and now BulkWriteBench) in AllCategories.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
evergreen/run-perf-tests.sh Adds LinqBench to Evergreen perf category filter.
benchmarks/MongoDB.Driver.Benchmarks/Linq/README.md Documents the LINQ benchmark inventory and interpretation guidance.
benchmarks/MongoDB.Driver.Benchmarks/Linq/LinqTranslationBenchmark.cs Adds translation-focused benchmarks across filter/field/projection/update/IQueryable entry points.
benchmarks/MongoDB.Driver.Benchmarks/Linq/LinqEndToEndBenchmark.cs Adds end-to-end LINQ vs raw benchmarks that seed data and run against a live server.
benchmarks/MongoDB.Driver.Benchmarks/Exporters/LocalExporter.cs Emits per-category and per-benchmark units (MB/s vs translations/s).
benchmarks/MongoDB.Driver.Benchmarks/Exporters/EvergreenExporter.cs Emits per-category and per-benchmark metric names for Evergreen (MB/s vs translations/s).
benchmarks/MongoDB.Driver.Benchmarks/DriverBenchmarkCategory.cs Adds LinqBench and includes it (and BulkWriteBench) in composite category list.
benchmarks/MongoDB.Driver.Benchmarks/BenchmarkResult.cs Adds unit/metric metadata and computes translations/s scoring for LinqBench.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| Benchmark | Expression | Translator path exercised |
|---|---|---|
| `MultiFieldSearch` | `x.Status == s && x.CustomerName.StartsWith(prefix) && x.ShippingAddress.City == city && x.CreatedAt > cutoff && !x.IsPaid` | And → Comparison, MethodCall (StartsWith), nested MemberAccess, Not + boolean MemberAccess |
| `OrStatusFilter` | 4-way `==` OR with literal constants | Or → Comparison. No closures — fastest filter, most sensitive to small regressions |
[GlobalSetup]
public void Setup()
{
_client = MongoConfiguration.CreateClient();
Comment on lines +211 to +220
private void SetupQueryableExpressions(string statusFilter)
{
var mongoUri = Environment.GetEnvironmentVariable("MONGODB_URI");
var settings = mongoUri != null ? MongoClientSettings.FromConnectionString(mongoUri) : new MongoClientSettings();
settings.ClusterSource = DisposingClusterSource.Instance;
_queryClient = new MongoClient(settings);

var collection = _queryClient.GetDatabase("linqbench").GetCollection<OrderDocument>("orders");
var queryable = collection.AsQueryable();


**Allocation changes** are often more actionable than time changes. A new allocation in a hot path is a real regression even if the time delta is within noise. The `[MemoryDiagnoser]` columns (`Gen0`, `Allocated`) make allocation regressions visible.

**`OrStatusFilter`** is the fastest filter (~7µs, ~5x faster than others) because it uses literal constants instead of captured variables, producing a simpler expression tree with less work at every stage. This makes it the most sensitive filter benchmark — small translator regressions that would be lost in the noise on slower benchmarks show up clearly here.
Comment on lines +93 to +99
| Bucket | Threshold | Benchmarks | Observed range |
|---|---|---|---|
| Tight | 15% | `MultiFieldSearch`, `UpdatePipeline`, `BatchLookup`, `ArrayElementQuery` | 9–15% |
| Wider | 30% | `OrStatusFilter`, `FieldSelection`, `AggregationProjection`, `QueryablePipeline`, `GroupByAggregation` | 20–29% |
| Sentinel | 100% | `ProjectionSentinel` | 5% |

`OrStatusFilter` and `FieldSelection` land in the wider bucket for different reasons: `OrStatusFilter` is ~7µs and proportional noise is large at that scale; `FieldSelection` is ~2µs with similar characteristics. The three complex benchmarks (`AggregationProjection`, `QueryablePipeline`, `GroupByAggregation`) show higher drift because they allocate more and exercise more GC pressure.
public void Cleanup()
{
_client.DropDatabase(DatabaseName);
_client.Dispose();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v3 MongoDB.Driver this will not unregister the cluster. If we really want to clean up we should follow the clean up guide.

{
return _collection.Find(x =>
x.Status == _statusFilter &&
x.CustomerName.StartsWith(_prefix) &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might makes sense to made a first translation somewhere in GlobalSetup and compare the produced MQL. So if we will change the translation in future the Benchmark will throw.

{
private const string DatabaseName = "linqbench";
private const string CollectionName = "orders";
private const int SeedCount = 500;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need 500 documents? As far as I understood the point is to measure the LINQ translation overhead, so I would say it should be enough to have even a single document in the collection to reduce the server/network/serialization overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Non-code maintenance (deps, docs, configs, etc.).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants