CSHARP-5840: Proper XUnit shared test fixture for Atlas tests by ajcvickers · Pull Request #2007 · mongodb/mongo-csharp-driver

ajcvickers · 2026-05-22T14:34:47Z

Replaces the suite's dependence on pre-configured Atlas sample databases and search indexes. The new AtlasSearchFixture (collection-scoped) owns a single client, an EventCapturer, and a fixture-managed atlas_search_ database. Lazy seeders populate historical_documents, movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items, auto_embed_movies, two synonym source collections, and the existing return-scope directors collection, and create the ten search/vector indexes required by the tests (waiting for READY). Dispose drops the whole database.

The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs (generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample data once) so the test suite has zero runtime dependency on sample databases — only on an Atlas-search-capable cluster being reachable at ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32 vectors via BinaryVectorFloat32.

AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)] and pull their client / collections through the fixture. AtlasSearchTests loses its static __indexesCreated / __databaseUniquifier / __returnScopeTestCount machinery and the inline return-scope seed. AtlasSearchTestsUtils is obsoleted.

AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840 skip is gone): the auto-embed collection and index live in the fixture so the Voyage AI embedding work happens once per test session instead of three times, the corpus is reduced to 4 movies (2 pass the runtime+year filter, 2 don't), and the ctor's unbounded wait loop is replaced with a 15-minute deadline that surfaces a clear error mentioning VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors without a Voyage AI key can still run the rest of the Atlas Search suite. When enabled, the Atlas Local container must have VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the key must authenticate against ai.mongodb.com, not api.voyageai.com). docker-compose.atlas-local.yml is included as a convenience launcher.

Count- and ranking-dependent assertions are recalibrated for the lean corpus: phrase-match counts drop from 108 to 1, geo-result counts from 25 to membership checks, and vector-search top-K assertions switch from order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo. The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported, which probes the deployment on first use.

Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36 AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars are unset, and no atlas_search_* databases remain after the suite exits.

Copilot

Pull request overview

This PR refactors the Atlas Search integration test suite to use a collection-scoped xUnit fixture that provisions its own ephemeral database, seed data, and required Atlas Search/vector indexes, eliminating reliance on pre-populated sample databases.

Changes:

Add AtlasSearchFixture + AtlasSearchCollection to centralize client/event capture and lazily seed collections/indexes for Atlas Search tests.
Update Atlas Search, Vector Search, Auto-embed Vector Search, and index-management tests to consume seeded collections via the fixture and adjust assertions for the reduced deterministic corpus.
Add a one-shot AtlasSeedExtractor tool and a convenience docker-compose.atlas-local.yml for running Atlas Local with Voyage AI auto-embedding.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/Tools/AtlasSeedExtractor/README.md	Documents the seed-extraction tool used to generate embedded C# seed literals.
tests/Tools/AtlasSeedExtractor/Program.cs	Implements deterministic extraction + C# emission of curated sample documents/embeddings.
tests/Tools/AtlasSeedExtractor/AtlasSeedExtractor.csproj	Adds the new tool project targeting net10.0.
tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs	Switches tests to fixture-provided client/collections and updates ranking assertions for lean seed.
tests/MongoDB.Driver.Tests/Search/AutoEmbedVectorSearchTests.cs	Enables real auto-embed tests behind env var gating and uses fixture-owned corpus/index.
tests/MongoDB.Driver.Tests/Search/AtlasSearchTestsUtils.cs	Removes the old ad-hoc Atlas Search client helper.
tests/MongoDB.Driver.Tests/Search/AtlasSearchTests.cs	Migrates to fixture-backed seeded collections/indexes and recalibrates corpus-dependent assertions.
tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs	Migrates index-helper tests to fixture-owned client and updates auto-embed gating/expectations.
tests/MongoDB.Driver.Tests/Search/AtlasSearchFixture.cs	Introduces the shared fixture that seeds collections and waits for indexes to become READY.
tests/MongoDB.Driver.Tests/Search/AtlasSearchCollection.cs	Adds the xUnit collection definition binding the fixture to test classes.
docker-compose.atlas-local.yml	Adds an Atlas Local compose file configured for Voyage AI auto-embedding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sanych-sun · 2026-05-25T16:19:52Z

+            get
+            {
+                EnsureMoviesInitialized();
+                if (_isRerankSupported is bool cached)


I would suggest to use _isRerankSupported.HasValue instead to clearly show what exactly we are checking.

BTW, it looks like Lazy<> could be better here.

Lazy<> is not a good choice here because:

Exception handling is different. Lazy will keep throwing; current code does not.

EnsureAutoEmbedInitialized needs to run an AUTO_EMBEDDING_TESTS_ENABLED fail-fast check

IsRerankSupported depends on EnsureMoviesInitialized() running first.

Lazy purely-for-side-effects is an awkward idiom.

Independent per-Lazy locks vs. the shared _initLock makes the sharing logic easier to grok.

However, I did some refactoring to reduce some of the duplication.

sanych-sun · 2026-05-25T16:26:18Z

+                });
+
+                _historicalInitialized = true;
+                EventCapturer?.Clear();


Each EnsureXXXInitialized method ends with EventCapturer?.Clear();. Is it safe to clear the event capturer in such a way? User's code does not control when EnsureXXXInitialized method is being executed, so if the test case will looks like:

var collection1 = Fixture.Collection1; collection1.Aggregate(...) // do something useful with the collection - but these events could be lost because of the next line. var collection2 = Fixture.Collection2;

Should we have a single EnsureTestDataSet method to initialize all collections?

Good catch. This isn't an issue currently, since all tests are in a single collection, and XUnit doesn't run tests in the same collection in parallel. However, fixing this now means that if we move tests into separate collections and run in parallel (for perf), then things will not break.

sanych-sun · 2026-05-25T16:39:03Z

        {
-            (_mongoClient, _) = AtlasSearchTestsUtils.CreateAtlasSearchMongoClient();
+            RequireEnvironment.Check().EnvironmentVariable("ATLAS_SEARCH_TESTS_ENABLED");
+            RequireEnvironment.Check().EnvironmentVariable("ATLAS_SEARCH_URI");


It looks like we are always check ATLAS_SEARCH_TESTS_ENABLED and ATLAS_SEARCH_URI together. Do we really need ATLAS_SEARCH_TESTS_ENABLED then?

I don't think so, but I'm not sure why we have all the environment variables we have, so I have been cautious about consolidating. If we're okay with gating on the URI being set, then we can do this.

sanych-sun · 2026-05-28T19:56:57Z

+            get
+            {
+                EnsureMoviesInitialized();
+                if (_isRerankSupported is { } cached)


I would say the following code would be more self-descriptive:

if (_isRerankSupported.HasValue) { return _isRerankSupported.Value; }

I disagree. It's just older.

sanych-sun · 2026-05-28T19:58:59Z

+                if (_embeddedMoviesInitialized) return;
+
+                var collection = Database.GetCollection<BsonDocument>(EmbeddedMoviesName);
+                if (!collection.AsQueryable().Any())


Does it makes sense trying to preserve previously existed data. What if the collection is filled with something else, or what if we decide to change the test data?

Will take a look.

It's so that the database name can be pinned during debugging, and the test can be run multiple times without needing to re-seed every time, which slows down the inter-loop.

sanych-sun · 2026-05-28T20:01:34Z

+
+        // ---- Typed collection accessors ----
+
+        public IMongoCollection<T> GetHistoricalDocumentsCollection<T>()


Why this methods has type parameter? Do we expect different tests to use different models?

It's currently used with HistoricalDocument and HistoricalDocumentWithCommentsOnly.

sanych-sun · 2026-05-28T20:06:53Z

@@ -0,0 +1,18 @@
+<Project Sdk="Microsoft.NET.Sdk">


I do agree that this tool looks useful, but do we really need it in the Driver's repo? Are we expecting to use it frequently?

Should we have a separate repo for tools like this? Somewhere under https://github.com/mongodb-labs .

I'm not insisting, just asking.

I'll just remove it.

Replaces the suite's dependence on pre-configured Atlas sample databases and search indexes. The new AtlasSearchFixture (collection-scoped) owns a single client, an EventCapturer, and a fixture-managed atlas_search_<guid> database. Lazy seeders populate historical_documents, movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items, auto_embed_movies, two synonym source collections, and the existing return-scope directors collection, and create the ten search/vector indexes required by the tests (waiting for READY). Dispose drops the whole database. The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs (generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample data once) so the test suite has zero runtime dependency on sample databases — only on an Atlas-search-capable cluster being reachable at ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32 vectors via BinaryVectorFloat32. AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)] and pull their client / collections through the fixture. AtlasSearchTests loses its static __indexesCreated / __databaseUniquifier / __returnScopeTestCount machinery and the inline return-scope seed. AtlasSearchTestsUtils is obsoleted. AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840 skip is gone): the auto-embed collection and index live in the fixture so the Voyage AI embedding work happens once per test session instead of three times, the corpus is reduced to 4 movies (2 pass the runtime+year filter, 2 don't), and the ctor's unbounded wait loop is replaced with a 15-minute deadline that surfaces a clear error mentioning VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors without a Voyage AI key can still run the rest of the Atlas Search suite. When enabled, the Atlas Local container must have VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the key must authenticate against ai.mongodb.com, not api.voyageai.com). docker-compose.atlas-local.yml is included as a convenience launcher. Count- and ranking-dependent assertions are recalibrated for the lean corpus: phrase-match counts drop from 108 to 1, geo-result counts from 25 to membership checks, and vector-search top-K assertions switch from order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo. The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported, which probes the deployment on first use. Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36 AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars are unset, and no atlas_search_* databases remain after the suite exits. [review-iter 1] Address review findings [review-iter 2] Address review findings

The AtlasSeedExtractor project was a one-shot generator that produced the baked-in literals in AtlasSearchFixtureSeedData.cs. That seed data is now static and checked in, so the tool is no longer needed: delete the project and drop the generator references from the seed-data file header. Also collapse the eight near-identical double-checked-locking blocks in AtlasSearchFixture into a single EnsureInitialized(isInitialized, initialize) helper, and mark the per-collection init guards volatile so the lock-free fast-path read is correctly published. Behavior is unchanged: the shared _initLock still serializes seeding, and a throwing seeder still leaves its guard unset so the next access retries (unlike Lazy<T>, which would cache and re-throw the failure).

ajcvickers requested a review from Copilot May 22, 2026 14:35

ajcvickers added the chore Non–user-facing code changes (tests, build scripts, etc.). label May 22, 2026

Copilot started reviewing on behalf of ajcvickers May 22, 2026 14:35 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

sanych-sun requested changes May 25, 2026

View reviewed changes

ajcvickers force-pushed the CSHARP-5840 branch 2 times, most recently from 0bbc0a1 to fc12213 Compare May 27, 2026 12:57

ajcvickers marked this pull request as ready for review May 28, 2026 12:42

ajcvickers requested a review from a team as a code owner May 28, 2026 12:42

ajcvickers requested review from papafe and sanych-sun May 28, 2026 12:42

sanych-sun reviewed May 28, 2026

View reviewed changes

ajcvickers added 5 commits May 29, 2026 10:45

Copilot feedback

69b977e

Review feedback

6bf8569

Remove atlas-search-index-helpers-test

cc6bc3e

ajcvickers force-pushed the CSHARP-5840 branch from 44fa6f3 to 5616cb9 Compare May 29, 2026 10:18

ajcvickers requested a review from sanych-sun May 29, 2026 10:22


		// ---- Typed collection accessors ----

		public IMongoCollection<T> GetHistoricalDocumentsCollection<T>()

Conversation

ajcvickers commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants