Skip to content

CSHARP-5840: Proper XUnit shared test fixture for Atlas tests#2007

Open
ajcvickers wants to merge 5 commits into
mongodb:mainfrom
ajcvickers:CSHARP-5840
Open

CSHARP-5840: Proper XUnit shared test fixture for Atlas tests#2007
ajcvickers wants to merge 5 commits into
mongodb:mainfrom
ajcvickers:CSHARP-5840

Conversation

@ajcvickers
Copy link
Copy Markdown
Contributor

Replaces the suite's dependence on pre-configured Atlas sample databases and search indexes. The new AtlasSearchFixture (collection-scoped) owns a single client, an EventCapturer, and a fixture-managed atlas_search_ database. Lazy seeders populate historical_documents, movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items, auto_embed_movies, two synonym source collections, and the existing return-scope directors collection, and create the ten search/vector indexes required by the tests (waiting for READY). Dispose drops the whole database.

The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs (generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample data once) so the test suite has zero runtime dependency on sample databases — only on an Atlas-search-capable cluster being reachable at ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32 vectors via BinaryVectorFloat32.

AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)] and pull their client / collections through the fixture. AtlasSearchTests loses its static __indexesCreated / __databaseUniquifier / __returnScopeTestCount machinery and the inline return-scope seed. AtlasSearchTestsUtils is obsoleted.

AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840 skip is gone): the auto-embed collection and index live in the fixture so the Voyage AI embedding work happens once per test session instead of three times, the corpus is reduced to 4 movies (2 pass the runtime+year filter, 2 don't), and the ctor's unbounded wait loop is replaced with a 15-minute deadline that surfaces a clear error mentioning VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors without a Voyage AI key can still run the rest of the Atlas Search suite. When enabled, the Atlas Local container must have VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the key must authenticate against ai.mongodb.com, not api.voyageai.com). docker-compose.atlas-local.yml is included as a convenience launcher.

Count- and ranking-dependent assertions are recalibrated for the lean corpus: phrase-match counts drop from 108 to 1, geo-result counts from 25 to membership checks, and vector-search top-K assertions switch from order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo. The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported, which probes the deployment on first use.

Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36 AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars are unset, and no atlas_search_* databases remain after the suite exits.

@ajcvickers ajcvickers requested a review from Copilot May 22, 2026 14:35
@ajcvickers ajcvickers added the chore Non–user-facing code changes (tests, build scripts, etc.). label May 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Atlas Search integration test suite to use a collection-scoped xUnit fixture that provisions its own ephemeral database, seed data, and required Atlas Search/vector indexes, eliminating reliance on pre-populated sample databases.

Changes:

  • Add AtlasSearchFixture + AtlasSearchCollection to centralize client/event capture and lazily seed collections/indexes for Atlas Search tests.
  • Update Atlas Search, Vector Search, Auto-embed Vector Search, and index-management tests to consume seeded collections via the fixture and adjust assertions for the reduced deterministic corpus.
  • Add a one-shot AtlasSeedExtractor tool and a convenience docker-compose.atlas-local.yml for running Atlas Local with Voyage AI auto-embedding.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/Tools/AtlasSeedExtractor/README.md Documents the seed-extraction tool used to generate embedded C# seed literals.
tests/Tools/AtlasSeedExtractor/Program.cs Implements deterministic extraction + C# emission of curated sample documents/embeddings.
tests/Tools/AtlasSeedExtractor/AtlasSeedExtractor.csproj Adds the new tool project targeting net10.0.
tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs Switches tests to fixture-provided client/collections and updates ranking assertions for lean seed.
tests/MongoDB.Driver.Tests/Search/AutoEmbedVectorSearchTests.cs Enables real auto-embed tests behind env var gating and uses fixture-owned corpus/index.
tests/MongoDB.Driver.Tests/Search/AtlasSearchTestsUtils.cs Removes the old ad-hoc Atlas Search client helper.
tests/MongoDB.Driver.Tests/Search/AtlasSearchTests.cs Migrates to fixture-backed seeded collections/indexes and recalibrates corpus-dependent assertions.
tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs Migrates index-helper tests to fixture-owned client and updates auto-embed gating/expectations.
tests/MongoDB.Driver.Tests/Search/AtlasSearchFixture.cs Introduces the shared fixture that seeds collections and waits for indexes to become READY.
tests/MongoDB.Driver.Tests/Search/AtlasSearchCollection.cs Adds the xUnit collection definition binding the fixture to test classes.
docker-compose.atlas-local.yml Adds an Atlas Local compose file configured for Voyage AI auto-embedding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/MongoDB.Driver.Tests/Search/AtlasSearchFixture.cs
Comment thread tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs Outdated
Comment thread tests/Tools/AtlasSeedExtractor/Program.cs Outdated
Comment thread tests/Tools/AtlasSeedExtractor/Program.cs Outdated
Comment thread tests/Tools/AtlasSeedExtractor/README.md Outdated
Comment thread docker-compose.atlas-local.yml
Comment thread tests/MongoDB.Driver.Tests/Search/AtlasSearchFixture.cs Outdated
get
{
EnsureMoviesInitialized();
if (_isRerankSupported is bool cached)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use _isRerankSupported.HasValue instead to clearly show what exactly we are checking.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it looks like Lazy<> could be better here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy<> is not a good choice here because:

  • Exception handling is different. Lazy will keep throwing; current code does not.
  • EnsureAutoEmbedInitialized needs to run an AUTO_EMBEDDING_TESTS_ENABLED fail-fast check
  • IsRerankSupported depends on EnsureMoviesInitialized() running first.
  • Lazy purely-for-side-effects is an awkward idiom.
  • Independent per-Lazy locks vs. the shared _initLock makes the sharing logic easier to grok.

However, I did some refactoring to reduce some of the duplication.

});

_historicalInitialized = true;
EventCapturer?.Clear();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each EnsureXXXInitialized method ends with EventCapturer?.Clear();. Is it safe to clear the event capturer in such a way? User's code does not control when EnsureXXXInitialized method is being executed, so if the test case will looks like:

var collection1 = Fixture.Collection1;
collection1.Aggregate(...) // do something useful with the collection - but these events could be lost because of the next line.
var collection2 = Fixture.Collection2;

Should we have a single EnsureTestDataSet method to initialize all collections?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This isn't an issue currently, since all tests are in a single collection, and XUnit doesn't run tests in the same collection in parallel. However, fixing this now means that if we move tests into separate collections and run in parallel (for perf), then things will not break.

{
(_mongoClient, _) = AtlasSearchTestsUtils.CreateAtlasSearchMongoClient();
RequireEnvironment.Check().EnvironmentVariable("ATLAS_SEARCH_TESTS_ENABLED");
RequireEnvironment.Check().EnvironmentVariable("ATLAS_SEARCH_URI");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are always check ATLAS_SEARCH_TESTS_ENABLED and ATLAS_SEARCH_URI together. Do we really need ATLAS_SEARCH_TESTS_ENABLED then?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but I'm not sure why we have all the environment variables we have, so I have been cautious about consolidating. If we're okay with gating on the URI being set, then we can do this.

Comment thread tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs Outdated
Comment thread tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs Outdated
@ajcvickers ajcvickers force-pushed the CSHARP-5840 branch 2 times, most recently from 0bbc0a1 to fc12213 Compare May 27, 2026 12:57
@ajcvickers ajcvickers marked this pull request as ready for review May 28, 2026 12:42
@ajcvickers ajcvickers requested a review from a team as a code owner May 28, 2026 12:42
@ajcvickers ajcvickers requested review from papafe and sanych-sun May 28, 2026 12:42
get
{
EnsureMoviesInitialized();
if (_isRerankSupported is { } cached)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the following code would be more self-descriptive:

if (_isRerankSupported.HasValue)
{
    return _isRerankSupported.Value;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. It's just older.

if (_embeddedMoviesInitialized) return;

var collection = Database.GetCollection<BsonDocument>(EmbeddedMoviesName);
if (!collection.AsQueryable().Any())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes sense trying to preserve previously existed data. What if the collection is filled with something else, or what if we decide to change the test data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so that the database name can be pinned during debugging, and the test can be run multiple times without needing to re-seed every time, which slows down the inter-loop.


// ---- Typed collection accessors ----

public IMongoCollection<T> GetHistoricalDocumentsCollection<T>()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this methods has type parameter? Do we expect different tests to use different models?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently used with HistoricalDocument and HistoricalDocumentWithCommentsOnly.

@@ -0,0 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that this tool looks useful, but do we really need it in the Driver's repo? Are we expecting to use it frequently?

Should we have a separate repo for tools like this? Somewhere under https://github.com/mongodb-labs .

I'm not insisting, just asking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove it.

Replaces the suite's dependence on pre-configured Atlas sample databases
and search indexes. The new AtlasSearchFixture (collection-scoped) owns
a single client, an EventCapturer, and a fixture-managed
atlas_search_<guid> database. Lazy seeders populate historical_documents,
movies, embedded_movies, airbnb_listings, test_classes, binary_vector_items,
auto_embed_movies, two synonym source collections, and the existing
return-scope directors collection, and create the ten search/vector
indexes required by the tests (waiting for READY). Dispose drops the
whole database.

The seed data is baked in as C# literals in AtlasSearchFixtureSeedData.cs
(generated by tests/Tools/AtlasSeedExtractor from Atlas Local sample
data once) so the test suite has zero runtime dependency on sample
databases — only on an Atlas-search-capable cluster being reachable at
ATLAS_SEARCH_URI. Embeddings are preserved as real Voyage-AI float32
vectors via BinaryVectorFloat32.

AtlasSearchTests, VectorSearchTests, AutoEmbedVectorSearchTests, and
AtlasSearchIndexManagementTests join via [Collection(AtlasSearchCollection.Name)]
and pull their client / collections through the fixture. AtlasSearchTests
loses its static __indexesCreated / __databaseUniquifier /
__returnScopeTestCount machinery and the inline return-scope seed.
AtlasSearchTestsUtils is obsoleted.

AutoEmbedVectorSearchTests now run for real (the previous CSHARP-5840
skip is gone): the auto-embed collection and index live in the fixture
so the Voyage AI embedding work happens once per test session instead
of three times, the corpus is reduced to 4 movies (2 pass the
runtime+year filter, 2 don't), and the ctor's unbounded wait loop is
replaced with a 15-minute deadline that surfaces a clear error mentioning
VOYAGE_API_KEY. Gated on a new AUTO_EMBEDDING_TESTS_ENABLED env var so contributors
without a Voyage AI key can still run the rest of the Atlas Search
suite. When enabled, the Atlas Local container must have
VOYAGE_API_KEY provisioned via the Atlas UI's Model API Keys page (the
key must authenticate against ai.mongodb.com, not api.voyageai.com).
docker-compose.atlas-local.yml is included as a convenience launcher.

Count- and ranking-dependent assertions are recalibrated for the lean
corpus: phrase-match counts drop from 108 to 1, geo-result counts from
25 to membership checks, and vector-search top-K assertions switch from
order-sensitive ShouldBeEquivalentTo to order-insensitive BeEquivalentTo.
The Pinocchio→Oz swap in VectorSearch top-5 is documented in the test
where it lives. The Rerank test guards on AtlasSearchFixture.IsRerankSupported,
which probes the deployment on first use.

Validated against Atlas Local: 77/77 AtlasSearch* tests pass, 36/36
AtlasSearchIndexManagement tests pass, 3/3 AutoEmbedVectorSearchTests
pass (with a valid VOYAGE_API_KEY), 74 tests skip cleanly when env vars
are unset, and no atlas_search_* databases remain after the suite exits.

[review-iter 1] Address review findings

[review-iter 2] Address review findings
The AtlasSeedExtractor project was a one-shot generator that produced the
baked-in literals in AtlasSearchFixtureSeedData.cs. That seed data is now
static and checked in, so the tool is no longer needed: delete the project
and drop the generator references from the seed-data file header.

Also collapse the eight near-identical double-checked-locking blocks in
AtlasSearchFixture into a single EnsureInitialized(isInitialized, initialize)
helper, and mark the per-collection init guards volatile so the lock-free
fast-path read is correctly published. Behavior is unchanged: the shared
_initLock still serializes seeding, and a throwing seeder still leaves its
guard unset so the next access retries (unlike Lazy<T>, which would cache
and re-throw the failure).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Non–user-facing code changes (tests, build scripts, etc.).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants