SOLR-18194: fix nested docs detection false positive by kotman12 · Pull Request #4279 · apache/solr

kotman12 · 2026-04-12T02:17:53Z

Description

The upgrade core index endpoint doesn't currently support child docs and so it guards against running on an index with such documents. The logic to check for the existence of child docs contingent upon the _root_ field existing could be improved. If you just compare the doc count of _root_ vs the unique count (current behavior), it will falsely flag any segment with updated docs (where the update happened within one segment) as containing child docs when in actuality it may have no child docs. I actually ran into such a scenario devising my own version of this check.

Solution

Compare terms size of id field with the terms size of the _root_ field to ensure no documents share a root. We do this for each segment separately as was done for the previous check.

Tests

testNestedDocsDetection_nonNestedJustAdd — Plain docs, no updates. Verifies no false positive.
testNestedDocsDetection_nestedJustAdd — Actual nested docs. Verifies detection works.
testNestedDocsDetection_nonNestedWithWithinCommitUpdates — Non-nested docs where some are re-added (same id) before commit. The deleted+re-added docs share root values within the segment, which previously caused false positives.
testNestedDocsDetection_nestedWithWithinCommitUpdates — Same within-commit update pattern but with real nested docs present.
testNestedDocsDetection_nonNestedWithWithinCommitDeletesAndUpdates — Non-nested docs with deletes AND updates before commit. Deleted entries in the segment can also trigger false positives.
testNestedDocsDetection_nestedWithWithinCommitDeletesAndUpdates — Same delete+update pattern with real nested docs.

kotman12 · 2026-04-12T02:53:29Z

Btw @dsmiley I got Claude to factor this with an EmbeddedSolrServerTest here (as you suggested on the initial PR) but I am not sure what to think about it. Part of me thinks the collection-centric language, i.e. .newCollection() is a bit weird to put in a test of a feature that is explicitly not supported in cloud mode. That being said it is just a naming thing and the test does work. So its perhaps more of a problem with the solrTestRule. Perhaps I have some trouble grasping the finer details as I've only ever interacted with Solr in Cloud mode.

dsmiley

Part of me thinks the collection-centric language, i.e. .newCollection() is a bit weird to put in a test of a feature that is explicitly not supported in cloud mode. That being said it is just a naming thing and the test does work. So its perhaps more of a problem with the solrTestRule. Perhaps I have some trouble grasping the finer details as I've only ever interacted with Solr in Cloud mode.

The SolrTestRule is an abstraction that will grow to SolrCloud. So I think there should be one terminology instead of bifurcated. On the other hand, I could see adding an alias for non-cloud ones when we're sure the logic of the test is very fundamentally about cores.

dsmiley · 2026-04-12T04:15:17Z

+    }
+  }
+
+  // --- Child docs detection tests ---


[0] Wow lots of tests... and they seem only slightly tweaked amongst each. My observation is that Lucene/Solr old-timers (like me) prefer to write a minimal number of randomized test instead of writing many repeated tests that tweak something. I noticed LLMs prefer the latter. But it's fine, really!

These are indeed LLM generated but I was very intentional about what scenarios to test, i.e. hasChildDocs X hasUpdates X hasDeletes. These variables have different effects on the distinct count vs docCount of both id and _root_ and would like any hypothetical future refactoring to take into account each since the logic is a function of these variables. I am not convinced a randomized test would have been effective here without knowing what edge cases you are looking for a priori. I suppose in theory it could have caught other things I hadn't considered although seems unlikely given the relatively simple logic and few moving parts (famous last words). Thus, there is advantage to deterministically validating these known edge cases with each build. That being said, it feels like overkill to test these in UpgradeCoreIndex as this check is a rather small component of the overall operation. Perhaps this check should live in some utility function? Then we could also call it from, say, the LukeHandler and expose a hasNested/ChildDocs in the index info. The benefit of such thorough testing would be more obvious in a targeted utility. What do you think?

As an aside, is your reflexive urge to use randomized testing here motivated by aesthetic reasons? If so, I could consider parameterizing them. I think it should be possible.

Thanks for explaining why the tests are what they are. Seems perfect then :-)

No; not aesthetic reasons. Well okay slightly but whatever.

I love the idea of a utility function somewhere! SolrCore? I hate to suggest that place given SolrCore's size but it would at least have some logical sense to its placement there, and discoverability, which matters.

…reIndex.java Co-authored-by: David Smiley <[email protected]>

rahulgoswami · 2026-04-12T20:34:07Z

+  // in segments (since NoMergePolicy prevents segment merges from purging them).
+
+  @Test
+  public void testChildDocsDetection_noChildDocsJustAdd() throws Exception {


I understand the intent that we are trying to be explicit about the edge cases of the child docs check. In certain cases however, I'd argue that certain tests like this one are effectively testing the happy path which is already getting covered inherently in existing tests like testNoUpgradeNeededWhenAllSegmentsCurrent().

I appreciate the thorough tests, but I think we have an opportunity to keep the coverage tight here by relying on certain existing tests like above. Another reasoning is that every test is "just another test" by itself, but can gradually add to the build times.

rahulgoswami · 2026-04-12T20:39:01Z

+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsJustAdd() throws Exception {


Same argument on duplicate coverage as above through existing test testUpgradeCoreIndexFailsWithNestedDocuments() (now renamed to testUpgradeCoreIndexFailsWithChildDocuments() in this PR)

rahulgoswami · 2026-04-12T21:00:01Z

+  }
+
+  @Test
+  public void testChildDocsDetection_withChildDocsWithWithinCommitUpdates() throws Exception {


I'd argue that all tests of the pattern testChildDocsDetection_withChildDocs* are getting covered through the existing test testUpgradeCoreIndexFailsWithChildDocuments() since the with vs within(CommitUpdates) condition is not a real differentiator in that case.

testChildDocsDetection_noChildDocsWithWithinCommitUpdates() and testChildDocsDetection_noChildDocsWithWithinCommitDeletesAndUpdates() is all we need?

In the current algorithm they are equivalent but things can change so this is more for regression prevention. I think it's possible to imagine some future code path that only gets exercised when unique(id) == idTerms.size() (which is the case only if all deletes are via update) or vice versa. We can also randomize the counts of docs added vs deleted vs updated if we are really worried about the incremental build time here. All that being said, it feels like overkill for the UpgradeCoreIndex tests because this isn't even the main point of this class.

That's why I proposed moving the child doc check to a central location (where they can also be exposed by some info endpoint as well). I know @dsmiley expressed interest in exposing the child doc check via some info endpoint in the original PR (luke or maybe in core/collection status?). If we were to go that direction then we'd probably want to move this check to some centralized utility.

kotman12 added 5 commits April 6, 2026 16:29

add false-positive nested doc check test

0f6dccc

suggested improvement

15f02bd

add changelog

adb2f88

nestedDocs -> childDocs

76cdb76

nestedDocs -> childDocs

f4d0f2a

kotman12 requested a review from rahulgoswami April 12, 2026 02:17

github-actions bot added the tests label Apr 12, 2026

kotman12 changed the title ~~Solr-18194: fix nested docs detection false positive~~ SOLR-18194: fix nested docs detection false positive Apr 12, 2026

dsmiley approved these changes Apr 12, 2026

View reviewed changes

Update solr/core/src/java/org/apache/solr/handler/admin/api/UpgradeCo…

38703b7

…reIndex.java Co-authored-by: David Smiley <[email protected]>

rahulgoswami reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-18194: fix nested docs detection false positive#4279

SOLR-18194: fix nested docs detection false positive#4279
kotman12 wants to merge 6 commits intoapache:mainfrom
kotman12:SOLR-18194-fix-nested-docs-detection-false-positive

kotman12 commented Apr 12, 2026

Uh oh!

kotman12 commented Apr 12, 2026 •

edited

Loading

Uh oh!

dsmiley left a comment

Uh oh!

Uh oh!

dsmiley Apr 12, 2026

Uh oh!

kotman12 Apr 12, 2026

Uh oh!

dsmiley Apr 12, 2026

Uh oh!

rahulgoswami Apr 12, 2026 •

edited

Loading

Uh oh!

rahulgoswami Apr 12, 2026

Uh oh!

rahulgoswami Apr 12, 2026 •

edited

Loading

Uh oh!

kotman12 Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kotman12 commented Apr 12, 2026

Description

Solution

Tests

Uh oh!

kotman12 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsmiley Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

kotman12 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

dsmiley Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

rahulgoswami Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulgoswami Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

rahulgoswami Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotman12 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kotman12 commented Apr 12, 2026 •

edited

Loading

rahulgoswami Apr 12, 2026 •

edited

Loading

rahulgoswami Apr 12, 2026 •

edited

Loading