Skip to content

feat: Tolerances for inner lists and arrays#21

Merged
Marius Merkle (MariusMerkleQC) merged 7 commits intomainfrom
nested_list_comparison
Mar 27, 2026
Merged

feat: Tolerances for inner lists and arrays#21
Marius Merkle (MariusMerkleQC) merged 7 commits intomainfrom
nested_list_comparison

Conversation

@MariusMerkleQC
Copy link
Copy Markdown
Collaborator

@MariusMerkleQC Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Motivation

Follow-up to #19, this finally solves #8 to 100%. We so far defaulted to naive comparison for inner lists vs lists, so whenever they were nested within some other data structure (like an array of lists, a struct of struct of lists, etc.). Element-wise comparison accounting for tolerances is now applied instead: whenever two columns contain a list anywhere in their data type "tree", we compute the maximum length of the lists, where maximum is both over
(1) left and right data frame
(2) on any level in the data type tree

In list vs list comparisons, we then traverse all elements up to max_list_length and cover out-of-bounds by returning None. This doesn't yield false positive matches as we combine the element-wise check with a list-length check.

Changes

  • Adjusted _max_list_lengths_by_column to consider all data type levels
  • Adjusted expected outcome in test_condition_equal_columns_nested_list_array_with_tolerance
  • Added a new test test_condition_equal_columns_lists_only_inner where lists are not an outer but inner data type

@github-actions github-actions bot added the enhancement New feature or request label Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2a3010b) to head (d2097c9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #21   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          742       758   +16     
=========================================
+ Hits           742       758   +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends tolerance-based element-wise comparisons to apply to inner list levels nested inside other types (e.g., structs/arrays containing lists) by computing a per-column maximum list length across the full dtype “tree”.

Changes:

  • Update _max_list_lengths_by_column to detect lists at any nesting level and compute the maximum list length across both frames and all list levels.
  • Adjust list-vs-list sequence comparison to always unroll element-wise using the computed max_list_length (and propagate it into recursive comparisons).
  • Update/extend tests to cover nested inner-list tolerance behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
diffly/comparison.py Computes max list lengths for columns with nested lists anywhere in their dtype tree.
diffly/_conditions.py Removes nested-list fallback equality and requires max_list_length for List-vs-List unrolling; propagates length to recursive comparisons.
tests/test_conditions.py Updates expected results and adds a new test for inner-list-only nesting (struct containing list).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't consider this previously but comparing all list elements independently will cause severe performance regressions when running the comparison on (large) lists with non-floats. Can we benchmark this and use all of this complicated logic only if there is a float somewhere in the "list hierarchy" (-> separate PR)?

@MariusMerkleQC Marius Merkle (MariusMerkleQC) merged commit cccfad4 into main Mar 27, 2026
17 checks passed
@MariusMerkleQC Marius Merkle (MariusMerkleQC) deleted the nested_list_comparison branch March 27, 2026 22:10
@MariusMerkleQC
Copy link
Copy Markdown
Collaborator Author

Can we benchmark this

-> #25

@MariusMerkleQC
Copy link
Copy Markdown
Collaborator Author

use all of this complicated logic only if there is a float somewhere in the "list hierarchy"

-> #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Properly perform floating point comparisons for structs and lists

3 participants