feat: Tolerances for inner lists and arrays by MariusMerkleQC · Pull Request #21 · Quantco/diffly

Marius Merkle (MariusMerkleQC) · 2026-03-27T08:13:40Z

Motivation

Follow-up to #19, this finally solves #8 to 100%. We so far defaulted to naive comparison for inner lists vs lists, so whenever they were nested within some other data structure (like an array of lists, a struct of struct of lists, etc.). Element-wise comparison accounting for tolerances is now applied instead: whenever two columns contain a list anywhere in their data type "tree", we compute the maximum length of the lists, where maximum is both over
(1) left and right data frame
(2) on any level in the data type tree

In list vs list comparisons, we then traverse all elements up to max_list_length and cover out-of-bounds by returning None. This doesn't yield false positive matches as we combine the element-wise check with a list-length check.

Changes

Adjusted _max_list_lengths_by_column to consider all data type levels
Adjusted expected outcome in test_condition_equal_columns_nested_list_array_with_tolerance
Added a new test test_condition_equal_columns_lists_only_inner where lists are not an outer but inner data type

codecov · 2026-03-27T08:13:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2a3010b) to head (d2097c9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #21   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          742       758   +16     
=========================================
+ Hits           742       758   +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tests/test_conditions.py

diffly/comparison.py

tests/test_conditions.py

Copilot

Pull request overview

Extends tolerance-based element-wise comparisons to apply to inner list levels nested inside other types (e.g., structs/arrays containing lists) by computing a per-column maximum list length across the full dtype “tree”.

Changes:

Update _max_list_lengths_by_column to detect lists at any nesting level and compute the maximum list length across both frames and all list levels.
Adjust list-vs-list sequence comparison to always unroll element-wise using the computed max_list_length (and propagate it into recursive comparisons).
Update/extend tests to cover nested inner-list tolerance behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`diffly/comparison.py`	Computes max list lengths for columns with nested lists anywhere in their dtype tree.
`diffly/_conditions.py`	Removes nested-list fallback equality and requires `max_list_length` for List-vs-List unrolling; propagates length to recursive comparisons.
`tests/test_conditions.py`	Updates expected results and adds a new test for inner-list-only nesting (struct containing list).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

diffly/_conditions.py

tests/test_conditions.py

Oliver Borchert (borchero)

I actually didn't consider this previously but comparing all list elements independently will cause severe performance regressions when running the comparison on (large) lists with non-floats. Can we benchmark this and use all of this complicated logic only if there is a float somewhere in the "list hierarchy" (-> separate PR)?

diffly/_conditions.py

Marius Merkle (MariusMerkleQC) · 2026-03-27T22:28:55Z

Can we benchmark this

-> #25

Marius Merkle (MariusMerkleQC) · 2026-03-27T22:45:02Z

use all of this complicated logic only if there is a float somewhere in the "list hierarchy"

-> #26

feat: Tolerances for inner lists and arrays

2ad8877

Marius Merkle (MariusMerkleQC) self-assigned this Mar 27, 2026

github-actions bot added the enhancement New feature or request label Mar 27, 2026

Marius Merkle (MariusMerkleQC) added 2 commits March 27, 2026 09:17

fix

ab746e9

remove _max_or_zero

abb8709

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

View reviewed changes

tests/test_conditions.py Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

View reviewed changes

diffly/comparison.py Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

View reviewed changes

tests/test_conditions.py Outdated Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

test: Combine tests with _max_list_lenghts_by_column #23

Merged

Marius Merkle (MariusMerkleQC) requested a review from Copilot March 27, 2026 08:54

Copilot started reviewing on behalf of Marius Merkle (MariusMerkleQC) March 27, 2026 08:55 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

diffly/_conditions.py Outdated Show resolved Hide resolved

tests/test_conditions.py Outdated Show resolved Hide resolved

tests/test_conditions.py Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) added 2 commits March 27, 2026 10:04

feedback copilot

8afddd2

fix test coverage

3d5467a

Marius Merkle (MariusMerkleQC) marked this pull request as ready for review March 27, 2026 09:10

Marius Merkle (MariusMerkleQC) requested review from EgeKaraismailogluQC and Oliver Borchert (borchero) as code owners March 27, 2026 09:10

Marius Merkle (MariusMerkleQC) linked an issue Mar 27, 2026 that may be closed by this pull request

Properly perform floating point comparisons for structs and lists #8

Closed

Oliver Borchert (borchero) approved these changes Mar 27, 2026

View reviewed changes

diffly/_conditions.py Outdated Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) and others added 2 commits March 27, 2026 23:06

test: Combine tests with _max_list_lenghts_by_column (#23)

36fc349

feedback OB

d2097c9

Marius Merkle (MariusMerkleQC) merged commit cccfad4 into main Mar 27, 2026
17 checks passed

Marius Merkle (MariusMerkleQC) deleted the nested_list_comparison branch March 27, 2026 22:10

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

test: Benchmark slowdown of element-wise list comparison #25

Open

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

perf: Element-wise comparison only for tolerance-requiring data types #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Tolerances for inner lists and arrays#21

feat: Tolerances for inner lists and arrays#21
Marius Merkle (MariusMerkleQC) merged 7 commits intomainfrom
nested_list_comparison

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oliver Borchert (borchero) left a comment

Uh oh!

Uh oh!

Uh oh!

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

codecov bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oliver Borchert (borchero) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

codecov bot commented Mar 27, 2026 •

edited

Loading