Fix ground truth for inheritance/MRO benchmarks (Liskov substitution)#14
Open
jaltmayerpizzorno wants to merge 21 commits intosecure-software-engineering:mainfrom
Open
Conversation
…meration; - made test more interesting by substituting <value1> with more than just "int";
…able definition. Corresponds change a40d4db in the templates;
The previous ground truth annotated each method with only its body's return type, ignoring that subclass overrides must have compatible return types per the Liskov substitution principle. When annotated as given, mypy --strict reports override errors on every affected benchmark. The corrected annotations widen parent method return types to include subclass override types, making all benchmarks pass mypy. Affected benchmarks: - classes/inheritance_overriding: MyClass.func str -> int|str - mro/parents_same_superclass: A.func str -> int|str - mro/self_assignment: B.func int -> int|str - mro/two_parents: B.func str -> int|str - mro/two_parents_method_defined: A.func float -> float|str, B.func int -> float|int|str
There was a problem hiding this comment.
Pull request overview
Updates TypeEvalPy micro-benchmark ground-truth annotations for inheritance/MRO cases so method return types respect Liskov substitution (i.e., base method types are widened to accommodate override return types), aligning the benchmarks with what a type-safe program requires under strict type checking.
Changes:
- Widen base-class method return types in inheritance overriding to include subclass override return types.
- Adjust MRO/multiple-inheritance ground truth return types to avoid incompatible override relationships.
- Expand affected
main_gt.jsonentries to represent unions of valid polymorphic return types.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| micro-benchmark/python_features/mro/two_parents_method_defined/main_gt.json | Widens A.func and B.func return types to unions to account for overrides in the hierarchy. |
| micro-benchmark/python_features/mro/two_parents/main_gt.json | Widens B.func return type to include the type introduced via MRO in subclass C. |
| micro-benchmark/python_features/mro/self_assignment/main_gt.json | Widens B.func return type to include the subclass override type. |
| micro-benchmark/python_features/mro/parents_same_superclass/main_gt.json | Widens A.func return type to include the overriding subclass return type. |
| micro-benchmark/python_features/classes/inheritance_overriding/main_gt.json | Widens MyClass.func return type to include the subclass override type. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
micro-benchmark/python_features/mro/two_parents_method_defined/main_gt.json
Outdated
Show resolved
Hide resolved
B.func should not include float in its return type: float comes from A.func, but A is not in B's class hierarchy (they are sibling co-parents of C). The LSP-widened type for B.func is int|str (B's own int plus C's override str), not float|int|str. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! Thanks again for creating and maintaining TypeEvalPy — it has been an invaluable resource for our work evaluating type inference tools.
While running the benchmarks, we noticed that 5 inheritance/MRO ground truth annotations use only each method body's return type, without accounting for the Liskov substitution principle. When annotated as given,
mypy --strictreports incompatible override errors on all of them. Widening the parent method return types to include the subclass override types resolves this and makes the annotations consistent with what a type-safe program requires.Affected benchmarks
classes/inheritance_overridingMyClass.funcstrint|strmro/parents_same_superclassA.funcstrint|strmro/self_assignmentB.funcintint|strmro/two_parentsB.funcstrint|strmro/two_parents_method_definedA.funcfloatfloat|strmro/two_parents_method_definedB.funcintint|strNote:
B.funcintwo_parents_method_definedis widened toint|str(notfloat|int|str), becausefloatcomes fromA.funcandAis not inB's class hierarchy — they are unrelated sibling co-parents ofC. The LSP widening should only include overrides fromB's own subclass chain.We verified with
mypy --strictthat the original annotations produce override errors and the corrected ones pass cleanly.Thanks for considering this!