Fix ground truth for inheritance/MRO benchmarks (Liskov substitution) by jaltmayerpizzorno · Pull Request #14 · secure-software-engineering/TypeEvalPy

jaltmayerpizzorno · 2026-03-13T18:37:04Z

Hi! Thanks again for creating and maintaining TypeEvalPy — it has been an invaluable resource for our work evaluating type inference tools.

While running the benchmarks, we noticed that 5 inheritance/MRO ground truth annotations use only each method body's return type, without accounting for the Liskov substitution principle. When annotated as given, mypy --strict reports incompatible override errors on all of them. Widening the parent method return types to include the subclass override types resolves this and makes the annotations consistent with what a type-safe program requires.

Affected benchmarks

Benchmark	Function	Before	After
`classes/inheritance_overriding`	`MyClass.func`	`str`	`int\|str`
`mro/parents_same_superclass`	`A.func`	`str`	`int\|str`
`mro/self_assignment`	`B.func`	`int`	`int\|str`
`mro/two_parents`	`B.func`	`str`	`int\|str`
`mro/two_parents_method_defined`	`A.func`	`float`	`float\|str`
`mro/two_parents_method_defined`	`B.func`	`int`	`int\|str`

Note: B.func in two_parents_method_defined is widened to int|str (not float|int|str), because float comes from A.func and A is not in B's class hierarchy — they are unrelated sibling co-parents of C. The LSP widening should only include overrides from B's own subclass chain.

We verified with mypy --strict that the original annotations produce override errors and the corrected ones pass cleanly.

Thanks for considering this!

…able definition;

…meration; - made test more interesting by substituting <value1> with more than just "int";

…ns-2

…able definition. Corresponds change a40d4db in the templates;

The previous ground truth annotated each method with only its body's return type, ignoring that subclass overrides must have compatible return types per the Liskov substitution principle. When annotated as given, mypy --strict reports override errors on every affected benchmark. The corrected annotations widen parent method return types to include subclass override types, making all benchmarks pass mypy. Affected benchmarks: - classes/inheritance_overriding: MyClass.func str -> int|str - mro/parents_same_superclass: A.func str -> int|str - mro/self_assignment: B.func int -> int|str - mro/two_parents: B.func str -> int|str - mro/two_parents_method_defined: A.func float -> float|str, B.func int -> float|int|str

Copilot

Pull request overview

Updates TypeEvalPy micro-benchmark ground-truth annotations for inheritance/MRO cases so method return types respect Liskov substitution (i.e., base method types are widened to accommodate override return types), aligning the benchmarks with what a type-safe program requires under strict type checking.

Changes:

Widen base-class method return types in inheritance overriding to include subclass override return types.
Adjust MRO/multiple-inheritance ground truth return types to avoid incompatible override relationships.
Expand affected main_gt.json entries to represent unions of valid polymorphic return types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
micro-benchmark/python_features/mro/two_parents_method_defined/main_gt.json	Widens `A.func` and `B.func` return types to unions to account for overrides in the hierarchy.
micro-benchmark/python_features/mro/two_parents/main_gt.json	Widens `B.func` return type to include the type introduced via MRO in subclass `C`.
micro-benchmark/python_features/mro/self_assignment/main_gt.json	Widens `B.func` return type to include the subclass override type.
micro-benchmark/python_features/mro/parents_same_superclass/main_gt.json	Widens `A.func` return type to include the overriding subclass return type.
micro-benchmark/python_features/classes/inheritance_overriding/main_gt.json	Widens `MyClass.func` return type to include the subclass override type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

micro-benchmark/python_features/mro/two_parents_method_defined/main_gt.json

B.func should not include float in its return type: float comes from A.func, but A is not in B's class hierarchy (they are sibling co-parents of C). The LSP-widened type for B.func is int|str (B's own int plus C's override str), not float|int|str. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

jaltmayerpizzorno added 20 commits October 12, 2025 13:26

Merge branch '202510-righttyper-support'

934de26

Merge branch '202510-corrections'

f35ece8

Merge branch '202510-corrections'

0ce1258

- fixed coordinates;

35fa775

- fixed types not being parametrized;

e6da004

- fixed missing "function" context;

81f2ce8

- fixed coordinate;

09bdce5

- fixed incorrect function context;

03c842f

- fixed type not being parametrized;

4959d1c

- deleted entry from ground truth, as it describes a call, not a vari…

a40d4db

…able definition;

- fixed invalid dictionary keys in subscript expressions;

378745a

- fixed type, which wasn't parametrized;

3991208

- fixed index out of range error due to <value1> interfering with enu…

e010ba9

…meration; - made test more interesting by substituting <value1> with more than just "int";

Merge branch 'secure-software-engineering:main' into main

1c03981

Merge branch 'secure-software-engineering:main' into 202510-correctio…

348f225

…ns-2

Merge branch '202510-corrections-2'

f167d80

- deleted entry from ground truth, as it describes a call, not a vari…

00311c4

…able definition. Corresponds change a40d4db in the templates;

Merge branch '202510-corrections-2'

bdd2c78

Merge branch 'secure-software-engineering:main' into main

10ab41a

ashwinprasadme requested a review from Copilot March 24, 2026 14:24

Copilot started reviewing on behalf of ashwinprasadme March 24, 2026 14:25 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

micro-benchmark/python_features/mro/two_parents_method_defined/main_gt.json Outdated Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ground truth for inheritance/MRO benchmarks (Liskov substitution)#14

Fix ground truth for inheritance/MRO benchmarks (Liskov substitution)#14
jaltmayerpizzorno wants to merge 21 commits intosecure-software-engineering:mainfrom
plasma-umass:fix-inheritance-ground-truth

jaltmayerpizzorno commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jaltmayerpizzorno commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Affected benchmarks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaltmayerpizzorno commented Mar 13, 2026 •

edited

Loading