Validation result is sometimes incorrect when using group rules

In yesterday's PyData meetup in Zurich, one question prompted me to realize that we're incorrectly dealing with group rules and row-level rules: if a row-level rule removes a row which would make a group rule fail, we do not realize it. For example:


```python
import dataframely as dy
import polars as pl

class DiagnosisSchema(dy.Schema):
    invoice_id = dy.String(primary_key=True)
    diagnosis = dy.String(primary_key=True, regex="^[A-Z]{3}$")
    is_main = dy.Bool(nullable=False)

    @dy.rule()
    def exactly_one_main_diagnosis() -> pl.Expr:
        return pl.col("is_main").sum() == 1

df = pl.DataFrame(
    {
        "invoice_id": ["A", "A", "A"],
        "diagnosis": ["ABC", "ABD", "123"],
        "is_main": [False, False, True],
    }
)
good, _ = DiagnosisSchema.filter(df)
print(good)
```

results in

```
shape: (2, 3)
┌────────────┬───────────┬─────────┐
│ invoice_id ┆ diagnosis ┆ is_main │
│ ---        ┆ ---       ┆ ---     │
│ str        ┆ str       ┆ bool    │
╞════════════╪═══════════╪═════════╡
│ A          ┆ ABC       ┆ false   │
│ A          ┆ ABD       ┆ false   │
└────────────┴───────────┴─────────┘
```

which clearly violates the schema since we don't have a main diagnosis for the group.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation result is sometimes incorrect when using group rules #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validation result is sometimes incorrect when using group rules #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions