Skip to content

Validate SQLModel on all Non-Database Sourced Data#1823

Open
ClayGendron wants to merge 2 commits intofastapi:mainfrom
ClayGendron:validate-table-models-on-construction
Open

Validate SQLModel on all Non-Database Sourced Data#1823
ClayGendron wants to merge 2 commits intofastapi:mainfrom
ClayGendron:validate-table-models-on-construction

Conversation

@ClayGendron
Copy link
Copy Markdown

As noted by other issues and pull requests, when setting table=True in a SQLModel, Pydantic validation does not run, and this breaks the contract that "a SQLModel model is also a Pydantic model". This PR builds on prior ones and also hopes to address concerns with changing the intentional validation bypass for table models.

First, for those new to this issue, here is an example of how SQLModels behave differently when they are a table:

from pydantic import BaseModel, ValidationError
from sqlmodel import SQLModel, Field


class HeroBase(BaseModel):
    id: int | None = Field(default=None, primary_key=True)
    name: str
    age: int


class HeroSQLBase(SQLModel):
    id: int | None = Field(default=None, primary_key=True)
    name: str
    age: int


class HeroSQLTable(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    name: str
    age: int


try:
    HeroBase(name="Deadpond", age="not an int")
    print("HeroBase: created with invalid data!")
except ValidationError:
    print("HeroBase: ValidationError raised")

try:
    HeroSQLBase(name="Deadpond", age="not an int")
    print("HeroSQLBase: created with invalid data!")
except ValidationError:
    print("HeroSQLBase: ValidationError raised")

try:
    HeroSQLTable(name="Deadpond", age="not an int")
    print("HeroSQLTable: created with invalid data!")
except ValidationError:
    print("HeroSQLTable: ValidationError raised")
HeroBase: ValidationError raised
HeroSQLBase: ValidationError raised
HeroSQLTable: created with invalid data!

The same is true for @field_validator and @model_validator. Both will be silently ignored when table=True.

Use Case

I am building a library that includes a base SQLModel class (without table=True) that has validators to normalize and populate data. Downstream developers using my library would then create their own table=True class that inherits from the base:

# ---- library code ----
import hashlib
import posixpath
import uuid

from pydantic import field_validator, model_validator
from sqlmodel import SQLModel, Field


class DocumentBase(SQLModel):
    id: str = Field(
        default_factory=lambda: str(uuid.uuid4()), max_length=256, primary_key=True
    )
    path: str
    content: str
    content_hash: str = ""

    @field_validator("path")
    @classmethod
    def normalize_path(cls, v: str) -> str:
        if not v.startswith("/"):
            v = "/" + v
        return posixpath.normpath(v)

    @model_validator(mode="after")
    def compute_content_hash(self) -> "DocumentBase":
        self.content_hash = hashlib.sha256(self.content.encode()).hexdigest()
        return self


# ---- downstream developer code ----
class Document(DocumentBase, table=True):
    project_field: str | None = None

The base class works correctly on its own, but it can't hold the downstream project_field:

>>> Document(path="not_normalized/../path", content="hello", project_field="important info!")
DocumentBase(
    path='/path',                   # normalized
    content='hello',
    content_hash='2cf24db...',      # computed
  	# project_field missing!
)

But when a downstream developer inherits with table=True, both validators are silently skipped:

>>> Document(path="not_normalized/../path", content="hello", project_field="important info!")
Document(
    path='not_normalized/../path',  # not normalized!
    content='hello',
    content_hash='',                # not computed!
    project_field='important info!',
)

My library must rely on validation from the custom defined inherited class, but it will not work out of the box. My issue could be resolved with the solution described in the multiple models doc, but that approach would mean I would be asking developers to create twice as many classes, one for validation and one for table mapping. Using SQLModel was chosen for this project as it promised to provide a unified model between Pydantic and SQLAlchemy, which in my case, means any initialized model derived from DocumentBase is valid, both in python and in the database.

The Change

The change is in sqlmodel_init() in _compat.py. Previously, table models called sqlmodel_table_construct() which skips validation entirely. Now, all models go through validate_python(), and table models do a post-validation step to re-trigger SQLAlchemy instrumentation via setattr:

def sqlmodel_init(*, self: "SQLModel", data: dict[str, Any]) -> None:
    old_dict = self.__dict__.copy()
    self.__pydantic_validator__.validate_python(
        data,
        self_instance=self,
    )
    if not is_table_model_class(self.__class__):
        object.__setattr__(
            self,
            "__dict__",
            {**old_dict, **self.__dict__},
        )
    else:
        fields_set = self.__pydantic_fields_set__.copy()
        for key, value in {**old_dict, **self.__dict__}.items():
            setattr(self, key, value)
        object.__setattr__(self, "__pydantic_fields_set__", fields_set)
        for key in self.__sqlmodel_relationships__:
            value = data.get(key, Undefined)
            if value is not Undefined:
                setattr(self, key, value)

This mirrors the existing pattern used by sqlmodel_validate() (the model_validate() path) which already validates table models successfully.

Addressing Prior Concerns

"SQLAlchemy needs to assign values after instantiation" (#52)

The concern raised in #52 was that relationships need to be assignable after construction, so validation can't run on __init__.

Relationships are not part of model_fields — they live in __sqlmodel_relationships__ and are handled separately, outside of Pydantic validation. validate_python() never sees or validates relationship attributes. Both sides of a bidirectional relationship can be created independently, exactly as before:

from sqlmodel import SQLModel, Field, Relationship, Session, create_engine


class Team(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    name: str
    heroes: list["Hero"] = Relationship(back_populates="team")


class Hero(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    name: str
    team_id: int | None = Field(default=None, foreign_key="team.id")
    team: Team | None = Relationship(back_populates="heroes")


# Create each side independently — no relationship passed
team = Team(name="Preventers")
hero = Hero(name="Deadpond")

# Assign relationship after construction
hero.team = team

engine = create_engine("sqlite:///:memory:")
SQLModel.metadata.create_all(engine)

with Session(engine) as session:
    session.add(hero)
    session.commit()
    session.refresh(hero)
    print(f"{hero.name}'s team: {hero.team.name}")
Deadpond's team: Preventers

Performance on ORM Reads

Validation does not run when loading from the database. SQLAlchemy does not call __init__ when hydrating instances from query results (SQLAlchemy docs: Constructors and Object Initialization). This is unchanged, as it is safe to assume that data loaded from the database is valid.

To verify, here is a test that writes invalid data directly to the database and confirms it loads without triggering validation:

from pydantic import field_validator
from sqlmodel import SQLModel, Field, Session, create_engine, select


class Hero(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    name: str

    @field_validator("name")
    @classmethod
    def name_must_be_short(cls, v: str) -> str:
        if len(v) > 5:
            raise ValueError("too long")
        return v


engine = create_engine("sqlite:///:memory:")
SQLModel.metadata.create_all(engine)

# Insert valid data through the model
with Session(engine) as session:
    session.add(Hero(name="short"))
    session.commit()

# Write invalid data directly to the database, bypassing the model
with engine.connect() as conn:
    conn.execute(
        Hero.__table__.update()
        .where(Hero.__table__.c.id == 1)
        .values(name="this is way too long")
    )
    conn.commit()

# Load from database — no validation runs, invalid data loads fine
with Session(engine) as session:
    loaded = session.exec(select(Hero)).first()
    print(f"Loaded from DB: {loaded.name!r}")
Loaded from DB: 'this is way too long'

Breaking Change

This could represent a behavior change for code that previously constructed table=True models with invalid data.

Related Issues and PRs

  • #52 — SQLModel doesn't raise ValidationError
  • #453 — Why does a SQLModel class with table=True not validate data?
  • #134 — Pydantic Validators does not raise ValueError if conditions are not met
  • #1041 — Ensure that type checks are executed when setting table=True
  • #227 — Class Initialisation Validation Kwarg

Thank you for reviewing!

Enable Pydantic validation on table=True model __init__, matching the
existing behavior of model_validate(). Validation does not run on ORM
loads from the database — SQLAlchemy does not call __init__ when
hydrating from query results.
Copy link
Copy Markdown

@mahdirajaee mahdirajaee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant behavioral change — table models now run Pydantic validation on __init__ instead of bypassing it via sqlmodel_table_construct. The key insight is that validate_python(data, self_instance=self) is called unconditionally for both table and non-table models now, which means field validators, model validators, and type coercion all fire during construction. The test suite additions are excellent and cover the critical scenarios: field_validator, model_validator (both before/after modes), BeforeValidator via Annotated, and crucially test_validation_does_not_run_on_orm_load which verifies that loading from the database still bypasses validation (important for not breaking existing data that might not pass newer validators).

The removal of sqlmodel_table_construct is the right call since it was essentially a copy of Pydantic's model_construct() that skipped validation — which was the original bug. One thing to watch: this is a breaking change for users who relied on being able to instantiate table models with missing required fields (the test_not_allow_instantiation_without_arguments test makes this explicit). The old test test_allow_instantiation_without_arguments previously passed with Item() where name: str had no default — that will now raise ValidationError. Projects doing item = Item(); item.name = "..." will need to migrate to Item(name="...").

The sqlmodel_init branching for table vs non-table models post-validation is sensible: non-table models merge dicts directly, while table models use setattr to trigger SQLAlchemy instrumentation and manually restore __pydantic_fields_set__. Worth confirming that the fields_set save/restore around the setattr loop doesn't cause issues if a setattr triggers a SQLAlchemy event that modifies __pydantic_fields_set__ internally, though that seems unlikely in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants