Skip to content

perf(core): use simpler checks for table existence#670

Merged
michael-johnston merged 12 commits intomainfrom
maj_skip_redundant_ddl_checks
Apr 2, 2026
Merged

perf(core): use simpler checks for table existence#670
michael-johnston merged 12 commits intomainfrom
maj_skip_redundant_ddl_checks

Conversation

@michael-johnston
Copy link
Copy Markdown
Member

Previously, SQLSampleStore.init called _create_source_table() unconditionally on every construction. _create_source_table() calls meta.create_all(checkfirst=True), which issues a separate table-existence query for each of the four backing tables even when the tables already exist — which they always do on read-only CLI commands. Locally this cost was ~1.45s. The changes reduce this to 0.24s.

The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally.

The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist.
On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips).
The main impact of this is in the tests.

Previously, SQLSampleStore.__init__ called _create_source_table()
unconditionally on every construction.  _create_source_table() calls
meta.create_all(checkfirst=True), which issues a separate table-existence
query for each of the four backing tables even when
the tables already exist — which they always do on read-only CLI commands.
Locally this cost was ~1.45s. The changes reduce this to 0.24s.

The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally.

 The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist.
  On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips).
  The main impact of this is in the tests.
Copy link
Copy Markdown
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be careful with this: the use of inspect is dialect independent and is the suggested way to check for table existence. Where do you see the 15 round trips? It should be a DESCRIBE of the 4 tables releated to the sqlsource

I'm all for caching the existence of the tables (like we added in SQLResourceStore) but I'd keep what we have in term of inspection.

Comment thread orchestrator/core/samplestore/sql.py Outdated
@michael-johnston
Copy link
Copy Markdown
Member Author

Where do you see the 15 round trips?

That was a typo. Fixed

I'd be careful with this: the use of inspect is dialect independent and is the suggested way to check for table existence.

This is the typical optimisation trade off. If we keep the generic method, inspect, we have to pay the ~1.2 secs for it.

@AlessandroPomponio
Copy link
Copy Markdown
Member

This is the typical optimisation trade off. If we keep the generic method, inspect, we have to pay the ~1.2 secs for it.

By not checking what the table structure is, though, we will not be able to perform migrations down the line

@michael-johnston
Copy link
Copy Markdown
Member Author

michael-johnston commented Mar 11, 2026

By not checking what the table structure is, though, we will not be able to perform migrations down the line

Could we not add it back if necessary?

@AlessandroPomponio
Copy link
Copy Markdown
Member

This should be updated now that #672 has been merged

@DRL-NextGen
Copy link
Copy Markdown
Member

DRL-NextGen commented Mar 18, 2026

No vulnerabilities found.

Comment thread orchestrator/core/samplestore/sql.py Outdated
@DRL-NextGen
Copy link
Copy Markdown
Member

DRL-NextGen commented Mar 24, 2026

No vulnerabilities found.

The _source_tables_verified cache used only the table name as its key,
so two SQLSampleStore instances with the same identifier but pointing to
different databases would incorrectly share a cache entry. The second
store would skip the table-existence check and attempt to query tables
that don't exist in its database.

Fix by keying the cache on (db_url, tablename), consistent with how
_tables_exist_cache is already scoped in SQLStore (sqlstore.py).
Comment thread orchestrator/core/samplestore/sql.py Outdated
Uses fast-path if db is sqlite or mysql or falls back to inspect/has_table if not.
Comment thread orchestrator/metastore/sql/statements.py Outdated
Comment thread orchestrator/metastore/sql/statements.py Outdated
Comment thread orchestrator/metastore/sql/statements.py Outdated
Comment thread orchestrator/metastore/sql/statements.py Outdated
@AlessandroPomponio AlessandroPomponio changed the title feat(perf): skip redundant DDL checks in SQLSampleStore.__init__ perf(core): use simpler checks for table existence Apr 1, 2026
Comment thread orchestrator/metastore/sql/statements.py Outdated
Comment thread orchestrator/metastore/sql/statements.py Outdated
Co-authored-by: Alessandro Pomponio <[email protected]>
Co-authored-by: Michael Johnston <[email protected]>
Signed-off-by: Michael Johnston <[email protected]>
Copy link
Copy Markdown
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make it uniform

Comment thread orchestrator/metastore/sql/statements.py Outdated
Co-authored-by: Alessandro Pomponio <[email protected]>
Signed-off-by: Michael Johnston <[email protected]>
Copy link
Copy Markdown
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@michael-johnston michael-johnston added this pull request to the merge queue Apr 2, 2026
Merged via the queue into main with commit 6113495 Apr 2, 2026
19 checks passed
@michael-johnston michael-johnston deleted the maj_skip_redundant_ddl_checks branch April 2, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants