perf(core): use simpler checks for table existence#670
perf(core): use simpler checks for table existence#670michael-johnston merged 12 commits intomainfrom
Conversation
Previously, SQLSampleStore.__init__ called _create_source_table() unconditionally on every construction. _create_source_table() calls meta.create_all(checkfirst=True), which issues a separate table-existence query for each of the four backing tables even when the tables already exist — which they always do on read-only CLI commands. Locally this cost was ~1.45s. The changes reduce this to 0.24s. The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally. The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist. On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips). The main impact of this is in the tests.
AlessandroPomponio
left a comment
There was a problem hiding this comment.
I'd be careful with this: the use of inspect is dialect independent and is the suggested way to check for table existence. Where do you see the 15 round trips? It should be a DESCRIBE of the 4 tables releated to the sqlsource
I'm all for caching the existence of the tables (like we added in SQLResourceStore) but I'd keep what we have in term of inspection.
Signed-off-by: Michael Johnston <[email protected]>
That was a typo. Fixed
This is the typical optimisation trade off. If we keep the generic method, inspect, we have to pay the ~1.2 secs for it. |
By not checking what the table structure is, though, we will not be able to perform migrations down the line |
Could we not add it back if necessary? |
|
This should be updated now that #672 has been merged |
|
No vulnerabilities found. |
|
No vulnerabilities found. |
The _source_tables_verified cache used only the table name as its key, so two SQLSampleStore instances with the same identifier but pointing to different databases would incorrectly share a cache entry. The second store would skip the table-existence check and attempt to query tables that don't exist in its database. Fix by keying the cache on (db_url, tablename), consistent with how _tables_exist_cache is already scoped in SQLStore (sqlstore.py).
Uses fast-path if db is sqlite or mysql or falls back to inspect/has_table if not.
Co-authored-by: Alessandro Pomponio <[email protected]> Co-authored-by: Michael Johnston <[email protected]> Signed-off-by: Michael Johnston <[email protected]>
AlessandroPomponio
left a comment
There was a problem hiding this comment.
just to make it uniform
Co-authored-by: Alessandro Pomponio <[email protected]> Signed-off-by: Michael Johnston <[email protected]>
Previously, SQLSampleStore.init called _create_source_table() unconditionally on every construction. _create_source_table() calls meta.create_all(checkfirst=True), which issues a separate table-existence query for each of the four backing tables even when the tables already exist — which they always do on read-only CLI commands. Locally this cost was ~1.45s. The changes reduce this to 0.24s.
The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally.
The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist.
On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips).
The main impact of this is in the tests.