Misc minor optimizations to query optimizer performance#21128
Misc minor optimizations to query optimizer performance#21128AdamGS wants to merge 5 commits intoapache:mainfrom
Conversation
d71431d to
eec93a2
Compare
| true | ||
| } | ||
|
|
||
| #[expect(clippy::only_used_in_recursion)] |
There was a problem hiding this comment.
this is just wasteful, just a lint away.
There was a problem hiding this comment.
As in you plan to remove it as a follow on PR?
|
|
||
| fn analyze(&self, plan: LogicalPlan, config: &ConfigOptions) -> Result<LogicalPlan> { | ||
| let empty_schema = DFSchema::empty(); | ||
| static EMPTY_SCHEMA: LazyLock<DFSchema> = LazyLock::new(DFSchema::empty); |
There was a problem hiding this comment.
empty DFSchema isn't free, similar to #20534
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/optimizer-memory-optimizations (532b74e) to 2b986c8 (merge-base) diff File an issue against this benchmark runner |
| // The dummy column name is unused and doesn't matter as only | ||
| // expressions without column references can be evaluated | ||
| static DUMMY_COL_NAME: &str = "."; | ||
| let schema = Arc::new(Schema::new(vec![Field::new( |
There was a problem hiding this comment.
this is nice to save several callocations for each call to the Const evaluator 👍
| true | ||
| } | ||
|
|
||
| #[expect(clippy::only_used_in_recursion)] |
There was a problem hiding this comment.
As in you plan to remove it as a follow on PR?
| plan: LogicalPlan, | ||
| config: &dyn OptimizerConfig, | ||
| ) -> Result<Transformed<LogicalPlan>> { | ||
| let _ = config.options(); |
There was a problem hiding this comment.
to answer your question above, I think the lint has to stay here because this seems worse, and as far as I can tell in this rule the config is just passed along recursively
| fn extract_or_clauses_for_join<'a>( | ||
| filters: &'a [Expr], | ||
| schema: &'a DFSchema, | ||
| schema_cols: &'a HashSet<Column>, |
There was a problem hiding this comment.
Do we need to have owned Columns?
Maybe this could be something like &HashSet<&Column> and avoid copying strings too
There was a problem hiding this comment.
I don't think it'll work with &Column here, but I do think its possible to avoid the String allocation and everything here seems like internal functions, I'll try it out
There was a problem hiding this comment.
I pushed a change, it widen the scope by a bit but basically just introduces a type that holds two references and passes it around, everything is contained within the module.
b78496f to
c26da85
Compare
c26da85 to
9b9e4f5
Compare
|
I think the benchmarks results got lost? |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/optimizer-memory-optimizations (9b9e4f5) to 2b986c8 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/optimizer-memory-optimizations (9b9e4f5) to 2b986c8 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
## Which issue does this PR close? - Closes #. ## Rationale for this change Similar to #21128, just trying to shave time off the optimizer. Locally, it improves some sql-planner benchmarks by up to 10% but they seem relatively noisy on my laptop. ## What changes are included in this PR? 1. Avoid allocation `plan.children()` in a loop in `sort_pushdown.rs`. 2. Try and avoid some expensive tree rewrites in `join_selection.rs` 3. Avoid deep clones of exec limit nodes in `limit_pushdown.rs`, and only mutate the plan if it was actually changed. 4. Use cheaper code path to change the limit on an `AggregateExec` in `limited_distinct_aggregation.rs`. 5. Use a read-only traversal in `sanity_checker.rs`. Its read only and `transform_up` is always more expensive. I've considered extending `TreeNode` but this seems to be basically the only place in the codebase that does something like this. There are a few places where we unconditionally return `Transformed::yes` which might unintended downstream consequences because it breaks pointer equality and I think it also just end up allocating more memory, but they are harder to untangle so I'll try and do them in followups. ## Are these changes tested? One new test for limits, otherwise the existing tests. ## Are there any user-facing changes? Removes the `LimitExec` type, I can't imagine why someone would use it, and its only used in one place. Happy to bring it back as a deprecated type. --------- Signed-off-by: Adam Gutglick <[email protected]>
|
🤔 some of the logical plan changes seem to get slower repeatably. I will rerun to see if I can reproduce |
|
run benchmark sql_planner |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/optimizer-memory-optimizations (e53a07f) to 5ff80e4 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/optimizer-memory-optimizations (e53a07f) to 5ff80e4 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
Inspired by @blaginin, trying to find more places that might drag the optimizer's performance. On my laptop , this improves many of the sql planner's benchmarks by a fairly consistent 2-5%.
What changes are included in this PR?
A slew of minor optimization in the logical planner, trying to avoid wasted work or repeated allocations
Are these changes tested?
Existing tests.
Are there any user-facing changes?
None