Skip to content

perf: Implement physical execution of uncorrelated scalar subqueries#21240

Open
neilconway wants to merge 17 commits intoapache:mainfrom
neilconway:neilc/scalar-subquery-expr
Open

perf: Implement physical execution of uncorrelated scalar subqueries#21240
neilconway wants to merge 17 commits intoapache:mainfrom
neilconway:neilc/scalar-subquery-expr

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Mar 29, 2026

Which issue does this PR close?

Rationale for this change

Previously, DataFusion evaluated uncorrelated scalar subqueries by transforming them into joins. This has two shortcomings:

  1. Scalar subqueries that return > 1 row were allowed, producing incorrect query results. Such queries should instead result in a runtime error.
  2. Performance. Evaluating scalar subqueries as a join requires going through the join machinery. More importantly, it means that UDFs that have special-cases for scalar inputs cannot use those code paths for scalar subqueries, which often results in significantly slower query execution.

This PR introduces physical execution of uncorrelated scalar subqueries:

  • Uncorrelated subqueries are left in the plan by the optimizer, not rewritten into joins
  • The physical planner collects uncorrelated scalar subqueries and plans them recursively (supporting nested subqueries). We add a ScalarSubqueryExec plan node to the top of any physical plan with uncorrelated subqueries: it has N+1 children, N subqueries and its "main" input, which is the rest of the query plan. The subquery expression in the parent plan is replaced with a ScalarSubqueryExpr.
  • ScalarSubqueryExec manages the execution of the subqueries and stores the result in a shared "results container", which is an Arc<Vec<OnceLock<ScalarValue>>>. At present, subquery evaluation is done sequentially and not overlapped with evaluation of the parent query.
  • When ScalarSubqueryExpr is evaluated, it fetches the result of the subquery from the result container.

This architecture makes it easy to avoid the two shortcomings described above. Performance seems roughly unchanged (benchmarks added in this PR), but in situations like #18181, we can now leverage scalar fast-paths; in the case of #18181 specifically, this improves performance from ~800 ms to ~30 ms.

What changes are included in this PR?

  • Benchmarks
  • Modify subquery rewriter to not transform subqueries -> joins
  • Collect and plan uncorrelated scalar subqueries in the physical planner, and wire up ScalarSubqueryExpr
  • Support for subqueries in physical plan serialization/deserialization using PhysicalProtoConverterExtension to wire up ScalarSubqueryExpr correctly
  • Support for subqueries in logical plan serialization/deserialization
  • Add various SLT tests and update expected plan shapes for some tests

Are these changes tested?

Yes.

Are there any user-facing changes?

At the SQL-level, scalar subqueries that returned > 1 row will now be rejected instead of producing incorrect query results.

At the API-level, this PR adds several new public APIs (e.g., ScalarSubqueryExpr, ScalarSubqueryExec) and makes breaking changes to several public APIs (e.g., parse_expr). It also introduces a new physical plan node (and allows Subquery to remain in logical plans); third-party query optimization code will encounter these nodes when they wouldn't have before.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate physical-plan Changes to the physical-plan crate labels Mar 29, 2026
pub struct DefaultPhysicalProtoConverter;
#[derive(Default)]
pub struct DefaultPhysicalProtoConverter {
scalar_subquery_results: RefCell<Option<ScalarSubqueryResults>>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the serialization/deserialization code well; would love feedback on whether this is the right way to do this.

Comment on lines +73 to +77
/// TODO: Consider overlapping computation of the subqueries with evaluating the
/// main query.
///
/// TODO: Subqueries are evaluated sequentially. Consider parallel evaluation in
/// the future.
Copy link
Copy Markdown
Contributor Author

@neilconway neilconway Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to address these TODOs now or in a followup PR, if folks have opinions on the best way to do this.

Comment on lines +443 to +463
// Create the shared results container and register it (along with
// the index map) in ExecutionProps so that `create_physical_expr`
// can resolve `Expr::ScalarSubquery` into `ScalarSubqueryExpr`
// nodes. We clone the SessionState so these are available
// throughout physical planning without mutating the caller's state.
//
// Ideally, the subquery state would live in a dedicated planning
// context rather than on ExecutionProps (which is meant for
// session-level configuration). It's here because
// `create_physical_expr` only receives `&ExecutionProps`, and
// changing that signature would be a breaking public API change.
let results: Arc<Vec<OnceLock<ScalarValue>>> =
Arc::new((0..links.len()).map(|_| OnceLock::new()).collect());
let session_state = if links.is_empty() {
Cow::Borrowed(session_state)
} else {
let mut owned = session_state.clone();
owned.execution_props_mut().subquery_indexes = index_map;
owned.execution_props_mut().subquery_results = Arc::clone(&results);
Cow::Owned(owned)
};
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed a bit kludgy but I couldn't think of a better way to do it; feedback/suggestions welcome.

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Mar 30, 2026
@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-606-pw9cn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-607-zdt8z 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156823048-608-fgcr6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 45.05 / 45.95 ±0.92 / 47.44 ms │ 45.56 / 45.98 ±0.75 / 47.47 ms │    no change │
│ QQuery 2  │ 21.19 / 21.38 ±0.21 / 21.66 ms │ 21.35 / 21.56 ±0.19 / 21.91 ms │    no change │
│ QQuery 3  │ 31.73 / 32.19 ±0.50 / 33.11 ms │ 31.92 / 32.43 ±0.31 / 32.80 ms │    no change │
│ QQuery 4  │ 20.46 / 21.29 ±0.60 / 22.11 ms │ 20.37 / 21.26 ±0.79 / 22.23 ms │    no change │
│ QQuery 5  │ 48.69 / 50.41 ±1.17 / 51.92 ms │ 48.36 / 49.77 ±1.68 / 52.96 ms │    no change │
│ QQuery 6  │ 17.02 / 17.19 ±0.14 / 17.45 ms │ 17.25 / 18.05 ±1.00 / 19.84 ms │    no change │
│ QQuery 7  │ 53.55 / 54.54 ±0.56 / 55.18 ms │ 54.03 / 54.80 ±0.71 / 55.93 ms │    no change │
│ QQuery 8  │ 47.88 / 48.53 ±0.50 / 49.43 ms │ 48.31 / 49.01 ±1.02 / 51.03 ms │    no change │
│ QQuery 9  │ 54.63 / 55.50 ±0.78 / 56.86 ms │ 54.33 / 55.42 ±0.91 / 56.60 ms │    no change │
│ QQuery 10 │ 71.18 / 71.61 ±0.39 / 72.33 ms │ 69.97 / 70.95 ±0.65 / 71.66 ms │    no change │
│ QQuery 11 │ 13.76 / 14.07 ±0.24 / 14.45 ms │ 34.60 / 35.26 ±0.51 / 36.02 ms │ 2.51x slower │
│ QQuery 12 │ 27.78 / 28.16 ±0.24 / 28.52 ms │ 28.04 / 28.71 ±1.10 / 30.90 ms │    no change │
│ QQuery 13 │ 38.02 / 38.83 ±0.59 / 39.63 ms │ 38.41 / 39.41 ±0.91 / 41.05 ms │    no change │
│ QQuery 14 │ 28.51 / 28.89 ±0.32 / 29.45 ms │ 28.51 / 28.71 ±0.15 / 28.96 ms │    no change │
│ QQuery 15 │ 33.38 / 33.64 ±0.23 / 34.01 ms │ 81.32 / 82.08 ±0.58 / 82.76 ms │ 2.44x slower │
│ QQuery 16 │ 15.85 / 16.08 ±0.20 / 16.44 ms │ 15.90 / 16.18 ±0.15 / 16.30 ms │    no change │
│ QQuery 17 │ 71.98 / 72.73 ±0.44 / 73.31 ms │ 73.16 / 73.69 ±0.33 / 74.06 ms │    no change │
│ QQuery 18 │ 76.62 / 78.05 ±1.00 / 79.49 ms │ 77.03 / 79.02 ±1.36 / 80.85 ms │    no change │
│ QQuery 19 │ 37.61 / 38.00 ±0.44 / 38.76 ms │ 37.96 / 38.14 ±0.20 / 38.43 ms │    no change │
│ QQuery 20 │ 40.10 / 40.87 ±0.74 / 42.16 ms │ 40.11 / 41.48 ±1.00 / 42.90 ms │    no change │
│ QQuery 21 │ 64.14 / 65.78 ±0.89 / 66.56 ms │ 64.51 / 65.90 ±0.71 / 66.44 ms │    no change │
│ QQuery 22 │ 17.71 / 18.20 ±0.33 / 18.70 ms │ 50.61 / 51.89 ±0.92 / 53.42 ms │ 2.85x slower │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 891.89ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 999.71ms │
│ Average Time (HEAD)                       │  40.54ms │
│ Average Time (neilc_scalar-subquery-expr) │  45.44ms │
│ Queries Faster                            │        0 │
│ Queries Slower                            │        3 │
│ Queries with No Change                    │       19 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 4.7s
Peak memory 4.0 GiB
Avg memory 3.6 GiB
CPU user 33.0s
CPU sys 3.1s
Disk read 0 B
Disk write 136.0 KiB

tpch — branch

Metric Value
Wall time 5.2s
Peak memory 4.0 GiB
Avg memory 3.6 GiB
CPU user 36.4s
CPU sys 3.2s
Disk read 0 B
Disk write 65.3 MiB

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark tpch10

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4156947198-609-ngld5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (b9bce91) to 0be5982 (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            neilc_scalar-subquery-expr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.32 / 4.53 ±6.33 / 17.19 ms │          1.31 / 4.51 ±6.30 / 17.12 ms │     no change │
│ QQuery 1  │        14.23 / 14.53 ±0.20 / 14.86 ms │        14.35 / 14.64 ±0.17 / 14.86 ms │     no change │
│ QQuery 2  │        44.31 / 44.58 ±0.28 / 45.02 ms │        44.40 / 44.75 ±0.31 / 45.19 ms │     no change │
│ QQuery 3  │        43.28 / 44.24 ±0.70 / 45.35 ms │        44.70 / 45.73 ±0.88 / 47.18 ms │     no change │
│ QQuery 4  │     286.08 / 290.96 ±3.34 / 294.94 ms │     290.52 / 300.63 ±6.22 / 307.69 ms │     no change │
│ QQuery 5  │     343.30 / 360.49 ±9.19 / 368.08 ms │     344.37 / 347.19 ±2.44 / 350.28 ms │     no change │
│ QQuery 6  │           5.46 / 5.92 ±0.40 / 6.44 ms │           5.40 / 5.97 ±0.29 / 6.20 ms │     no change │
│ QQuery 7  │        17.17 / 19.36 ±3.27 / 25.76 ms │        16.87 / 18.58 ±2.12 / 22.74 ms │     no change │
│ QQuery 8  │     432.14 / 441.42 ±9.03 / 452.58 ms │     433.86 / 443.34 ±8.18 / 453.05 ms │     no change │
│ QQuery 9  │     665.10 / 676.08 ±9.11 / 689.32 ms │     624.49 / 635.81 ±6.97 / 645.27 ms │ +1.06x faster │
│ QQuery 10 │        92.37 / 94.29 ±1.59 / 96.89 ms │        90.27 / 93.43 ±2.54 / 97.79 ms │     no change │
│ QQuery 11 │     104.22 / 105.64 ±1.45 / 107.54 ms │     103.32 / 105.88 ±1.55 / 107.78 ms │     no change │
│ QQuery 12 │     344.34 / 349.25 ±3.25 / 353.07 ms │     345.08 / 347.85 ±1.60 / 349.45 ms │     no change │
│ QQuery 13 │     463.79 / 472.59 ±7.95 / 485.80 ms │     457.64 / 463.97 ±6.15 / 472.80 ms │     no change │
│ QQuery 14 │     350.37 / 356.22 ±3.77 / 360.54 ms │     346.54 / 352.03 ±6.32 / 364.15 ms │     no change │
│ QQuery 15 │    360.40 / 374.90 ±17.68 / 406.65 ms │    375.51 / 394.04 ±32.95 / 459.82 ms │  1.05x slower │
│ QQuery 16 │    714.01 / 738.95 ±23.24 / 774.61 ms │    728.40 / 746.45 ±14.43 / 765.84 ms │     no change │
│ QQuery 17 │    714.60 / 731.23 ±12.85 / 746.56 ms │     715.64 / 721.12 ±5.66 / 731.77 ms │     no change │
│ QQuery 18 │ 1430.78 / 1488.33 ±40.84 / 1548.80 ms │ 1379.93 / 1479.71 ±51.92 / 1528.42 ms │     no change │
│ QQuery 19 │        35.90 / 37.02 ±1.18 / 39.14 ms │        35.40 / 37.33 ±1.81 / 40.76 ms │     no change │
│ QQuery 20 │    713.45 / 735.48 ±24.51 / 771.36 ms │    712.34 / 727.29 ±14.90 / 754.80 ms │     no change │
│ QQuery 21 │     754.02 / 765.34 ±6.85 / 774.44 ms │     761.37 / 764.62 ±2.67 / 768.81 ms │     no change │
│ QQuery 22 │  1123.65 / 1128.39 ±4.69 / 1137.31 ms │  1126.97 / 1131.73 ±7.10 / 1145.76 ms │     no change │
│ QQuery 23 │ 3041.09 / 3062.25 ±18.65 / 3096.08 ms │  3033.97 / 3043.12 ±7.01 / 3055.29 ms │     no change │
│ QQuery 24 │     101.54 / 103.59 ±1.75 / 106.55 ms │      98.71 / 100.39 ±1.13 / 101.92 ms │     no change │
│ QQuery 25 │     142.10 / 142.85 ±0.49 / 143.58 ms │     136.56 / 138.17 ±0.90 / 139.25 ms │     no change │
│ QQuery 26 │     100.19 / 102.93 ±2.31 / 107.10 ms │      98.00 / 100.86 ±2.33 / 103.12 ms │     no change │
│ QQuery 27 │     849.12 / 854.43 ±7.74 / 869.79 ms │     846.66 / 853.51 ±4.77 / 857.99 ms │     no change │
│ QQuery 28 │ 7705.51 / 7745.32 ±22.00 / 7770.71 ms │ 7697.89 / 7744.14 ±31.93 / 7780.46 ms │     no change │
│ QQuery 29 │        50.77 / 55.69 ±5.09 / 65.45 ms │        50.30 / 53.99 ±4.24 / 61.53 ms │     no change │
│ QQuery 30 │     363.99 / 370.45 ±4.29 / 377.11 ms │     356.81 / 365.83 ±6.34 / 375.05 ms │     no change │
│ QQuery 31 │    362.12 / 377.82 ±11.94 / 394.11 ms │     376.70 / 380.15 ±3.81 / 386.17 ms │     no change │
│ QQuery 32 │ 1200.38 / 1267.05 ±55.53 / 1326.36 ms │ 1265.70 / 1294.94 ±27.34 / 1344.67 ms │     no change │
│ QQuery 33 │ 1460.50 / 1499.33 ±45.94 / 1580.55 ms │ 1470.47 / 1563.53 ±46.86 / 1592.95 ms │     no change │
│ QQuery 34 │  1431.98 / 1445.24 ±8.97 / 1459.07 ms │  1442.78 / 1454.41 ±8.09 / 1463.45 ms │     no change │
│ QQuery 35 │     382.15 / 386.54 ±3.26 / 390.79 ms │     379.35 / 385.51 ±7.65 / 397.78 ms │     no change │
│ QQuery 36 │     120.63 / 123.11 ±2.38 / 127.06 ms │     112.43 / 120.57 ±5.93 / 129.56 ms │     no change │
│ QQuery 37 │        48.56 / 49.41 ±0.56 / 50.23 ms │        48.03 / 50.15 ±1.57 / 52.85 ms │     no change │
│ QQuery 38 │        76.82 / 77.82 ±1.61 / 81.02 ms │        73.92 / 76.14 ±1.74 / 78.88 ms │     no change │
│ QQuery 39 │     220.70 / 223.98 ±1.85 / 226.23 ms │     204.83 / 218.14 ±7.68 / 228.76 ms │     no change │
│ QQuery 40 │        20.76 / 23.38 ±1.78 / 25.13 ms │        23.89 / 25.58 ±1.20 / 27.12 ms │  1.09x slower │
│ QQuery 41 │        20.53 / 22.07 ±1.92 / 25.72 ms │        19.67 / 20.48 ±0.58 / 21.38 ms │ +1.08x faster │
│ QQuery 42 │        19.62 / 19.98 ±0.25 / 20.37 ms │        18.69 / 20.48 ±1.77 / 23.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 27232.98ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 27236.70ms │
│ Average Time (HEAD)                       │   633.33ms │
│ Average Time (neilc_scalar-subquery-expr) │   633.41ms │
│ Queries Faster                            │          2 │
│ Queries Slower                            │          2 │
│ Queries with No Change                    │         39 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 137.3s
Peak memory 43.4 GiB
Avg memory 30.6 GiB
CPU user 1280.1s
CPU sys 100.5s
Disk read 0 B
Disk write 3.7 GiB

clickbench_partitioned — branch

Metric Value
Wall time 137.3s
Peak memory 42.2 GiB
Avg memory 30.8 GiB
CPU user 1274.8s
CPU sys 105.0s
Disk read 0 B
Disk write 756.0 KiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃               neilc_scalar-subquery-expr ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 1  │           44.25 / 44.82 ±0.48 / 45.51 ms │           43.85 / 44.69 ±0.60 / 45.60 ms │      no change │
│ QQuery 2  │        146.98 / 147.70 ±0.56 / 148.52 ms │        146.78 / 148.18 ±0.91 / 149.54 ms │      no change │
│ QQuery 3  │        114.21 / 115.31 ±0.72 / 116.44 ms │        113.91 / 114.94 ±0.60 / 115.64 ms │      no change │
│ QQuery 4  │    1361.24 / 1382.43 ±11.51 / 1394.45 ms │    1283.54 / 1322.99 ±26.13 / 1357.02 ms │      no change │
│ QQuery 5  │        173.10 / 174.54 ±1.70 / 177.81 ms │        174.76 / 176.93 ±1.75 / 179.87 ms │      no change │
│ QQuery 6  │    1015.03 / 1042.42 ±19.66 / 1065.36 ms │        141.66 / 142.72 ±1.01 / 144.51 ms │  +7.30x faster │
│ QQuery 7  │        354.77 / 357.64 ±3.86 / 365.22 ms │        351.55 / 354.33 ±1.76 / 356.39 ms │      no change │
│ QQuery 8  │        117.07 / 118.20 ±0.78 / 119.39 ms │        117.21 / 118.52 ±0.82 / 119.33 ms │      no change │
│ QQuery 9  │        104.15 / 111.53 ±8.82 / 128.86 ms │ 11772.07 / 11804.97 ±43.46 / 11890.91 ms │ 105.84x slower │
│ QQuery 10 │        108.76 / 110.53 ±1.28 / 112.52 ms │        109.57 / 112.00 ±1.86 / 114.67 ms │      no change │
│ QQuery 11 │      940.91 / 992.20 ±37.97 / 1029.34 ms │       920.39 / 943.43 ±14.07 / 964.76 ms │      no change │
│ QQuery 12 │           47.32 / 48.44 ±0.88 / 49.94 ms │           45.77 / 47.28 ±1.70 / 50.42 ms │      no change │
│ QQuery 13 │        411.48 / 420.21 ±6.20 / 427.97 ms │        404.43 / 409.93 ±3.82 / 416.09 ms │      no change │
│ QQuery 14 │    1021.64 / 1033.76 ±13.70 / 1060.24 ms │    2459.59 / 2478.22 ±14.55 / 2495.08 ms │   2.40x slower │
│ QQuery 15 │           16.38 / 17.02 ±0.85 / 18.68 ms │           15.97 / 16.60 ±0.64 / 17.60 ms │      no change │
│ QQuery 16 │           40.86 / 42.57 ±1.14 / 43.89 ms │           40.26 / 41.55 ±0.94 / 43.03 ms │      no change │
│ QQuery 17 │        240.59 / 243.80 ±2.28 / 247.36 ms │        243.65 / 245.81 ±1.65 / 248.45 ms │      no change │
│ QQuery 18 │        129.46 / 130.41 ±0.51 / 130.96 ms │        129.49 / 131.28 ±1.57 / 133.31 ms │      no change │
│ QQuery 19 │        155.80 / 157.11 ±0.69 / 157.78 ms │        155.14 / 156.80 ±0.98 / 158.18 ms │      no change │
│ QQuery 20 │           14.20 / 14.68 ±0.53 / 15.60 ms │           13.56 / 13.83 ±0.18 / 14.11 ms │  +1.06x faster │
│ QQuery 21 │           19.81 / 20.02 ±0.19 / 20.37 ms │           19.30 / 19.90 ±0.61 / 21.05 ms │      no change │
│ QQuery 22 │        497.84 / 502.82 ±4.14 / 509.46 ms │        480.11 / 484.79 ±5.36 / 495.20 ms │      no change │
│ QQuery 23 │       895.90 / 921.72 ±21.03 / 953.14 ms │    3630.01 / 3663.42 ±50.84 / 3763.87 ms │   3.97x slower │
│ QQuery 24 │        415.61 / 419.65 ±2.54 / 422.31 ms │     2816.38 / 2831.36 ±9.63 / 2843.15 ms │   6.75x slower │
│ QQuery 25 │        352.77 / 356.03 ±1.80 / 358.05 ms │        359.52 / 364.18 ±3.56 / 370.31 ms │      no change │
│ QQuery 26 │           84.09 / 84.47 ±0.52 / 85.48 ms │           84.39 / 86.51 ±1.68 / 88.70 ms │      no change │
│ QQuery 27 │        347.45 / 352.98 ±3.79 / 357.59 ms │        348.79 / 353.41 ±2.35 / 355.01 ms │      no change │
│ QQuery 28 │        148.27 / 151.63 ±2.48 / 154.86 ms │        149.83 / 151.92 ±2.47 / 155.37 ms │      no change │
│ QQuery 29 │        300.29 / 302.56 ±2.35 / 306.73 ms │        301.35 / 306.56 ±2.80 / 309.75 ms │      no change │
│ QQuery 30 │           44.14 / 45.30 ±1.05 / 46.83 ms │           43.84 / 45.85 ±1.52 / 47.52 ms │      no change │
│ QQuery 31 │        175.31 / 176.57 ±0.79 / 177.69 ms │        173.71 / 174.43 ±0.76 / 175.76 ms │      no change │
│ QQuery 32 │           57.96 / 59.07 ±0.81 / 59.85 ms │           57.07 / 57.93 ±0.83 / 59.48 ms │      no change │
│ QQuery 33 │        141.83 / 144.07 ±1.94 / 147.55 ms │        141.53 / 144.42 ±1.52 / 145.77 ms │      no change │
│ QQuery 34 │        107.36 / 108.51 ±0.86 / 109.58 ms │        106.03 / 107.62 ±1.10 / 109.32 ms │      no change │
│ QQuery 35 │        107.79 / 110.28 ±1.59 / 112.49 ms │        106.64 / 109.53 ±1.74 / 111.98 ms │      no change │
│ QQuery 36 │        219.23 / 224.42 ±3.03 / 227.67 ms │        220.11 / 223.14 ±2.92 / 228.20 ms │      no change │
│ QQuery 37 │        177.45 / 181.60 ±3.05 / 186.12 ms │        180.63 / 183.65 ±3.04 / 189.42 ms │      no change │
│ QQuery 38 │           85.23 / 88.26 ±2.76 / 93.27 ms │           86.28 / 88.25 ±1.53 / 89.83 ms │      no change │
│ QQuery 39 │        125.33 / 129.66 ±2.86 / 133.52 ms │        127.17 / 130.50 ±1.84 / 132.33 ms │      no change │
│ QQuery 40 │        111.06 / 118.45 ±7.21 / 132.16 ms │        116.67 / 121.21 ±4.68 / 129.25 ms │      no change │
│ QQuery 41 │           15.14 / 16.20 ±0.71 / 17.29 ms │           14.31 / 15.43 ±0.75 / 16.38 ms │      no change │
│ QQuery 42 │        106.07 / 109.01 ±1.62 / 110.51 ms │        107.70 / 109.62 ±1.12 / 110.99 ms │      no change │
│ QQuery 43 │           83.46 / 84.93 ±0.90 / 85.96 ms │           84.31 / 84.85 ±0.30 / 85.22 ms │      no change │
│ QQuery 44 │           11.83 / 12.50 ±0.84 / 14.15 ms │           10.75 / 11.47 ±0.50 / 12.31 ms │  +1.09x faster │
│ QQuery 45 │           52.65 / 54.45 ±1.60 / 57.36 ms │           53.03 / 54.38 ±0.78 / 55.19 ms │      no change │
│ QQuery 46 │        229.89 / 233.87 ±2.16 / 235.83 ms │        230.75 / 233.61 ±1.85 / 236.06 ms │      no change │
│ QQuery 47 │        696.01 / 702.63 ±4.78 / 710.07 ms │        687.70 / 692.96 ±2.80 / 695.88 ms │      no change │
│ QQuery 48 │        287.73 / 296.85 ±7.22 / 305.39 ms │        289.01 / 295.03 ±4.71 / 301.18 ms │      no change │
│ QQuery 49 │        259.14 / 261.03 ±1.48 / 263.03 ms │        261.39 / 262.20 ±0.50 / 262.74 ms │      no change │
│ QQuery 50 │        223.24 / 233.90 ±6.07 / 240.45 ms │        226.60 / 237.72 ±6.25 / 243.65 ms │      no change │
│ QQuery 51 │        185.08 / 188.39 ±3.03 / 193.10 ms │        181.75 / 185.58 ±3.65 / 191.74 ms │      no change │
│ QQuery 52 │        109.13 / 111.62 ±2.08 / 115.41 ms │        107.03 / 108.80 ±1.54 / 111.15 ms │      no change │
│ QQuery 53 │        104.50 / 105.45 ±1.14 / 107.67 ms │        101.81 / 104.13 ±1.79 / 106.33 ms │      no change │
│ QQuery 54 │        149.30 / 151.24 ±1.19 / 152.77 ms │        156.52 / 158.75 ±2.77 / 164.04 ms │      no change │
│ QQuery 55 │        108.95 / 110.52 ±1.38 / 112.70 ms │        107.18 / 108.39 ±1.42 / 110.46 ms │      no change │
│ QQuery 56 │        144.02 / 145.23 ±1.29 / 147.63 ms │        142.76 / 144.52 ±1.61 / 147.40 ms │      no change │
│ QQuery 57 │        178.51 / 179.82 ±1.38 / 182.32 ms │        173.32 / 176.46 ±2.69 / 180.39 ms │      no change │
│ QQuery 58 │        301.08 / 308.07 ±5.89 / 318.34 ms │        284.32 / 286.13 ±1.06 / 287.64 ms │  +1.08x faster │
│ QQuery 59 │        198.65 / 200.43 ±1.82 / 203.79 ms │        197.96 / 200.55 ±2.03 / 202.79 ms │      no change │
│ QQuery 60 │        143.28 / 145.85 ±1.63 / 147.63 ms │        145.18 / 147.01 ±1.21 / 148.78 ms │      no change │
│ QQuery 61 │        170.53 / 172.66 ±1.39 / 174.85 ms │        171.46 / 173.55 ±1.50 / 176.01 ms │      no change │
│ QQuery 62 │       892.25 / 916.36 ±29.78 / 973.49 ms │       874.14 / 891.12 ±19.50 / 928.90 ms │      no change │
│ QQuery 63 │        103.36 / 105.12 ±1.54 / 107.87 ms │        103.00 / 104.65 ±1.50 / 107.40 ms │      no change │
│ QQuery 64 │        703.19 / 707.18 ±2.15 / 709.05 ms │        702.85 / 707.46 ±2.50 / 710.44 ms │      no change │
│ QQuery 65 │        247.33 / 251.97 ±3.58 / 255.09 ms │        251.45 / 254.93 ±3.10 / 259.73 ms │      no change │
│ QQuery 66 │        251.98 / 257.05 ±2.97 / 259.96 ms │       237.54 / 256.69 ±15.77 / 278.65 ms │      no change │
│ QQuery 67 │        303.71 / 312.70 ±8.73 / 327.82 ms │        311.04 / 319.95 ±6.40 / 329.08 ms │      no change │
│ QQuery 68 │        280.57 / 284.59 ±3.27 / 287.73 ms │        279.25 / 282.30 ±2.13 / 284.99 ms │      no change │
│ QQuery 69 │        103.56 / 104.90 ±0.80 / 105.85 ms │        103.49 / 105.13 ±1.06 / 106.56 ms │      no change │
│ QQuery 70 │       347.29 / 354.74 ±11.03 / 376.65 ms │       340.58 / 362.19 ±12.96 / 374.97 ms │      no change │
│ QQuery 71 │        136.05 / 136.62 ±0.55 / 137.65 ms │        135.11 / 136.11 ±0.90 / 137.53 ms │      no change │
│ QQuery 72 │        710.67 / 720.41 ±8.57 / 735.08 ms │        708.61 / 725.47 ±9.99 / 740.00 ms │      no change │
│ QQuery 73 │        103.25 / 105.05 ±1.74 / 108.23 ms │        102.97 / 105.53 ±1.76 / 108.32 ms │      no change │
│ QQuery 74 │        543.98 / 549.58 ±5.66 / 557.36 ms │        543.13 / 545.65 ±1.86 / 548.05 ms │      no change │
│ QQuery 75 │        279.15 / 280.80 ±0.86 / 281.64 ms │        277.68 / 280.61 ±2.83 / 284.56 ms │      no change │
│ QQuery 76 │        131.39 / 133.55 ±1.73 / 136.63 ms │        133.37 / 135.08 ±1.23 / 136.96 ms │      no change │
│ QQuery 77 │        187.96 / 189.74 ±1.31 / 191.81 ms │        189.48 / 191.30 ±1.46 / 193.10 ms │      no change │
│ QQuery 78 │        348.71 / 356.03 ±4.77 / 361.35 ms │        357.13 / 360.34 ±3.07 / 364.95 ms │      no change │
│ QQuery 79 │        232.70 / 236.90 ±5.40 / 247.39 ms │        232.93 / 235.40 ±1.95 / 237.34 ms │      no change │
│ QQuery 80 │        329.34 / 332.78 ±3.26 / 337.44 ms │        333.30 / 335.02 ±1.35 / 336.80 ms │      no change │
│ QQuery 81 │           25.65 / 27.36 ±1.24 / 28.58 ms │           26.80 / 28.09 ±0.84 / 29.10 ms │      no change │
│ QQuery 82 │        199.33 / 202.38 ±3.13 / 206.64 ms │        201.56 / 203.41 ±2.46 / 208.19 ms │      no change │
│ QQuery 83 │           39.88 / 40.93 ±1.15 / 43.11 ms │           39.27 / 40.14 ±1.12 / 42.31 ms │      no change │
│ QQuery 84 │           48.26 / 49.54 ±1.52 / 52.52 ms │           48.79 / 50.49 ±1.39 / 52.32 ms │      no change │
│ QQuery 85 │        150.38 / 151.76 ±0.98 / 153.32 ms │        148.47 / 150.97 ±1.83 / 153.95 ms │      no change │
│ QQuery 86 │           38.20 / 40.24 ±1.71 / 43.32 ms │           39.06 / 40.23 ±0.79 / 41.47 ms │      no change │
│ QQuery 87 │           85.56 / 88.67 ±3.21 / 94.61 ms │           87.50 / 89.60 ±2.23 / 93.58 ms │      no change │
│ QQuery 88 │         99.72 / 100.64 ±0.75 / 101.48 ms │        101.58 / 102.33 ±0.43 / 102.83 ms │      no change │
│ QQuery 89 │        119.65 / 120.68 ±0.81 / 122.08 ms │        116.81 / 118.73 ±1.32 / 120.33 ms │      no change │
│ QQuery 90 │           23.89 / 24.63 ±0.40 / 25.01 ms │           23.50 / 24.39 ±0.75 / 25.56 ms │      no change │
│ QQuery 91 │           63.91 / 65.34 ±1.09 / 67.03 ms │           62.96 / 65.22 ±1.28 / 66.95 ms │      no change │
│ QQuery 92 │           56.86 / 57.99 ±0.72 / 59.12 ms │           57.66 / 58.13 ±0.29 / 58.51 ms │      no change │
│ QQuery 93 │        190.02 / 192.75 ±1.95 / 195.89 ms │        193.35 / 195.59 ±1.22 / 196.95 ms │      no change │
│ QQuery 94 │           60.45 / 62.27 ±1.26 / 64.19 ms │           60.68 / 62.24 ±1.36 / 64.26 ms │      no change │
│ QQuery 95 │        135.00 / 136.46 ±1.24 / 138.14 ms │        134.78 / 136.74 ±1.53 / 138.68 ms │      no change │
│ QQuery 96 │           71.97 / 73.87 ±1.43 / 75.12 ms │           71.00 / 74.89 ±2.10 / 77.22 ms │      no change │
│ QQuery 97 │        127.60 / 129.22 ±1.43 / 131.78 ms │        131.31 / 132.42 ±0.77 / 133.47 ms │      no change │
│ QQuery 98 │        153.41 / 156.61 ±2.03 / 159.65 ms │        154.66 / 156.51 ±1.62 / 158.98 ms │      no change │
│ QQuery 99 │ 10715.50 / 10746.34 ±23.07 / 10778.99 ms │ 10685.30 / 10727.68 ±39.63 / 10788.50 ms │      no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 33831.79ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 51057.44ms │
│ Average Time (HEAD)                       │   341.74ms │
│ Average Time (neilc_scalar-subquery-expr) │   515.73ms │
│ Queries Faster                            │          4 │
│ Queries Slower                            │          4 │
│ Queries with No Change                    │         91 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 169.5s
Peak memory 5.3 GiB
Avg memory 4.4 GiB
CPU user 271.6s
CPU sys 18.5s
Disk read 0 B
Disk write 707.3 MiB

tpcds — branch

Metric Value
Wall time 255.6s
Peak memory 5.5 GiB
Avg memory 4.4 GiB
CPU user 340.5s
CPU sys 23.1s
Disk read 0 B
Disk write 792.0 KiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃         neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │  369.19 / 370.73 ±2.51 / 375.73 ms │  368.96 / 370.36 ±1.31 / 371.96 ms │    no change │
│ QQuery 2  │  135.00 / 138.57 ±2.07 / 141.14 ms │  135.89 / 137.81 ±1.29 / 139.25 ms │    no change │
│ QQuery 3  │  289.53 / 291.57 ±1.23 / 293.22 ms │  290.53 / 297.20 ±5.11 / 304.04 ms │    no change │
│ QQuery 4  │  152.29 / 154.24 ±1.78 / 156.63 ms │  152.70 / 155.09 ±1.29 / 156.49 ms │    no change │
│ QQuery 5  │  419.37 / 427.51 ±7.24 / 440.72 ms │  428.18 / 434.72 ±4.87 / 443.03 ms │    no change │
│ QQuery 6  │  131.49 / 132.09 ±0.53 / 132.93 ms │  131.37 / 132.40 ±0.69 / 133.50 ms │    no change │
│ QQuery 7  │ 551.21 / 565.50 ±10.88 / 583.16 ms │  558.76 / 565.02 ±5.22 / 574.66 ms │    no change │
│ QQuery 8  │  469.80 / 474.21 ±3.26 / 479.52 ms │ 474.57 / 486.11 ±10.18 / 501.01 ms │    no change │
│ QQuery 9  │  670.21 / 679.03 ±7.92 / 688.56 ms │ 681.44 / 695.24 ±11.43 / 709.76 ms │    no change │
│ QQuery 10 │  325.05 / 335.91 ±5.88 / 341.92 ms │  331.25 / 340.91 ±6.86 / 350.17 ms │    no change │
│ QQuery 11 │  106.41 / 108.42 ±1.66 / 110.77 ms │  244.91 / 250.70 ±5.02 / 257.33 ms │ 2.31x slower │
│ QQuery 12 │  203.75 / 205.81 ±2.14 / 209.59 ms │  202.97 / 206.01 ±2.80 / 209.36 ms │    no change │
│ QQuery 13 │  311.48 / 323.44 ±7.21 / 334.11 ms │  302.37 / 310.21 ±7.64 / 324.47 ms │    no change │
│ QQuery 14 │  181.83 / 186.15 ±5.10 / 195.22 ms │  181.62 / 183.84 ±2.43 / 188.48 ms │    no change │
│ QQuery 15 │  329.37 / 331.97 ±1.95 / 335.24 ms │  765.13 / 766.18 ±0.83 / 767.36 ms │ 2.31x slower │
│ QQuery 16 │     84.52 / 86.28 ±2.09 / 90.23 ms │     85.76 / 89.54 ±2.04 / 91.55 ms │    no change │
│ QQuery 17 │  740.10 / 747.07 ±5.94 / 756.12 ms │  745.78 / 749.20 ±3.18 / 754.51 ms │    no change │
│ QQuery 18 │ 821.06 / 857.35 ±20.20 / 876.81 ms │ 833.04 / 853.12 ±12.21 / 868.09 ms │    no change │
│ QQuery 19 │  266.65 / 270.81 ±5.31 / 280.90 ms │  266.35 / 272.41 ±8.99 / 290.20 ms │    no change │
│ QQuery 20 │  313.00 / 321.96 ±6.63 / 333.12 ms │  320.59 / 330.50 ±6.63 / 340.98 ms │    no change │
│ QQuery 21 │  836.48 / 850.09 ±7.99 / 859.63 ms │  844.86 / 855.47 ±8.71 / 867.57 ms │    no change │
│ QQuery 22 │     79.43 / 82.63 ±2.92 / 87.31 ms │   95.60 / 101.03 ±3.22 / 104.46 ms │ 1.22x slower │
└───────────┴────────────────────────────────────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 7941.33ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 8583.07ms │
│ Average Time (HEAD)                       │  360.97ms │
│ Average Time (neilc_scalar-subquery-expr) │  390.14ms │
│ Queries Faster                            │         0 │
│ Queries Slower                            │         3 │
│ Queries with No Change                    │        19 │
│ Queries with Failure                      │         0 │
└───────────────────────────────────────────┴───────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 40.1s
Peak memory 9.6 GiB
Avg memory 7.3 GiB
CPU user 418.3s
CPU sys 28.9s
Disk read 0 B
Disk write 3.0 GiB

tpch10 — branch

Metric Value
Wall time 43.3s
Peak memory 9.6 GiB
Avg memory 7.1 GiB
CPU user 450.3s
CPU sys 32.6s
Disk read 0 B
Disk write 84.0 KiB

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

Some slowdowns, due to missing statisitics I wonder?

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark tpch tpch10 tpcds

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4157316411-610-2mvqv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (09f167a) to 0be5982 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4157316411-611-fkr9v 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (09f167a) to 0be5982 (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4157316411-612-vrjfk 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (09f167a) to 0be5982 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

}

fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> {
self.input.partition_statistics(partition)
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be improved to return 1 row if the subquery has larger num_rows statistic (or no statistics at all)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for bytes estimation

Copy link
Copy Markdown
Contributor Author

@neilconway neilconway Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I want to make sure I understand. partition_statistics is about the statistics for the output of the plan node. The output of ScalarSubqueryExec is identical to the output of its main input child node; the n other children are just subqueries whose result values get used somewhere by the main input plan. The statistics of the child subqueries don't direct influence the properties of the output of the ScalarSubqueryExec itself. Lmk if I'm misunderstanding you though!

Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughy was if this is the physical node for scalar subquery (which returns 1 row maximum and returns error if it exceeeds 1 row)? If so, we could update the statistics (based on child stats) to be min(1, child).

Will try to look at the PR more in depth tomorrow!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is mostly that the subquery will have either unknown or high estimated stats, returning updated stats for a scalar subquery should generally be much more effective for planning correct join order, as it can be loaded in the build side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I will admit I still don't entirely follow what you mean, sorry to be slow 🙃

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 45.44 / 46.14 ±0.71 / 47.42 ms │ 45.84 / 46.49 ±0.88 / 48.19 ms │    no change │
│ QQuery 2  │ 21.53 / 21.81 ±0.22 / 22.08 ms │ 21.71 / 21.88 ±0.11 / 22.01 ms │    no change │
│ QQuery 3  │ 32.16 / 32.36 ±0.15 / 32.59 ms │ 32.23 / 33.10 ±0.90 / 34.81 ms │    no change │
│ QQuery 4  │ 21.27 / 22.46 ±0.93 / 23.94 ms │ 21.41 / 22.49 ±0.76 / 23.49 ms │    no change │
│ QQuery 5  │ 48.55 / 49.70 ±0.75 / 50.84 ms │ 48.45 / 51.16 ±1.41 / 52.28 ms │    no change │
│ QQuery 6  │ 17.33 / 18.05 ±0.62 / 19.12 ms │ 17.21 / 17.46 ±0.16 / 17.71 ms │    no change │
│ QQuery 7  │ 54.80 / 55.79 ±0.83 / 56.70 ms │ 56.24 / 57.57 ±1.31 / 59.97 ms │    no change │
│ QQuery 8  │ 48.32 / 48.94 ±0.38 / 49.37 ms │ 48.58 / 49.14 ±0.45 / 49.80 ms │    no change │
│ QQuery 9  │ 53.95 / 56.14 ±1.15 / 57.24 ms │ 54.38 / 55.10 ±0.62 / 56.24 ms │    no change │
│ QQuery 10 │ 72.16 / 74.73 ±2.22 / 77.59 ms │ 72.25 / 72.69 ±0.47 / 73.32 ms │    no change │
│ QQuery 11 │ 14.17 / 14.75 ±0.48 / 15.54 ms │ 35.13 / 36.16 ±0.97 / 37.83 ms │ 2.45x slower │
│ QQuery 12 │ 28.15 / 28.59 ±0.26 / 28.91 ms │ 28.31 / 29.23 ±1.09 / 31.28 ms │    no change │
│ QQuery 13 │ 38.73 / 40.41 ±1.24 / 42.39 ms │ 38.69 / 39.34 ±0.55 / 40.01 ms │    no change │
│ QQuery 14 │ 28.75 / 29.05 ±0.30 / 29.61 ms │ 28.77 / 29.08 ±0.44 / 29.95 ms │    no change │
│ QQuery 15 │ 33.93 / 34.60 ±0.54 / 35.32 ms │ 82.77 / 83.97 ±1.21 / 86.20 ms │ 2.43x slower │
│ QQuery 16 │ 16.40 / 16.65 ±0.20 / 16.99 ms │ 16.05 / 16.49 ±0.23 / 16.74 ms │    no change │
│ QQuery 17 │ 74.66 / 75.38 ±0.52 / 76.20 ms │ 74.79 / 75.44 ±0.64 / 76.33 ms │    no change │
│ QQuery 18 │ 78.97 / 80.07 ±1.35 / 82.68 ms │ 79.16 / 80.25 ±0.93 / 81.70 ms │    no change │
│ QQuery 19 │ 37.88 / 39.20 ±1.97 / 43.08 ms │ 38.02 / 38.51 ±0.44 / 39.12 ms │    no change │
│ QQuery 20 │ 40.78 / 41.35 ±0.48 / 42.04 ms │ 40.80 / 41.91 ±0.97 / 43.63 ms │    no change │
│ QQuery 21 │ 64.34 / 65.79 ±1.46 / 68.59 ms │ 65.35 / 66.47 ±1.10 / 68.12 ms │    no change │
│ QQuery 22 │ 17.98 / 18.29 ±0.27 / 18.78 ms │ 50.51 / 52.34 ±2.17 / 56.07 ms │ 2.86x slower │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                         │  910.24ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 1016.25ms │
│ Average Time (HEAD)                       │   41.37ms │
│ Average Time (neilc_scalar-subquery-expr) │   46.19ms │
│ Queries Faster                            │         0 │
│ Queries Slower                            │         3 │
│ Queries with No Change                    │        19 │
│ Queries with Failure                      │         0 │
└───────────────────────────────────────────┴───────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 4.8s
Peak memory 4.1 GiB
Avg memory 3.6 GiB
CPU user 33.7s
CPU sys 3.0s
Disk read 0 B
Disk write 136.0 KiB

tpch — branch

Metric Value
Wall time 5.3s
Peak memory 3.9 GiB
Avg memory 3.6 GiB
CPU user 37.0s
CPU sys 3.2s
Disk read 0 B
Disk write 72.0 KiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃         neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │  369.45 / 371.23 ±1.01 / 372.21 ms │  371.07 / 372.98 ±1.21 / 374.77 ms │    no change │
│ QQuery 2  │  136.11 / 138.90 ±1.52 / 140.76 ms │  136.90 / 139.86 ±1.71 / 141.67 ms │    no change │
│ QQuery 3  │  288.62 / 292.34 ±3.27 / 298.25 ms │  291.31 / 296.84 ±4.63 / 304.93 ms │    no change │
│ QQuery 4  │  153.67 / 157.31 ±2.31 / 160.92 ms │  152.93 / 155.98 ±2.42 / 159.69 ms │    no change │
│ QQuery 5  │  430.18 / 432.40 ±1.81 / 435.45 ms │  429.36 / 436.35 ±4.11 / 441.57 ms │    no change │
│ QQuery 6  │  130.31 / 131.64 ±1.10 / 133.28 ms │  131.80 / 133.06 ±1.30 / 135.57 ms │    no change │
│ QQuery 7  │  560.44 / 564.99 ±3.07 / 569.20 ms │  562.20 / 572.75 ±6.19 / 580.51 ms │    no change │
│ QQuery 8  │  468.56 / 476.12 ±5.84 / 484.44 ms │  479.45 / 481.57 ±2.24 / 485.71 ms │    no change │
│ QQuery 9  │  675.61 / 679.30 ±3.42 / 684.98 ms │  682.55 / 687.44 ±3.07 / 691.03 ms │    no change │
│ QQuery 10 │  325.36 / 330.80 ±5.52 / 341.23 ms │  325.07 / 333.94 ±5.66 / 341.20 ms │    no change │
│ QQuery 11 │  107.65 / 111.54 ±3.38 / 117.76 ms │  244.85 / 251.74 ±4.39 / 256.84 ms │ 2.26x slower │
│ QQuery 12 │  202.12 / 207.01 ±4.74 / 215.87 ms │  205.21 / 208.60 ±2.86 / 213.07 ms │    no change │
│ QQuery 13 │  308.20 / 315.71 ±4.91 / 321.67 ms │ 307.94 / 320.53 ±11.52 / 342.17 ms │    no change │
│ QQuery 14 │  184.67 / 188.66 ±3.44 / 193.87 ms │  182.94 / 185.26 ±1.80 / 187.72 ms │    no change │
│ QQuery 15 │  329.35 / 331.48 ±1.16 / 332.88 ms │  768.50 / 770.46 ±2.85 / 776.11 ms │ 2.32x slower │
│ QQuery 16 │     85.64 / 87.57 ±2.17 / 91.62 ms │     84.19 / 85.80 ±0.95 / 87.01 ms │    no change │
│ QQuery 17 │  750.46 / 752.23 ±1.17 / 753.97 ms │  746.45 / 750.21 ±2.99 / 754.34 ms │    no change │
│ QQuery 18 │ 848.14 / 866.19 ±15.06 / 886.56 ms │  857.26 / 866.04 ±8.91 / 882.93 ms │    no change │
│ QQuery 19 │  265.22 / 270.35 ±5.32 / 280.58 ms │  268.58 / 274.96 ±7.28 / 288.90 ms │    no change │
│ QQuery 20 │  314.37 / 325.18 ±6.22 / 332.54 ms │  322.72 / 326.81 ±2.71 / 331.13 ms │    no change │
│ QQuery 21 │  845.49 / 853.82 ±8.32 / 865.94 ms │ 844.81 / 855.82 ±10.28 / 873.36 ms │    no change │
│ QQuery 22 │     79.10 / 82.80 ±2.83 / 86.52 ms │   97.54 / 100.95 ±2.23 / 103.96 ms │ 1.22x slower │
└───────────┴────────────────────────────────────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 7967.58ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 8607.95ms │
│ Average Time (HEAD)                       │  362.16ms │
│ Average Time (neilc_scalar-subquery-expr) │  391.27ms │
│ Queries Faster                            │         0 │
│ Queries Slower                            │         3 │
│ Queries with No Change                    │        19 │
│ Queries with Failure                      │         0 │
└───────────────────────────────────────────┴───────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 40.2s
Peak memory 9.9 GiB
Avg memory 7.6 GiB
CPU user 420.6s
CPU sys 28.6s
Disk read 0 B
Disk write 2.9 GiB

tpch10 — branch

Metric Value
Wall time 43.4s
Peak memory 9.9 GiB
Avg memory 7.7 GiB
CPU user 452.8s
CPU sys 31.8s
Disk read 0 B
Disk write 856.0 KiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                neilc_scalar-subquery-expr ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 1  │           43.22 / 43.94 ±0.80 / 45.41 ms │            44.33 / 45.69 ±1.06 / 47.55 ms │      no change │
│ QQuery 2  │        147.30 / 148.77 ±1.27 / 150.37 ms │         146.88 / 148.58 ±1.36 / 150.83 ms │      no change │
│ QQuery 3  │        114.50 / 114.86 ±0.36 / 115.49 ms │         115.07 / 117.29 ±1.77 / 119.44 ms │      no change │
│ QQuery 4  │    1399.18 / 1454.42 ±41.04 / 1523.59 ms │     1404.93 / 1452.99 ±33.73 / 1500.79 ms │      no change │
│ QQuery 5  │        174.01 / 178.70 ±4.04 / 184.26 ms │         174.82 / 176.67 ±2.05 / 180.34 ms │      no change │
│ QQuery 6  │    1008.48 / 1069.55 ±42.41 / 1137.25 ms │         170.15 / 175.57 ±3.52 / 180.51 ms │  +6.09x faster │
│ QQuery 7  │        368.11 / 372.54 ±2.71 / 376.56 ms │         356.68 / 361.72 ±4.85 / 370.23 ms │      no change │
│ QQuery 8  │        118.30 / 119.52 ±1.21 / 121.55 ms │         117.68 / 118.71 ±0.75 / 119.67 ms │      no change │
│ QQuery 9  │        102.88 / 110.09 ±4.78 / 117.76 ms │ 12439.23 / 12688.98 ±264.59 / 13181.08 ms │ 115.26x slower │
│ QQuery 10 │        111.15 / 112.57 ±1.23 / 114.54 ms │         115.79 / 119.08 ±3.47 / 123.72 ms │   1.06x slower │
│ QQuery 11 │    1068.41 / 1090.22 ±18.47 / 1117.90 ms │      1085.27 / 1094.33 ±8.28 / 1106.52 ms │      no change │
│ QQuery 12 │           46.63 / 50.10 ±2.45 / 53.28 ms │            49.68 / 52.24 ±1.65 / 54.70 ms │      no change │
│ QQuery 13 │        427.40 / 429.76 ±1.43 / 431.37 ms │         420.71 / 426.47 ±4.53 / 432.51 ms │      no change │
│ QQuery 14 │     1057.72 / 1063.91 ±8.82 / 1081.23 ms │     2536.99 / 2581.84 ±35.55 / 2624.52 ms │   2.43x slower │
│ QQuery 15 │           17.14 / 17.87 ±0.95 / 19.65 ms │            17.64 / 19.68 ±1.20 / 20.99 ms │   1.10x slower │
│ QQuery 16 │           41.97 / 44.20 ±1.51 / 46.01 ms │            42.80 / 43.50 ±0.59 / 44.47 ms │      no change │
│ QQuery 17 │        245.61 / 249.20 ±4.65 / 258.03 ms │         244.94 / 256.16 ±6.23 / 262.17 ms │      no change │
│ QQuery 18 │        131.44 / 135.11 ±3.44 / 139.42 ms │         132.98 / 135.89 ±1.72 / 138.29 ms │      no change │
│ QQuery 19 │        158.08 / 162.04 ±4.29 / 168.97 ms │         157.87 / 165.06 ±5.99 / 172.77 ms │      no change │
│ QQuery 20 │           14.95 / 15.38 ±0.24 / 15.59 ms │            13.99 / 14.46 ±0.51 / 15.18 ms │  +1.06x faster │
│ QQuery 21 │           20.79 / 21.04 ±0.15 / 21.28 ms │            19.36 / 19.80 ±0.36 / 20.37 ms │  +1.06x faster │
│ QQuery 22 │       497.03 / 513.27 ±17.60 / 538.81 ms │         500.72 / 506.12 ±8.30 / 522.44 ms │      no change │
│ QQuery 23 │        944.06 / 950.99 ±6.89 / 960.45 ms │     3961.96 / 4004.39 ±37.23 / 4054.62 ms │   4.21x slower │
│ QQuery 24 │       415.98 / 429.27 ±11.09 / 443.00 ms │     2900.51 / 2977.24 ±58.74 / 3073.15 ms │   6.94x slower │
│ QQuery 25 │        355.56 / 365.32 ±6.01 / 373.73 ms │         370.08 / 372.78 ±2.93 / 378.19 ms │      no change │
│ QQuery 26 │           80.72 / 83.21 ±1.42 / 84.95 ms │            85.76 / 91.80 ±3.96 / 97.92 ms │   1.10x slower │
│ QQuery 27 │        357.78 / 363.23 ±3.42 / 367.76 ms │         362.86 / 367.31 ±3.51 / 373.59 ms │      no change │
│ QQuery 28 │        149.84 / 154.40 ±4.19 / 159.83 ms │         153.79 / 155.29 ±1.35 / 157.26 ms │      no change │
│ QQuery 29 │        296.84 / 307.31 ±7.66 / 319.72 ms │         311.75 / 314.08 ±1.60 / 316.37 ms │      no change │
│ QQuery 30 │           44.76 / 47.21 ±1.66 / 48.64 ms │            45.92 / 48.47 ±2.42 / 52.89 ms │      no change │
│ QQuery 31 │        171.58 / 177.06 ±4.25 / 184.29 ms │         172.38 / 177.28 ±3.36 / 181.55 ms │      no change │
│ QQuery 32 │           56.61 / 57.53 ±0.60 / 58.34 ms │            58.58 / 60.42 ±2.56 / 65.48 ms │   1.05x slower │
│ QQuery 33 │        141.26 / 145.36 ±3.81 / 150.34 ms │         142.21 / 146.31 ±3.92 / 152.76 ms │      no change │
│ QQuery 34 │        106.52 / 107.09 ±0.47 / 107.62 ms │         108.28 / 112.56 ±2.42 / 115.77 ms │   1.05x slower │
│ QQuery 35 │        112.11 / 115.22 ±1.82 / 117.66 ms │         108.30 / 110.77 ±1.90 / 113.29 ms │      no change │
│ QQuery 36 │        217.66 / 222.79 ±4.22 / 228.49 ms │         226.31 / 230.72 ±3.23 / 235.02 ms │      no change │
│ QQuery 37 │        178.21 / 181.30 ±2.17 / 184.60 ms │         182.96 / 185.27 ±2.01 / 188.89 ms │      no change │
│ QQuery 38 │           84.20 / 87.44 ±2.06 / 90.20 ms │           88.39 / 94.00 ±4.32 / 101.48 ms │   1.08x slower │
│ QQuery 39 │        125.52 / 138.37 ±9.74 / 152.41 ms │         129.20 / 133.69 ±5.19 / 143.23 ms │      no change │
│ QQuery 40 │        112.70 / 122.15 ±6.54 / 132.27 ms │        111.33 / 121.82 ±11.42 / 140.73 ms │      no change │
│ QQuery 41 │           15.29 / 16.50 ±1.28 / 18.30 ms │            14.98 / 15.68 ±0.84 / 17.33 ms │      no change │
│ QQuery 42 │        107.08 / 109.87 ±2.07 / 113.18 ms │         111.16 / 113.08 ±1.26 / 114.61 ms │      no change │
│ QQuery 43 │           85.01 / 86.83 ±1.15 / 87.97 ms │            84.21 / 84.79 ±0.80 / 86.36 ms │      no change │
│ QQuery 44 │           11.42 / 12.04 ±0.56 / 12.90 ms │            11.29 / 11.92 ±0.48 / 12.69 ms │      no change │
│ QQuery 45 │           53.05 / 54.72 ±1.23 / 56.74 ms │            55.18 / 56.86 ±1.26 / 59.04 ms │      no change │
│ QQuery 46 │       229.92 / 242.84 ±10.58 / 256.40 ms │        231.80 / 247.91 ±10.66 / 258.68 ms │      no change │
│ QQuery 47 │       708.87 / 797.54 ±50.85 / 857.01 ms │        765.85 / 790.16 ±20.75 / 818.50 ms │      no change │
│ QQuery 48 │        292.31 / 298.85 ±6.08 / 308.93 ms │        281.26 / 299.03 ±10.98 / 310.80 ms │      no change │
│ QQuery 49 │        258.27 / 263.41 ±3.97 / 269.74 ms │         254.86 / 263.32 ±7.04 / 274.77 ms │      no change │
│ QQuery 50 │        240.33 / 243.72 ±5.64 / 254.96 ms │         230.75 / 243.32 ±8.97 / 253.89 ms │      no change │
│ QQuery 51 │        185.34 / 186.88 ±1.18 / 188.33 ms │         181.61 / 190.00 ±5.07 / 195.92 ms │      no change │
│ QQuery 52 │        110.05 / 111.80 ±1.92 / 114.36 ms │         108.37 / 111.77 ±2.75 / 116.48 ms │      no change │
│ QQuery 53 │        105.76 / 107.34 ±1.09 / 109.01 ms │         103.26 / 104.60 ±1.25 / 106.86 ms │      no change │
│ QQuery 54 │        154.10 / 156.76 ±2.52 / 161.53 ms │         158.33 / 163.32 ±3.72 / 168.31 ms │      no change │
│ QQuery 55 │        109.20 / 110.28 ±0.78 / 111.29 ms │         108.15 / 110.77 ±1.97 / 113.88 ms │      no change │
│ QQuery 56 │        146.98 / 147.99 ±0.70 / 148.68 ms │         142.04 / 146.30 ±2.42 / 148.62 ms │      no change │
│ QQuery 57 │        181.68 / 183.93 ±2.92 / 189.70 ms │         172.60 / 179.00 ±5.26 / 186.92 ms │      no change │
│ QQuery 58 │       301.95 / 318.16 ±13.30 / 338.46 ms │         282.00 / 288.08 ±4.09 / 292.64 ms │  +1.10x faster │
│ QQuery 59 │        206.12 / 208.37 ±1.61 / 210.41 ms │         200.17 / 206.46 ±4.89 / 212.15 ms │      no change │
│ QQuery 60 │        149.14 / 151.68 ±1.72 / 154.36 ms │         146.12 / 148.84 ±1.72 / 150.96 ms │      no change │
│ QQuery 61 │        180.39 / 182.46 ±1.09 / 183.48 ms │         174.17 / 179.12 ±4.54 / 186.10 ms │      no change │
│ QQuery 62 │      947.43 / 990.56 ±26.90 / 1032.01 ms │      951.31 / 1007.33 ±47.75 / 1070.11 ms │      no change │
│ QQuery 63 │        105.85 / 107.24 ±0.84 / 108.21 ms │         104.29 / 110.94 ±4.02 / 115.30 ms │      no change │
│ QQuery 64 │       735.67 / 752.58 ±10.15 / 761.83 ms │        718.16 / 733.07 ±10.08 / 746.01 ms │      no change │
│ QQuery 65 │       255.73 / 273.13 ±13.54 / 292.10 ms │        254.22 / 268.21 ±14.66 / 294.22 ms │      no change │
│ QQuery 66 │        259.37 / 268.47 ±7.07 / 278.77 ms │        251.65 / 276.79 ±20.00 / 310.32 ms │      no change │
│ QQuery 67 │       310.75 / 327.02 ±14.44 / 348.76 ms │        315.00 / 333.54 ±12.17 / 346.98 ms │      no change │
│ QQuery 68 │       288.51 / 302.47 ±11.76 / 317.66 ms │         300.27 / 304.78 ±4.07 / 312.23 ms │      no change │
│ QQuery 69 │        108.14 / 109.31 ±0.81 / 110.30 ms │         105.13 / 106.99 ±1.21 / 108.20 ms │      no change │
│ QQuery 70 │       350.27 / 361.28 ±20.28 / 401.77 ms │         348.12 / 357.67 ±7.40 / 370.66 ms │      no change │
│ QQuery 71 │        138.79 / 140.34 ±1.25 / 142.44 ms │         140.46 / 145.11 ±3.07 / 148.34 ms │      no change │
│ QQuery 72 │       734.45 / 746.28 ±10.08 / 764.38 ms │        750.45 / 762.30 ±15.40 / 791.76 ms │      no change │
│ QQuery 73 │        104.61 / 107.85 ±3.50 / 114.35 ms │         110.05 / 112.08 ±1.45 / 114.32 ms │      no change │
│ QQuery 74 │       591.48 / 636.63 ±32.99 / 667.18 ms │        676.14 / 690.25 ±12.43 / 710.43 ms │   1.08x slower │
│ QQuery 75 │        278.67 / 286.50 ±4.69 / 291.79 ms │         287.01 / 289.18 ±1.94 / 291.54 ms │      no change │
│ QQuery 76 │        134.92 / 137.73 ±1.93 / 140.86 ms │         139.29 / 141.77 ±1.81 / 144.26 ms │      no change │
│ QQuery 77 │        188.15 / 191.37 ±3.28 / 196.00 ms │         195.65 / 196.71 ±0.79 / 197.77 ms │      no change │
│ QQuery 78 │        358.72 / 368.34 ±9.36 / 383.27 ms │         373.83 / 376.16 ±3.21 / 382.43 ms │      no change │
│ QQuery 79 │        234.79 / 245.62 ±9.43 / 260.53 ms │         257.73 / 262.91 ±3.55 / 268.65 ms │   1.07x slower │
│ QQuery 80 │        327.17 / 336.63 ±6.04 / 344.33 ms │         338.45 / 341.07 ±1.62 / 342.96 ms │      no change │
│ QQuery 81 │           26.56 / 27.66 ±0.85 / 29.00 ms │            28.10 / 28.74 ±0.68 / 30.04 ms │      no change │
│ QQuery 82 │        199.85 / 204.26 ±5.64 / 215.19 ms │         200.61 / 208.06 ±4.89 / 213.48 ms │      no change │
│ QQuery 83 │           41.35 / 43.09 ±1.22 / 44.92 ms │            41.31 / 43.85 ±1.50 / 45.81 ms │      no change │
│ QQuery 84 │           50.66 / 51.30 ±0.49 / 52.08 ms │            51.11 / 53.09 ±1.59 / 54.55 ms │      no change │
│ QQuery 85 │        150.30 / 152.32 ±2.44 / 156.86 ms │         153.42 / 154.49 ±1.20 / 156.49 ms │      no change │
│ QQuery 86 │           41.67 / 42.25 ±0.52 / 43.04 ms │            40.70 / 41.92 ±1.17 / 44.06 ms │      no change │
│ QQuery 87 │           88.53 / 90.92 ±2.86 / 96.55 ms │            88.28 / 92.06 ±3.70 / 98.56 ms │      no change │
│ QQuery 88 │        102.70 / 105.02 ±1.17 / 105.72 ms │         102.49 / 104.43 ±1.58 / 106.55 ms │      no change │
│ QQuery 89 │        118.01 / 119.59 ±0.85 / 120.48 ms │         120.13 / 121.39 ±1.56 / 124.37 ms │      no change │
│ QQuery 90 │           24.29 / 24.93 ±0.53 / 25.79 ms │            24.04 / 25.06 ±0.63 / 25.86 ms │      no change │
│ QQuery 91 │           66.17 / 67.69 ±1.13 / 69.36 ms │            67.98 / 69.06 ±0.82 / 70.48 ms │      no change │
│ QQuery 92 │           58.69 / 59.48 ±0.65 / 60.21 ms │            58.89 / 60.00 ±1.19 / 62.06 ms │      no change │
│ QQuery 93 │        193.53 / 198.85 ±5.09 / 205.33 ms │         191.90 / 200.32 ±5.43 / 206.90 ms │      no change │
│ QQuery 94 │           61.36 / 62.26 ±0.74 / 63.26 ms │            62.33 / 63.67 ±0.78 / 64.74 ms │      no change │
│ QQuery 95 │        133.44 / 137.88 ±2.61 / 141.29 ms │         135.82 / 138.39 ±1.75 / 140.64 ms │      no change │
│ QQuery 96 │           73.50 / 74.80 ±0.82 / 76.03 ms │            72.76 / 74.69 ±1.10 / 76.11 ms │      no change │
│ QQuery 97 │        128.85 / 135.81 ±5.75 / 143.80 ms │         127.97 / 134.78 ±6.08 / 143.43 ms │      no change │
│ QQuery 98 │        157.79 / 162.41 ±4.37 / 169.63 ms │         156.73 / 162.41 ±3.77 / 166.70 ms │      no change │
│ QQuery 99 │ 10937.59 / 11014.63 ±60.82 / 11089.38 ms │  10932.38 / 11004.53 ±44.68 / 11072.84 ms │      no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 35028.79ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 53977.19ms │
│ Average Time (HEAD)                       │   353.83ms │
│ Average Time (neilc_scalar-subquery-expr) │   545.22ms │
│ Queries Faster                            │          4 │
│ Queries Slower                            │         12 │
│ Queries with No Change                    │         83 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 175.5s
Peak memory 5.4 GiB
Avg memory 4.5 GiB
CPU user 281.1s
CPU sys 19.8s
Disk read 0 B
Disk write 638.9 MiB

tpcds — branch

Metric Value
Wall time 270.2s
Peak memory 5.1 GiB
Avg memory 4.5 GiB
CPU user 359.3s
CPU sys 25.0s
Disk read 0 B
Disk write 148.0 KiB

File an issue against this benchmark runner

@neilconway
Copy link
Copy Markdown
Contributor Author

From doing some digging on TPC-H query 11, I think the problem is that we're able to pushdown projections in the original plan but we weren't doing so for subqueries. That is fixed, will need to check out if there are any further performance regressions.

@neilconway
Copy link
Copy Markdown
Contributor Author

run benchmark tpch tpch10 tpcds

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4158271774-616-t7nfd 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (2c256e7) to f830ee3 (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4158271774-615-tmzkb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (2c256e7) to f830ee3 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4158271774-617-swlqf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/scalar-subquery-expr (2c256e7) to f830ee3 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     neilc_scalar-subquery-expr ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 45.57 / 47.39 ±1.30 / 48.58 ms │ 45.55 / 46.06 ±0.71 / 47.47 ms │     no change │
│ QQuery 2  │ 21.45 / 21.92 ±0.52 / 22.90 ms │ 21.51 / 21.82 ±0.24 / 22.23 ms │     no change │
│ QQuery 3  │ 32.25 / 33.17 ±1.37 / 35.90 ms │ 32.10 / 32.34 ±0.19 / 32.65 ms │     no change │
│ QQuery 4  │ 20.90 / 21.87 ±1.21 / 24.19 ms │ 20.55 / 21.26 ±0.72 / 22.26 ms │     no change │
│ QQuery 5  │ 50.84 / 52.93 ±1.50 / 55.30 ms │ 47.90 / 50.45 ±1.85 / 52.28 ms │     no change │
│ QQuery 6  │ 17.24 / 17.47 ±0.19 / 17.81 ms │ 17.23 / 17.36 ±0.15 / 17.64 ms │     no change │
│ QQuery 7  │ 54.77 / 55.33 ±0.32 / 55.67 ms │ 53.97 / 56.59 ±1.90 / 58.99 ms │     no change │
│ QQuery 8  │ 48.90 / 49.29 ±0.34 / 49.82 ms │ 48.24 / 50.35 ±2.61 / 55.09 ms │     no change │
│ QQuery 9  │ 55.34 / 56.02 ±0.74 / 56.97 ms │ 54.54 / 55.43 ±0.59 / 56.34 ms │     no change │
│ QQuery 10 │ 71.95 / 73.87 ±1.62 / 76.44 ms │ 70.66 / 71.36 ±0.74 / 72.75 ms │     no change │
│ QQuery 11 │ 14.21 / 14.39 ±0.21 / 14.76 ms │ 13.80 / 14.41 ±0.72 / 15.82 ms │     no change │
│ QQuery 12 │ 27.92 / 28.38 ±0.30 / 28.73 ms │ 28.18 / 29.69 ±2.06 / 33.73 ms │     no change │
│ QQuery 13 │ 38.38 / 39.19 ±0.73 / 40.51 ms │ 38.45 / 40.01 ±1.15 / 40.99 ms │     no change │
│ QQuery 14 │ 28.95 / 30.56 ±1.60 / 33.21 ms │ 28.47 / 28.80 ±0.39 / 29.57 ms │ +1.06x faster │
│ QQuery 15 │ 33.60 / 34.46 ±0.69 / 35.45 ms │ 33.34 / 34.18 ±0.98 / 36.04 ms │     no change │
│ QQuery 16 │ 16.01 / 16.31 ±0.27 / 16.68 ms │ 16.51 / 16.74 ±0.17 / 17.00 ms │     no change │
│ QQuery 17 │ 72.66 / 73.60 ±0.56 / 74.19 ms │ 73.60 / 74.79 ±1.22 / 77.08 ms │     no change │
│ QQuery 18 │ 77.06 / 79.34 ±1.98 / 82.51 ms │ 79.11 / 80.06 ±0.80 / 81.38 ms │     no change │
│ QQuery 19 │ 37.53 / 38.20 ±0.65 / 39.23 ms │ 38.40 / 39.03 ±0.69 / 40.26 ms │     no change │
│ QQuery 20 │ 41.28 / 42.18 ±0.82 / 43.45 ms │ 40.30 / 42.30 ±1.19 / 43.52 ms │     no change │
│ QQuery 21 │ 63.96 / 65.29 ±1.15 / 67.42 ms │ 64.46 / 66.35 ±1.39 / 68.09 ms │     no change │
│ QQuery 22 │ 17.63 / 18.14 ±0.64 / 19.40 ms │ 60.15 / 61.70 ±1.43 / 64.33 ms │  3.40x slower │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 909.30ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 951.09ms │
│ Average Time (HEAD)                       │  41.33ms │
│ Average Time (neilc_scalar-subquery-expr) │  43.23ms │
│ Queries Faster                            │        1 │
│ Queries Slower                            │        1 │
│ Queries with No Change                    │       20 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 4.8s
Peak memory 4.1 GiB
Avg memory 3.6 GiB
CPU user 33.7s
CPU sys 3.0s
Disk read 0 B
Disk write 140.0 KiB

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 4.0 GiB
Avg memory 3.6 GiB
CPU user 33.9s
CPU sys 2.8s
Disk read 0 B
Disk write 65.2 MiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃         neilc_scalar-subquery-expr ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │  374.96 / 376.79 ±1.87 / 379.50 ms │  375.11 / 376.09 ±0.63 / 377.03 ms │    no change │
│ QQuery 2  │  137.75 / 138.38 ±0.49 / 139.02 ms │  136.55 / 141.26 ±3.13 / 145.61 ms │    no change │
│ QQuery 3  │  295.06 / 301.98 ±5.01 / 308.08 ms │  299.01 / 303.93 ±4.52 / 309.78 ms │    no change │
│ QQuery 4  │  158.17 / 159.70 ±1.17 / 160.92 ms │  155.30 / 156.85 ±1.70 / 160.12 ms │    no change │
│ QQuery 5  │  442.69 / 445.64 ±2.57 / 449.71 ms │  430.11 / 435.46 ±3.49 / 439.79 ms │    no change │
│ QQuery 6  │  134.28 / 135.07 ±0.53 / 135.72 ms │  132.61 / 134.56 ±1.73 / 137.21 ms │    no change │
│ QQuery 7  │  566.59 / 579.56 ±7.48 / 589.98 ms │  568.50 / 576.23 ±7.21 / 586.75 ms │    no change │
│ QQuery 8  │  489.57 / 494.28 ±2.62 / 497.40 ms │  488.22 / 489.69 ±1.27 / 492.04 ms │    no change │
│ QQuery 9  │  683.57 / 701.09 ±9.03 / 707.53 ms │ 680.65 / 701.03 ±16.26 / 728.03 ms │    no change │
│ QQuery 10 │  341.87 / 345.57 ±2.36 / 348.86 ms │  329.10 / 339.15 ±6.32 / 345.99 ms │    no change │
│ QQuery 11 │  107.53 / 113.56 ±4.51 / 119.05 ms │  109.08 / 113.00 ±3.18 / 117.51 ms │    no change │
│ QQuery 12 │  207.55 / 209.75 ±2.26 / 213.46 ms │  209.99 / 212.50 ±2.75 / 217.85 ms │    no change │
│ QQuery 13 │  323.34 / 329.05 ±5.95 / 339.76 ms │  316.93 / 326.24 ±5.97 / 333.90 ms │    no change │
│ QQuery 14 │  187.37 / 190.49 ±2.46 / 193.39 ms │  185.98 / 189.97 ±3.28 / 194.53 ms │    no change │
│ QQuery 15 │  342.07 / 343.65 ±1.50 / 346.41 ms │  338.05 / 340.62 ±1.50 / 342.20 ms │    no change │
│ QQuery 16 │     90.73 / 92.38 ±1.78 / 95.66 ms │     86.83 / 92.27 ±4.12 / 98.93 ms │    no change │
│ QQuery 17 │  768.73 / 774.58 ±3.89 / 779.29 ms │  773.14 / 777.33 ±2.39 / 779.79 ms │    no change │
│ QQuery 18 │ 848.24 / 892.89 ±36.30 / 958.38 ms │ 837.53 / 866.50 ±18.73 / 891.74 ms │    no change │
│ QQuery 19 │  270.17 / 274.86 ±6.38 / 287.13 ms │  272.60 / 279.06 ±5.41 / 287.82 ms │    no change │
│ QQuery 20 │  330.26 / 332.66 ±2.34 / 336.80 ms │  322.96 / 332.44 ±4.87 / 335.92 ms │    no change │
│ QQuery 21 │ 867.71 / 880.25 ±11.51 / 895.36 ms │  859.27 / 867.85 ±8.73 / 881.37 ms │    no change │
│ QQuery 22 │     81.49 / 87.21 ±5.54 / 96.75 ms │  113.26 / 115.45 ±2.06 / 118.87 ms │ 1.32x slower │
└───────────┴────────────────────────────────────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 8199.37ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 8167.48ms │
│ Average Time (HEAD)                       │  372.70ms │
│ Average Time (neilc_scalar-subquery-expr) │  371.25ms │
│ Queries Faster                            │         0 │
│ Queries Slower                            │         1 │
│ Queries with No Change                    │        21 │
│ Queries with Failure                      │         0 │
└───────────────────────────────────────────┴───────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 41.3s
Peak memory 9.8 GiB
Avg memory 7.5 GiB
CPU user 427.6s
CPU sys 32.5s
Disk read 0 B
Disk write 2.9 GiB

tpch10 — branch

Metric Value
Wall time 41.2s
Peak memory 10.0 GiB
Avg memory 7.6 GiB
CPU user 427.8s
CPU sys 31.5s
Disk read 0 B
Disk write 840.0 KiB

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_scalar-subquery-expr
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃               neilc_scalar-subquery-expr ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 1  │           42.99 / 43.84 ±0.84 / 45.45 ms │           43.86 / 44.46 ±0.51 / 45.28 ms │      no change │
│ QQuery 2  │        146.56 / 147.30 ±0.59 / 148.35 ms │        145.92 / 146.72 ±0.64 / 147.69 ms │      no change │
│ QQuery 3  │        114.34 / 115.01 ±0.62 / 116.17 ms │        114.13 / 114.80 ±0.64 / 115.91 ms │      no change │
│ QQuery 4  │    1251.84 / 1278.42 ±16.70 / 1297.20 ms │    1281.00 / 1306.17 ±19.04 / 1332.61 ms │      no change │
│ QQuery 5  │        175.53 / 176.71 ±0.93 / 177.74 ms │        172.68 / 175.41 ±1.89 / 178.11 ms │      no change │
│ QQuery 6  │     977.48 / 1003.36 ±19.90 / 1033.95 ms │        141.97 / 144.51 ±1.59 / 146.81 ms │  +6.94x faster │
│ QQuery 7  │        357.06 / 358.56 ±1.04 / 359.80 ms │        353.73 / 356.14 ±1.63 / 358.48 ms │      no change │
│ QQuery 8  │        116.56 / 117.44 ±0.62 / 118.26 ms │        116.58 / 117.24 ±0.60 / 118.19 ms │      no change │
│ QQuery 9  │        102.14 / 105.26 ±3.37 / 110.68 ms │ 11821.52 / 11847.94 ±22.27 / 11885.69 ms │ 112.56x slower │
│ QQuery 10 │        106.32 / 108.90 ±1.29 / 109.56 ms │        111.44 / 113.55 ±1.34 / 115.35 ms │      no change │
│ QQuery 11 │        855.08 / 872.05 ±9.80 / 883.03 ms │       868.38 / 888.02 ±14.68 / 905.52 ms │      no change │
│ QQuery 12 │           44.77 / 46.65 ±0.98 / 47.54 ms │           44.88 / 46.86 ±1.42 / 48.37 ms │      no change │
│ QQuery 13 │        404.62 / 406.77 ±2.56 / 411.51 ms │        402.72 / 405.45 ±1.67 / 407.69 ms │      no change │
│ QQuery 14 │     1025.71 / 1039.05 ±7.73 / 1048.55 ms │     1088.67 / 1091.15 ±3.14 / 1096.71 ms │   1.05x slower │
│ QQuery 15 │           15.82 / 16.88 ±1.09 / 18.93 ms │           16.42 / 17.13 ±0.92 / 18.88 ms │      no change │
│ QQuery 16 │           41.06 / 42.22 ±0.98 / 43.47 ms │           40.93 / 42.09 ±0.68 / 42.97 ms │      no change │
│ QQuery 17 │        242.01 / 243.52 ±1.84 / 246.82 ms │        239.69 / 243.26 ±3.16 / 247.97 ms │      no change │
│ QQuery 18 │        129.93 / 130.58 ±0.94 / 132.45 ms │        131.42 / 132.84 ±1.47 / 135.45 ms │      no change │
│ QQuery 19 │        155.49 / 158.03 ±1.46 / 159.63 ms │        154.42 / 155.76 ±1.09 / 157.53 ms │      no change │
│ QQuery 20 │           13.25 / 14.44 ±0.66 / 15.19 ms │           13.84 / 14.28 ±0.45 / 15.11 ms │      no change │
│ QQuery 21 │           20.37 / 20.87 ±0.39 / 21.38 ms │           19.65 / 19.88 ±0.24 / 20.28 ms │      no change │
│ QQuery 22 │        483.38 / 486.92 ±3.43 / 492.16 ms │        488.68 / 489.81 ±0.74 / 490.75 ms │      no change │
│ QQuery 23 │        878.96 / 887.45 ±6.87 / 899.02 ms │     1179.95 / 1188.81 ±4.45 / 1191.54 ms │   1.34x slower │
│ QQuery 24 │        419.86 / 421.35 ±2.66 / 426.67 ms │        792.14 / 798.65 ±6.21 / 809.36 ms │   1.90x slower │
│ QQuery 25 │        354.90 / 357.76 ±2.58 / 361.97 ms │        354.12 / 355.40 ±1.75 / 358.86 ms │      no change │
│ QQuery 26 │           80.84 / 83.18 ±1.26 / 84.21 ms │           84.33 / 86.19 ±1.11 / 87.77 ms │      no change │
│ QQuery 27 │        350.06 / 352.56 ±2.26 / 356.70 ms │        348.48 / 349.56 ±0.85 / 350.70 ms │      no change │
│ QQuery 28 │        149.24 / 151.79 ±2.16 / 155.14 ms │        148.61 / 149.63 ±0.82 / 150.77 ms │      no change │
│ QQuery 29 │        298.96 / 302.49 ±3.48 / 308.21 ms │        298.64 / 300.49 ±1.62 / 303.48 ms │      no change │
│ QQuery 30 │           43.86 / 45.68 ±1.01 / 46.79 ms │           43.20 / 44.85 ±1.27 / 46.49 ms │      no change │
│ QQuery 31 │        171.61 / 174.11 ±1.61 / 175.83 ms │        168.71 / 172.04 ±1.78 / 173.79 ms │      no change │
│ QQuery 32 │         55.84 / 65.86 ±17.58 / 100.99 ms │           56.68 / 58.24 ±1.02 / 59.59 ms │  +1.13x faster │
│ QQuery 33 │        142.45 / 143.47 ±1.16 / 145.71 ms │        140.84 / 141.93 ±0.85 / 142.98 ms │      no change │
│ QQuery 34 │        105.53 / 107.72 ±1.12 / 108.58 ms │        107.99 / 108.86 ±0.56 / 109.39 ms │      no change │
│ QQuery 35 │        108.17 / 109.44 ±1.04 / 110.78 ms │        109.32 / 111.06 ±1.09 / 112.50 ms │      no change │
│ QQuery 36 │        215.07 / 220.48 ±3.14 / 224.42 ms │        213.14 / 218.62 ±4.16 / 224.48 ms │      no change │
│ QQuery 37 │        176.70 / 181.28 ±2.78 / 184.96 ms │        173.27 / 178.32 ±2.73 / 181.37 ms │      no change │
│ QQuery 38 │           82.06 / 86.76 ±4.44 / 92.45 ms │           85.06 / 88.70 ±3.12 / 92.67 ms │      no change │
│ QQuery 39 │        124.61 / 127.93 ±2.66 / 132.17 ms │        125.17 / 128.52 ±2.18 / 131.40 ms │      no change │
│ QQuery 40 │        114.11 / 118.85 ±2.84 / 122.98 ms │        108.85 / 116.89 ±7.17 / 130.37 ms │      no change │
│ QQuery 41 │           14.09 / 15.17 ±1.01 / 16.92 ms │           14.60 / 15.54 ±0.73 / 16.80 ms │      no change │
│ QQuery 42 │        106.40 / 108.55 ±1.97 / 111.11 ms │        106.09 / 107.88 ±1.72 / 110.98 ms │      no change │
│ QQuery 43 │           83.33 / 84.33 ±1.39 / 87.08 ms │           83.79 / 84.85 ±0.83 / 86.22 ms │      no change │
│ QQuery 44 │           11.51 / 11.94 ±0.42 / 12.68 ms │           11.33 / 12.51 ±1.63 / 15.70 ms │      no change │
│ QQuery 45 │           52.19 / 53.55 ±0.94 / 54.66 ms │           53.05 / 53.82 ±0.88 / 55.52 ms │      no change │
│ QQuery 46 │        230.30 / 232.36 ±1.45 / 234.35 ms │        231.36 / 233.74 ±2.33 / 237.74 ms │      no change │
│ QQuery 47 │       697.41 / 712.73 ±12.46 / 735.13 ms │        687.05 / 692.11 ±4.07 / 699.36 ms │      no change │
│ QQuery 48 │        286.34 / 292.68 ±4.09 / 297.60 ms │        287.49 / 292.41 ±3.96 / 296.08 ms │      no change │
│ QQuery 49 │        256.40 / 258.18 ±1.43 / 260.69 ms │        256.09 / 258.26 ±1.90 / 260.58 ms │      no change │
│ QQuery 50 │        225.50 / 235.81 ±6.26 / 243.19 ms │        228.74 / 232.52 ±4.22 / 240.69 ms │      no change │
│ QQuery 51 │        183.79 / 185.50 ±1.96 / 189.27 ms │        182.34 / 185.51 ±2.93 / 190.77 ms │      no change │
│ QQuery 52 │        108.57 / 109.85 ±0.96 / 111.08 ms │        106.44 / 107.10 ±0.43 / 107.61 ms │      no change │
│ QQuery 53 │        102.31 / 103.60 ±0.80 / 104.48 ms │        102.15 / 104.06 ±1.54 / 106.20 ms │      no change │
│ QQuery 54 │        147.46 / 148.98 ±1.16 / 150.91 ms │        159.18 / 162.97 ±2.81 / 166.48 ms │   1.09x slower │
│ QQuery 55 │        107.01 / 107.74 ±0.81 / 109.19 ms │        106.45 / 107.81 ±1.16 / 109.33 ms │      no change │
│ QQuery 56 │        141.55 / 142.70 ±1.16 / 144.48 ms │        140.54 / 141.90 ±1.59 / 145.01 ms │      no change │
│ QQuery 57 │        178.79 / 181.06 ±1.44 / 182.98 ms │        175.17 / 175.79 ±0.67 / 177.01 ms │      no change │
│ QQuery 58 │        299.63 / 310.28 ±7.10 / 321.62 ms │        283.61 / 286.62 ±1.72 / 288.59 ms │  +1.08x faster │
│ QQuery 59 │        199.23 / 202.00 ±1.49 / 203.37 ms │        197.88 / 201.57 ±2.05 / 203.56 ms │      no change │
│ QQuery 60 │        145.95 / 148.31 ±2.02 / 151.58 ms │        144.08 / 144.93 ±0.83 / 146.10 ms │      no change │
│ QQuery 61 │        171.66 / 173.92 ±1.46 / 175.96 ms │        170.25 / 172.75 ±1.69 / 174.66 ms │      no change │
│ QQuery 62 │       871.63 / 906.48 ±28.30 / 956.97 ms │       867.36 / 896.52 ±20.46 / 928.90 ms │      no change │
│ QQuery 63 │        104.69 / 107.06 ±2.53 / 111.71 ms │        102.77 / 105.66 ±1.98 / 108.49 ms │      no change │
│ QQuery 64 │        705.26 / 711.72 ±4.34 / 716.88 ms │        703.90 / 706.02 ±2.52 / 710.68 ms │      no change │
│ QQuery 65 │        253.29 / 257.31 ±4.63 / 265.21 ms │        247.44 / 251.19 ±3.83 / 258.15 ms │      no change │
│ QQuery 66 │       247.02 / 261.75 ±11.28 / 274.95 ms │       243.08 / 254.50 ±12.66 / 277.23 ms │      no change │
│ QQuery 67 │        306.83 / 312.76 ±3.91 / 317.95 ms │        303.63 / 309.44 ±5.50 / 319.87 ms │      no change │
│ QQuery 68 │        278.66 / 285.33 ±3.95 / 290.97 ms │        279.77 / 283.42 ±2.87 / 287.24 ms │      no change │
│ QQuery 69 │        102.13 / 103.42 ±0.96 / 104.61 ms │        104.42 / 106.22 ±1.47 / 108.40 ms │      no change │
│ QQuery 70 │       321.49 / 354.45 ±22.18 / 385.44 ms │       327.83 / 347.00 ±14.59 / 370.70 ms │      no change │
│ QQuery 71 │        135.90 / 136.80 ±0.76 / 137.96 ms │        134.69 / 137.11 ±1.72 / 139.40 ms │      no change │
│ QQuery 72 │       697.32 / 717.13 ±10.21 / 725.43 ms │       703.33 / 723.44 ±12.51 / 737.81 ms │      no change │
│ QQuery 73 │        102.74 / 105.23 ±2.87 / 110.24 ms │        104.27 / 105.90 ±1.39 / 108.20 ms │      no change │
│ QQuery 74 │        546.09 / 554.74 ±6.47 / 562.75 ms │        537.01 / 549.80 ±6.64 / 555.91 ms │      no change │
│ QQuery 75 │        280.26 / 281.90 ±1.68 / 284.84 ms │        278.65 / 281.01 ±1.19 / 281.76 ms │      no change │
│ QQuery 76 │        135.12 / 136.71 ±1.34 / 139.14 ms │        131.82 / 134.62 ±1.47 / 136.00 ms │      no change │
│ QQuery 77 │        190.65 / 192.31 ±1.36 / 194.21 ms │        189.46 / 190.39 ±0.74 / 191.26 ms │      no change │
│ QQuery 78 │        355.27 / 359.83 ±2.91 / 363.80 ms │        354.92 / 357.50 ±1.85 / 359.70 ms │      no change │
│ QQuery 79 │        233.54 / 239.27 ±3.77 / 244.34 ms │        231.91 / 235.34 ±2.48 / 238.98 ms │      no change │
│ QQuery 80 │        334.27 / 338.16 ±4.48 / 346.71 ms │        330.98 / 334.58 ±2.71 / 338.43 ms │      no change │
│ QQuery 81 │           27.28 / 29.11 ±1.26 / 30.52 ms │           26.57 / 28.99 ±2.19 / 31.78 ms │      no change │
│ QQuery 82 │        204.89 / 206.60 ±2.56 / 211.57 ms │        197.92 / 200.64 ±2.12 / 204.02 ms │      no change │
│ QQuery 83 │           40.99 / 41.56 ±0.49 / 42.21 ms │           39.70 / 41.23 ±1.28 / 43.42 ms │      no change │
│ QQuery 84 │           49.95 / 50.39 ±0.28 / 50.75 ms │           49.41 / 50.48 ±1.32 / 53.09 ms │      no change │
│ QQuery 85 │        150.20 / 152.10 ±1.42 / 154.43 ms │        150.13 / 152.49 ±2.55 / 157.21 ms │      no change │
│ QQuery 86 │           39.93 / 40.61 ±0.38 / 40.92 ms │           38.94 / 39.77 ±0.65 / 40.85 ms │      no change │
│ QQuery 87 │           88.97 / 91.89 ±2.57 / 96.36 ms │           86.80 / 88.74 ±2.57 / 93.65 ms │      no change │
│ QQuery 88 │        102.04 / 103.20 ±0.67 / 103.88 ms │        101.02 / 101.64 ±0.46 / 102.38 ms │      no change │
│ QQuery 89 │        120.65 / 122.39 ±1.03 / 123.56 ms │        118.05 / 118.95 ±0.54 / 119.62 ms │      no change │
│ QQuery 90 │           24.54 / 25.41 ±0.87 / 27.07 ms │           24.12 / 24.57 ±0.27 / 24.93 ms │      no change │
│ QQuery 91 │           65.31 / 66.36 ±0.98 / 67.66 ms │           62.82 / 64.58 ±1.14 / 66.14 ms │      no change │
│ QQuery 92 │           59.76 / 60.30 ±0.62 / 61.46 ms │           56.99 / 58.34 ±1.02 / 59.82 ms │      no change │
│ QQuery 93 │        193.14 / 198.05 ±4.63 / 205.63 ms │        189.25 / 192.65 ±2.02 / 195.58 ms │      no change │
│ QQuery 94 │           63.62 / 64.38 ±0.58 / 65.06 ms │           60.95 / 61.50 ±0.33 / 61.92 ms │      no change │
│ QQuery 95 │        138.70 / 139.07 ±0.48 / 140.01 ms │        134.22 / 135.84 ±1.87 / 139.33 ms │      no change │
│ QQuery 96 │           74.68 / 76.48 ±1.43 / 78.74 ms │           74.21 / 74.55 ±0.20 / 74.79 ms │      no change │
│ QQuery 97 │        131.33 / 133.85 ±2.00 / 137.22 ms │        128.24 / 130.54 ±1.54 / 132.01 ms │      no change │
│ QQuery 98 │        156.75 / 157.50 ±0.48 / 158.03 ms │        150.30 / 153.07 ±1.88 / 155.06 ms │      no change │
│ QQuery 99 │ 10744.12 / 10789.25 ±31.79 / 10827.55 ms │ 10751.60 / 10791.55 ±34.92 / 10844.14 ms │      no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 33601.08ms │
│ Total Time (neilc_scalar-subquery-expr)   │ 45104.68ms │
│ Average Time (HEAD)                       │   339.40ms │
│ Average Time (neilc_scalar-subquery-expr) │   455.60ms │
│ Queries Faster                            │          3 │
│ Queries Slower                            │          5 │
│ Queries with No Change                    │         91 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 168.3s
Peak memory 5.6 GiB
Avg memory 4.6 GiB
CPU user 269.8s
CPU sys 18.6s
Disk read 0 B
Disk write 708.7 MiB

tpcds — branch

Metric Value
Wall time 225.8s
Peak memory 5.4 GiB
Avg memory 4.6 GiB
CPU user 306.8s
CPU sys 20.3s
Disk read 0 B
Disk write 792.0 KiB

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate development-process Related to development process of DataFusion logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance of array_has Implement physical execution of uncorrelated scalar subqueries

3 participants