Skip to content

Add binary size check workflow#1697

Draft
bghgary wants to merge 4 commits into
BabylonJS:masterfrom
bghgary:add-binary-size-check
Draft

Add binary size check workflow#1697
bghgary wants to merge 4 commits into
BabylonJS:masterfrom
bghgary:add-binary-size-check

Conversation

@bghgary
Copy link
Copy Markdown
Contributor

@bghgary bghgary commented May 15, 2026

Context

No automated guardrail today for binary size regressions in BN. #1695 added ~17.5 KB to Playground.exe and ~119 KB to spirv-cross-hlsl.lib — the change was small but the numbers were invisible until I measured locally. Future dep bumps (SPIRV-Cross, bgfx, glslang) have the same gap.

Approach

Two workflows in .github/workflows/:

size-check.yml (on: pull_request): checks out PR base + head, builds the Playground target in each with the same Win32 x64 D3D11 RelWithDebInfo flags as Win32_x64_D3D11 CI, enumerates .exe/.dll/.lib under each RelWithDebInfo output, and uploads a markdown report (Playground.exe delta + aggregate + top 10 other movers) as a size-check-report artifact. Runs unprivileged — no write tokens, no secrets in scope of PR-author code.

size-check-comment.yml (on: workflow_run): downloads the artifact and posts/updates a sticky PR comment with the markdown report. Runs from master with pull-requests: write.

The split follows the pattern in Keeping your GitHub Actions and workflows secure: Preventing pwn requests — untrusted code on pull_request (unprivileged) hands off via artifact to a workflow_run job (privileged, no access to PR code). The alternative — pull_request_target with explicit checkout of PR code — is the anti-pattern the article warns against.

Notes

  • Informational only. No threshold enforcement. Once we have data on natural per-PR variance, we can layer a fail-on-N% gate.
  • Win32 only for the first iteration. Easiest platform; covers the highest-traffic consumer. macOS/iOS/Android/Linux are natural follow-ups — each platform has a different SPIRV-Cross emitter linked in.
  • ~30 min wall clock for two sequential Win32 RelWithDebInfo builds on windows-latest. Runs in parallel with the rest of CI so it doesn't extend the bottleneck. concurrency: cancel-in-progress: true drops stale runs on new pushes.
  • Self-test caveat: workflow_run loads the workflow file from master, so the comment-post path can't run on this PR (comment workflow doesn't exist on master yet). First PR after merge will exercise it end-to-end.
  • Small (sub-1 KB) deltas on unchanged libs are COFF timestamp noise; the 256 B filter trims the worst of it.

[Created by Copilot on behalf of @bghgary]

bghgary and others added 4 commits May 15, 2026 09:50
Adds a new `size-check.yml` workflow that measures and surfaces binary
size impact on every PR to master.

## What it does

On every `pull_request` to master, the workflow:

1. Checks out the PR's base SHA and head SHA into separate trees.
2. Configures and builds the `Playground` target in each tree using the
   same Win32 x64 D3D11 RelWithDebInfo flags as the existing CI's
   `Win32_x64_D3D11` job (`BX_CONFIG_DEBUG=ON`, `BABYLON_DEBUG_TRACE=ON`,
   `BGFX_CONFIG_MAX_FRAME_BUFFERS=256`).
3. Enumerates every `.exe`/`.dll`/`.lib` under each build's
   `RelWithDebInfo` output, computes the per-artifact delta, and formats
   a markdown report with:
   - The final `Playground.exe` size delta (the canonical headline).
   - The aggregate size delta across all built artifacts.
   - The top 10 other artifacts that moved by more than 256 B.
4. Posts a sticky PR comment (updated in place on subsequent pushes)
   keyed by an HTML marker, and also writes the same report to the run
   summary so reviewers can find it from the run page when the comment
   can't be posted (e.g., fork PRs without write tokens).

## Why

PR BabylonJS#1695 (SPIRV-Cross bump) added ~17.5 KB to `Playground.exe` and
~119 KB to `spirv-cross-hlsl.lib`. The change was small but there was
no automated guardrail to surface either number on the PR. Future
dependency bumps (SPIRV-Cross, bgfx, glslang) and large feature
additions all have the same gap.

This workflow is **informational only** — no threshold enforcement.
The goal for the first iteration is to make size impact visible so
reviewers can react. A threshold (e.g., fail-on-N%) can be layered
on later once we have data on natural per-PR variance.

## Cost

Two sequential Win32 RelWithDebInfo builds on a single
`windows-latest` runner, ~30 min wall clock. Runs in parallel with the
rest of CI so it doesn't extend the bottleneck.

`concurrency: cancel-in-progress: true` cancels stale runs when a PR
gets a new push, so quick iteration doesn't pile up runner minutes.

## Trade-offs / follow-ups

- **Win32 only.** Easiest platform to add first; covers the highest-traffic
  consumer scenario. macOS/iOS/Android/Linux are natural follow-ups when
  the workflow proves stable. Each platform has its own SPIRV-Cross
  emitter linked in (HLSL on Win32/UWP, MSL on Apple, GLSL on Linux),
  so cross-platform coverage is meaningful.
- **No baseline caching.** Each run rebuilds `base` from scratch. Could
  cache `_deps` keyed on hashed CMakeLists later if the wall-clock cost
  becomes a problem.
- **No threshold enforcement.** Strictly informational. After a few weeks
  of data we can decide on an appropriate failure threshold.
- **Static lib deltas include COFF timestamp noise** (small, sub-1 KB
  movements on libs whose source didn't change). The 256 B filter
  excludes the smallest noise but a few bytes of timestamp can still
  show up. Acceptable for an MVP.

[Created by Copilot on behalf of @bghgary]

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The single-workflow approach failed at the comment-post step on fork PRs
(the default for BN contributors) because `pull_request` events from
forks get a read-only `GITHUB_TOKEN`, which doesn't have permission to
post comments.

Refactor to the two-workflow pattern recommended by the GitHub Security
Lab:

  https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/

- `size-check.yml` (on: pull_request) builds + measures + uploads a
  report artifact. Runs in an unprivileged context (no write tokens, no
  secrets exposed to PR-author code). Same Win32 RelWithDebInfo build
  as before, ~30 min.

- `size-check-comment.yml` (on: workflow_run) triggers when the build
  workflow completes successfully, downloads the artifact, and posts
  the sticky PR comment. Runs from master with full
  `pull-requests: write` permission, so the comment works on every PR
  regardless of whether it's from a fork.

This PR can't fully exercise the comment-post path because
`size-check-comment.yml` doesn't yet exist on master (workflow_run
loads the workflow file from the default branch). After merge, the
first new PR will trigger end-to-end and post the first sticky comment.

[Created by Copilot on behalf of @bghgary]

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The inline ~120-line pwsh block in `size-check.yml` is moved to a standalone
script so it can be run locally for ad-hoc size measurements (build a master
worktree and a feature worktree with the same cmake flags, then point the
script at both). The workflow step becomes a single one-line invocation.

Lives under a new top-level `Scripts/` folder (PascalCase, matching the rest
of the repo). Existing `Apps/scripts/getNightly.js` is Apps-scoped; this new
script measures binary sizes across the whole build (Apps + Plugins + Core +
Polyfills + _deps) so it doesn't belong under Apps/.

[Created by Copilot on behalf of @bghgary]

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant