Add binary size check workflow#1697
Draft
bghgary wants to merge 4 commits into
Draft
Conversation
Adds a new `size-check.yml` workflow that measures and surfaces binary size impact on every PR to master. ## What it does On every `pull_request` to master, the workflow: 1. Checks out the PR's base SHA and head SHA into separate trees. 2. Configures and builds the `Playground` target in each tree using the same Win32 x64 D3D11 RelWithDebInfo flags as the existing CI's `Win32_x64_D3D11` job (`BX_CONFIG_DEBUG=ON`, `BABYLON_DEBUG_TRACE=ON`, `BGFX_CONFIG_MAX_FRAME_BUFFERS=256`). 3. Enumerates every `.exe`/`.dll`/`.lib` under each build's `RelWithDebInfo` output, computes the per-artifact delta, and formats a markdown report with: - The final `Playground.exe` size delta (the canonical headline). - The aggregate size delta across all built artifacts. - The top 10 other artifacts that moved by more than 256 B. 4. Posts a sticky PR comment (updated in place on subsequent pushes) keyed by an HTML marker, and also writes the same report to the run summary so reviewers can find it from the run page when the comment can't be posted (e.g., fork PRs without write tokens). ## Why PR BabylonJS#1695 (SPIRV-Cross bump) added ~17.5 KB to `Playground.exe` and ~119 KB to `spirv-cross-hlsl.lib`. The change was small but there was no automated guardrail to surface either number on the PR. Future dependency bumps (SPIRV-Cross, bgfx, glslang) and large feature additions all have the same gap. This workflow is **informational only** — no threshold enforcement. The goal for the first iteration is to make size impact visible so reviewers can react. A threshold (e.g., fail-on-N%) can be layered on later once we have data on natural per-PR variance. ## Cost Two sequential Win32 RelWithDebInfo builds on a single `windows-latest` runner, ~30 min wall clock. Runs in parallel with the rest of CI so it doesn't extend the bottleneck. `concurrency: cancel-in-progress: true` cancels stale runs when a PR gets a new push, so quick iteration doesn't pile up runner minutes. ## Trade-offs / follow-ups - **Win32 only.** Easiest platform to add first; covers the highest-traffic consumer scenario. macOS/iOS/Android/Linux are natural follow-ups when the workflow proves stable. Each platform has its own SPIRV-Cross emitter linked in (HLSL on Win32/UWP, MSL on Apple, GLSL on Linux), so cross-platform coverage is meaningful. - **No baseline caching.** Each run rebuilds `base` from scratch. Could cache `_deps` keyed on hashed CMakeLists later if the wall-clock cost becomes a problem. - **No threshold enforcement.** Strictly informational. After a few weeks of data we can decide on an appropriate failure threshold. - **Static lib deltas include COFF timestamp noise** (small, sub-1 KB movements on libs whose source didn't change). The 256 B filter excludes the smallest noise but a few bytes of timestamp can still show up. Acceptable for an MVP. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The single-workflow approach failed at the comment-post step on fork PRs (the default for BN contributors) because `pull_request` events from forks get a read-only `GITHUB_TOKEN`, which doesn't have permission to post comments. Refactor to the two-workflow pattern recommended by the GitHub Security Lab: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ - `size-check.yml` (on: pull_request) builds + measures + uploads a report artifact. Runs in an unprivileged context (no write tokens, no secrets exposed to PR-author code). Same Win32 RelWithDebInfo build as before, ~30 min. - `size-check-comment.yml` (on: workflow_run) triggers when the build workflow completes successfully, downloads the artifact, and posts the sticky PR comment. Runs from master with full `pull-requests: write` permission, so the comment works on every PR regardless of whether it's from a fork. This PR can't fully exercise the comment-post path because `size-check-comment.yml` doesn't yet exist on master (workflow_run loads the workflow file from the default branch). After merge, the first new PR will trigger end-to-end and post the first sticky comment. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The inline ~120-line pwsh block in `size-check.yml` is moved to a standalone script so it can be run locally for ad-hoc size measurements (build a master worktree and a feature worktree with the same cmake flags, then point the script at both). The workflow step becomes a single one-line invocation. Lives under a new top-level `Scripts/` folder (PascalCase, matching the rest of the repo). Existing `Apps/scripts/getNightly.js` is Apps-scoped; this new script measures binary sizes across the whole build (Apps + Plugins + Core + Polyfills + _deps) so it doesn't belong under Apps/. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
No automated guardrail today for binary size regressions in BN. #1695 added ~17.5 KB to
Playground.exeand ~119 KB tospirv-cross-hlsl.lib— the change was small but the numbers were invisible until I measured locally. Future dep bumps (SPIRV-Cross, bgfx, glslang) have the same gap.Approach
Two workflows in
.github/workflows/:size-check.yml(on: pull_request): checks out PR base + head, builds thePlaygroundtarget in each with the same Win32 x64 D3D11 RelWithDebInfo flags asWin32_x64_D3D11CI, enumerates.exe/.dll/.libunder eachRelWithDebInfooutput, and uploads a markdown report (Playground.exe delta + aggregate + top 10 other movers) as asize-check-reportartifact. Runs unprivileged — no write tokens, no secrets in scope of PR-author code.size-check-comment.yml(on: workflow_run): downloads the artifact and posts/updates a sticky PR comment with the markdown report. Runs from master withpull-requests: write.The split follows the pattern in Keeping your GitHub Actions and workflows secure: Preventing pwn requests — untrusted code on
pull_request(unprivileged) hands off via artifact to aworkflow_runjob (privileged, no access to PR code). The alternative —pull_request_targetwith explicit checkout of PR code — is the anti-pattern the article warns against.Notes
windows-latest. Runs in parallel with the rest of CI so it doesn't extend the bottleneck.concurrency: cancel-in-progress: truedrops stale runs on new pushes.workflow_runloads the workflow file from master, so the comment-post path can't run on this PR (comment workflow doesn't exist on master yet). First PR after merge will exercise it end-to-end.[Created by Copilot on behalf of @bghgary]