Skip to content

master into nix-next#1715

Merged
Ericson2314 merged 23 commits intonix-nextfrom
maser-into-nix-next
May 6, 2026
Merged

master into nix-next#1715
Ericson2314 merged 23 commits intonix-nextfrom
maser-into-nix-next

Conversation

@Ericson2314
Copy link
Copy Markdown
Member

No description provided.

Ericson2314 and others added 23 commits May 4, 2026 18:08
There is an upstream recursive-nix issue that causes this test to fail
intermittently in CI. Use `HARNESS-RETRY 5` so yath retries it rather
than failing the whole suite.
The queue runner now supports systemd socket activation for both its
REST and gRPC servers (`--rest-bind -` and `--grpc-bind -`). It uses
`LISTEN_FDNAMES` to map socket names (`rest`, `grpc`) to fd indices.
The HTTP and gRPC servers are refactored to accept pre-bound listeners,
and binding is done centrally in `main`.

The NixOS module now unconditionally uses socket units, eliminating
port races and enabling the standard systemd socket activation
lifecycle.

The test harness (`QueueRunnerContext.pm`) binds sockets with port 0
and passes them to the queue runner via `LISTEN_FDS`, so tests get
OS-assigned ports with no race condition.
Allow `dyn-drv-non-trivial` test to retry up to 5 times
…ivation

Add systemd socket activation to queue runner
The `build_log` gRPC handler discarded the `io::Error` with `|_|`,
reporting only `"Failed to write log file."` regardless of whether the
cause was a disk issue, a broken stream from a crashed builder, or
something else. This made intermittent CI failures impossible to
diagnose.

Include the error in the gRPC status message so the builder logs show
the real cause (e.g. `"broken pipe"`, `"No space left on device"`).
… reporting

The Perl web controllers were falling back to reading
`/etc/nix/machines` (the old Nix remote systems file) when
`queue_runner_endpoint` wasn't configured. But this doesn't really make
sense because Hydra won't actually fall back to using those machines —
the Rust queue runner uses a completely different gRPC self-registration
mechanism.

`getMachines()` now always queries the queue runner's `/status/machines`
HTTP endpoint, and warns if `queue_runner_endpoint` is missing.

Additionally, the `SystemStatus` DB table was being used as an
intermediary for status reporting: the queue runner would write a JSON
dump there, and `hydra-queue-runner --status` would read it back. This
is no longer needed since the queue runner has a REST API that serves
the same data directly. All consumers (`queue-runner-status` page,
`hydra-send-stats`) now query the HTTP endpoint instead.

Removed:

- Legacy `/etc/nix/machines` file parsing from `getMachines()`

- `SystemStatus` DB reads from `Root.pm` (the Perl controller)

- The `--status` CLI flag and `dump_status_loop` from the Rust queue
  runner (no more writing status to the database)

- `upsert_status`, `get_status`, `notify_dump_status`,
  `notify_status_dumped` from the DB crate

- The `POST /dump_status` HTTP endpoint

- The `buildMachinesFiles` NixOS option (with `mkRemovedOptionModule`
  deprecation message) and `NIX_REMOTE_SYSTEMS` env var

- The machines file parsing test and related test scaffolding

`hydra-send-stats` and the `queue-runner-status` page now query the
queue runner's REST API directly. The `SystemStatus` table is marked
for removal in a future migration. The `hydra-send-stats` test now
exercises both the no-endpoint and with-endpoint code paths.
queue-runner: include actual error in `build_log` failure messages
Remove legacy build machine discovery and `SystemStatus`-based status…
Bumps [nixpkgs](https://github.com/NixOS/nixpkgs) from `fcf5160` to `41a7bd5`.
- [Commits](NixOS/nixpkgs@fcf5160...41a7bd5)

---
updated-dependencies:
- dependency-name: nixpkgs
  dependency-version: 41a7bd591bc47320390c829b9f16bfc59b25843c
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [nix](https://github.com/NixOS/nix) from `ee1ce88` to `24a0724`.
- [Commits](NixOS/nix@ee1ce88...24a0724)

---
updated-dependencies:
- dependency-name: nix
  dependency-version: 24a072442bee6ad4c3a2da92abafff987881da26
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Instead of forking Starman to add socket activation support, carry a
`Net::Server::Systemd::PreFork` personality inside hydra's own Perl
libs and monkey-patch `@Starman::Server::ISA` in `Hydra::Script::Server`
when `LISTEN_FDS` is set. This avoids a forked dependency while the
upstream PR (miyagawa/Starman#156) is pending.

I am working on a forked foreman for optional local testing, but for
now, the VM test tests that that the socket activation works correctly.
Add systemd socket activation for `hydra-server` (Starman)
build(deps): bump nix from `ee1ce88` to `24a0724`
build(deps): bump nixpkgs from `fcf5160` to `41a7bd5`
It is not needed by the test suite.
The two changes are:
- `build.job` appears to be a string, so `build.job.name` should just
  be `build.job`
- `git clone` doesn't copy over orphaned commits, so if the branch
  moves and a commit we're interested in is abandoned, we have to add a
  call to `git fetch` to get that specific commit
The web UI's machine status page needs this to fetch data from the
queue runner's REST API. Without it, `/machines` shows no machines.
Add a `Socketfile` that declares sockets for `hydra-server` and
`hydra-queue-runner`, so foreman can pass sockets to them just like
systemd socket activation would. This means that development with
foreman is once again exercising the same code paths that production
will use.

The foreman start scripts are updated accordingly:

- `hydra-server` switches from `hydra-dev-server --port` to
  `hydra-server -f` (Starman with fork mode), picking up the socket via
  the `Net::Server::Systemd::PreFork` personality.

- `hydra-queue-runner` gets `--rest-bind - --grpc-bind -` to use socket
  activation for both its REST and gRPC servers.

- All services now `wait_for_hydra_db` before starting, since with
  socket activation the listen port is open before the server is ready,
  so `wait_for_hydra_server` alone is no longer sufficient.

The foreman fork (Ericson2314/foreman@socketfile, PR
ddollar/foreman#816) adds the `Socketfile`
feature. Foreman is packaged from source in `packaging/foreman/` because
it was not clear how to override the Nixpkgs version which gets its
source from rubygems.

I would not use forked projects when I am far from sure that upstream
would accept the change in production, but foreman is just a developer
convenience in the dev shell. That to me makes this OK --- it's just for
developer convenience.
Use forked foreman to passing sockets like systemd in production
Update the "reproduce locally" script
Update harmonia to latest main, where `DrvOutput` uses a `StorePath`
instead of a derivation hash. This lets us construct `Realisation`
objects directly from data we already have (resolved drv path + output
name + concrete output path) without calling `static_output_hashes`
through the nix C++ FFI.

Remove the realisation query FFI (`query_raw_realisation`,
`InternalRealisation`, `realisation.rs`, `realisation.cpp`) and
`static_output_hashes` FFI entirely. The binary-cache crate's
`copy_realisation` (which queried the local store via FFI) is replaced
with `write_realisation` that accepts a pre-constructed `Realisation`.

In `succeed_step`, after a build step with CA floating outputs succeeds,
write a `Realisation` for each output to all S3 binary caches. The
realisations are signed with the cache's secret keys.

TODO: also write realisations to the local store's SQLite `Realisations`
table (for non-S3 stores) once nix is updated to 2.35, which uses
path-based `DrvOutput` matching the new harmonia types.
Write CA realisations to binary cache in pure Rust
@Ericson2314 Ericson2314 changed the title Maser into nix next master into nix-next May 6, 2026
@Ericson2314 Ericson2314 enabled auto-merge May 6, 2026 19:11
@Ericson2314 Ericson2314 added this pull request to the merge queue May 6, 2026
Merged via the queue into nix-next with commit 7c46d87 May 6, 2026
2 checks passed
@Ericson2314 Ericson2314 deleted the maser-into-nix-next branch May 6, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants