Skip to content

feat(installer): post-enroll tunnel verification + structured errors#1796

Closed
irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 11 commits into
feat/agent-tunnel-identity-pr2from
feat/agent-tunnel-identity-pr3
Closed

feat(installer): post-enroll tunnel verification + structured errors#1796
irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 11 commits into
feat/agent-tunnel-identity-pr2from
feat/agent-tunnel-identity-pr3

Conversation

@irvingoujAtDevolution
Copy link
Copy Markdown
Contributor

Summary

Third of three PRs implementing the agent-tunnel identity refactor
described in AGENT_TUNNEL_IDENTITY_DESIGN.md. This PR is agent +
installer
.

Before this PR the installer reported success as soon as agent.exe up
wrote a cert to disk, even if the tunnel was unreachable (wrong host
returned, UDP blocked, SAN mismatch, etc.). After this PR the installer
only reports success when a real QUIC handshake plus one route-advertise
round-trip completes — turning enrollment from "certs were written" into
"the tunnel is reachable and usable".

Scope (per spec PR 3):

  • New agent.exe verify-tunnel --timeout <secs> subcommand. Performs one
    QUIC handshake + one RouteAdvertise + Heartbeat/HeartbeatAck
    round-trip and exits 0 on success, 1 on any classified failure.
  • Full error catalog from spec section 6 (9 named kinds plus
    unexpected_error catch-all that always carries correlation_id + log
    path). Emits a single-line JSON triple {kind, detail, next_step} on
    stderr as the last line before exit.
  • Windows Event Log writer (source DevolutionsAgent, named properties)
    so monitoring tools can parse failures structurally.
  • New CA.VerifyAgentTunnel MSI custom action: invokes
    agent.exe verify-tunnel --timeout 10 after EnrollAgentTunnel
    succeeds, parses the JSON triple, surfaces kind/detail/next_step
    through session.Message(InstallMessage.Error, ...). MSI rollbacks
    on any non-zero exit. 10s budget hardcoded; no MSI property, no
    skip-verify escape hatch.
  • Drop the Gateway URL override field from the AgentTunnel dialog
    (textbox, label, hint, RowCount, localization strings, property
    declaration, deferred-CA wiring). With this design the JWT is the
    single source of truth for the agent-facing URL; overriding it
    server-side would defeat the host validation introduced in PR 1.

Dependency

This PR depends on #1795 (agent derives endpoint from JWT host), which
itself depends on #1794 (gateway returns quic_port).

Spec

See AGENT_TUNNEL_IDENTITY_DESIGN.md (PR 3 section + section 6 error
catalog + section 7 Windows Event Log surface).

Test plan

  • cargo test -p devolutions-agent passes (45 lib + 7 bin tests)
  • dotnet build package/AgentWindowsManaged succeeds
  • Manual install with bad enrollment (DNS unresolvable): installer
    dialog shows the next_step text and MSI rolls back
  • Manual install with good enrollment: tunnel up, installer reports
    success, agent shows online in DVLS within 30 seconds

Adds an optional Agent Tunnel wizard step to the Devolutions Agent
installer so admins can enroll the agent in a Gateway QUIC tunnel as
part of MSI install (UI or unattended).

Surfaces three MSI public properties for unattended installs:
- AGENT_TUNNEL_ENROLLMENT_STRING (dgw-enroll:v1:<base64> from DVLS/Hub/Gateway)
- AGENT_TUNNEL_ADVERTISE_SUBNETS (CSV CIDR; empty = none)
- AGENT_TUNNEL_ADVERTISE_DOMAINS (CSV DNS suffixes; empty = auto-detect only)

Wires a new deferred elevated custom action (EnrollAgentTunnel) that
runs Before StartServices when AGENT_TUNNEL_FEATURE is being installed.
It base64-decodes the enrollment payload, shells out to
`devolutions-agent.exe enroll <url> <token> <name> [subnets]` with a 60s
timeout, and redacts the token in the session log. Advertise domains
are persisted by patching `Tunnel.AdvertiseDomains` in agent.json
post-enrollment, matching the agreed direction that domain config lives
in the file rather than as a CLI flag.

The Tunnel feature itself is opt-in (isEnabled:false, allowChange:true);
the dialog is skipped when the feature isn't selected. An empty
enrollment string also skips tunnel setup, allowing the installer to be
used without touching the tunnel.
WixSharp's runtime dialog loader threw at AgentTunnelDialog init (MSI 1603)
because the tableLayoutPanel had RowCount=8 but the new gateway URL controls
were placed at rows 8/9/10.
…add Agent name field

- Propagate AGENT_TUNNEL_* properties to deferred CA via Secure MSI Property
  declarations + explicit UsesProperties string. The deferred CA was
  previously seeing empty values because the wizard-set properties never
  crossed the UAC boundary.
- Treat empty enrollment string as install failure (was silent skip).
  EnrollAgentTunnel CA now returns ActionResult.Failure and surfaces
  session.Message(InstallMessage.Error, ...) on the empty case and on
  enrollment timeout, non-zero exit, and exception paths.
- Add optional Agent name field to AgentTunnelDialog. Resolution order at
  install time: dialog value > JWT jet_agent_name claim > computer name.
  Avoids "missing required --name" failures when the JWT lacks the claim.
- Update Wizard.ShouldSkip-gated dialog so blank enrollment is blocked at
  UI validation (previously the dialog let users click Next on empty).
Captures the root cause behind silent enrollment-success-but-no-tunnel
failures we hit during integration testing, the constraints we've
confirmed with the team, and the proposed redesign:

- Decouple Gateway's cryptographic identity (server cert SAN) from its
  network reachability (the host agents dial). Replace single conf.hostname
  with AgentTunnel.AdvertisedNames (multi-SAN, label-able).
- Agent derives its QUIC endpoint from the host it enrolled through
  (jet_gw_url) + a quic_port returned by the gateway, instead of accepting
  whatever hostname the gateway dictates.
- Gateway validates enrollment URL host against AdvertisedNames upfront,
  with a structured 400 response carrying error/message/help.
- New agent.exe verify-tunnel subcommand wired into the MSI CA so install
  success means the tunnel is actually up, not just that a cert was
  written. Errors expose a structured kind/detail/next_step triple.
- DVLS enrollment-string UI becomes a dropdown over AdvertisedNames
  (refreshed from gateway diagnostics) instead of a free-text URL box.

Includes a 9-entry error catalog with operator-facing next-step text,
non-goals (single-use enforcement, gateway farms — deferred), migration
path, and a 5-PR implementation plan.

Includes Codex's review.
Adds `agent.exe verify-tunnel --timeout <secs>` which performs one QUIC
handshake plus one RouteAdvertise + Heartbeat/HeartbeatAck round-trip and
exits 0 on success or 1 on any classified failure. The last line of stderr
is a single-line JSON triple `{kind, detail, next_step}` consumed by the
installer custom action to surface actionable error dialogs.

Implements the 9+1-kind error catalog from the design doc (section 6):
enrollment_host_not_advertised, dns_resolution_failed, udp_unreachable,
tls_san_mismatch, tls_spki_pin_mismatch, quic_handshake_timeout,
route_advertise_timeout, enrollment_token_expired,
enrollment_token_signature_invalid, and the unexpected_error catch-all
which always carries a correlation_id and log path.

On Windows the triple is also written to the Event Log under the
DevolutionsAgent source with kind/detail/next_step as named properties
so monitoring tools can parse failures without scraping text.
…y-url override

After `agent.exe up` succeeds, the agent-tunnel installation now invokes
`agent.exe verify-tunnel --timeout 10` and only reports success if a real
QUIC handshake + RouteAdvertise/Heartbeat round-trip completes. The CA
parses the structured JSON triple from agent stderr and surfaces
`{kind, detail, next_step}` via `session.Message(InstallMessage.Error, ...)`
so installer failure dialogs contain an actionable next step instead of
"setup failed".

The 10s timeout is hardcoded by design (no MSI property, no escape
hatch); a few extra seconds of wall-clock budget guard against a
misbehaving process. MSI rollbacks engage on any non-zero exit.

Also drops the Gateway URL override field from the AgentTunnel dialog
(textbox, label, hint, RowCount/RowStyles, property declaration,
deferred-CA UsesProperties wiring, localization strings in en-us and
fr-fr). With the identity refactor the enrollment JWT is the single
source of truth for the agent-facing URL — overriding it server-side
would defeat the host validation against AdvertisedNames on the gateway.
@irvingoujAtDevolution
Copy link
Copy Markdown
Contributor Author

Closing — not authorized; will be reopened after explicit owner approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant