Skip to content

feat(aws-test-infra): add AWS test infrastructure provisioning action#139

Open
sowmyav27 wants to merge 1 commit intoloft-sh:mainfrom
sowmyav27:engqa-aws-tool
Open

feat(aws-test-infra): add AWS test infrastructure provisioning action#139
sowmyav27 wants to merge 1 commit intoloft-sh:mainfrom
sowmyav27:engqa-aws-tool

Conversation

@sowmyav27
Copy link
Copy Markdown
Contributor

@sowmyav27 sowmyav27 commented May 7, 2026

Summary

Fixes: ENGQA-702

A composite action that provisions and tears down AWS test infrastructure (VPC + subnet + IGW + route table + security group + EC2 instances) for e2e workflows. Built as a Go binary using aws-sdk-go-v2, unit-tested with hand-rolled mocks.

Replaces ~150 lines of duplicated Bash + aws-cli that previously lived inline in loft-sh/vcluster-pro's e2e-selinux-support-matrix.yaml and prerelease-vcluster.yaml workflows. A separate vcluster-pro PR migrates those workflows to use this action.

Why

  • De-duplicate ~150 lines of identical AWS provisioning + teardown across two consumer workflows.
  • Make it testable — DevOps's concern that hundreds of lines of inline Bash had no tests at all.
  • Replace Bash with a typed language — Go, since the consumers already have actions/setup-go in place.

Design

Build-from-source: the action runs go build from src/ on every invocation. Mirrors run-ginkgo's pattern — assumes the consumer has Go available (via actions/setup-go). No separate release artifact lifecycle, no SHA-256 dance, no two-step "merge then tag then PR" coordination. Tag aws-test-infra/v1 is usable immediately after merge.

Two subcommands:

  • provision: VPC/subnet/IGW/route table/SG/EC2/SSM-wait. Optional -ami-architecture and -ami-virtualization-type filter to preserve the safety net the original Bash had (defaults to x86_64 + hvm in action.yml).
  • cleanup: best-effort direct teardown by ID, plus a tag-based sweep that catches resources from runs that failed before exporting IDs. Sweep errors are logged + swallowed by default to match set +e semantics of the original Bash teardown; -strict-sweep opts back into hard failure.

Outputs:

  • Standard: vpc-id, igw-id, subnet-id, route-table-id, route-assoc-id, security-group-id, ami-id, primary-public-ip, instance-ids (CSV).
  • Per-role named: primary-instance-id, worker1-instance-id, worker2-instance-id (covers the common 3-instance case).
  • instance-id-by-role (JSON map): for consumers using arbitrary role names or non-three-instance counts.

Tests

29 top-level / 71 cases / 62.9% coverage. Cover:

  • API call ordering and tag application on every resource
  • The "return collected IDs on failure" contract that lets cleanup tear down failed provisions
  • Tag-based sweep correctness (filter scope, dependency ordering)
  • Strict-sweep error semantics for if: always() cleanup steps
  • Dependency-order strict checks (disassoc-before-delete, terminate-before-VPC-delete)
  • Ingress encoding round-trip
  • Flag validation
  • Output format wiring (GITHUB_OUTPUT append-mode, JSON map for arbitrary roles)

Test plan

  • PR-time CI green (test-aws-test-infra.yaml: go test ./... + build verification)
  • After merge, push tag aws-test-infra/v1 so the consumer PR (vcluster-pro engqa-aws-mig) can reference it
  • Validate end-to-end via workflow_dispatch on e2e-selinux-support-matrix.yaml in vcluster-pro after the consumer PR merges (matrix runs on all 3 distros, cleanup leaves no orphans)
  • Validate via workflow_dispatch on prerelease-vcluster.yaml (Kind shared-HA + EC2 standalone paths both pass, teardown succeeds)

Composite action that provisions and tears down AWS test infra (VPC +
subnet + IGW + route table + security group + EC2 instances) for e2e
workflows. Built as a Go binary using aws-sdk-go-v2 with hand-rolled
mocks for unit testing.

Replaces ~150 lines of duplicated Bash + aws-cli inline in
loft-sh/vcluster-pro's e2e-selinux-support-matrix.yaml and
prerelease-vcluster.yaml workflows.

Build-from-source: action runs `go build` from src/ on every invocation.
Mirrors run-ginkgo's pattern — assumes the consumer has Go available
(via actions/setup-go). No separate release artifact lifecycle.

Two subcommands:
- provision: VPC/subnet/IGW/route table/SG/EC2/SSM-wait, with optional
  AMI architecture/virtualization-type filter for safety-net AMI lookup.
- cleanup: best-effort direct teardown by ID, plus tag-based sweep that
  catches resources from runs that failed before exporting IDs. Sweep
  errors are logged + swallowed by default to match `set +e` semantics
  of the original Bash teardown; -strict-sweep opts back into hard fail.

Outputs: vpc-id, igw-id, subnet-id, route-table-id, route-assoc-id,
security-group-id, ami-id, primary-public-ip, instance-ids (CSV),
named per-role instance IDs (primary/worker1/worker2), and a JSON map
output (instance-id-by-role) for consumers using arbitrary role names
or non-three-instance counts.

Tests (29 top-level / 71 cases / 62.9% coverage) cover API call
ordering, tag application on every resource, the partial-failure
ResourceIDs contract that lets cleanup tear down failed provisions,
tag-based sweep correctness, dependency-order strict checks for
disassoc-before-delete pairs, ingress encoding round-trip, flag
validation, and output format wiring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants