Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ jobs:
with:
node-version: 24
cache: npm
- run: node --version && npm --version
name: Log Node and npm versions
- run: npm ci
name: Install dependencies
- run: npm run bootstrap
Expand All @@ -40,6 +42,8 @@ jobs:
with:
node-version: 24
cache: npm
- run: node --version && npm --version
name: Log Node and npm versions
- run: npm ci
name: Install dependencies
- run: npm run bootstrap
Expand All @@ -64,7 +68,9 @@ jobs:
- uses: helm/kind-action@v1.12.0
with:
config: packages/k8s/tests/test-kind.yaml
- run: npm install
- run: node --version && npm --version
name: Log Node and npm versions
- run: npm ci
name: Install dependencies
- run: npm run bootstrap
name: Bootstrap the packages
Expand Down
56 changes: 56 additions & 0 deletions docs/adrs/0135-rwx-volume-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# ADR 0135: RWX volume strategy and RWO affinity fallback

**Date:** 22 April 2026

**Status**: Accepted

## Context

The Kubernetes hook implementation for GitHub Actions runners requires access to the runner's working directory (`_work`) within the dynamically created job pods. This shared access is typically managed via Persistent Volume Claims (PVCs).

The choice of storage strategy significantly impacts pod scheduling. While ReadWriteOnce (RWO) volumes require job pods to be co-located on the same node as the runner pod, ReadWriteMany (RWX) volumes allow job pods to be scheduled freely across the cluster.

Depending on the storage provider and cluster configuration, operators may choose between `ReadWriteMany` (RWX) or `ReadWriteOnce` (RWO) access modes. RWX is preferred because it allows the Kubernetes scheduler to place job pods on any available node, improving resource utilization and cluster flexibility. RWO restricts volume access to a single node at a time, requiring all pods using the volume to be pinned to that specific node.

## Decision

We have decided to establish `ReadWriteMany` (RWX) as the preferred storage strategy for the Kubernetes hook. RWX provides superior operational flexibility by enabling free scheduling of job pods across the cluster, as the shared volume is accessible from any node. This decoupling of job pods from the runner's node allows for better resource distribution and reduces the risk of node-level resource exhaustion.

For environments where RWX is unavailable or undesirable, we support a `ReadWriteOnce` (RWO) fallback strategy. This fallback is implemented using node affinity to ensure that job pods are scheduled onto the same node as the runner pod that holds the RWO volume.

### Operational Guidance

1. **Preferred Model (RWX):** Operators should configure the runner with a PVC supporting `ReadWriteMany`.
2. **Fallback Model (RWO):** If using `ReadWriteOnce`, set `ACTIONS_RUNNER_HOOK_RWO=true` to enforce same-node scheduling.
3. **Node Selection:** By default, the hook applies a `preferredDuringSchedulingIgnoredDuringExecution` node affinity targeting the runner's current node (`kubernetes.io/hostname`). With `ACTIONS_RUNNER_HOOK_RWO=true`, this becomes `requiredDuringSchedulingIgnoredDuringExecution`.
4. **Implementation Details:**
- The hook determines the node name via `getCurrentNodeName()` and applies affinity in `packages/k8s/src/k8s/index.ts` (lines 101, 165).
- `ACTIONS_RUNNER_HOOK_RWO=true` enables required same-node affinity, as defined in `packages/k8s/src/k8s/utils.ts`.
- The PVC claim name defaults to `${ACTIONS_RUNNER_POD_NAME}-work` unless overridden by `ACTIONS_RUNNER_CLAIM_NAME` (`packages/k8s/src/hooks/constants.ts`, lines 27-33).

### Non-Recommendations

We explicitly do **not** recommend the use of `spec.nodeName` for operator-driven scheduling. The hook relies on scheduler-based affinity (`preferred` by default, `required` when `ACTIONS_RUNNER_HOOK_RWO=true`) to keep scheduling decisions scheduler-aware and avoid hard node pinning.

## Alternatives

- **nodeName Bypass:** Directly setting `nodeName` bypasses the scheduler entirely. This was rejected as a recommendation because it prevents the scheduler from accounting for taints, tolerations, and resource pressure.
- **Local Volumes:** Using local volumes tied to specific nodes. This is a subset of the RWO fallback and is supported via the affinity mechanism.

## Consequences

- **Flexibility:** RWX users benefit from the ability to schedule job pods on any node in the cluster, maximizing resource utilization.
- **Node Coupling:** RWO users remain coupled to the node where the runner pod is running when `ACTIONS_RUNNER_HOOK_RWO=true`. The hook ensures job pods are scheduled on the same node via required affinity to maintain workspace integrity.
- **Configuration:** Default behavior is preferred same-node affinity. Operators using strict RWO semantics should set `ACTIONS_RUNNER_HOOK_RWO=true` for required same-node affinity. RWX configurations do not require any special configuration for basic operation.

## Migration Guidance

Operators migrating from an RWO setup that relied on legacy `nodeName` behavior should migrate to scheduler-based affinity:
1. Remove `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER` if present.
2. Set `ACTIONS_RUNNER_HOOK_RWO=true` to enforce required same-node affinity.
3. Verify that the runner's ServiceAccount has the necessary permissions to list pods (to determine its own node).

## Non-Goals

- This ADR does not recommend `nodeName` as a primary or secondary configuration path for operators.
- This ADR does not dictate specific storage providers (e.g., EBS vs. EFS vs. Azure Files), but rather the access mode strategy.
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
},
"scripts": {
"test": "npm run test --prefix packages/docker && npm run test --prefix packages/k8s",
"bootstrap": "npm install --prefix packages/hooklib && npm ci --prefix packages/k8s && npm ci --prefix packages/docker",
"test:docker": "npm run test --prefix packages/docker",
"test:k8s": "npm run test --prefix packages/k8s",
"bootstrap": "npm ci --prefix packages/hooklib && npm ci --prefix packages/k8s && npm ci --prefix packages/docker",
"format": "prettier --write '**/*.ts'",
"format-check": "prettier --check '**/*.ts'",
"lint": "eslint packages/**/*.ts",
Expand Down
22 changes: 22 additions & 0 deletions packages/k8s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,28 @@ rules:
- The `ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER` env should be set to true to prevent the runner from running any jobs outside of a container
- The runner pod should map a persistent volume claim into the `_work` directory
- The `ACTIONS_RUNNER_CLAIM_NAME` env should be set to the persistent volume claim that contains the runner's working directory, otherwise it defaults to `${ACTIONS_RUNNER_POD_NAME}-work`
- By default, the hook uses the Kubernetes scheduler and sets a *preferred* `nodeAffinity` to the runner node (`kubernetes.io/hostname`) without requiring it.
- Setting `ACTIONS_RUNNER_HOOK_RWO=true` switches this to a *required* node affinity, ensuring scheduling on the same node as the runner pod (recommended for `ReadWriteOnce` volumes).

## Storage Guidance
The K8s hooks require a shared volume between the runner pod and the job pods to share the workspace and other internal directories.

### RWX (Recommended)
The preferred way to configure storage is using a `ReadWriteMany` (RWX) Persistent Volume Claim. RWX allows the Kubernetes scheduler to place job pods on any node in the cluster, maximizing resource availability and flexibility.

To migrate from RWO to RWX:
1. Provision a new `ReadWriteMany` StorageClass if one is not available.
2. Update your PVC definition to use `accessModes: [ReadWriteMany]`.
3. No additional environment variables are needed - preferred same-node affinity is the default.

### RWO Fallback (Affinity-based)
If `ReadWriteMany` storage is not available, you can use `ReadWriteOnce` (RWO) storage. In this mode, all job pods must be scheduled on the same node as the runner pod that owns the PVC.

To enable this safely:
1. Set `ACTIONS_RUNNER_HOOK_RWO=true`.
2. The hooks will add a required `nodeAffinity` to job pods, ensuring they are scheduled on the same node as the runner pod (`kubernetes.io/hostname` match).

> **Note:** We do not recommend manually setting `nodeName` in the pod template, as the hooks handle node placement automatically via affinity.
- Some actions runner env's are expected to be set. These are set automatically by the runner.
- `RUNNER_WORKSPACE` is expected to be set to the workspace of the runner
- `GITHUB_WORKSPACE` is expected to be set to the workspace of the job
Expand Down
77 changes: 29 additions & 48 deletions packages/k8s/src/hooks/prepare-job.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import * as core from '@actions/core'
import * as io from '@actions/io'
import * as k8s from '@kubernetes/client-node'
import {
JobContainerInfo,
Expand All @@ -7,33 +8,26 @@ import {
writeToResponseFile,
ServiceContainerInfo
} from 'hooklib'
import path from 'path'
import {
containerPorts,
createJobPod,
createPod,
isPodContainerAlpine,
prunePods,
waitForPodPhases,
getPrepareJobTimeoutSeconds,
execCpToPod,
execPodStep
getPrepareJobTimeoutSeconds
} from '../k8s'
import {
CONTAINER_VOLUMES,
containerVolumes,
DEFAULT_CONTAINER_ENTRY_POINT,
DEFAULT_CONTAINER_ENTRY_POINT_ARGS,
generateContainerName,
mergeContainerWithOptions,
readExtensionFromFile,
PodPhase,
fixArgs,
prepareJobScript
fixArgs
} from '../k8s/utils'
import {
CONTAINER_EXTENSION_PREFIX,
getJobPodName,
JOB_CONTAINER_NAME
} from './constants'
import { dirname } from 'path'
import { CONTAINER_EXTENSION_PREFIX, JOB_CONTAINER_NAME } from './constants'

export async function prepareJob(
args: PrepareJobArgs,
Expand All @@ -46,6 +40,7 @@ export async function prepareJob(
await prunePods()

const extension = readExtensionFromFile()
await copyExternalsToRoot()

let container: k8s.V1Container | undefined = undefined
if (args.container?.image) {
Expand Down Expand Up @@ -75,8 +70,7 @@ export async function prepareJob(

let createdPod: k8s.V1Pod | undefined = undefined
try {
createdPod = await createJobPod(
getJobPodName(),
createdPod = await createPod(
container,
services,
args.container.registry,
Expand All @@ -96,13 +90,6 @@ export async function prepareJob(
`Job pod created, waiting for it to come online ${createdPod?.metadata?.name}`
)

const runnerWorkspace = dirname(process.env.RUNNER_WORKSPACE as string)

let prepareScript: { containerPath: string; runnerPath: string } | undefined
if (args.container?.userMountVolumes?.length) {
prepareScript = prepareJobScript(args.container.userMountVolumes || [])
}

try {
await waitForPodPhases(
createdPod.metadata.name,
Expand All @@ -115,28 +102,6 @@ export async function prepareJob(
throw new Error(`pod failed to come online with error: ${err}`)
}

await execCpToPod(createdPod.metadata.name, runnerWorkspace, '/__w')

if (prepareScript) {
await execPodStep(
['sh', '-e', prepareScript.containerPath],
createdPod.metadata.name,
JOB_CONTAINER_NAME
)

const promises: Promise<void>[] = []
for (const vol of args?.container?.userMountVolumes || []) {
promises.push(
execCpToPod(
createdPod.metadata.name,
vol.sourceVolumePath,
vol.targetVolumePath
)
)
}
await Promise.all(promises)
}

core.debug('Job pod is ready for traffic')

let isAlpine = false
Expand Down Expand Up @@ -180,8 +145,10 @@ function generateResponseFile(
const mainContainerContextPorts: ContextPorts = {}
if (mainContainer?.ports) {
for (const port of mainContainer.ports) {
mainContainerContextPorts[port.containerPort] =
mainContainerContextPorts.hostPort
if (port.containerPort && port.hostPort) {
mainContainerContextPorts[port.containerPort.toString()] =
port.hostPort.toString()
}
}
}

Expand Down Expand Up @@ -217,6 +184,17 @@ function generateResponseFile(
writeToResponseFile(responseFile, JSON.stringify(response))
}

async function copyExternalsToRoot(): Promise<void> {
const workspace = process.env['RUNNER_WORKSPACE']
if (workspace) {
await io.cp(
path.join(workspace, '../../externals'),
path.join(workspace, '../externals'),
{ force: true, recursive: true, copySourceDirectory: false }
)
}
}

export function createContainerSpec(
container: JobContainerInfo | ServiceContainerInfo,
name: string,
Expand Down Expand Up @@ -250,7 +228,7 @@ export function createContainerSpec(
container['environmentVariables'] || {}
)) {
if (value && key !== 'HOME') {
podContainer.env.push({ name: key, value })
podContainer.env.push({ name: key, value: value })
}
}

Expand All @@ -266,7 +244,10 @@ export function createContainerSpec(
})
}

podContainer.volumeMounts = CONTAINER_VOLUMES
podContainer.volumeMounts = containerVolumes(
container['userMountVolumes'],
jobContainer
)

if (!extension) {
return podContainer
Expand Down
Loading
Loading