Skip to content

feat: background workers = non-HTTP workers with shared state#2287

Open
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:sidekicks
Open

feat: background workers = non-HTTP workers with shared state#2287
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:sidekicks

Conversation

@nicolas-grekas
Copy link
Copy Markdown
Contributor

@nicolas-grekas nicolas-grekas commented Mar 16, 2026

Summary

Background workers are long-running PHP workers that run outside the HTTP cycle. They observe their environment (Redis, DB, filesystem, etc.) and publish configuration that HTTP workers read per-request - enabling real-time reconfiguration without restarts or polling.

A whiteboard for apps

PHP API

  • frankenphp_worker_set_vars(array $vars) - publishes config from a background worker (persistent memory, cross-thread). Skips all work when data is unchanged (=== check).
  • frankenphp_worker_get_vars(string|array $name, float $timeout = 30.0) - reads config from HTTP workers (blocks until first publish, generational cache)
  • frankenphp_worker_get_signaling_stream() - returns a pipe-based stream for stream_select() integration (cooperative shutdown)

Caddyfile configuration

php_server {
    # HTTP worker (unchanged)
    worker public/index.php { num 4 }

    # Named background worker (auto-started if num >= 1)
    worker bin/worker.php { background; name config-watcher }

    # Catch-all for lazy-started names
    worker bin/worker.php { background }
}
  • background marks a worker as non-HTTP
  • name specifies an exact worker name; workers without name are catch-all for lazy-started names
  • Not declaring a catch-all forbids lazy-started ones
  • max_threads on catch-all sets a safety cap for lazy-started instances (defaults to 16)
  • max_consecutive_failures defaults to 6 (same as HTTP workers)
  • max_execution_time automatically disabled for background workers
  • Each php_server block has its own isolated scope - workers in different blocks can use the same names without conflict

Shutdown

Background workers are stopped cooperatively via the signaling stream: FrankenPHP writes "stop\n" which is picked up by stream_select(). Workers have a 5-second grace period.

After the grace period, a best-effort force-kill is attempted:

  • Linux ZTS: arms PHP's own max_execution_time timer cross-thread via timer_settime(EG(max_execution_timer_timer))
  • Windows: CancelSynchronousIo + QueueUserAPC interrupts blocking I/O and alertable waits
  • macOS: no per-thread mechanism available; stuck threads are abandoned

Architecture

  • Per-php_server scope isolation with internal registry (unexported types, minimal public API)
  • Dedicated backgroundWorkerThread handler implementing threadHandler interface - no isBackgroundWorker checks in HTTP worker code paths
  • drain() on the handler interface for clean shutdown signaling
  • Persistent memory (pemalloc) with RWMutex for safe cross-thread sharing
  • set_vars skip: uses PHP's === (zend_is_identical) to detect unchanged data - skips validation, persistent copy, write lock, and version bump. Cache updated to enable pointer equality on next call.
  • Generational cache: per-thread version check skips lock + copy when data hasn't changed; repeated get_vars calls return the same array instance (=== is O(1))
  • Opcache immutable array zero-copy fast path (IS_ARRAY_IMMUTABLE)
  • Interned string optimizations (ZSTR_IS_INTERNED) - skip copy/free for shared memory strings
  • Rich type support: null, scalars, arrays (nested), enums
  • Signaling stream: pipe-based fd for stream_select() - compatible with amphp/ReactPHP event loops
  • Crash recovery with exponential backoff and automatic restart
  • Background workers integrate with existing worker infrastructure (scaling, thread management)
  • $_SERVER['FRANKENPHP_WORKER_NAME'] set for all workers (HTTP and background)
  • $_SERVER['FRANKENPHP_WORKER_BACKGROUND'] set for all workers (true/false)

Example

// Background worker: polls Redis every 5s via signaling stream
$signalingStream = frankenphp_worker_get_signaling_stream();
$redis = new Redis();
$redis->connect('127.0.0.1');

frankenphp_worker_set_vars([
    'maintenance' => (bool) $redis->get('maintenance_mode'),
    'feature_flags' => json_decode($redis->get('features'), true),
]);

while (true) {
    $r = [$signalingStream]; $w = $e = [];
    if (false === @stream_select($r, $w, $e, 5)) { break; }
    if ($r && "stop\n" === fgets($signalingStream)) { break; }

    frankenphp_worker_set_vars([
        'maintenance' => (bool) $redis->get('maintenance_mode'),
        'feature_flags' => json_decode($redis->get('features'), true),
    ]);
}
// HTTP worker
$config = frankenphp_worker_get_vars('config-watcher');
if ($config['maintenance']) {
    return new Response('Down for maintenance', 503);
}

Test coverage

16 tests covering: basic vars, at-most-once start, validation, type support (enums, binary-safe strings), multiple background workers, multiple entrypoints, crash restart, signaling stream, worker restart lifecycle, non-background-worker error handling, identity detection, generational cache.

All tests pass on PHP 8.2, 8.3, 8.4, and 8.5 with -race. Zero memory leaks on PHP debug builds.

Documentation

Full docs at docs/background-workers.md.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from e1655ab to 867e9b3 Compare March 16, 2026 20:26
@AlliBalliBaba
Copy link
Copy Markdown
Contributor

AlliBalliBaba commented Mar 16, 2026

Interesting approach to parallelism, what would be a concrete use case for only letting information flow one way from the sidekick to the http workers?

Usually the flow would be inverted, where a http worker offloads work to a pool of 'sidekick' workers and can optionally wait for a task to complete.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 2 times, most recently from da54ab8 to a06ba36 Compare March 16, 2026 21:45
@henderkes
Copy link
Copy Markdown
Contributor

Thank you for the contribution. Interesting idea, but I'm thinking we should merge the approach with #1883. The kind of worker is the same, how they are started is but a detail.

@nicolas-grekas the Caddyfile setting should likely be per php_server, not a global setting.

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 7 times, most recently from ad71bfe to 05e9702 Compare March 17, 2026 08:03
@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

nicolas-grekas commented Mar 17, 2026

@AlliBalliBaba The use case isn't task offloading (HTTP->worker), but out-of-band reconfigurability (environment->worker->HTTP). Sidekicks observe external systems (Redis Sentinel failover, secret rotation, feature flag changes, etc.) and publish updated configuration that HTTP workers pick up on their next request; with per-request consistency guaranteed via $_SERVER injection. No polling, no TTLs, no redeployment.

Task offloading (what you describe) is a valid and complementary pattern, but it solves a different problem. The non-HTTP worker foundation here could support both.

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

  • Minimal FrankenPHP config: a single sidekick_entrypoint in php_server(thanks for the idea). No need to declare individual workers in the Caddyfile. The PHP app controls which sidekicks to start via frankenphp_sidekick_start(), keeping the infrastructure config simple.

  • Graceful degradability: apps should work correctly with or without FrankenPHP. The same codebase should work on FrankenPHP (with real-time reconfiguration) and on traditional setups (with static or always refreshed config).

  • Nice framework integration: the sidekick_entrypoint pointing to e.g. bin/console means sidekicks are regular framework commands, making them easy to develop.

Happy to follow up with your proposals now that this is hopefully clarified.
I'm going to continue on my own a bit also :)

@dunglas
Copy link
Copy Markdown
Member

dunglas commented Mar 17, 2026

Great PR!

Couldn't we create a single API that covers both use case?

We try to keep the number of public symbols and config option as small as possible!

@henderkes
Copy link
Copy Markdown
Contributor

@henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:

Yes, that's why I'd like to unify the two API's and background implementations into one. Unfortunately the first task worker attempt didn't make it into main, but perhaps @AlliBalliBaba can use his experience with the previous PR to influence this one. I'd be more in favour of a general API, than a specific sidecar one.

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

The PHP-side API has been significantly reworked since the initial iteration: I replaced $_SERVER injection with explicit get_vars/set_vars protocol.

The old design used frankenphp_set_server_var() to inject values into $_SERVER implicitly. The new design uses an explicit request/response model:

  • frankenphp_sidekick_set_vars(array $vars): called from the sidekick to publish a complete snapshot atomically
  • frankenphp_sidekick_get_vars(string|array $name, float $timeout = 30.0): array: called from HTTP workers to read the latest vars

Key improvements:

  • No race condition on startup: get_vars blocks until the sidekick has called set_vars. The old design had a race where HTTP requests could arrive before the sidekick had published its values.
  • Strict context enforcement: set_vars and should_stop throw RuntimeException if called from a non-sidekick context.
  • Atomic snapshots: set_vars replaces all vars at once. No partial state possible
  • Parallel start: get_vars(['redis-watcher', 'feature-flags']) starts all sidekicks concurrently, waits for all, returns vars keyed by name.
  • Works in both worker and non-worker mode: get_vars works from any PHP script served by php_server, not just from frankenphp_handle_request() workers.

Other changes:

  • sidekick_entrypoint moved from global frankenphp block to per-php_server (as @henderkes suggested)
  • Removed the $argv parameter: the sidekick name is the command, passed as $_SERVER['argv'][1]
  • set_vars is restricted to sidekick context only (throws if called from HTTP workers)
  • get_vars accepts string|array: when given an array, all sidekicks start in parallel
  • Atomic snapshots: set_vars replaces all vars at once, no partial state
  • Binary-safe values (null bytes, UTF-8)

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 3 times, most recently from cb65f46 to 4dda455 Compare March 17, 2026 10:46
@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

Thanks @dunglas and @henderkes for the feedback. I share the goal of keeping the API surface minimal.

Thinking about it more, the current API is actually quite small and already general:

  • 1 Caddyfile setting: sidekick_entrypoint (per php_server)
  • 3 PHP functions: get_vars, set_vars, should_stop

The name "sidekick" works as a generic concept: a helper running alongside. The current set_vars/get_vars protocol covers the config-publishing use case. For task offloading (HTTP->worker) later, the same sidekick infrastructure could support:

  • frankenphp_sidekick_send_task(string $name, mixed $payload): mixed
  • frankenphp_sidekick_receive_task(): mixed

Same worker type, same sidekick_entrypoint, same should_stop(). Just a different communication pattern added on top. No new config, no new worker type.

So the path would be:

  1. This PR: sidekicks with set_vars/get_vars (config publishing)
  2. Future PR: add send_task/receive_task (task offloading), reusing the same non-HTTP worker foundation

The foundation (non-HTTP threads, cooperative shutdown, crash recovery, per-php_server scoping) is shared. Only the communication primitives differ.

WDYT?

@nicolas-grekas nicolas-grekas force-pushed the sidekicks branch 4 times, most recently from b3734f5 to ed79f46 Compare March 17, 2026 11:48
@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

nicolas-grekas commented Mar 17, 2026

I think the failures are unrelated - a cache reset would be needed. Any help on this topic?

@alexandre-daubois
Copy link
Copy Markdown
Member

alexandre-daubois commented Mar 17, 2026

Hmm, it seems they are on some versions, for example here: https://github.com/php/frankenphp/actions/runs/23192689128/job/67392820942?pr=2287#step:10:3614

For the cache, I'm not aware of a Github feature that allow to clear everything unfortunately 🙁

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

Split zones implemented, see last commit.

@AlliBalliBaba
Copy link
Copy Markdown
Contributor

I think it would still be nice to declare the workers in the global scope so you could also have a process only running background scripts. Doesn't need to be in this PR though.

@henderkes
Copy link
Copy Markdown
Contributor

I think it would still be nice to declare the workers in the global scope so you could also have a process only running background scripts. Doesn't need to be in this PR though.

If it doesn't have to be connected to any specific application, https://github.com/Baldinof/caddy-supervisor should do the trick. Also still using it to run my messenger queues.

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

We actually do support this already. Workers declared in the global frankenphp block get their own scope (empty string) and the lookup falls back to it when no php_server scope matches. So a FrankenPHP instance with only global background workers (no php_server) would work: get_vars resolves against the global scope.

What we don't do (by design) is let php_server-scoped workers access global-scope background workers. Each scope is isolated. If that cross-scope access is needed in the future, we could add a fallback chain (check own scope first, then global), but I'd rather keep it strict for now.

Do you have something else in mind?

@henderkes
Copy link
Copy Markdown
Contributor

We actually do support this already. Workers declared in the global frankenphp block get their own scope (empty string) and the lookup falls back to it when no php_server scope matches. So a FrankenPHP instance with only global background workers (no php_server) would work: get_vars resolves against the global scope.

I think that's a great compromise.

Do you have something else in mind?

I'm mostly on-board with the concept now, but there's still one thing I'd like to see solved a bit better: it's an issue that workers are implicitly marked as ready. I would much prefer that a background worker must explicitly reach some function for it to be considered in a ready state. This goes together with my intuition that streams are just a tiny bit too complicated for most people for this to be the best option.

I don't really want to circle back to a frankenphp_should_stop function, but it would solve both. Any other ideas?

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

nicolas-grekas commented Mar 25, 2026

Isn't calling set_vars the ready-state we already have?

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

Note that I would love to have workers available for CLI scripts. Something that's entirely skipped for now.
I'd leave this for a future PR.

@henderkes
Copy link
Copy Markdown
Contributor

Isn't calling set_vars the ready-state we already have?

You're absolutely right. The point about the stream logic being quite low level for users still stands, though.

Note that I would love to have workers available for CLI scripts. Something that's entirely skipped for now.
I'd leave this for a future PR.

php/php-src#21354

Should wait for that first. The current solution with an emulated cli through the embed sapi is a bit hacky and causes issues. I'd prefer not to add much to it until we get a proper cli sapi embedded.

frankenphp.c Outdated
Comment on lines +880 to +903
static bool bg_worker_validate_zval(zval *z) {
switch (Z_TYPE_P(z)) {
case IS_NULL:
case IS_FALSE:
case IS_TRUE:
case IS_LONG:
case IS_DOUBLE:
case IS_STRING:
return true;
case IS_OBJECT:
return (Z_OBJCE_P(z)->ce_flags & ZEND_ACC_ENUM) != 0;
case IS_ARRAY: {
zval *val;
ZEND_HASH_FOREACH_VAL(Z_ARRVAL_P(z), val) {
if (!bg_worker_validate_zval(val))
return false;
}
ZEND_HASH_FOREACH_END();
return true;
}
default:
return false;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theoretically, if validation fails, we could also fallback to serialization.

Copy link
Copy Markdown
Contributor Author

@nicolas-grekas nicolas-grekas Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd break SRP, I explained this above. Practically, that'd mean set_vars() could throw unattended exceptions. Better tell ppl they should split codepaths (serialize on their own or use a Symfony's DeepCloner or similar.)

@dbu
Copy link
Copy Markdown
Contributor

dbu commented Mar 27, 2026

after discussion with @nicolas-grekas today at symfony live, i tried out if we could offer a polyfill for this functionality. it is still very rough, but it shows the basic PoC by writing to a filesystem: https://github.com/dbu/shared-state-polyfill

i'd be happy to evolve on it a bit, after getting input - though probably its best to wait until this PR is merged or at least the contract is sure to be final.

(i don't want to hijack this thread, please open issues in my PoC repository to discuss the polyfill. @nicolas-grekas if you could review the basic thing before the frankenphp changes get merged, we can do a sanity check if we maybe should change something of the pattern to make it polyfill friendly. the notification stream is a bit funny for the polyfill as it would be unnecessary as each worker is its own process that receives the signals directly. and the name of the background worker being implicit makes polyfill a bit awkward, but is great for a smooth experience with frankenphp.
and last, do we really want to prefix the methods with frankenphp when the concept is applicable in other context as well?)

Comment on lines +148 to +153
struct itimerspec its;
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = 1;
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = 0;
timer_settime(thread_php_timers[idx], 0, &its, NULL);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is safe or a good idea. Would be better to store a reference to execution globals and do this instead:

zend_atomic_bool_store_ex(&EG(vm_interrupt), true);

There might also be a way to access EG directly via TSRM, haven't looked into it too much yet though.

Comment on lines +74 to +94
func (registry *backgroundWorkerRegistry) Entrypoint() string {
return registry.entrypoint
}

func (registry *backgroundWorkerRegistry) Num() int {
if registry.num <= 0 {
return 0
}
return registry.num
}

func (registry *backgroundWorkerRegistry) MaxThreads() int {
if registry.num > 0 {
return registry.num
}
return 1
}

func (registry *backgroundWorkerRegistry) SetNum(num int) {
registry.num = num
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some function here are unused

Comment on lines +60 to +62
num int // threads per background worker (0 = lazy-start with 1 thread)
maxWorkers int // max lazy-started instances (0 = unlimited)
autoStartNames []string // names to start at boot when num >= 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need these values here, aren't they already defined on the worker?
num 0 = lazy start
num 1 = autostart

preparedEnvNeedsReplacement bool
logger *slog.Logger
requestOptions []frankenphp.RequestOption
backgroundScope string
Copy link
Copy Markdown
Contributor

@AlliBalliBaba AlliBalliBaba Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope can also just be an integer. It should probably be requested from and managed by the frankenphp package.

@AlliBalliBaba
Copy link
Copy Markdown
Contributor

AlliBalliBaba commented Mar 29, 2026

Would be nice if someone with fresh eyes could look over this PR. Short summary from my side:

  • the PR is still very big, there's probably a lot of room for simplification
  • the API is very exotic. Not necessarily bad, but it would be nice if we could have more general abstractions or a similar implementation for orientation somewhere else.
  • Still feels a bit messy to allow starting background workers at runtime. Is the main use case that libraries can start workers without the user configuring them?
  • Doing shutdown via stream is very clunky
  • Would be nice to have clean sync shutdown with signals, not sure this is possible rn
  • no integration caddy_tests
  • copying across threads is still very limited in what you can copy (something to be aware of in the future)

@AlliBalliBaba
Copy link
Copy Markdown
Contributor

Another reason I dislike streams is that the polling api from (this RFC) might already be in PHP 8.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants