Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,16 @@ jobs:
name: codecov-unit-node-${{ matrix.node-version }}
fail_ci_if_error: false

# Integration tests (v5): one job at a time (max-parallel: 1) to avoid 502/503 on shared Conductor.
# Sharding (--shard i/N) splits the suite so each job runs ~1/N of tests — keeps per-job under timeout.
# Integration tests (v5): lower max-parallel reduces 502/503 from the shared Conductor server
# but makes CI slower without eliminating flakes entirely — feel free to experiment.
# Sharding (--shard i/N) splits the suite so each job runs ~1/N of tests.
# fetchWithRetry now retries 502/503/504, so higher parallelism is more viable than before.
integration-tests:
runs-on: ubuntu-latest
timeout-minutes: 25
strategy:
fail-fast: false
max-parallel: 2
max-parallel: 3
matrix:
node-version: [20, 22, 24]
shard: [1, 2, 3]
Expand Down
10 changes: 5 additions & 5 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ src/sdk/ # Main SDK source
decorators/worker.ts # @worker decorator + dual-mode support
decorators/registry.ts # Global registry (register/get/clear)
context/TaskContext.ts # AsyncLocalStorage per-task context
metrics/ # MetricsCollector, MetricsServer, PrometheusRegistry
metrics/ # LegacyMetricsCollector, CanonicalMetricsCollector, metricsFactory, MetricsServer, PrometheusRegistry, CanonicalPrometheusRegistry, accumulators, httpObserver
schema/ # jsonSchema, schemaField decorators
generators/ # Legacy generators (pre-v3, still exported for compat)
src/open-api/ # OpenAPI layer
Expand Down Expand Up @@ -211,10 +211,10 @@ public async someMethod(args): Promise<T> {

### Metrics Documentation (METRICS.md)

When adding, removing, or renaming metrics in `src/sdk/worker/metrics/MetricsCollector.ts`:
1. Update `METRICS.md` to reflect the change (name, type, labels, description)
2. Ensure both `MetricsCollector.toPrometheusText()` and `PrometheusRegistry.createMetrics()` are updated in sync — missing a summary/counter in either causes silent data loss
3. Update the metric count in the METRICS.md overview section
When adding, removing, or renaming metrics in `src/sdk/worker/metrics/`:
1. Update both `LegacyMetricsCollector.ts` and `CanonicalMetricsCollector.ts` (or add a no-op stub in the collector that does not emit the metric)
2. Ensure `toPrometheusText()` and the corresponding `PrometheusRegistry` / `CanonicalPrometheusRegistry` are updated in sync — missing a metric in either causes silent data loss
3. Update `METRICS.md` to reflect the change in both the legacy and canonical catalog tables
4. Add or update the corresponding direct recording method documentation if applicable

### SDK_NEW_LANGUAGE_GUIDE.md
Expand Down
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **Metrics harmonization** - canonical metric surface aligned with the cross-SDK catalog, opt-in via `WORKER_CANONICAL_METRICS=true`
- New `CanonicalMetricsCollector` and optional `CanonicalPrometheusRegistry` (prom-client adapter) emit the harmonized cross-SDK catalog: 12 counters (e.g. `task_poll_total`, `task_execution_started_total`, `task_paused_total`, `external_payload_used_total{entityName,operation,payloadType}`, `workflow_start_error_total{workflowType,exception}`), 4 time histograms (`task_poll_time_seconds`, `task_execute_time_seconds`, `task_update_time_seconds`, `http_api_client_request_seconds{method,uri,status}`) with buckets `0.001…10s`, 2 size histograms (`task_result_size_bytes`, `workflow_input_size_bytes{workflowType,version}`) with buckets `100…10_000_000` bytes, and `active_workers` gauge. Labels are camelCase; names are unprefixed.
- `createMetricsCollector()` factory selects `LegacyMetricsCollector` (default) or `CanonicalMetricsCollector` based on `WORKER_CANONICAL_METRICS` (truthy: `true`, `1`, `yes`, case-insensitive). `WORKER_LEGACY_METRICS` is also recognized; canonical wins when both are set.
- `HttpMetricsObserver` plus `fetchWithRetry` instrumentation records `http_api_client_request_seconds`; `WorkflowExecutor` records `workflow_input_size_bytes` and `workflow_start_error_total`.
- `Poller`, `TaskRunner`, and `EventDispatcher` emit a new `taskAckFailed` event and propagate output size, exception cause, and status.
- `fetchWithRetry` now retries on HTTP 502/503/504.
- Harness deployment manifest sets `WORKER_CANONICAL_METRICS=true`; `harness/main.ts` logs which collector is active.

### Changed

- **Metrics harmonization** - defaults preserved; legacy metrics emit unchanged when `WORKER_CANONICAL_METRICS` is unset
- `src/sdk/worker/metrics/MetricsCollector.ts` was renamed to `LegacyMetricsCollector.ts`. The public symbol is preserved via `export { LegacyMetricsCollector as MetricsCollector }` in `src/sdk/worker/metrics/index.ts`, so existing imports keep working.
- Default behavior is unchanged: with no env var set, the metric names, labels, and `conductor_worker_*` prefix from `v3.0.3` are preserved byte-for-byte.
- Rewrote `METRICS.md` with both surfaces, the env-var gate, side-by-side migration table with PromQL replacements, and troubleshooting.
- Updated `README.md`, `AGENTS.md`, `SDK_DEVELOPMENT.md`, `SDK_COMPARISON.md`, and `WORKER_ARCHITECTURE_COMPARISON.md` to reference `createMetricsCollector()` and the env var.
Loading
Loading