Kubernetes sidecar that monitors a target process's memory usage on Linux, detects anomalies (spikes and slow leaks), captures jemalloc heap dumps, and uploads them to S3.
The sidecar runs alongside the target process in a shared-PID-namespace pod. It reads the process's anonymous memory (RssAnon from /proc/<pid>/status) on a configurable interval and maintains a sliding window of samples.
Once the history window is full, a baseline P50 (median) is established. Each tick, two checks run:
- Spike: instantaneous memory > P95 × multiplier (default 3x). Catches sudden allocations.
- Slow leak: current P50 > baseline P50 + threshold. Catches gradual memory growth.
Each mode has an independent cooldown timer. On detection, the monitor:
- Captures a heap dump by sending
dump\nto a jemalloc profiling Unix socket (/tmp/heap_dump_<binary>.sock) - Uploads the
.pproffile to S3 - Sends a Slack notification (if configured)
- Updates the baseline P50
A baseline dump is always captured on startup (after the initial delay) for comparison.
| Environment | Channel |
|---|---|
prod |
alerts-prod |
staging/shadow |
alerts-barn |
| other | alerts-temp |
All configuration is via environment variables.
| Variable | Description |
|---|---|
BINARY_NAME |
Name of the target process (matched via /proc/*/comm) |
CHECK_INTERVAL |
Seconds between memory checks |
MEMORY_CHANGE_THRESHOLD |
Bytes above baseline P50 to trigger slow-leak detection |
INITIAL_DELAY |
Seconds to wait before capturing the baseline dump |
DUMP_COOLDOWN |
Minimum seconds between slow-leak dumps |
S3_BUCKET |
S3 bucket for dump uploads |
S3_PATH_PREFIX |
S3 key prefix for dump files |
POD_NAME |
Kubernetes pod name (used in filenames and alerts) |
| Variable | Default | Description |
|---|---|---|
HISTORY_WINDOW_SIZE |
60 |
Number of samples in the sliding window |
SPIKE_MULTIPLIER |
3 |
Multiplier of P95 for spike detection |
SLACK_API_TOKEN |
Slack Bot token for notifications | |
ENVIRONMENT |
Environment name (prod/staging/shadow) for Slack routing | |
NETWORK |
Network name included in Slack alerts |
Spike cooldown is computed automatically as HISTORY_WINDOW_SIZE × CHECK_INTERVAL (time for a full window refresh).
The sidecar is deployed as a container alongside the target service. Requirements:
- Shared PID namespace:
shareProcessNamespace: trueon the pod spec (so the sidecar can read/procof the target process) - Shared
/tmpvolume:emptyDirmounted at/tmpin both containers (for the jemalloc Unix socket) - IAM role: Pod must have an IAM role with S3 write permissions (credentials are picked up automatically via the AWS SDK credential chain)
ghcr.io/cowprotocol/memory-monitor:<tag>
Tags:
main— latest build from the main branch (used for staging/shadow)latest— latest release (used for prod)v*— specific version tags (e.g.,v0.1.0)sha-*— specific commit SHA
The sidecar is wired into Kubernetes deployments via the createMemoryMonitorSidecar() factory in the infrastructure repo. The weekly release script pins the latest tag to a specific version for prod deployments.
cargo build # Build debug
cargo build --release # Build release
cargo test --locked # Run all tests
cargo fmt --check # Check formatting
cargo clippy -- -D warnings # Lint (warnings are errors in CI)docker build -t memory-monitor .main.rs — Monitoring loop orchestration
├── config.rs — Environment variable loading and validation
├── process.rs — PID lookup and RssAnon reading via /proc
├── history.rs — Ring buffer with percentile calculation
├── detection.rs — Spike and slow-leak detection with cooldowns
├── heap_dump.rs — jemalloc Unix socket communication
├── s3.rs — S3 upload with retry
└── slack.rs — Slack notification with channel routing