memory-monitor

Kubernetes sidecar that monitors a target process's memory usage on Linux, detects anomalies (spikes and slow leaks), captures jemalloc heap dumps, and uploads them to S3.

How it works

The sidecar runs alongside the target process in a shared-PID-namespace pod. It reads the process's anonymous memory (RssAnon from /proc/<pid>/status) on a configurable interval and maintains a sliding window of samples.

Detection modes

Once the history window is full, a baseline P50 (median) is established. Each tick, two checks run:

Spike: instantaneous memory > P95 × multiplier (default 3x). Catches sudden allocations.
Slow leak: current P50 > baseline P50 + threshold. Catches gradual memory growth.

Each mode has an independent cooldown timer. On detection, the monitor:

Captures a heap dump by sending dump\n to a jemalloc profiling Unix socket (/tmp/heap_dump_<binary>.sock)
Uploads the .pprof file to S3
Sends a Slack notification (if configured)
Updates the baseline P50

A baseline dump is always captured on startup (after the initial delay) for comparison.

Slack channel routing

Environment	Channel
`prod`	alerts-prod
`staging`/`shadow`	alerts-barn
other	alerts-temp

Configuration

All configuration is via environment variables.

Required

Variable	Description
`BINARY_NAME`	Name of the target process (matched via `/proc/*/comm`)
`CHECK_INTERVAL`	Seconds between memory checks
`MEMORY_CHANGE_THRESHOLD`	Bytes above baseline P50 to trigger slow-leak detection
`INITIAL_DELAY`	Seconds to wait before capturing the baseline dump
`DUMP_COOLDOWN`	Minimum seconds between slow-leak dumps
`S3_BUCKET`	S3 bucket for dump uploads
`S3_PATH_PREFIX`	S3 key prefix for dump files
`POD_NAME`	Kubernetes pod name (used in filenames and alerts)

Optional

Variable	Default	Description
`HISTORY_WINDOW_SIZE`	`60`	Number of samples in the sliding window
`SPIKE_MULTIPLIER`	`3`	Multiplier of P95 for spike detection
`SLACK_API_TOKEN`		Slack Bot token for notifications
`ENVIRONMENT`		Environment name (prod/staging/shadow) for Slack routing
`NETWORK`		Network name included in Slack alerts

Spike cooldown is computed automatically as HISTORY_WINDOW_SIZE × CHECK_INTERVAL (time for a full window refresh).

Deployment

The sidecar is deployed as a container alongside the target service. Requirements:

Shared PID namespace: shareProcessNamespace: true on the pod spec (so the sidecar can read /proc of the target process)
Shared /tmp volume: emptyDir mounted at /tmp in both containers (for the jemalloc Unix socket)
IAM role: Pod must have an IAM role with S3 write permissions (credentials are picked up automatically via the AWS SDK credential chain)

Docker image

ghcr.io/cowprotocol/memory-monitor:<tag>

Tags:

main — latest build from the main branch (used for staging/shadow)
latest — latest release (used for prod)
v* — specific version tags (e.g., v0.1.0)
sha-* — specific commit SHA

Infrastructure integration

The sidecar is wired into Kubernetes deployments via the createMemoryMonitorSidecar() factory in the infrastructure repo. The weekly release script pins the latest tag to a specific version for prod deployments.

Development

cargo build                  # Build debug
cargo build --release        # Build release
cargo test --locked          # Run all tests
cargo fmt --check            # Check formatting
cargo clippy -- -D warnings  # Lint (warnings are errors in CI)

Docker

docker build -t memory-monitor .

Architecture

main.rs          — Monitoring loop orchestration
├── config.rs    — Environment variable loading and validation
├── process.rs   — PID lookup and RssAnon reading via /proc
├── history.rs   — Ring buffer with percentile calculation
├── detection.rs — Spike and slow-leak detection with cooldowns
├── heap_dump.rs — jemalloc Unix socket communication
├── s3.rs        — S3 upload with retry
└── slack.rs     — Slack notification with channel routing

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memory-monitor

How it works

Detection modes

Slack channel routing

Configuration

Required

Optional

Deployment

Docker image

Infrastructure integration

Development

Docker

Architecture

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

memory-monitor

How it works

Detection modes

Slack channel routing

Configuration

Required

Optional

Deployment

Docker image

Infrastructure integration

Development

Docker

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages