Skip to content
This repository was archived by the owner on Apr 13, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -223,3 +223,6 @@ generated-days/
# Node dependencies
node_modules/
.claude/settings.local.json

# Superpowers (AI agent specs/plans - not for version control)
docs/superpowers/
8 changes: 4 additions & 4 deletions docs/containers/file-uploader.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# File uploader

The file uploader is a Flask application that streams CAN CSV logs into InfluxDB 3. It exposes a simple web UI for selecting the destination bucket and monitoring progress.
The file uploader is a Flask application that streams CAN CSV logs into InfluxDB 3. It exposes a simple web UI for selecting the destination **season** (InfluxDB table within the configured database) and monitoring progress.

## Ports

Expand All @@ -10,8 +10,8 @@ The file uploader is a Flask application that streams CAN CSV logs into InfluxDB

| Variable | Description | Default |
| --- | --- | --- |
| `INFLUXDB_URL` | API endpoint for bucket discovery and writes. | `http://influxdb3:8181` |
| `INFLUXDB_TOKEN` | Token with write access to the target bucket. | `dev-influxdb-admin-token` |
| `INFLUXDB_URL` | API endpoint for table discovery and writes. | `http://influxdb3:8181` |
| `INFLUXDB_TOKEN` | Token with write access to the target database. | `dev-influxdb-admin-token` |
| `FILE_UPLOADER_WEBHOOK_URL` | Optional webhook invoked when uploads finish. | empty |
| `SLACK_WEBHOOK_URL` | Fallback webhook if the dedicated uploader value is unset. | empty |

Expand All @@ -25,6 +25,6 @@ The file uploader is a Flask application that streams CAN CSV logs into InfluxDB
## Usage

1. Visit http://localhost:8084.
2. Choose a target bucket from the drop-down (populated from the InfluxDB API).
2. Choose a target season (table) from the drop-down (populated from the InfluxDB API).
3. Upload one or more CSV files exported from the vehicle logger.
4. Monitor progress via the live event stream; notifications are sent upon completion if a webhook is configured.
14 changes: 14 additions & 0 deletions installer/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
# ------------------------------------------------------------
DBC_FILE_PATH=example.dbc

# ------------------------------------------------------------
# File uploader — team DBCs from GitHub (optional)
# ------------------------------------------------------------
# Fine-grained PAT or classic PAT with contents:read on Western-Formula-Racing/DBC
GITHUB_DBC_TOKEN=
# GITHUB_DBC_REPO=Western-Formula-Racing/DBC
# GITHUB_DBC_BRANCH=main

# Optional limits for .zip uploads (file-uploader); defaults are generous for team use
# UPLOAD_ZIP_MAX_ARCHIVE_BYTES=2147483648
# UPLOAD_ZIP_MAX_MEMBER_BYTES=4294967296
# UPLOAD_ZIP_MAX_TOTAL_UNCOMPRESSED_BYTES=25769803776
# UPLOAD_ZIP_MAX_CSV_IN_ZIP=5000

# ------------------------------------------------------------
# InfluxDB credentials
# ------------------------------------------------------------
Expand Down
197 changes: 197 additions & 0 deletions installer/VPS_RECOVERY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# VPS Recovery Guide — OVH vps-1969c8c2

## What happened

The server ran out of memory (OOM). The Linux kernel started killing processes to survive, including:
- `containerd` (Docker runtime) — taking down all containers
- `cloudflared` binary — deleted from disk
- `tailscaled` binary — deleted from disk
- `containerd-shim-runc-v2` binary — deleted from disk

OVH detected the unresponsive server and rebooted it into **rescue mode**.

---

## Step 1 — Exit rescue mode

OVH boots into rescue mode automatically when the server crashes hard. You need to manually switch it back.

1. Go to [OVH control panel](https://www.ovh.com/manager/) → Bare Metal Cloud → VPS → `vps-1969c8c2.vps.ovh.ca`
2. Find the **Boot** field (shows `RESCUE`) — click the pencil/edit icon
3. Change to **Hard disk** (normal mode)
4. Click **Reboot**

> The server will come up clean — no Docker containers will auto-start (all are `restart=no` or `restart=unless-stopped` but Docker itself won't be running until the daemon starts).

---

## Step 2 — Fix SSH known_hosts

The rescue OS has a different host key, so SSH will warn you. After rebooting to normal mode, clear the old key:

```bash
ssh-keygen -R 148.113.191.22
```

Then connect:
```bash
ssh [email protected]
# or via Tailscale:
ssh ubuntu@ovh-daq-server
```

---

## Step 3 — Restore missing binaries

The OOM killer can delete binaries from disk. Check and fix each one:

### cloudflared
```bash
sudo systemctl status cloudflared
# If "status=203/EXEC" — binary is missing

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 \
-o /tmp/cloudflared
sudo mv /tmp/cloudflared /usr/bin/cloudflared
sudo chmod +x /usr/bin/cloudflared
sudo systemctl restart cloudflared
sudo systemctl status cloudflared
```

### tailscale
```bash
sudo systemctl status tailscaled
# If "status=203/EXEC" — binary is missing

sudo apt-get install --reinstall tailscale -y
sudo systemctl restart tailscaled
tailscale status
```

### containerd (Docker runtime)
```bash
# If Docker containers fail to start with:
# "containerd-shim-runc-v2: file does not exist"

sudo apt-get install --reinstall containerd.io -y
sudo systemctl daemon-reload
sudo systemctl start docker
docker info | grep "Server Version"
```

---

## Step 4 — Start the Docker stack

```bash
cd /home/ubuntu/projects/daq-server-components/installer
docker compose up -d
```

Wait ~30 seconds for InfluxDB to become healthy, then verify:

```bash
docker ps --format 'table {{.Names}}\t{{.Status}}'
```

Expected running containers:
| Container | Notes |
|---|---|
| influxdb3 | Should show `(healthy)` |
| influxdb3-explorer | InfluxDB UI |
| grafana | Dashboard |
| grafana-bridge | Grafana API bridge |
| file-uploader | |
| data-downloader-api | Waits for influxdb3 healthy |
| data-downloader-scanner | |
| data-downloader-frontend | |
| health-monitor | |
| sandbox | |
| code-generator | |
| slackbot | Exits cleanly if `ENABLE_SLACK=false` |
| startup-data-loader | Runs once then exits — normal |

> `lap-detector` is intentionally disabled. To run it: `docker compose --profile disabled up lap-detector -d`

---

## Step 5 — Verify Cloudflare tunnel

```bash
sudo systemctl status cloudflared
# Should show "Registered tunnel connection" in logs
```

Check that https://grafana.westernformularacing.org loads.

---

## Investigating an OOM crash

If the server crashed again and you're in rescue mode:

```bash
# Mount original disk
mkdir -p /mnt/vps
mount /dev/sdb1 /mnt/vps

# Check what got OOM-killed and when
journalctl --directory=/mnt/vps/var/log/journal \
--since="2 hours ago" --no-pager \
| grep -iE "oom|killed|memory" | head -50

# Check disk usage
df -h /mnt/vps
du -sh /mnt/vps/var/lib/docker /mnt/vps/var/lib/containerd /mnt/vps/var/log/journal

# Vacuum logs if journal is large (>500MB)
journalctl --vacuum-size=200M
```

---

## Preventing OOM crashes

Memory limits are now set in `docker-compose.yml`. Key limits:

| Service | Limit |
|---|---|
| influxdb3 | 4096M |
| file-uploader | 1536M |
| data-downloader-api | 1024M |
| sandbox | 1024M |
| grafana | 512M |
| others | 128–512M |

**Total ceiling: ~9GB** across all services (server has 8GB RAM + 8GB swap).

If OOM happens again, check which container hit its limit:
```bash
docker stats --no-stream
# or check logs:
sudo journalctl -u docker --since="1 hour ago" | grep -i "oom\|killed"
```

Daily restart is scheduled at 4 AM to clear any memory accumulation:
```bash
crontab -l # shows: 0 4 * * * docker compose restart
# Logs at: /var/log/docker-restart.log
```

---

## Quick reference

| Service | Port |
|---|---|
| InfluxDB | 9000 |
| InfluxDB Explorer | 8888 |
| Grafana | 8087 (also via Cloudflare tunnel) |
| Grafana Bridge | 3001 |
| File Uploader | 8084 |
| Data Downloader API | 8000 |
| Data Downloader Frontend | 3000 |
| Lap Detector (disabled) | 8050 |

Tailscale IP: `100.72.11.60` (hostname: `ovh-daq-server`)
9 changes: 5 additions & 4 deletions installer/data-downloader/backend/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def _parse_seasons(raw: str | None) -> List[SeasonConfig]:
"""Parse SEASONS env var: "WFR25:2025:222 76 153,WFR26:2026:..."."""
if not raw:
# Default fallback if not set
return [SeasonConfig(name="WFR25", year=2025, database="WFR25", table="WFR25", color="#DE4C99")]
return [SeasonConfig(name="WFR25", year=2025, database=os.getenv("INFLUX_DATABASE", "WFR"), table="WFR25", color="#DE4C99")]

seasons = []
for part in raw.split(","):
Expand All @@ -46,13 +46,14 @@ def _parse_seasons(raw: str | None) -> List[SeasonConfig]:

color = parts[2] if len(parts) > 2 else None

# DB and table name both match season name by convention (WFR25→WFR25, WFR26→WFR26)
seasons.append(SeasonConfig(name=name, year=year, database=name, table=name, color=color))
# All seasons share one database; table name matches season name (WFR25, WFR26, etc.)
shared_db = os.getenv("INFLUX_DATABASE", "WFR")
seasons.append(SeasonConfig(name=name, year=year, database=shared_db, table=name, color=color))
except ValueError:
continue

if not seasons:
return [SeasonConfig(name="WFR25", year=2025, database="WFR25", table="WFR25")]
return [SeasonConfig(name="WFR25", year=2025, database=os.getenv("INFLUX_DATABASE", "WFR"), table="WFR25")]

# Sort by year descending (newest first)
seasons.sort(key=lambda s: s.year, reverse=True)
Expand Down
3 changes: 3 additions & 0 deletions installer/docker-compose.local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ services:
- INFLUXDB_TOKEN=${INFLUXDB_ADMIN_TOKEN:-apiv3_dev-influxdb-admin-token}
- INFLUXDB_URL=${INFLUXDB_URL:-http://influxdb3:8181}
- DBC_FILE_PATH=/installer/example.dbc
- GITHUB_DBC_TOKEN=${GITHUB_DBC_TOKEN:-}
- GITHUB_DBC_REPO=${GITHUB_DBC_REPO:-Western-Formula-Racing/DBC}
- GITHUB_DBC_BRANCH=${GITHUB_DBC_BRANCH:-main}
deploy:
resources:
limits:
Expand Down
Loading
Loading