Monitoring Stack
The monitoring stack is entirely centralised on attic-gremlin. Prometheus scrapes everything; Grafana visualises it. The architecture was chosen to keep complexity off smaug and to avoid running heavyweight agents on immichbox.
Software versions
| Component | Version |
|---|---|
| Prometheus | 3.10.0 |
| Grafana | 12.4.1 |
| Node Exporter | 1.10.2 |
| Blackbox Exporter | 0.28.0 |
| Graphite Exporter | 0.16.0 |
How metrics flow
attic-gremlin (node_exporter:9100) ──────────────────────┐
immichbox (node_exporter:9100) ──────────────────────┤
immichbox (services via blackbox:9115) ───────────────┤──► Prometheus:9090 ──► Grafana:3000
immichbox (matrix_synapse:9092) ─────────────────────┤
smaug (graphite push → graphite_exporter:2003/9108)┤
HAOS (blackbox with bearer token) ───────────────┘
Ollama (localhost:11434/metrics) ──────────────────┘ ← not yet working
smaug is the exception to the scrape-based model. Because no software runs directly on smaug, it uses TrueNAS's built-in Graphite reporting exporter to push metrics to attic-gremlin's Graphite Exporter (port 2003). Prometheus then scrapes the Graphite Exporter at port 9108. Everything else is standard pull-based scraping.
Prometheus config
Config lives at /etc/prometheus/prometheus.yml.
Reloading config — important: sudo systemctl reload prometheus does not work for this unit. Neither does the /-/reload HTTP endpoint unless --web.enable-lifecycle is added to the ExecStart flags (it isn't). The correct workflow is:
promtool check config /etc/prometheus/prometheus.yml # validate first
sudo systemctl restart prometheus
Always validate before restarting to avoid a silent outage.
Blackbox Exporter
The Blackbox Exporter runs on attic-gremlin and probes all services over HTTP. The config lives at /etc/blackbox_exporter/config.yml.
All standard services use the http_2xx module. Home Assistant is the exception — it uses a custom http_2xx_bearer module that injects the Authorization: Bearer TOKEN header, and it has its own separate Prometheus job (homeassistant) rather than being grouped under the standard service_health job.
YAML indentation gotcha: The Blackbox config is strict YAML. Each module must be indented exactly 2 spaces under modules:. A single-space difference causes the module lookup to silently fail with empty probe results and no error message.
Full list of probed endpoints:
| Service | URL | Module |
|---|---|---|
| Jellyfin | http://192.168.4.30:8096/health |
http_2xx |
| Prowlarr | http://192.168.4.30:9696/api/v1/health?apikey=KEY |
http_2xx |
| Radarr | http://192.168.4.30:7878/api/v3/health?apikey=KEY |
http_2xx |
| Sonarr | http://192.168.4.30:8989/api/v3/health?apikey=KEY |
http_2xx |
| qBittorrent | http://192.168.4.30:8081/api/v2/app/version |
http_2xx |
| Immich | http://192.168.4.50:30041/server-info/ping |
http_2xx |
| Requestrr | http://192.168.4.30:4545/ |
http_2xx |
| Home Assistant | http://192.168.4.40:8123/api/ |
http_2xx_bearer |
Each target has a name label set in prometheus.yml (e.g. name: Jellyfin) so Grafana displays friendly names instead of raw URLs. The Grafana variable query uses label_values(probe_success{job="service_health"}, name).
Matrix Synapse metrics
Matrix Synapse runs on immichbox and exposes Prometheus metrics on port 9092. This required two changes to /etc/matrix-synapse/homeserver.yaml:
enable_metrics: true
metrics_port: 9092
metrics_bind_host: "0.0.0.0"
By default Synapse binds metrics to 127.0.0.1 only. metrics_bind_host: "0.0.0.0" is required for attic-gremlin to scrape it remotely. The UFW rule on immichbox allows port 9092 from attic-gremlin only.
After editing, restart Synapse: sudo systemctl restart matrix-synapse
Graphite Exporter (smaug metrics)
TrueNAS is configured to push Graphite metrics to attic-gremlin. Settings in TrueNAS under System → Reporting → Reporting Exporters:
| Field | Value |
|---|---|
| Name | attic-gremlin |
| Type | GRAPHITE |
| Destination IP | 192.168.4.71 |
| Destination Port | 2003 |
| Prefix | smaug |
| Namespace | truenas |
| Update Every | 15s |
| Buffer On Failures | 20 |
| Send Names Instead Of IDs | Yes |
Grafana dashboards
Three dashboards are in use:
Node Exporter Full (ID: 1860) — works out of the box. Use the Nodename dropdown to switch between attic-gremlin and immichbox.
Prometheus Blackbox Exporter (ID: 7587) — requires two manual fixes in Grafana 12:
- Settings → Variables →
target: set Data source toprometheus, set query tolabel_values(probe_success{job="service_health"}, name), click Run query — all service names should appear, then save. - Row title is
$target statuswhich now correctly shows friendly names (Jellyfin, Radarr, etc.) since the variable resolves to thenamelabel.
The dashboard has been trimmed to show only: Global Probe Duration chart, Status, HTTP Status Code, HTTP Version, Average Probe Duration, and Average DNS Lookup per service. SSL and SSL Expiry panels are intentionally absent — blackbox probes internal http:// IPs directly, bypassing Caddy's TLS termination.
The homeassistant job is separate and not included in this dashboard by default.
Smaug TrueNAS (custom, UID: smaug-truenas-001) — a custom dashboard built for this installation. Import via Dashboards → New → Import → Upload JSON (smaug-grafana-dashboard.json). Covers: system overview stat cards, per-core CPU usage and temps, memory, ZFS ARC, disk I/O per drive by serial, network throughput, and load average.
Known non-working metrics
| Target | Status | Reason |
|---|---|---|
Ollama (localhost:11434/metrics) |
❌ 404 | No Prometheus metrics endpoint exists at this path |
Immich native metrics (192.168.4.50:30041/metrics) |
❌ No response | Metrics port not exposed through TrueNAS app isolation |