Skip to content

Monitoring Stack

The monitoring stack is entirely centralised on attic-gremlin. Prometheus scrapes everything; Grafana visualises it. The architecture was chosen to keep complexity off smaug and to avoid running heavyweight agents on immichbox.

Software versions

Component Version
Prometheus 3.10.0
Grafana 12.4.1
Node Exporter 1.10.2
Blackbox Exporter 0.28.0
Graphite Exporter 0.16.0

How metrics flow

attic-gremlin (node_exporter:9100) ──────────────────────┐
immichbox      (node_exporter:9100) ──────────────────────┤
immichbox      (services via blackbox:9115) ───────────────┤──► Prometheus:9090 ──► Grafana:3000
immichbox      (matrix_synapse:9092) ─────────────────────┤
smaug          (graphite push → graphite_exporter:2003/9108)┤
HAOS           (blackbox with bearer token) ───────────────┘
Ollama         (localhost:11434/metrics) ──────────────────┘ ← not yet working

smaug is the exception to the scrape-based model. Because no software runs directly on smaug, it uses TrueNAS's built-in Graphite reporting exporter to push metrics to attic-gremlin's Graphite Exporter (port 2003). Prometheus then scrapes the Graphite Exporter at port 9108. Everything else is standard pull-based scraping.

Prometheus config

Config lives at /etc/prometheus/prometheus.yml.

Reloading config — important: sudo systemctl reload prometheus does not work for this unit. Neither does the /-/reload HTTP endpoint unless --web.enable-lifecycle is added to the ExecStart flags (it isn't). The correct workflow is:

promtool check config /etc/prometheus/prometheus.yml   # validate first
sudo systemctl restart prometheus

Always validate before restarting to avoid a silent outage.

Blackbox Exporter

The Blackbox Exporter runs on attic-gremlin and probes all services over HTTP. The config lives at /etc/blackbox_exporter/config.yml.

All standard services use the http_2xx module. Home Assistant is the exception — it uses a custom http_2xx_bearer module that injects the Authorization: Bearer TOKEN header, and it has its own separate Prometheus job (homeassistant) rather than being grouped under the standard service_health job.

YAML indentation gotcha: The Blackbox config is strict YAML. Each module must be indented exactly 2 spaces under modules:. A single-space difference causes the module lookup to silently fail with empty probe results and no error message.

Full list of probed endpoints:

Service URL Module
Jellyfin http://192.168.4.30:8096/health http_2xx
Prowlarr http://192.168.4.30:9696/api/v1/health?apikey=KEY http_2xx
Radarr http://192.168.4.30:7878/api/v3/health?apikey=KEY http_2xx
Sonarr http://192.168.4.30:8989/api/v3/health?apikey=KEY http_2xx
qBittorrent http://192.168.4.30:8081/api/v2/app/version http_2xx
Immich http://192.168.4.50:30041/server-info/ping http_2xx
Requestrr http://192.168.4.30:4545/ http_2xx
Home Assistant http://192.168.4.40:8123/api/ http_2xx_bearer

Each target has a name label set in prometheus.yml (e.g. name: Jellyfin) so Grafana displays friendly names instead of raw URLs. The Grafana variable query uses label_values(probe_success{job="service_health"}, name).

Matrix Synapse metrics

Matrix Synapse runs on immichbox and exposes Prometheus metrics on port 9092. This required two changes to /etc/matrix-synapse/homeserver.yaml:

enable_metrics: true
metrics_port: 9092
metrics_bind_host: "0.0.0.0"

By default Synapse binds metrics to 127.0.0.1 only. metrics_bind_host: "0.0.0.0" is required for attic-gremlin to scrape it remotely. The UFW rule on immichbox allows port 9092 from attic-gremlin only.

After editing, restart Synapse: sudo systemctl restart matrix-synapse

Graphite Exporter (smaug metrics)

TrueNAS is configured to push Graphite metrics to attic-gremlin. Settings in TrueNAS under System → Reporting → Reporting Exporters:

Field Value
Name attic-gremlin
Type GRAPHITE
Destination IP 192.168.4.71
Destination Port 2003
Prefix smaug
Namespace truenas
Update Every 15s
Buffer On Failures 20
Send Names Instead Of IDs Yes

Grafana dashboards

Three dashboards are in use:

Node Exporter Full (ID: 1860) — works out of the box. Use the Nodename dropdown to switch between attic-gremlin and immichbox.

Prometheus Blackbox Exporter (ID: 7587) — requires two manual fixes in Grafana 12:

  1. Settings → Variables → target: set Data source to prometheus, set query to label_values(probe_success{job="service_health"}, name), click Run query — all service names should appear, then save.
  2. Row title is $target status which now correctly shows friendly names (Jellyfin, Radarr, etc.) since the variable resolves to the name label.

The dashboard has been trimmed to show only: Global Probe Duration chart, Status, HTTP Status Code, HTTP Version, Average Probe Duration, and Average DNS Lookup per service. SSL and SSL Expiry panels are intentionally absent — blackbox probes internal http:// IPs directly, bypassing Caddy's TLS termination.

The homeassistant job is separate and not included in this dashboard by default.

Smaug TrueNAS (custom, UID: smaug-truenas-001) — a custom dashboard built for this installation. Import via Dashboards → New → Import → Upload JSON (smaug-grafana-dashboard.json). Covers: system overview stat cards, per-core CPU usage and temps, memory, ZFS ARC, disk I/O per drive by serial, network throughput, and load average.

Known non-working metrics

Target Status Reason
Ollama (localhost:11434/metrics) ❌ 404 No Prometheus metrics endpoint exists at this path
Immich native metrics (192.168.4.50:30041/metrics) ❌ No response Metrics port not exposed through TrueNAS app isolation