Logs tell you what happened. Metrics tell you what's happening right now and what's about to happen. If Articles 2 and 3 were about building your homelab's memory, this one is about giving it a nervous system.
By the end of this article you'll have Prometheus scraping metrics from every host in the lab, dashboards showing CPU, memory, disk, and network for each machine, Pi-hole DNS statistics, and full Proxmox cluster visibility including VM and LXC status. We're also setting up ZFS pool monitoring, because a degraded RAID array you don't know about is just a data loss event that hasn't introduced itself yet.
If you haven't set up your ZFS pool yet, I covered that in a separate guide here.
Let's get into it.
Adding Prometheus to the Stack
Prometheus runs on Nexus alongside Loki and Grafana. Add it to your docker-compose.yml:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "127.0.0.1:9090:9090"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- /media/disk1/logStorage/prometheus:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=90d'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
restart: unless-stopped
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
reservations:
memory: 128M
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"Prometheus is bound to 127.0.0.1:9090 — local only. Unlike Loki which needs to accept connections from remote Fluent Bit agents, Prometheus does the scraping itself. It doesn't need to be reachable from other hosts.
Create the storage directory:
sudo mkdir -p /media/disk1/logStorage/prometheus
sudo chmod 777 /media/disk1/logStorage/prometheusSame AppArmor situation as Loki and Grafana. If Prometheus fails with a permission error about queries.active, chmod 777 on the storage directory is the fix.
Node Exporter: Host Metrics for Everyone
Node Exporter is a small Prometheus exporter that collects host-level metrics — CPU, memory, disk, network, temperature sensors, and more. We install it directly on the host rather than running it in Docker. The reason is practical: running it in a container introduces abstraction that skews disk metrics, hides network interfaces, and blocks access to hardware sensors. Just install it on the metal.
Installing on Ubuntu (Nexus and Vault):
wget https://github.com/prometheus/node_exporter/releases/download/v1.11.1/node_exporter-1.11.1.linux-amd64.tar.gz
tar xvf node_exporter-1.11.1.linux-amd64.tar.gz
sudo cp node_exporter-1.11.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporterCreate the systemd service:
sudo tee /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporterVerify it's running:
sudo systemctl status node_exporter
curl http://localhost:9100/metrics | head -5Wall of Prometheus metric lines means it's working. The curl pipe error at the end is just head closing the pipe early — not a real error.
Installing on Proxmox nodes:
Same process. Proxmox runs Debian under the hood so the commands are identical. SSH into each node and run the same install:
ssh titan
# run the same install commands
ssh hermes
# run the same install commands
ssh callisto
# run the same install commandsThe Proxmox Firewall Problem
Here's the one that will ruin your day if you don't know about it upfront.
After installing node_exporter on the Proxmox nodes, you add an iptables rule to allow port 9100. Everything works. Then the power goes out. When the nodes come back online, Prometheus can't reach them. The iptables rules are gone.
Proxmox manages its own firewall, and when Proxmox's firewall service starts on boot it rebuilds iptables from its own configuration — overwriting any rules you added manually. Even with netfilter-persistent installed, Proxmox wins this fight every time.
The correct fix is to add the rule through Proxmox's own interface so it persists in the cluster config rather than the OS:
- Log into the Proxmox web UI at
https://192.168.1.X:8006 - Go to Datacenter → Firewall
- Click Add and create a rule:
- Direction: in
- Action: ACCEPT
- Protocol: tcp
- Dest. port: 9100
- Comment:
node_exporter
- Make sure the datacenter firewall is enabled
One rule at the datacenter level covers all nodes in the cluster automatically. It lives in Proxmox's cluster config and survives every reboot and power outage. This is the way. I learned it twice before it stuck.
Prometheus Config
Create config/prometheus.yml on Nexus:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: node
static_configs:
- targets: ['192.168.1.2:9100']
labels:
instance: nexus
- targets: ['192.168.1.3:9100']
labels:
instance: vault
- targets: ['192.168.1.X:9100']
labels:
instance: titan
- targets: ['192.168.1.X:9100']
labels:
instance: hermes
- targets: ['192.168.1.X:9100']
labels:
instance: callisto
- job_name: pihole
static_configs:
- targets: ['pihole-exporter:9666']
labels:
instance: pihole-primary
- targets: ['pihole-exporter-2:9666']
labels:
instance: pihole-backup
- job_name: pve
metrics_path: /pve
params:
module: [default]
target: [192.168.1.X]
static_configs:
- targets: ['pve-exporter:9221']
labels:
instance: proxmox-clusterBring everything up:
docker compose up -dVerify all targets are being scraped:
curl http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -E "health|instance"Everything should show "health": "up". If a Proxmox node shows "health": "down" — it's the firewall issue above.
Pi-hole Metrics with pihole6-exporter
Pi-hole's Grafana dashboard isn't built on logs — it's built on metrics from the Pi-hole API. For Pi-hole v6 specifically we need the amonacoos/pihole6_exporter image which speaks to v6's new API format.
Add to docker-compose.yml on Nexus:
pihole-exporter:
image: amonacoos/pihole6_exporter:latest
container_name: pihole-exporter
environment:
- PIHOLE_HOST=192.168.1.2
- PIHOLE_PORT=9080
- PIHOLE_API_KEY=${PIHOLE_PASSWORD}
- PIHOLE_SCHEME=http
ports:
- "127.0.0.1:9666:9666"
restart: unless-stopped
depends_on:
- prometheus
deploy:
resources:
limits:
memory: 128M
cpus: "0.5"
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"
pihole-exporter-2:
image: amonacoos/pihole6_exporter:latest
container_name: pihole-exporter-2
environment:
- PIHOLE_HOST=192.168.1.X
- PIHOLE_PORT=80
- PIHOLE_API_KEY=${PIHOLE2_PASSWORD}
- PIHOLE_SCHEME=http
ports:
- "127.0.0.1:9667:9666"
restart: unless-stopped
depends_on:
- prometheus
deploy:
resources:
limits:
memory: 128M
cpus: "0.5"
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"Add to .env:
PIHOLE_PASSWORD=your_pihole_password
PIHOLE2_PASSWORD=your_second_pihole_passwordNote the port difference: the primary Pi-hole runs on port 9080 (remapped to avoid conflicts), the backup on Sentinel runs on the default port 80.
Proxmox Cluster Metrics with PVE Exporter
The PVE exporter connects to the Proxmox API and exposes metrics about the whole cluster — node CPU and memory, VM and LXC status, storage pool usage, cluster health. One exporter, whole cluster. Elegant.
First, create a read-only API user on your primary Proxmox node:
ssh titan
sudo pveum user add pve_exporter@pve
sudo pveum passwd pve_exporter@pve
sudo pveum aclmod / -user pve_exporter@pve -role PVEAuditorPVEAuditor gives read-only access to everything we need without any ability to make changes. Always use least privilege — especially in your homelab, because the person most likely to accidentally break something is you.
Add the exporter to docker-compose.yml on Nexus:
pve-exporter:
image: prompve/prometheus-pve-exporter:latest
container_name: pve-exporter
environment:
- PVE_USER=pve_exporter@pve
- PVE_PASSWORD=${PVE_PASSWORD}
- PVE_VERIFY_SSL=false
ports:
- "127.0.0.1:9221:9221"
restart: unless-stopped
deploy:
resources:
limits:
memory: 128M
cpus: "0.5"
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"Add to .env:
PVE_PASSWORD=your_pve_exporter_passwordAdding Prometheus as a Grafana Data Source
In Grafana:
- Go to Connections → Data Sources → Add data source
- Choose Prometheus
- Set URL to
http://prometheus:9090 - Click Save & Test
Green checkmark and you're done.
Importing the Dashboards
Three imports, instant visibility.
Node Exporter Full — Dashboard ID: 1860
One of the most downloaded Grafana dashboards in existence. Per-host CPU, memory, disk I/O, filesystem usage, network traffic, and system load. Go to Dashboards → New → Import, enter 1860, select Prometheus, import.
After importing the instance dropdown may be empty. Go to Dashboard Settings → Variables and update the query for each variable to use Label Values mode with node_uname_info as the metric. Once populated you can switch between nexus, vault, titan, hermes, and callisto from a single dropdown.
Pi-hole v6 Stats — Dashboard ID: 21043
Import 21043, select Prometheus. Built specifically for Pi-hole v6's new API. Query volume, block rates, top blocked domains, client activity, upstream resolver performance. With two Pi-hole instances you get an instance selector at the top to switch between primary and backup.
Proxmox Cluster Overview — Dashboard ID: 10347
Import 10347 for a full cluster view — all three nodes, resource usage, and status of every VM and LXC running across the cluster. This is the one to have open on a secondary monitor during maintenance.
ZFS Pool Monitoring
If you're running a ZFS pool — and if you have a media server with multiple drives, you should be, I wrote about setting one up here — node_exporter includes a ZFS collector that's enabled by default.
Verify it's collecting data:
curl http://localhost:9100/metrics | grep zfs | head -10You should see metrics like node_zfs_zpool_state and filesystem metrics for your pool. The Node Exporter Full dashboard includes ZFS panels that populate automatically.
The key metric for alerting — which we cover in Article 5 — is node_zfs_zpool_state. A value of 0 means the pool is ONLINE. Anything else means go look at it right now.
You can check pool status manually anytime:
sudo zpool status
sudo zpool listHealthy output looks like this:
NAME SIZE ALLOC FREE FRAG CAP HEALTH
MediaPool 43.6T 21.3T 22.3T 0% 48% ONLINEIf HEALTH says anything other than ONLINE — stop what you're doing and deal with it.
Where We Are
- ✅ Prometheus collecting metrics from all five hosts
- ✅ Node Exporter running bare-metal on every machine
- ✅ Pi-hole metrics for both instances
- ✅ Proxmox cluster metrics including VM and LXC status
- ✅ ZFS pool monitoring
- ✅ Three Grafana dashboards imported and working
- ✅ Proxmox firewall rules that actually survive reboots
The Series
- Introduction & Architecture –– Stop Flying Blind, Series Introduction
- Setting Up the Core Stack — Loki, Grafana, and Fluent Bit on your main host
- Shipping Logs from Multiple Hosts — expanding Fluent Bit across your network
- Metrics with Prometheus — node_exporter, Pi-hole metrics, and Proxmox monitoring
- Alerting — getting notified when things actually break
- Lessons Learned — everything that went wrong and how we fixed it
In Article 5 we put all of this to work with alerting — host down detection, disk space thresholds, ZFS pool health, error rate spikes — delivered to Discord and email. We'll also cover how to tune alerts so they're actually useful instead of just loud.
Go make another coffee. Article 5 is where it all pays off.