Metrics with Prometheus: Monitoring Every Host in Your Homelab

[Part 4] Because knowing your disk is 94% full after it fills up is a special kind of pain

Chris R. Miller May 1, 2026 7 min read

Logs tell you what happened. Metrics tell you what's happening right now and what's about to happen. If Articles 2 and 3 were about building your homelab's memory, this one is about giving it a nervous system.

By the end of this article you'll have Prometheus scraping metrics from every host in the lab, dashboards showing CPU, memory, disk, and network for each machine, Pi-hole DNS statistics, and full Proxmox cluster visibility including VM and LXC status. We're also setting up ZFS pool monitoring, because a degraded RAID array you don't know about is just a data loss event that hasn't introduced itself yet.

If you haven't set up your ZFS pool yet, I covered that in a separate guide here.

Let's get into it.

Adding Prometheus to the Stack

Prometheus runs on Nexus alongside Loki and Grafana. Add it to your docker-compose.yml:

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "127.0.0.1:9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - /media/disk1/logStorage/prometheus:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=90d'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
        reservations:
          memory: 128M
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Prometheus is bound to 127.0.0.1:9090 — local only. Unlike Loki which needs to accept connections from remote Fluent Bit agents, Prometheus does the scraping itself. It doesn't need to be reachable from other hosts.

Create the storage directory:

sudo mkdir -p /media/disk1/logStorage/prometheus
sudo chmod 777 /media/disk1/logStorage/prometheus

Same AppArmor situation as Loki and Grafana. If Prometheus fails with a permission error about queries.active, chmod 777 on the storage directory is the fix.

Node Exporter: Host Metrics for Everyone

Node Exporter is a small Prometheus exporter that collects host-level metrics — CPU, memory, disk, network, temperature sensors, and more. We install it directly on the host rather than running it in Docker. The reason is practical: running it in a container introduces abstraction that skews disk metrics, hides network interfaces, and blocks access to hardware sensors. Just install it on the metal.

Installing on Ubuntu (Nexus and Vault):

wget https://github.com/prometheus/node_exporter/releases/download/v1.11.1/node_exporter-1.11.1.linux-amd64.tar.gz
tar xvf node_exporter-1.11.1.linux-amd64.tar.gz
sudo cp node_exporter-1.11.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter

Create the systemd service:

sudo tee /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Verify it's running:

sudo systemctl status node_exporter
curl http://localhost:9100/metrics | head -5

Wall of Prometheus metric lines means it's working. The curl pipe error at the end is just head closing the pipe early — not a real error.

Installing on Proxmox nodes:

Same process. Proxmox runs Debian under the hood so the commands are identical. SSH into each node and run the same install:

ssh titan
# run the same install commands
ssh hermes
# run the same install commands
ssh callisto
# run the same install commands

The Proxmox Firewall Problem

Here's the one that will ruin your day if you don't know about it upfront.

After installing node_exporter on the Proxmox nodes, you add an iptables rule to allow port 9100. Everything works. Then the power goes out. When the nodes come back online, Prometheus can't reach them. The iptables rules are gone.

Proxmox manages its own firewall, and when Proxmox's firewall service starts on boot it rebuilds iptables from its own configuration — overwriting any rules you added manually. Even with netfilter-persistent installed, Proxmox wins this fight every time.

The correct fix is to add the rule through Proxmox's own interface so it persists in the cluster config rather than the OS:

Log into the Proxmox web UI at https://192.168.1.X:8006
Go to Datacenter → Firewall
Click Add and create a rule:
- Direction: in
- Action: ACCEPT
- Protocol: tcp
- Dest. port: 9100
- Comment: node_exporter
Make sure the datacenter firewall is enabled

One rule at the datacenter level covers all nodes in the cluster automatically. It lives in Proxmox's cluster config and survives every reboot and power outage. This is the way. I learned it twice before it stuck.

Prometheus Config

Create config/prometheus.yml on Nexus:

global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['192.168.1.2:9100']
        labels:
          instance: nexus
      - targets: ['192.168.1.3:9100']
        labels:
          instance: vault
      - targets: ['192.168.1.X:9100']
        labels:
          instance: titan
      - targets: ['192.168.1.X:9100']
        labels:
          instance: hermes
      - targets: ['192.168.1.X:9100']
        labels:
          instance: callisto

  - job_name: pihole
    static_configs:
      - targets: ['pihole-exporter:9666']
        labels:
          instance: pihole-primary
      - targets: ['pihole-exporter-2:9666']
        labels:
          instance: pihole-backup

  - job_name: pve
    metrics_path: /pve
    params:
      module: [default]
      target: [192.168.1.X]
    static_configs:
      - targets: ['pve-exporter:9221']
        labels:
          instance: proxmox-cluster

Bring everything up:

docker compose up -d

Verify all targets are being scraped:

curl http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -E "health|instance"

Everything should show "health": "up". If a Proxmox node shows "health": "down" — it's the firewall issue above.

Pi-hole Metrics with pihole6-exporter

Pi-hole's Grafana dashboard isn't built on logs — it's built on metrics from the Pi-hole API. For Pi-hole v6 specifically we need the amonacoos/pihole6_exporter image which speaks to v6's new API format.

Add to docker-compose.yml on Nexus:

  pihole-exporter:
    image: amonacoos/pihole6_exporter:latest
    container_name: pihole-exporter
    environment:
      - PIHOLE_HOST=192.168.1.2
      - PIHOLE_PORT=9080
      - PIHOLE_API_KEY=${PIHOLE_PASSWORD}
      - PIHOLE_SCHEME=http
    ports:
      - "127.0.0.1:9666:9666"
    restart: unless-stopped
    depends_on:
      - prometheus
    deploy:
      resources:
        limits:
          memory: 128M
          cpus: "0.5"
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "2"

  pihole-exporter-2:
    image: amonacoos/pihole6_exporter:latest
    container_name: pihole-exporter-2
    environment:
      - PIHOLE_HOST=192.168.1.X
      - PIHOLE_PORT=80
      - PIHOLE_API_KEY=${PIHOLE2_PASSWORD}
      - PIHOLE_SCHEME=http
    ports:
      - "127.0.0.1:9667:9666"
    restart: unless-stopped
    depends_on:
      - prometheus
    deploy:
      resources:
        limits:
          memory: 128M
          cpus: "0.5"
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "2"

Add to .env:

PIHOLE_PASSWORD=your_pihole_password
PIHOLE2_PASSWORD=your_second_pihole_password

Note the port difference: the primary Pi-hole runs on port 9080 (remapped to avoid conflicts), the backup on Sentinel runs on the default port 80.

Proxmox Cluster Metrics with PVE Exporter

The PVE exporter connects to the Proxmox API and exposes metrics about the whole cluster — node CPU and memory, VM and LXC status, storage pool usage, cluster health. One exporter, whole cluster. Elegant.

First, create a read-only API user on your primary Proxmox node:

ssh titan
sudo pveum user add pve_exporter@pve
sudo pveum passwd pve_exporter@pve
sudo pveum aclmod / -user pve_exporter@pve -role PVEAuditor

PVEAuditor gives read-only access to everything we need without any ability to make changes. Always use least privilege — especially in your homelab, because the person most likely to accidentally break something is you.

Add the exporter to docker-compose.yml on Nexus:

  pve-exporter:
    image: prompve/prometheus-pve-exporter:latest
    container_name: pve-exporter
    environment:
      - PVE_USER=pve_exporter@pve
      - PVE_PASSWORD=${PVE_PASSWORD}
      - PVE_VERIFY_SSL=false
    ports:
      - "127.0.0.1:9221:9221"
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 128M
          cpus: "0.5"
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "2"

Add to .env:

PVE_PASSWORD=your_pve_exporter_password

Adding Prometheus as a Grafana Data Source

In Grafana:

Go to Connections → Data Sources → Add data source
Choose Prometheus
Set URL to http://prometheus:9090
Click Save & Test

Green checkmark and you're done.

Importing the Dashboards

Three imports, instant visibility.

Node Exporter Full — Dashboard ID: 1860

One of the most downloaded Grafana dashboards in existence. Per-host CPU, memory, disk I/O, filesystem usage, network traffic, and system load. Go to Dashboards → New → Import, enter 1860, select Prometheus, import.

After importing the instance dropdown may be empty. Go to Dashboard Settings → Variables and update the query for each variable to use Label Values mode with node_uname_info as the metric. Once populated you can switch between nexus, vault, titan, hermes, and callisto from a single dropdown.

Pi-hole v6 Stats — Dashboard ID: 21043

Import 21043, select Prometheus. Built specifically for Pi-hole v6's new API. Query volume, block rates, top blocked domains, client activity, upstream resolver performance. With two Pi-hole instances you get an instance selector at the top to switch between primary and backup.

Proxmox Cluster Overview — Dashboard ID: 10347

Import 10347 for a full cluster view — all three nodes, resource usage, and status of every VM and LXC running across the cluster. This is the one to have open on a secondary monitor during maintenance.

ZFS Pool Monitoring

If you're running a ZFS pool — and if you have a media server with multiple drives, you should be, I wrote about setting one up here — node_exporter includes a ZFS collector that's enabled by default.

Verify it's collecting data:

curl http://localhost:9100/metrics | grep zfs | head -10

You should see metrics like node_zfs_zpool_state and filesystem metrics for your pool. The Node Exporter Full dashboard includes ZFS panels that populate automatically.

The key metric for alerting — which we cover in Article 5 — is node_zfs_zpool_state. A value of 0 means the pool is ONLINE. Anything else means go look at it right now.

You can check pool status manually anytime:

sudo zpool status
sudo zpool list

Healthy output looks like this:

NAME        SIZE  ALLOC   FREE  FRAG  CAP  HEALTH
MediaPool  43.6T  21.3T  22.3T    0%   48%  ONLINE

If HEALTH says anything other than ONLINE — stop what you're doing and deal with it.

Where We Are

✅ Prometheus collecting metrics from all five hosts
✅ Node Exporter running bare-metal on every machine
✅ Pi-hole metrics for both instances
✅ Proxmox cluster metrics including VM and LXC status
✅ ZFS pool monitoring
✅ Three Grafana dashboards imported and working
✅ Proxmox firewall rules that actually survive reboots

The Series

Introduction & Architecture –– Stop Flying Blind, Series Introduction
Setting Up the Core Stack — Loki, Grafana, and Fluent Bit on your main host
Shipping Logs from Multiple Hosts — expanding Fluent Bit across your network
Metrics with Prometheus — node_exporter, Pi-hole metrics, and Proxmox monitoring
Alerting — getting notified when things actually break
Lessons Learned — everything that went wrong and how we fixed it

In Article 5 we put all of this to work with alerting — host down detection, disk space thresholds, ZFS pool health, error rate spikes — delivered to Discord and email. We'll also cover how to tune alerts so they're actually useful instead of just loud.

Go make another coffee. Article 5 is where it all pays off.

Metrics with Prometheus: Monitoring Every Host in Your Homelab

[Part 4] Because knowing your disk is 94% full after it fills up is a special kind of pain

Adding Prometheus to the Stack

Node Exporter: Host Metrics for Everyone

The Proxmox Firewall Problem

Prometheus Config

Pi-hole Metrics with pihole6-exporter

Proxmox Cluster Metrics with PVE Exporter

Adding Prometheus as a Grafana Data Source

Importing the Dashboards

ZFS Pool Monitoring

The Series

Chris R. Miller

Alerting: Getting Notified When Things Actually Break

Shipping Logs from Multiple Hosts: Expanding Fluent Bit Across Your Network

Featured posts

Give Claude a Memory with Obsidian

Debugging in an AI-Assisted Development Workflow

I Built a QR Code Generator — And It Does More Than You Think

Tags

Latest posts

Give Claude a Memory with Obsidian

Debugging in an AI-Assisted Development Workflow

I Built a QR Code Generator — And It Does More Than You Think

Follow us

Give Claude a Memory with Obsidian

Debugging in an AI-Assisted Development Workflow

I Built a QR Code Generator — And It Does More Than You Think

Adding Prometheus to the Stack

Node Exporter: Host Metrics for Everyone

The Proxmox Firewall Problem

Prometheus Config

Pi-hole Metrics with pihole6-exporter

Proxmox Cluster Metrics with PVE Exporter

Adding Prometheus as a Grafana Data Source

Importing the Dashboards

ZFS Pool Monitoring

The Series

You might also like

Featured posts

Tags

Latest posts

Follow us