At my house, I run a pretty involved homelab. Two Ubuntu servers, a three-node Proxmox cluster, a handful of Pi-hole instances, and more Docker containers than I care to admit. For a long time, my approach to monitoring was the same as most people's — SSH in, grep through logs, hope for the best.
That stops working around the time you have more than two machines and more than a dozen services. At that point "just SSH in and check" becomes "SSH into which one, and check what exactly?" By the time I'd tracked down the source of a problem, I'd usually aged five years and my coffee was cold.
So I built a proper centralized logging and monitoring stack. Then I documented every step — including the parts that went sideways — so you don't have to suffer through the same process.
What We're Building
This is a six-part series. By the end, you'll have:
- Centralized logs from every Docker container, every host, and your Pi-hole DNS queries — all searchable from one place
- Metrics dashboards showing CPU, memory, disk, and network across every machine in your lab
- Alerting that tells you when something actually needs attention — not every time nginx gets a 404 from a bot somewhere in Eastern Europe
- ZFS pool monitoring because a degraded drive array that you don't know about is just a data loss event that hasn't introduced itself yet
The whole thing is open source, self-hosted, and costs nothing in licensing. No Splunk subscription required.
Why This Stack?
I went looking for something similar to Humio — fast, searchable, not requiring a computer science degree to write a query. After poking around the options, I landed on this combination:
- Loki for log storage — efficient, indexes metadata rather than full content, works seamlessly with Grafana
- Fluent Bit for log collection — lightweight, fast, handles Docker container logs without breaking a sweat
- Prometheus for metrics — the industry standard, and for good reason
- Grafana for everything you actually look at — dashboards, log exploration, and alerting all in one interface
One thing worth knowing before you start: if you're running Pi-hole and want proper DNS metrics — query counts, block rates, top blocked domains — Loki alone won't give you that. Pi-hole's Grafana integration is built around Prometheus metrics, not log parsing. We'll set up both: logs via Fluent Bit and metrics via the Pi-hole v6 exporter. You'll get the full picture.
The Lab
Here's what I'm working with. Your setup will be different, but the architecture scales in both directions.
- Nexus (192.168.1.2) — Ubuntu 25, AMD chip, dedicated 512GB M.2 for log storage. This is the main logging server — Loki, Grafana, Prometheus, and the primary Fluent Bit instance all run here.
- Vault (192.168.1.3) — Ubuntu 24, Intel chip with QuickSync. Runs Plex, Jellyfin, Nextcloud, and a collection of other self-hosted services. This is the heavy lifter.
- Proxmox Cluster — Three nodes (Titan, Hermes, Callisto) running VMs and LXC containers, including two Pi-hole instances.
- Sentinel — An LXC on Hermes running Pi-hole as the backup DNS server.
Got one server and a Pi-hole? This still works. Got fifteen machines? Also works. The architecture is the same either way.
How It All Connects
[Docker Containers] ──► Fluent Bit ──►┐
[Host Syslogs] ──► Fluent Bit ──►├──► Loki ──► Grafana
[Pi-hole Logs] ──► Fluent Bit ──►│
[Proxmox Nodes] ──► node_exporter ►├──► Prometheus ──► Grafana
[All Hosts] ──► node_exporter ►│
[Pi-hole API] ──► pihole-exporter►│
[Proxmox API] ──► pve-exporter ──►┘Grafana sits at the top and talks to both Loki (logs) and Prometheus (metrics). Everything else feeds into one of those two. Clean and straightforward.
What You'll Need
- At least one Linux server running Docker and Docker Compose V2
- A dedicated drive or partition for log storage — I used a 512GB M.2, and 90 days of logs from a busy homelab fits comfortably
- Basic comfort with Docker Compose and the command line
- Coffee. More than you think.
One thing I'll tell you now that I wish someone had told me: pin your image versions. Don't use :latest for any of these containers. I spent an embarrassing amount of time debugging behavior that made no sense before realizing I was running a completely different version than I thought I was — because the tag I specified didn't actually exist, Docker pulled :latest silently, and here we are. We'll cover all the gotchas like this as we go.
The Series
- Introduction & Architecture ← you are here
- Setting Up the Core Stack — Loki, Grafana, and Fluent Bit on your main host
- Shipping Logs from Multiple Hosts — expanding Fluent Bit across your network
- Metrics with Prometheus — node_exporter, Pi-hole metrics, and Proxmox monitoring
- Alerting — getting notified when things actually break
- Lessons Learned — everything that went wrong and how we fixed it
Up Next
In Article 2 we get into the actual setup — Docker Compose, Loki config, Fluent Bit, and the AppArmor permissions issue that will absolutely bite you on Ubuntu 24/25 if you don't see it coming.
Go make a coffee first. Article 2 has config files in it.