Setting Up the Core Stack: Loki, Grafana, and Fluent Bit

Setting Up the Core Stack: Loki, Grafana, and Fluent Bit

[Part 2] Wherein we learn that :latest is a lie and AppArmor has opinions

Let's get into it. This article covers the actual setup of the core stack on Nexus — Loki for log storage, Grafana for visualization, and Fluent Bit to collect and ship container logs. By the end you'll have all three running, Grafana connected to Loki, and logs flowing from your Docker containers.

If you haven't read Article 1, go do that first. It covers the architecture and what we're building. If you're the type who skips straight to the recipe without reading the headnote — respect the hustle, but you'll have questions.


Before Anything Else: Pin Your Versions

I'm putting this first because it cost me the most time for the least interesting reason.

I specified fluent/fluent-bit:3.3.3 in my compose file. Confident. Specific. Pinned, like you're supposed to do. The problem: that version didn't exist. Docker pulled :latest silently, gave no error, and I spent an afternoon debugging behavior that made no sense — because I was running a completely different version than I thought.

Before you use any version number you find in a blog post — including this one — verify it exists:

docker pull grafana/loki:3.4.1

If it fails immediately, the tag doesn't exist. Here's what we're using and what I've confirmed works:

  • grafana/loki:3.4.1
  • grafana/grafana:11.6.0
  • fluent/fluent-bit:3.2.10

Directory Structure

On Nexus, create the project directory:

mkdir -p /home/youruser/docker-projects/logStack/config/grafana/provisioning
cd /home/youruser/docker-projects/logStack

Create your log storage directories on the dedicated drive. I mounted mine at /media/disk1/logStorage:

sudo mkdir -p /media/disk1/logStorage/{loki,grafana,prometheus}
sudo chmod -R 777 /media/disk1/logStorage

Yes, 777. I'll explain why in a minute.


The Docker Compose File

No version: field at the top — that was deprecated in Docker Compose V2 and generates a warning on every single command if you include it. Just leave it out.

services:

  loki:
    image: grafana/loki:3.4.1
    container_name: loki
    user: "0"
    ports:
      - "192.168.1.2:3100:3100"
    volumes:
      - ./config/loki-config.yaml:/etc/loki/local-config.yaml:ro
      - /media/disk1/logStorage/loki:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: "2.0"
        reservations:
          memory: 512M
    healthcheck:
      test: ["CMD-SHELL", "wget -q --tries=1 -O- http://localhost:3100/ready || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  grafana:
    image: grafana/grafana:11.6.0
    container_name: grafana
    user: "0"
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_AUTH_ANONYMOUS_ENABLED=false
      - GF_SECURITY_DISABLE_GRAVATAR=true
      - GF_SERVER_ENABLE_GZIP=true
      - GF_ANALYTICS_REPORTING_ENABLED=false
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_FEATURE_TOGGLES_ENABLE=lokiQuerySplittingConfig
      - GF_SMTP_ENABLED=${SMTP_ENABLED:-false}
      - GF_SMTP_HOST=${SMTP_HOST:-}
      - GF_SMTP_USER=${SMTP_USER:-}
      - GF_SMTP_PASSWORD=${SMTP_PASSWORD:-}
      - GF_SMTP_FROM_ADDRESS=${SMTP_FROM:-}
    volumes:
      - /media/disk1/logStorage/grafana:/var/lib/grafana
      - ./config/grafana/provisioning:/etc/grafana/provisioning:ro
    depends_on:
      loki:
        condition: service_healthy
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
        reservations:
          memory: 128M
    healthcheck:
      test: ["CMD-SHELL", "wget -q --tries=1 -O- http://localhost:3000/api/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  fluent-bit:
    image: fluent/fluent-bit:3.2.10
    container_name: fluent-bit
    volumes:
      - ./config/fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf:ro
      - ./config/fluent-bit-parsers.conf:/fluent-bit/etc/parsers_custom.conf:ro
      - ./config/container_name.lua:/fluent-bit/etc/container_name.lua:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/log:/var/log:ro
      - /run/log/journal:/run/log/journal:ro
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
        reservations:
          memory: 64M
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "2"

Create your .env file. Don't commit this to git:

cat > .env << 'EOF'
GRAFANA_PASSWORD=use_something_strong_here
SMTP_ENABLED=false
SMTP_HOST=
SMTP_USER=
SMTP_PASSWORD=
SMTP_FROM=
EOF

The Loki Config

A few things worth calling out before you just copy-paste this:

  • chunk_encoding: lz4 — same compression algorithm we used on the ZFS pool. Good balance of speed and compression ratio on a storage-constrained drive.
  • retention_period: 2160h — 90 days. Adjust based on your storage situation.
  • embedded_cache — if you find a config online that uses fifocache, it was written for Loki 2.x. That syntax was removed in 3.x. embedded_cache is the replacement.
  • reject_old_samples_max_age: 168h — Loki rejects log entries older than this window. Matters when you're backfilling historical logs from existing files.

Create config/loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: warn

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: "2024-01-01"
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    cache_ttl: 24h
  filesystem:
    directory: /loki/chunks

ingester:
  wal:
    enabled: true
    dir: /loki/wal
    checkpoint_duration: 5m
    flush_on_shutdown: true
  chunk_encoding: lz4
  chunk_target_size: 1572864
  chunk_idle_period: 30m
  max_chunk_age: 2h
  chunk_retain_period: 0s

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 10
  delete_request_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  retention_period: 2160h
  retention_stream:
    - selector: '{level="debug"}'
      priority: 1
      period: 24h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32
  max_query_lookback: 2160h
  max_query_length: 721h
  max_entries_limit_per_query: 50000
  split_queries_by_interval: 24h

query_range:
  cache_results: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 256
        ttl: 24h

chunk_store_config:
  chunk_cache_config:
    embedded_cache:
      enabled: true
      max_size_mb: 256
      ttl: 1h

ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /loki/rules-temp
  alertmanager_url: http://localhost:9093
  enable_api: true

The Fluent Bit Config

Fluent Bit collects logs from your Docker containers and ships them to Loki. The interesting part is the Lua script that resolves container IDs to human-readable names — more on that in the next section.

Create config/fluent-bit.conf:

[SERVICE]
    Flush         5
    Daemon        Off
    Log_Level     warn
    Parsers_File  /fluent-bit/etc/parsers.conf
    Parsers_File  /fluent-bit/etc/parsers_custom.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_Port     2020
    storage.type  filesystem
    storage.path  /tmp/flb-storage/
    storage.sync  normal
    storage.checksum off
    storage.max_chunks_up 128

# ─── INPUTS ───────────────────────────────────────────────────────────────────

[INPUT]
    Name              tail
    Tag               docker.*
    Path              /var/lib/docker/containers/*/*.log
    Exclude_Path      *fluent-bit*
    Parser            docker
    DB                /tmp/flb_docker.db
    Mem_Buf_Limit     32MB
    Skip_Long_Lines   On
    Refresh_Interval  10

[INPUT]
    Name              tail
    Tag               syslog
    Path              /var/log/syslog
    Parser            syslog-rfc3164
    DB                /tmp/flb_syslog.db
    Mem_Buf_Limit     8MB

[INPUT]
    Name              systemd
    Tag               systemd
    Systemd_Filter    _SYSTEMD_UNIT=docker.service
    Systemd_Filter    _SYSTEMD_UNIT=ssh.service
    Read_From_Tail    On
    Strip_Underscores On

# ─── FILTERS ──────────────────────────────────────────────────────────────────

[FILTER]
    Name    lua
    Match   docker.*
    script  /fluent-bit/etc/container_name.lua
    call    extract_container_id

[FILTER]
    Name          record_modifier
    Match         *
    Record        host nexus

[FILTER]
    Name    grep
    Match   docker.*
    Exclude log .*health.*check.*
    Exclude log .*GET /ping.*
    Exclude log .*GET /health.*

# ─── OUTPUTS ──────────────────────────────────────────────────────────────────

[OUTPUT]
    Name              loki
    Match             docker.*
    Host              loki
    Port              3100
    Labels            job=docker, host=nexus
    Label_Keys        $container_name
    Line_Format       json
    Retry_Limit       10

[OUTPUT]
    Name          loki
    Match         syslog
    Host          loki
    Port          3100
    Labels        job=syslog, host=nexus
    Line_Format   key_value
    Retry_Limit   10

[OUTPUT]
    Name          loki
    Match         systemd
    Host          loki
    Port          3100
    Labels        job=systemd, host=nexus
    Line_Format   key_value
    Retry_Limit   10

Create config/fluent-bit-parsers.conf:

[PARSER]
    Name        json_log
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L

The Container Name Problem

Docker stores container logs at paths like this:

/var/lib/docker/containers/7f314efc44b80a5fe925cf484db816c52d7109f6c9805510a481a099df7be607/7f314efc44b8-json.log

That 64-character hex string is the container ID. If Fluent Bit uses that as the label, your Grafana label browser looks like a cryptocurrency wallet convention. Not ideal.

The fix is a Lua script that reads each container's config.v2.json to extract the human-readable name. Docker stores this file alongside the logs for every container — it contains the container name, image, environment variables, and everything else Docker knows about it.

Create config/container_name.lua:

function extract_container_id(tag, timestamp, record)
    local container_id = nil

    for segment in tag:gmatch("[^%.]+") do
        local id = segment:match("^([a-f0-9]+)-json$")
        if id then
            container_id = id
            break
        end
    end

    if container_id then
        local config_path = "/var/lib/docker/containers/" .. container_id .. "/config.v2.json"
        local f = io.open(config_path, "r")
        if f then
            local content = f:read("*all")
            f:close()
            local name = content:match('"Name":"%/?([^"]+)"')
            record["container_name"] = name or "no_name_in_json"
        else
            record["container_name"] = "file_not_found"
        end
    else
        record["container_name"] = "no_id_extracted"
    end

    return 1, timestamp, record
end

This splits the Fluent Bit tag on dots, finds the segment ending in -json, extracts the container ID, reads the corresponding config.v2.json, and pulls out the Name field. The full debugging saga behind getting this script working is in the Lessons Learned article if you enjoy reading about other people's pain.


About That user: "0"

You'll notice both Loki and Grafana have user: "0" in the compose file. Here's why.

Ubuntu 24 and 25 ship with AppArmor enabled. AppArmor is a mandatory access control system that enforces security policies at the kernel level — and it can override filesystem permissions entirely for bind-mounted directories in Docker containers.

The symptom: you chown the directory to the correct UID. Nothing. You chmod 777. Still nothing. The container still can't write to it.

grafana | mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied

The fix is running the affected containers as root. It feels wrong. For a homelab logging stack on a dedicated internal drive, it's fine. The containers aren't internet-facing in any meaningful way, and the alternative is a weekend spent configuring AppArmor profiles.

Both Loki and Grafana need this. If Loki throws a permission error about /loki/rules on startup, same fix.


Bring It Up

docker compose up -d

Verify Loki is ready:

curl http://localhost:3100/ready

Should return ready. Check that Fluent Bit is shipping logs — give it 60 seconds after startup:

curl http://localhost:3100/loki/api/v1/labels

You should start seeing label names appear. If you get {"status":"success","data":[]} after a minute, check Fluent Bit's logs:

docker compose logs fluent-bit

One More Gotcha: Port 3000

If something is already running on port 3000 — Home Assistant, Cockpit, and Netdata all default to 3000 — Grafana will fail to start with a port binding error. Check first:

sudo ss -tlnp | grep 3000

If something's there, remap Grafana:

ports:
  - "3001:3000"

That's what we're doing here. The container still thinks it's on 3000. The outside world uses 3001. Everyone's happy.


Adding Loki as a Data Source

Open Grafana at http://your-server-ip:3001, log in, then:

  1. Go to Connections → Data Sources → Add data source
  2. Choose Loki
  3. Set URL to http://loki:3100 — use the container name, not localhost, since Grafana talks to Loki over the Docker network
  4. Click Save & Test

Green checkmark means you're good. Head to Explore, select Loki, and run:

{job="docker"}

You should see logs from your containers labeled with actual container names. If you see container IDs instead of names, the Lua script isn't running correctly — double-check the volume mount for container_name.lua in your compose file.


Where We Are

  • ✅ Loki running and storing logs on the dedicated drive
  • ✅ Grafana connected to Loki
  • ✅ Fluent Bit collecting logs from all Docker containers
  • ✅ Human-readable container names in the label browser
  • ✅ 90 days retention with LZ4 compression

The Series

  1. Introduction & Architecture –– Stop Flying Blind, Series Introduction
  2. Setting Up the Core Stack — Loki, Grafana, and Fluent Bit on your main host
  3. Shipping Logs from Multiple Hosts — expanding Fluent Bit across your network
  4. Metrics with Prometheus — node_exporter, Pi-hole metrics, and Proxmox monitoring
  5. Alerting — getting notified when things actually break
  6. Lessons Learned — everything that went wrong and how we fixed it

In Article 3 we expand Fluent Bit to the other hosts — Vault, the Pi-hole instances — and deal with applications that log to files instead of stdout. We'll also find out what happens when you discover your containers have been shipping logs to a server that hasn't existed for two years.

Go make another coffee.

Chris R. Miller

Austin, TX
I like computers.