running loki and grafana on docker swarm

Loki is a log-aggregator from the grafana team, aimed to run very cost effective. As an elastic stack feels a bit oversized for my side-projects, i just tested Loki on docker swarm.

log collector: docker-driver vs promtail

Loki comes with a log shipper called promtail. But as the docs stated, promtail is not that easy to run with docker (for example, log docker logs to a syslog server and point promtail to syslog), so they built a docker logging plugin.

https://github.com/grafana/loki/tree/master/cmd/docker-driver

docker plugin install  grafana/loki-docker-driver:latest --alias loki --grant-all-permissions

But: the first error occures:

Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "rootfs_linux.go:58: mounting \"\" to rootfs ...

Quick look at the github issues gave me the correct fix: https://github.com/grafana/loki/issues/2147#issuecomment-637570920

docker plugin set loki:latest data.source=/tmp
docker plugin enable loki:latest

Allright, at least the docker logging driver is up and running :)

running as swarm stack

The first try to put everything together was pretty straight forward. I created an overlay network for Loki, where every container would join that want to log to Loki.

version: '3.2'

services:
  loki:
    image: grafana/loki:1.5.0
    networks: 
      - loki-net
    volumes:
      - loki-data:/loki
      
  grafana:
    image: grafana/grafana:7.0.1
    networks: 
      - traefik-overlay
      - loki-net
    volumes:
      - grafana-data:/var/lib/grafana
    deploy:
      labels:
        - traefik.labels=stripped
    logging:
      driver: loki
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"
        loki-retries: "5"
        loki-batch-size: "400"

networks:
  loki-net:
    external: true
  traefik-overlay:
    external: true

volumes:
  loki-data:
    external: true
  grafana-data:
    external: true

Logged into grafana and added Loki as datasource (http://loki:3100): everything is working fine… but, no logs were received. After a longer google search: the docker plugins are running on docker daemon level, so we have to check the daemon logs.

journalctl -u docker.service
# ctrl + G to jump to the end
msg=\"error sending batch, will retry\" status=-1 error=\"Post http://loki:3100/loki/api/v1/push: dial tcp: lookup lokiservice on xxx.xxx.xxx.xxx:53: no such host\"" 

Seems like the plugin cannot resolve the docker DNS name from Loki - and the docker plugin docs says: “currently supported network types: bridge, host, none” - so no chance to connect the logging driver plugin to the overlay network.

The only solution i found, i need to expose the port to the host, so i changed the compose file a bit

version: '3.2'

services:
  loki:
    image: grafana/loki:1.5.0
    ports:
      - "127.0.0.1:3100:3100"
    networks: 
      - loki-net
    volumes:
      - loki-data:/loki
      
  grafana:
    image: grafana/grafana:7.0.1
    networks: 
      - traefik-overlay
      - loki-net
    volumes:
      - grafana-data:/var/lib/grafana
    deploy:
      labels:
        - traefik.labels=stripped
    logging:
      driver: loki
      options:
        loki-url: "http://127.0.0.1:3100/loki/api/v1/push"
        loki-retries: "5"
        loki-batch-size: "400"

networks:
  loki-net:
  traefik-overlay:
    external: true

volumes:
  loki-data:
    external: true
  grafana-data:
    external: true

The clever plan was only to expose the port to localhost, but i was wrong :(

https://github.com/moby/moby/issues/32299

Tested with nc -zvw3 drailing.net 3100 - port was open -.-

fix it with iptables

First of all, we need the name of our network interface

ifconfig -s -a

Let’s assume it is eth0

After that, we can -Insert a new rule, for -iinterface eth0 and the tcp -pprotocol on the --destination-port 3100, which -jumps to REJECT

iptables -I DOCKER-USER -i eth0 -p tcp --destination-port 3100 -j REJECT

https://docs.docker.com/network/iptables/

With this iptable rule in place, all connections on port 3100 from the outside are rejected and Loki should be safe again.

adding log labels

Whew, that was a hard way to to see everything up and running… now we need to add some custom log labels, as the default labels from Loki are helpful but not enough to create some nice dashboards.

Wt first it was hard to understand how to parse the logs and add labels, (link: promtail pipelines) but to see the stages as a real pipeline helped and it was getting much clearer.

The docker stage is applied automatically, but as i am logging json data, i just told the logging driver which fields should be parsed explicitly, and pipe them in the labels stage, where i can just tell Loki what labels should be used. In the example below, we parse 4 extra fields in the json stage, and add all of them as label

    logging:
      driver: loki
      options:
        loki-url: "http://127.0.0.1:3100/loki/api/v1/push"
        loki-retries: "5"
        loki-batch-size: "400"
        loki-pipeline-stages: |
          - json:
              expressions:
                level: level
                path: path
                method: method
                msg: msg
          - labels:
              msg: 
              level:
              path:
              method:

first dashboard / logQL

A simple first chart is a pie-chart, showing the percentage of output by stacks in 5 minute buckets

sum( rate({source="stderr"}[5m]) ) by (swarm_stack)

For me, the same pattern is working fine for a lot of other graphs, for example

sum( count_over_time(({swarm_stack="yourSwarmStack"}) [5m]) ) by (logLevel)

summed up

A nice log aggregator, not that easy to setup in my environment, but now they do the job pretty well.

And everything (grafana and Loki) for ~40MB ram :)

Last posts

Tags