running loki and grafana on docker swarm
Loki is a log-aggregator from the grafana team, aimed to run very cost effective. As an elastic stack feels a bit oversized for my side-projects, i just tested Loki on docker swarm.
log collector: docker-driver vs promtail
Loki comes with a log shipper called promtail. But as the docs stated, promtail is not that easy to run with docker (for example, log docker logs to a syslog server and point promtail to syslog), so they built a docker logging plugin.
https://github.com/grafana/loki/tree/master/cmd/docker-driver
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
But: the first error occures:
Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused "rootfs_linux.go:58: mounting \"\" to rootfs ...
Quick look at the github issues gave me the correct fix: https://github.com/grafana/loki/issues/2147#issuecomment-637570920
docker plugin set loki:latest data.source=/tmp
docker plugin enable loki:latest
Allright, at least the docker logging driver is up and running :)
running as swarm stack
The first try to put everything together was pretty straight forward. I created an overlay network for Loki, where every container would join that want to log to Loki.
version: '3.2'
services:
loki:
image: grafana/loki:1.5.0
networks:
- loki-net
volumes:
- loki-data:/loki
grafana:
image: grafana/grafana:7.0.1
networks:
- traefik-overlay
- loki-net
volumes:
- grafana-data:/var/lib/grafana
deploy:
labels:
- traefik.labels=stripped
logging:
driver: loki
options:
loki-url: "http://loki:3100/loki/api/v1/push"
loki-retries: "5"
loki-batch-size: "400"
networks:
loki-net:
external: true
traefik-overlay:
external: true
volumes:
loki-data:
external: true
grafana-data:
external: true
Logged into grafana and added Loki as datasource (http://loki:3100): everything is working fine… but, no logs were received. After a longer google search: the docker plugins are running on docker daemon level, so we have to check the daemon logs.
journalctl -u docker.service
# ctrl + G to jump to the end
msg=\"error sending batch, will retry\" status=-1 error=\"Post http://loki:3100/loki/api/v1/push: dial tcp: lookup lokiservice on xxx.xxx.xxx.xxx:53: no such host\""
Seems like the plugin cannot resolve the docker DNS name from Loki - and the docker plugin docs says: “currently supported network types: bridge, host, none” - so no chance to connect the logging driver plugin to the overlay network.
The only solution i found, i need to expose the port to the host, so i changed the compose file a bit
version: '3.2'
services:
loki:
image: grafana/loki:1.5.0
ports:
- "127.0.0.1:3100:3100"
networks:
- loki-net
volumes:
- loki-data:/loki
grafana:
image: grafana/grafana:7.0.1
networks:
- traefik-overlay
- loki-net
volumes:
- grafana-data:/var/lib/grafana
deploy:
labels:
- traefik.labels=stripped
logging:
driver: loki
options:
loki-url: "http://127.0.0.1:3100/loki/api/v1/push"
loki-retries: "5"
loki-batch-size: "400"
networks:
loki-net:
traefik-overlay:
external: true
volumes:
loki-data:
external: true
grafana-data:
external: true
The clever plan was only to expose the port to localhost, but i was wrong :(
https://github.com/moby/moby/issues/32299
Tested with nc -zvw3 drailing.net 3100
- port was open -.-
fix it with iptables
First of all, we need the name of our network interface
ifconfig -s -a
Let’s assume it is eth0
After that, we can -I
nsert a new rule, for -i
interface eth0 and the tcp -p
protocol on the --destination-port
3100, which -j
umps to REJECT
iptables -I DOCKER-USER -i eth0 -p tcp --destination-port 3100 -j REJECT
https://docs.docker.com/network/iptables/
With this iptable rule in place, all connections on port 3100 from the outside are rejected and Loki should be safe again.
adding log labels
Whew, that was a hard way to to see everything up and running… now we need to add some custom log labels, as the default labels from Loki are helpful but not enough to create some nice dashboards.
Wt first it was hard to understand how to parse the logs and add labels, (link: promtail pipelines) but to see the stages as a real pipeline helped and it was getting much clearer.
The docker stage is applied automatically, but as i am logging json data, i just told the logging driver which fields should be parsed explicitly, and pipe them in the labels stage, where i can just tell Loki what labels should be used. In the example below, we parse 4 extra fields in the json stage, and add all of them as label
logging:
driver: loki
options:
loki-url: "http://127.0.0.1:3100/loki/api/v1/push"
loki-retries: "5"
loki-batch-size: "400"
loki-pipeline-stages: |
- json:
expressions:
level: level
path: path
method: method
msg: msg
- labels:
msg:
level:
path:
method:
first dashboard / logQL
A simple first chart is a pie-chart, showing the percentage of output by stacks in 5 minute buckets
sum( rate({source="stderr"}[5m]) ) by (swarm_stack)
For me, the same pattern is working fine for a lot of other graphs, for example
sum( count_over_time(({swarm_stack="yourSwarmStack"}) [5m]) ) by (logLevel)
summed up
A nice log aggregator, not that easy to setup in my environment, but now they do the job pretty well.
And everything (grafana and Loki) for ~40MB ram :)