ContainerD Kubernetes Syslog Forwarding
Move from Logspout to Filebeat to support containerd logging architecture.
Join the DZone community and get the full member experience.
Join For FreeYou might have heard that starting version 1.20, Docker is no longer the container runtime in Kubernetes. Although this change didn't affect the core functionality of Kubernetes, or how pods work in their clusters, there were users that relied on resources provided by the Docker engine. A small sentence in the blog article calls out that a critical component would be affected: logging.
Docker was not the only container runtime at the time of this change. Most cloud providers of Kubernetes (GKE, EKS, or AKS) managed this upgrade by defaulting the new cluster's runtime to containerd. With this, their native tooling to export logs to their own logging services was properly migrated. If you would deploy a new cluster in version 1.20, you wouldn't notice that something has changed. Behind the scenes, the monitoring agents were upgraded along with the clusters to start using containerd as a source for logs. No outages, no missing information.
But for those users relying on a third-party logging solution, changing to containerd would break the integration. Loggly, Papertrail, and Syslog destinations using DaemonSets workloads like Logspout were all impacted. They all relied on the Docker runtime to grab logs and send them to a syslog server.
One solid option I tried is rkubelog. This is a single deployment component that fetches logs from the available Kubernetes API resource via a cluster role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
namespace: default
name: rkubelog-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods", "pods/log"]
verbs: ["get", "watch", "list"]
I found, though, that this doesn't fit every need. This is a short list of the scenarios in which rkubelog might not be the proper shoe size for your cluster:
- You're not allowed to create Cluster Role bindings in the cluster you're working on.
- You're more comfortable distributing the load from log traffic to a cluster-wide DaemonSet instead of having rkubelog fetch and send everything from a single pod. In my particular case, the amount of logs we generate is around 1TB a month due to the size of the cluster, and the resource allocation for that single pod to work with that amount of logs was significant.
- You require more control over the tags and source labels when sending logs to a Syslog server. There is no filtering or transformation of data available when using rkubelog. The logs that are retrieved are the ones sent.
- You require to maintain the same format of host/system labels you had before the migration. rkubelog has little room for customization. Although you can fork the project and change the format of the logs being sent.
- You require some level of filtering before sending the logs to a destination.
So if you find yourself looking for an alternative, the following approach relies on Filebeat (as a DaemonSet) and Logstash to do the job. For this article, I will use Papertrail as a Syslog server as a destination.
Filebeat Installation
Basically, Filebeat will grab the logs from every node /var/log folder and push that data to Logstash. While doing that, it will populate some fields in the objects being sent to allow Logstash to identify what pod are the logs coming from. This is a summary of the pipeline used in Filebeat:
- It excludes some logs based on their filenames. Some of them will be mandatory for the functionality, like filebeat.* and logstash.* as they could cause recursive logging. But the rest are up to you. You can ignore logs from the Kubernetes System namespace or the ones coming from pods that don't really have relevant data.
- Then, it drops the 'host' field (if exists), so it can be populated from the filename in the next step.
- This next step is to generate some fields based on the filename of the logs. The format used in this example es
/var/log/containers/%{name}_%{host}_%{uuid}.log
- Then there is a dissect pipeline that changes the format of every log line. This is because the format in the hosts has a timestamp itself. I found this format tokenizer generic enough for most cases. But if your cluster has a different output, this is the place to tweak.
- From that previous step, the %{parsed} field is the one with the actual message. So the next two steps are to drop the current value of the key field "message" and replace it with the new parsed one.
- At the very end, you just need to add the Logstash destination address. Commented, there is a line to print the output of Filebeat to the stdout. This is useful if the pipeline doesn't really get the message parsed in your scenario.
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: monitoring
labels:
k8s-app: filebeat
data:
filebeat.yml: |-
filebeat.inputs:
- type: log
enabled: true
symlinks: true
exclude_files: ['filebeat.*',
'logstash.*',
'azure.*',
'kube.*',
'ignite.*',
'influx.*',
'prometheus.*',
'rkubelog.*',
'node-exporter.*']
paths:
- /var/log/containers/*.log
processors:
- drop_fields:
fields: ["host"]
ignore_missing: true
- dissect:
tokenizer: "/var/log/containers/%{name}_%{host}_%{uuid}.log"
field: "log.file.path"
target_prefix: ""
overwrite_keys: true
- dissect:
tokenizer: "%{header} F %{parsed}"
field: "message"
target_prefix: ""
overwrite_keys: true
- drop_fields:
fields: ["message"]
ignore_missing: true
- rename:
fields:
- from: "parsed"
to: "message"
ignore_missing: true
fail_on_error: false
#output.console:
#pretty: true
output.logstash:
hosts: ["${LOGSTASH_HOST}:${LOGSTASH_PORT}"]
Once the configuration is done, you just need to create a DaemonSet for Filebeat. Make sure you have all the proper toleration labels to include tainted nodes.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: monitoring
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.4.0
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
env:
- name: LOGSTASH_HOST
value: "logstash"
- name: LOGSTASH_PORT
value: "5100"
securityContext:
runAsUser: 0
#If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0600
name: filebeat-config
- name: varlog
hostPath:
path: /var/log
# data folder stores a registry of read status for all files,
# so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
path: /var/lib/filebeat-data
type: DirectoryOrCreate
tolerations:
- key: taintedLabel
operator: Equal
value: specialNode
effect: NoSchedule
Logstash Installation
Once Filebeat is running, you require a Logstash deployment pointing to the Syslog server (Papertrail in this example).
I will rely on the output/tcp plugin for this connection. You can also use the output/syslog plugin, but I found this one to be a little bit more flexible with mutations pipelines. The configuration file for Logstash looks like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
namespace: monitoring
data:
logstash.conf: |-
input {
beats {
port => 5100
host => "0.0.0.0"
type => "log"
}
}
filter {
mutate {
replace => { "message" => "%{name} %{message}" }
}
}
output {
tcp {
codec => "line"
host => "logs.papertrailapp.com"
port => 59999
}
}
Basically, this is opening a Filebeat input port at 5100, mutating the message field to include the name of the pod (coming from the Filebeat parsing step), and including it into the message. I'm doing this because, in the Papertrail log input, the first word actually determines the `program` from which grouping and filtering can be done. This is what I meant with customization. Before you send out the logs, you can add as many mutation steps as you need.
The last line is the TCP connection to a Papertrail endpoint. The Logstash deployment looks like this. Don't forget that the Logstash pod requires a service to be reached from Filebeat.
apiVersion: apps/v1
kind: Deployment
metadata:
name: logstash
namespace: monitoring
labels:
k8s.service: logstash
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
k8s.service: logstash
template:
metadata:
labels:
k8s.service: logstash
spec:
containers:
- image: docker.elastic.co/logstash/logstash:7.4.0
imagePullPolicy: "Always"
name: logstash
ports:
- containerPort: 5100
resources:
limits:
memory: 1024Mi
requests:
memory: 1024Mi
volumeMounts:
- mountPath: /usr/share/logstash/pipeline/logstash.conf
subPath: logstash.conf
name: logstash-config
hostname: logstash
restartPolicy: Always
volumes:
- name: logstash-config
configMap:
name: logstash-config
---
apiVersion: v1
kind: Service
metadata:
namespace: monitoring
labels:
k8s.service: logstash
name: logstash
spec:
ports:
- port: 5100
targetPort: 5100
protocol: TCP
name: logstash
selector:
k8s.service: logstash
More documentation and this example can be found here: https://github.com/miguelcallejasp/logging-filebeat-containerd.
Opinions expressed by DZone contributors are their own.
Comments