background
Edge clusters (based on Raspberry Pi + K3S) need to implement basic alarm functions.
Edge Cluster Limitations
- CPU/memory/storage resources are tight and cannot support a complete Prometheus-based monitoring system solution that requires at least 2GB of memory and a large amount of storage (even if it is based on Prometheus Agent, it cannot be supported) (need to avoid additional storage and computing resource consumption)
Network conditions cannot support the monitoring system, because the monitoring system generally needs to transmit data every 1 minute (or every moment), and the amount of data is not small;
- There is a 5G charging network, and the destination address of the access needs to be authorized, and the fee is charged according to the traffic, and because of the 5G network conditions, the network transmission capacity is limited and unstable (may be offline for a period of time);
key needs
In summary, the key requirements are as follows:
- To realize timely alarms for edge cluster abnormalities, it is necessary to know the abnormal conditions that are occurring in edge clusters;
- Network: The network conditions are poor, the network traffic is small, only a few destination addresses can be opened, and the situation of network instability (offline for a period of time) can be tolerated;
- Resources: It is necessary to avoid additional storage and computing resource consumption as much as possible
plan
In summary, the following schemes are adopted to achieve:
Alarm notification based on Kubernetes Events
architecture diagram
Technical solution planning
Collect Events from various resources of Kubernetes, such as:
- pod
- node
- kubelet
- crd
- ...
- pass kubernetes-event-exporter Components to realize the collection of Kubernetes Events;
- Only filter Warning level Events for alarm notification (follow-up, conditions can be further defined)
- The alarm is sent through communication tools such as Feishu webhook (later, the sending channel can be added)
Implementation steps
Manually:
On the edge cluster, perform the following operations:
1. Create roles
as follows:
cat << _EOF_ | kubectl apply -f - --- apiVersion: v1 kind: Namespace metadata: name: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch --- apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter _EOF_
2. Create kubernetes-event-exporter config
as follows:
cat << _EOF_ | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: XXX IoT K3S cluster alarm template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" _EOF_
🐾 Note:
- endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." Modify it to the corresponding webhook endpoint as needed, ❌Remember not to announce it to the public! !!
- content: XXX IoT K3S cluster alarm: Adjust to a name that is convenient and quick to identify as needed, such as: "Test K3S cluster alarm at home"
3. Create a Deployment
cat << _EOF_ | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule _EOF_
📝 Instructions:
- The related configuration of event-exporter-cfg is used to load the configuration file saved in the form of ConfigMap;
- The localtime zoneinfo TZ related configuration is used to modify the time zone of the pod to Asia/Shanghai, so that the final displayed notification effect is the CST time zone;
- The configuration related to affinity tolerations is to ensure that: in any case, it is prioritized to be dispatched to the master node and adjusted as needed. This is because the master often exists as a gateway in the edge cluster, with high configuration and long online time;
Automated Deployment
Effect: Automatic deployment when K3S is installed
On the node where the K3S server is located, create event-exporter.yaml under the /var/lib/rancher/k3s/server/manifests/ directory (if there is no such directory, create it first)
--- apiVersion: v1 kind: Namespace metadata: name: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch --- apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter --- apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec" headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: xxxK3S cluster alarm template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" --- apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule
Then start K3S and it will be deployed automatically.
📚️Reference:
Automated Deployment manifests and Helm charts | Rancher Documentation
final effect
As shown below:
📚️Reference Documentation
- opsgenie/kubernetes-event-exporter: Export Kubernetes events to multiple destinations with routing and filtering (github.com)
- AliyunContainerService/kube-eventer: kube-eventer emit kubernetes events to sinks (github.com)
- kubesphere/kube-events: K8s Event Exporting, Filtering and Alerting in Multi-Tenant Environment (github.com)
- kubesphere/notification-manager: K8s native notification management with multi-tenancy support (github.com)
Three people walk together, there must be my teacher; knowledge sharing, the world is public. This article is sponsored by Dongfeng Weiming Technology Blog EWhisper.cn write.