Implementation of alarm notification based on Kubernetes Events for IoT edge clusters

background

Edge clusters (based on Raspberry Pi + K3S) need to implement basic alarm functions.

Edge Cluster Limitations

  1. CPU/memory/storage resources are tight and cannot support a complete Prometheus-based monitoring system solution that requires at least 2GB of memory and a large amount of storage (even if it is based on Prometheus Agent, it cannot be supported) (need to avoid additional storage and computing resource consumption)
  2. Network conditions cannot support the monitoring system, because the monitoring system generally needs to transmit data every 1 minute (or every moment), and the amount of data is not small;

    1. There is a 5G charging network, and the destination address of the access needs to be authorized, and the fee is charged according to the traffic, and because of the 5G network conditions, the network transmission capacity is limited and unstable (may be offline for a period of time);

key needs

In summary, the key requirements are as follows:

  1. To realize timely alarms for edge cluster abnormalities, it is necessary to know the abnormal conditions that are occurring in edge clusters;
  2. Network: The network conditions are poor, the network traffic is small, only a few destination addresses can be opened, and the situation of network instability (offline for a period of time) can be tolerated;
  3. Resources: It is necessary to avoid additional storage and computing resource consumption as much as possible

plan

In summary, the following schemes are adopted to achieve:

Alarm notification based on Kubernetes Events

architecture diagram

Technical solution planning

  1. Collect Events from various resources of Kubernetes, such as:

    1. pod
    2. node
    3. kubelet
    4. crd
    5. ...
  2. pass kubernetes-event-exporter Components to realize the collection of Kubernetes Events;
  3. Only filter Warning level Events for alarm notification (follow-up, conditions can be further defined)
  4. The alarm is sent through communication tools such as Feishu webhook (later, the sending channel can be added)

Implementation steps

Manually:

On the edge cluster, perform the following operations:

1. Create roles

as follows:

cat << _EOF_ | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: event-exporter-extra
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: monitoring
  name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-exporter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    namespace: monitoring
    name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-exporter-extra
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: event-exporter-extra
subjects:
  - kind: ServiceAccount
    namespace: kube-event-export
    name: event-exporter
_EOF_

2. Create kubernetes-event-exporter config

as follows:

cat << _EOF_ | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: event-exporter-cfg
  namespace: monitoring
data:
  config.yaml: |
    logLevel: error
    logFormat: json
    route:
      routes:
        - match:
            - receiver: "dump"      
        - drop:
            - type: "Normal"
          match:
            - receiver: "feishu"                     
    receivers:
      - name: "dump"
        stdout: {}
      - name: "feishu"
        webhook:
          endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
          headers:
            Content-Type: application/json
          layout:
            msg_type: interactive
            card:
              config:
                wide_screen_mode: true
                enable_forward: true
              header:
                title:
                  tag: plain_text
                  content: XXX IoT K3S cluster alarm
                template: red
              elements:
                - tag: div
                  text: 
                    tag: lark_md
                    content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"
      
_EOF_

🐾 Note:

  • endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." Modify it to the corresponding webhook endpoint as needed, ❌Remember not to announce it to the public! !!
  • content: XXX IoT K3S cluster alarm: Adjust to a name that is convenient and quick to identify as needed, such as: "Test K3S cluster alarm at home"

3. Create a Deployment

cat << _EOF_ | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: event-exporter
      version: v1
  template:
    metadata:
      labels:
        app: event-exporter
        version: v1
    spec:
      volumes:
        - name: cfg
          configMap:
            name: event-exporter-cfg
            defaultMode: 420
        - name: localtime
          hostPath:
            path: /etc/localtime
            type: ''
        - name: zoneinfo
          hostPath:
            path: /usr/share/zoneinfo
            type: ''
      containers:
        - name: event-exporter
          image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11
          args:
            - '-conf=/data/config.yaml'
          env:
            - name: TZ
              value: Asia/Shanghai
          volumeMounts:
            - name: cfg
              mountPath: /data
            - name: localtime
              readOnly: true
              mountPath: /etc/localtime
            - name: zoneinfo
              readOnly: true
              mountPath: /usr/share/zoneinfo
          imagePullPolicy: IfNotPresent
      serviceAccount: event-exporter
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/controlplane
                    operator: In
                    values:
                      - 'true'
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/control-plane
                    operator: In
                    values:
                      - 'true'
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: In
                    values:
                      - 'true'    
      tolerations:
        - key: node-role.kubernetes.io/controlplane
          value: 'true'
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule      
_EOF_

📝 Instructions:

  1. The related configuration of event-exporter-cfg is used to load the configuration file saved in the form of ConfigMap;
  2. The localtime zoneinfo TZ related configuration is used to modify the time zone of the pod to Asia/Shanghai, so that the final displayed notification effect is the CST time zone;
  3. The configuration related to affinity tolerations is to ensure that: in any case, it is prioritized to be dispatched to the master node and adjusted as needed. This is because the master often exists as a gateway in the edge cluster, with high configuration and long online time;

Automated Deployment

Effect: Automatic deployment when K3S is installed

On the node where the K3S server is located, create event-exporter.yaml under the /var/lib/rancher/k3s/server/manifests/ directory (if there is no such directory, create it first)

---
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: event-exporter-extra
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: monitoring
  name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-exporter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    namespace: monitoring
    name: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: event-exporter-extra
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: event-exporter-extra
subjects:
  - kind: ServiceAccount
    namespace: kube-event-export
    name: event-exporter
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: event-exporter-cfg
  namespace: monitoring
data:
  config.yaml: |
    logLevel: error
    logFormat: json
    route:
      routes:
        - match:
            - receiver: "dump"      
        - drop:
            - type: "Normal"
          match:
            - receiver: "feishu"                     
    receivers:
      - name: "dump"
        stdout: {}
      - name: "feishu"
        webhook:
          endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec"
          headers:
            Content-Type: application/json
          layout:
            msg_type: interactive
            card:
              config:
                wide_screen_mode: true
                enable_forward: true
              header:
                title:
                  tag: plain_text
                  content: xxxK3S cluster alarm
                template: red
              elements:
                - tag: div
                  text: 
                    tag: lark_md
                    content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: event-exporter
      version: v1
  template:
    metadata:
      labels:
        app: event-exporter
        version: v1
    spec:
      volumes:
        - name: cfg
          configMap:
            name: event-exporter-cfg
            defaultMode: 420
        - name: localtime
          hostPath:
            path: /etc/localtime
            type: ''
        - name: zoneinfo
          hostPath:
            path: /usr/share/zoneinfo
            type: ''
      containers:
        - name: event-exporter
          image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11
          args:
            - '-conf=/data/config.yaml'
          env:
            - name: TZ
              value: Asia/Shanghai
          volumeMounts:
            - name: cfg
              mountPath: /data
            - name: localtime
              readOnly: true
              mountPath: /etc/localtime
            - name: zoneinfo
              readOnly: true
              mountPath: /usr/share/zoneinfo
          imagePullPolicy: IfNotPresent
      serviceAccount: event-exporter
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/controlplane
                    operator: In
                    values:
                      - 'true'
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/control-plane
                    operator: In
                    values:
                      - 'true'
            - weight: 100
              preference:
                matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: In
                    values:
                      - 'true'    
      tolerations:
        - key: node-role.kubernetes.io/controlplane
          value: 'true'
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule  

Then start K3S and it will be deployed automatically.

📚️Reference:
Automated Deployment manifests and Helm charts | Rancher Documentation

final effect

As shown below:

📚️Reference Documentation

Three people walk together, there must be my teacher; knowledge sharing, the world is public. This article is sponsored by Dongfeng Weiming Technology Blog EWhisper.cn write.

Tags: Operation & Maintenance Kubernetes Container DevOps

Posted by altis88 on Thu, 16 Feb 2023 08:01:06 +0530