Minimalist Prometheus monitoring practice

1. Introduction and Table of Contents

Monitoring is a key part of observability in the current cloud-native era. Compared with the previous era, many changes have taken place in the cloud-native era. Technologies such as microservices and containerization are emerging one after another, and the evolution speed of the cloud-native era has been updated. The speed is extremely fast, the amount of data generated by the corresponding monitoring is greatly increased, and the requirements for real-time performance are also greatly increased. In order to cope with the changes, Prometheus came into being. The functions it can achieve, the excellent compatibility with cloud native, and the convenience of integrating third-party open source components make it undoubtedly one of the most dazzling stars.

This article focuses on how to use Prometheus to build a monitoring system, covering probes, indicator settings, visualization, alarm settings, container monitoring, etc. This is an entry-level tutorial, which does not cover gateway, K8S cluster, etc. About the basic knowledge and concepts of Prometheus, google it by yourself. This article focuses on describing the actual combat process.

Table of contents:

  1. Deploy Prometheus Server
  2. Deploy monitoring probes
  3. Deploy Grafana
  4. Deploy AlertManager
  5. Deploy PrometheusAlert

2. Deploy Prometheus Server

This section mainly introduces the deployment of Prometheus Server in docker mode and reserves mapping related configuration items

2.1 Configuration environment

  1. Create folder and grant permissions
sudo mkdir -pv /data/docker/prometheus/{data,alert_rules,job}
sudo chown -R myusername:myusername /data/docker/prometheus/

in,

  • The data folder is used to store the data generated by prometheus
  • The alert_rules folder is used to store the prometheus alert alert rule configuration file
  • job is used to store the monitoring object configuration json file
  • myusername can be replaced with the actual username
  1. Execute this command to avoid permission denied error
sudo chown 65534:65534 -R /data/docker/prometheus/data
  1. Copy the configuration file to the specified directory. Note that you need to pay attention to the part involving "$ip" in the file. For subsequent configuration, such as adding AlertManager, remember to return to modify it here.
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    # - targets: ["$ip:9093"]
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - /etc/prometheus/alert_rules/*.rules

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    file_sd_configs:
    - files:
      - /etc/prometheus/job/prometheus.json
      refresh_interval: 1m  # reload configuration file

  # Node host group
  - job_name: 'host'
    #basic_auth:
    #  username: prometheus
    #  password: prometheus
    file_sd_configs:
    - files:
      - /etc/prometheus/job/host.json
      refresh_interval: 1m

  # cadvisor container group
  - job_name: 'cadvisor'
    file_sd_configs:
    - files:
      - /etc/prometheus/job/cadvisor.json
      refresh_interval: 1m

  # mysql exporter group
  - job_name: 'mysqld-exporter'
    file_sd_configs:
    - files:
      - /etc/prometheus/job/mysqld-exporter.json
      refresh_interval: 1m

  # blackbox ping group
  - job_name: 'blackbox_ping'
    scrape_interval: 5s
    scrape_timeout: 2s
    metrics_path: /probe
    params:
      module: [ping]
    file_sd_configs:
    - files:
      - /etc/prometheus/job/blackbox/ping/*.json
      refresh_interval: 1m
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: $ip:9115

  # blackbox http get 2xx group
  - job_name: 'blackbox_http_2xx'
    scrape_interval: 5s
    metrics_path: /probe
    params:
      module: [http_2xx]
    file_sd_configs:
    - files:
      - /etc/prometheus/job/blackbox/http_2xx/*.json
      refresh_interval: 1m
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: $ip:9115

  - job_name: "blackbox_tcp"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    file_sd_configs:
    - files:
      - /etc/prometheus/job/blackbox/tcp/*.json
      refresh_interval: 1m
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: $ip:9115

  - job_name: 'blackbox_ssh_banner'
    metrics_path: /probe
    params:
      module: [ssh_banner]
    file_sd_configs:
    - files:
      - /etc/prometheus/job/blackbox/ssh_banner/*.json
      refresh_interval: 1m
    relabel_configs:
      # Ensure port is 22, pass as URL parameter
      - source_labels: [__address__]
        regex: (.*?)(:.*)?
        replacement: ${1}:22
        target_label: __param_target
      # Make instance label the target
      - source_labels: [__param_target]
        target_label: instance
      # Actually talk to the blackbox exporter though
      - target_label: __address__
        replacement: $ip:9115

  - job_name: "blackbox_dns"
    metrics_path: /probe
    params:
      module: [dns_udp]
    file_sd_configs:
    - files:
      - /etc/prometheus/job/blackbox/dns/*.json
      refresh_interval: 1m
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: $ip:9115

2.2 Start the server

docker run -itd  \
  -p 9090:9090 \
  -v /data/docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro \
  -v /data/docker/prometheus/alert_rules:/etc/prometheus/alert_rules \
  -v /data/docker/prometheus/job:/etc/prometheus/job \
  -v /data/docker/prometheus/data:/data/prometheus/ \
  -v /etc/timezone:/etc/timezone:ro \
  -v /etc/localtime:/etc/localtime:ro \
  --name prometheus \
  --restart=always \
  prom/prometheus:v2.28.1 \
  --config.file=/etc/prometheus/prometheus.yml  \
  --storage.tsdb.path=/data/prometheus/ \
  --storage.tsdb.retention.time=30d \
  --web.read-timeout=5m \
  --web.max-connections=10 \
  --query.max-concurrency=20 \
  --query.timeout=2m \
  --web.enable-lifecycle

After the startup is successful, visit http://$ip:9090 through the browser to see the interface.

If the system has a firewall turned on, you may need to whitelist the following ports, taking centos7 as an example,

sudo firewall-cmd --zone=public --add-port=9090/tcp --permanent
sudo firewall-cmd --zone=public --add-port=9100/tcp --permanent
sudo firewall-cmd --zone=public --add-port=3000/tcp --permanent
sudo firewall-cmd --reload

2.3 Deploying Prometheus Server Reference Documentation

https://prometheus.io/docs/prometheus/latest/configuration/configuration/

3. Deploy monitoring probes

Prometheus is different from Zabbix. Prometheus mainly adopts the mode of active pulling, and reads monitoring data through the interface provided by Exporter. The exporter is responsible for collecting data. It can be understood as a probe, and provides an interface for the Server to call and read the data through http. The reader can google the meaning of the fields in the returned result provided by each exporter not described in this article.

3.1 Deploy node_exporter

node_exporter is used to monitor the host's CPU, memory, disk, I/O and other information. The focus is on the data collection of the host system itself.

  1. Download node exporter and extract it

Log in to the host that needs to be monitored, from here download node exporter

Or run curl -O https://github.com/prometheus/node_exporter/releases/download/v1.2.0/node_exporter-1.2.0.linux-amd64.tar.gz

After the download is complete, run the following command to unzip the binary package

tar xvfz node_exporter-1.2.0.linux-amd64.tar.gz
sudo mkdir -p /data/node_exporter/
sudo mv node_exporter-1.2.0.linux-amd64/* /data/node_exporter/
  1. Create prometheus user
sudo groupadd prometheus
sudo useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus
sudo chown prometheus.prometheus -R /data/node_exporter/
  1. Create Systemd service

Add and edit files

sudo nano /etc/systemd/system/node_exporter.service

write the following

[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/data/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
  1. Start node exporter with systemctl
    Start and see if the service is normal
sudo systemctl start node_exporter
sudo systemctl status node_exporter

should return text similar to the following

● node_exporter.service - node_exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since Three 2019-06-05 09:18:56 GMT; 3s ago
 Main PID: 11050 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─11050 /usr/local/prometheus/node_exporter/node_exporter

Set boot up: sudo systemctl enable node_exporter

  1. Enable firewall whitelist
    Execute curl localhost:9100, if you can see the returned webpage, it means that node exporter has been started successfully.
    Execute curl http://$ip:9100/ on other machines in the same network segment, and you should be able to see the returned page as well.

If you can't see the returned page, you can check whether the firewall port is not open.

sudo firewall-cmd --zone=public --add-port=9100/tcp --permanent
sudo firewall-cmd --reload
  1. Configure Prometheus
    Log in to the Prometheus server and edit the following files
    nano /data/docker/prometheus/job/host.json, the content is as follows, the ip address changes the actual ip address by itself
[
  {
    "targets": [ "192.168.1.100:9100"],
    "labels": {
      "subject": "node_exporter",
      "hostname": "server1"
    }
  },
  {
    "targets": [ "192.168.1.101:9100"],
    "labels": {
      "subject": "node_exporter",
      "hostname": "server2"
    }
  }
]
  1. Deploying node_exporter can refer to the documentation
    https://github.com/prometheus/node_exporter
    https://prometheus.io/docs/guides/node-exporter/
    https://www.jianshu.com/p/7bec152d1a1f

3.2 Deploy mysqld-exporter

mysqld-exporter is used to monitor MySQL database performance and other data.

  1. Log in to the host where the mysql database is located and start it through docker
docker run -d \
  -p 9104:9104 \
  --link mysql  \
  --name mysqld-exporter \
  --restart on-failure:5 \
  -e DATA_SOURCE_NAME="root:pwdpwdpwdpwdpwd@(mysql:3306)/" \
  prom/mysqld-exporter:v0.13.0

After startup, visit http://127.0.0.1:9104/metrics, you can see the monitoring information, and it should also be accessible from the Prometheus server.

  1. Deploy mysqld-exporter for reference
    https://github.com/prometheus/mysqld_exporter
    https://registry.hub.docker.com/r/prom/mysqld-exporter/

3.3 Deploy cadvisor

cadvisor is used to monitor the status of containers.

  1. Log in to the host where docker is located and start cadvisor by running the following script
docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=9101:8080 \
  --detach=true \
  --name=cadvisor \
  --restart on-failure:5 \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:v0.38.6

You may find two cadvisors, one is gcr.io/cadvisor/cadvisor, the other is google/cadvisor, it is recommended to use gcr.io/cadvisor/cadvisor

  1. Configure the Prometheus server
    Log in to the host where the Prometheus server is located, and edit the nano /data/docker/prometheus/job/cadvisor.json file. The content is as follows:
[
  {
    "targets": [ "192.168.1.100:9101"],
    "labels": {
      "subject": "cadvisor",
      "hostname": "server1"
    }
  },
  {
    "targets": [ "192.168.1.101:9101"],
    "labels": {
      "subject": "cadvisor",
      "hostname": "server2"
    }
  }
]
  1. If there is a firewall on the host where the docker is located, remember to add a firewall whitelist
sudo firewall-cmd --zone=public --add-port=9101/tcp --permanent
sudo firewall-cmd --reload
  1. Deploying cadvisor can refer to the documentation
    https://github.com/google/cadvisor

3.4 Deploy blackbox_exporter

blackbox_exporter is a tool for monitoring in black box mode

  1. Create configuration file
    Log in to the Prometheus server host and execute the following command
sudo mkdir -p /data/docker/blackbox/conf
sudo chown -R myusername:myusername /data/docker/blackbox

and add edit the file

nano /data/docker/blackbox/conf/blackbox.yml

The sample yml file is as follows:

modules:
  ping:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      preferred_ip_protocol: "ip4" # defaults to "ip4"
      ip_protocol_fallback: false  # no fallback to "ip6"
  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
      preferred_ip_protocol: "ip4"
  http_post_2xx_json:
    prober: http
    timeout: 30s
    http:
      preferred_ip_protocol: "ip4"
      method: POST
      headers:
        Content-Type: application/json
      body: '{"key1":""vlaue1,"params":{"param2":"vlaue2"}}'
  http_basic_auth:
    prober: http
    timeout: 60s
    http:
      method: POST
      headers:
        Host: "login.example.com"
      basic_auth:
        username: "username"
        password: "mysecret"

  tls_connect:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
  tcp_connect:
    prober: tcp
    timeout: 5s

  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true

  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
      - send: SSH-2.0-blackbox-ssh-check

  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"

  dns_udp:
    prober: dns
    timeout: 10s
    dns:
      transport_protocol: udp
      preferred_ip_protocol: ip4
      query_name: "www.example.cn"
      query_type: "A"
  1. Configure Prometheus
    Continue on the Prometheus server host and execute the following command
sudo mkdir -p /data/docker/prometheus/job/blackbox/

sudo mkdir -pv /data/docker/prometheus/job/blackbox/{dns,http_2xx,ping,ssh_banner,tcp}
sudo chown -R myusername:myusername /data/docker/prometheus/job/blackbox/

Next, create a json file in the corresponding folder under /data/docker/prometheus/job/blackbox/ and refer to the sample to write the configuration

In the dns folder, create dns.json, the sample is as follows

[
  {
    "targets": [ "192.168.1.1"],
    "labels": {
      "subject": "blackbox_dns",
      "app": "my_dns"
    }
  }
]

In the http_2xx folder, create search-site.json, the sample is as follows

[
  {
    "targets": [ "https://www.google.cn/?HealthCheck"],
    "labels": {
      "app": "google",
      "subject": "blackbox_http_2xx",
      "hostname": "server-01"
    }
  },
  {
    "targets": [ "https://cn.bing.com/?HealthCheck"],
    "labels": {
      "app": "bing",
      "subject": "blackbox_http_2xx",
      "hostname": "server-02"
    }
  }
]

Under the ping folder, create search-site.json, the sample is as follows

[
  {
    "targets": [ "www.google.cn"],
    "labels": {
      "app": "google",
      "subject": "blackbox_ping",
      "hostname": "server-01"
    }
  },
  {
    "targets": [ "cn.bing.com"],
    "labels": {
      "app": "bing",
      "subject": "blackbox_ping",
      "hostname": "server-02"
    }
  }
]

In the ssh_banner folder, create ssh-banner.json, the sample is as follows

[
  {
    "targets": [ "192.168.1.100:22"],
    "labels": {
      "subject": "blackbox_ssh_banner",
      "hostname": "server-01"
    }
  },
  {
    "targets": [ "192.168.1.101:22"],
    "labels": {
      "subject": "blackbox_ssh_banner",
      "hostname": "server-02"
    }
  }
]

In the tcp folder, create tcp.json, the sample is as follows

[
  {
    "targets": [ "$ip:3306"],
    "labels": {
      "app": "mysql.example.cn",
      "subject": "blackbox_tcp",
      "hostname": "mysql"
    }
  }
]
  1. run blackbox_exporter

On the host where the Prometheus server is located, run the following command to start blackbox_exporter using the container

docker run -d \
  --restart on-failure:5 \
  -p 9115:9115 \
  -v /data/docker/blackbox/conf/blackbox.yml:/config/blackbox.yml:ro \
  --name blackbox_exporter \
  prom/blackbox-exporter:v0.19.0 \
  --config.file=/config/blackbox.yml

After the startup is successful, visit http://$ip:9090/targets, you can see the data fed back by all the probes we have configured so far, and the State should be UP.

  1. Deploying blackbox_exporter can refer to the documentation
    https://github.com/prometheus/blackbox_exporter
    https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/install_blackbox_exporter

4. Deploy Grafana

Next, deploy the visualization tool Grafana. Grafana can quickly integrate Prometheus, and quickly turn the collection results into graphical pages by setting or even using ready-made templates.

4.1 Startup

Run the following commands to prepare for startup

sudo mkdir -p /data/docker/grafana
sudo chown 472:472 /data/docker/grafana -R

run grafana via docker

docker run -d \
  -p 3000:3000 \
  --name=grafana \
  -v /data/docker/grafana:/var/lib/grafana \
  -v /etc/localtime:/etc/localtime:ro \
  --restart=always \
  --name grafana \
  grafana/grafana:8.0.6

After the startup is successful, you can access the page through http://$ip:3000, the default account password: admin / admin .

4.2 Configuration

  1. Configure the data source
    Click "Configuration -> Data sources", enter http://$ip:3000/datasources, add the Prometheus data source, and configure it.

  2. Configure Dashboards

Click "Dashboards -> Manage -> import" to enter http://$ip:3000/dashboard/import, import the Grafana Dashboards template, in Import via grafana.com, fill in the template id you want to import, the commonly used template id is as follows:

  • node exporter ID: 8919
  • Cadvisor ID: 14282
  • mysqld-exporter ID: 7362

You can also search for Dashboards templates yourself at https://grafana.com/grafana/dashboards. You can also create dashboard panels yourself.

4.3 Deploying Grafana can refer to the documentation

https://grafana.com/docs/grafana/latest/installation/docker/

5. Deploy AlertManager

Up to now, we have deployed Prometheus Server, Exporter, and Grafana visualization components. We also need to configure the alarm component. When a fault occurs, the monitoring system can notify the receiver in various ways, so that the receiver can know and deal with it in time. However, Prometheus itself does not come with an alarm tool. Prometheus can send information to the AlertManager through pre-configured rules, and the AlertManager handles the alarm information in a unified manner, and informs the recipient of the alarm through email, SMS, WeChat, DingTalk, etc. Like Grafana, AlertManager also not only supports Prometheus, but also supports integrated processing of information from other programs.

5.1 Preparations

run the following command

sudo mkdir -pv /data/docker/alertmanager
sudo chown -R myusername:myusername /data/docker/alertmanager/
cd /data/docker/alertmanager

In the /data/docker/alertmanager folder, create the alertmanager.yml and email.tmpl files,

The example of alertmanager.yml is as follows, pay attention to setting the smtp related configuration items and the ddurl of the webhook:

global:
  resolve_timeout: 5m
  # Mail SMTP configuration
  smtp_smarthost: 'smtp.gmail.com:465'
  smtp_from: 'example@gmail.com'
  smtp_auth_username: 'example@gmail.com'
  smtp_auth_password: 'xxxxx'
  smtp_require_tls: false
# Customize notification templates
templates:
  - '/etc/alertmanager/email.tmpl'
# route is used to set the distribution strategy of the alarm
route:
  # Which label to use as group by
  group_by: ['alertname']
  # Group alarm wait time. That is, wait for 10s after the alarm is generated, and if there is an alarm in the same group, it will be sent together
  group_wait: 10s
  # Interval time between two groups of alarms
  group_interval: 10s
  # Interval time between repeated alerts to reduce the frequency of sending the same emails
  repeat_interval: 1h
  # Set default recipient
  receiver: 'myreceiver'
  routes:   # You can specify which groups take over which messages
  - receiver: 'myreceiver'
    continue: true
    group_wait: 10s
receivers:
- name: 'myreceiver'
#send_resolved: true
  email_configs:
  # - to: 'example@gmail.com, example2@gmail.com'
  - to: 'example@gmail.com'
    html: '{{ template "email.to.html" . }}'
    headers: { Subject: "Prometheus [Warning] alarm mail" }
  # DingTalk configuration
  webhook_configs:
  - url: 'http://$ip:18080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxx'

The example of email.tmpl is as follows, note that there is a "2006-01-02 15:04:05" in the example, this time cannot be changed, otherwise the alarm display time may be incorrect:

{{ define "email.to.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{ range .Alerts }}
=========start==========<br>
alert procedure: prometheus_alert <br>
Alarm level: {{ .Labels.severity }} <br>
Alert Type: {{ .Labels.alertname }} <br>
Alarm application: {{ .Labels.app }} <br>
Alarm host: {{ .Labels.instance }} <br>
Alert topic: {{ .Annotations.summary }}  <br>
Alarm details: {{ .Annotations.description }} <br>
Trigger time: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}

{{- if gt (len .Alerts.Resolved) 0 -}}
{{ range .Alerts }}
=========start==========<br>
alert procedure: prometheus_alert <br>
Alarm level: {{ .Labels.severity }} <br>
Alert Type: {{ .Labels.alertname }} <br>
Alarm application: {{ .Labels.app }} <br>
Alarm host: {{ .Labels.instance }} <br>
Alert topic: {{ .Annotations.summary }} <br>
Alarm details: {{ .Annotations.description }} <br>
Trigger time: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
Recovery Time: {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} <br>
=========end==========<br>
{{ end }}{{ end -}}

{{- end }}

These two configuration files can be modified by referring to https://prometheus.io/docs/alerting/latest/configuration/.

5.1 Start AlertManager

run the following command

docker run -d -p 9093:9093 \
  -v /data/docker/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro \
  -v /data/docker/alertmanager/email.tmpl/:/etc/alertmanager/email.tmpl:ro \
  --name alertmanager \
  --restart=always \
  prom/alertmanager:v0.22.2

5.3 Access

After successful startup, the alertmanager component can be accessed through http://$ip:9093

6. Deploy PrometheusAlert

As mentioned in the previous section, Prometheus alarms need to be composed of two parts. In the previous section, we have deployed AlertManager for information processing and notification. In this section, we need to define the configuration rules of Prometheus, so that Prometheus can generate alarm information and send it to AlertManager .

6.1 Preparations

run the following command

sudo mkdir -p /data/docker/prometheus-alert/conf
sudo chown -R fenixadar:fenixadar /data/docker/prometheus-alert/
nano /data/docker/prometheus-alert/conf/app.conf

Download the file from https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/conf/app-example.conf and move to /data/docker/prometheus-alert/conf/app.conf

6.2 Startup

Run the following command to enable prometheus-alert

docker run -d --publish=18080:8080 \
  -v /data/docker/prometheus-alert/conf/:/app/conf:ro \
  -v /data/docker/prometheus-alert/db/:/app/db \
  -v /data/docker/prometheus-alert/log/:/app/logs \
  --name prometheusalert-center \
  feiyu563/prometheus-alert:v-4.5.0

After the opening is successful, access the prometheus-alert interface through http://$ip:18080. User passwords are already set in app.conf.

If the system has a firewall enabled, remember to open the whitelist

sudo firewall-cmd --zone=public --add-port=18080/tcp --permanent
sudo firewall-cmd --reload

6.3 Configuration

  1. Configure an alarm template

Click AlertTemplate to enter http://$ip:18080/template, where there are templates for various third-party systems that can be connected.
Taking Dingding's alarm template as an example, the content of the template is changed to the following, mainly to correct the problem that the time display is slow by 8 hours, and to add some information

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
{{if eq $v.status "resolved"}}
## [Prometheus recovery information]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### Alarm level: {{$v.labels.level}}
###### Start time: {{GetCSTtime $v.startsAt}}
###### End time: {{GetCSTtime $v.endsAt}}
###### Failure hostname: {{$v.labels.hostname}}
###### Faulty host IP: {{$v.labels.instance}}
###### Faulting app: {{$v.labels.app}}
###### Failure host object: {{$v.labels.subject}}
##### {{$v.annotations.description}}
![Prometheus](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)
{{else}}
## [Prometheus warning message]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### Alarm level: {{$v.labels.level}}
###### Start time: {{GetCSTtime $v.startsAt}}
###### Failure hostname: {{$v.labels.hostname}}
###### Faulty host IP: {{$v.labels.instance}}
###### Faulting app: {{$v.labels.app}}
###### Failure host object: {{$v.labels.subject}}
##### {{$v.annotations.description}}
![Prometheus](https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png)
{{end}}
{{ end }}
  1. Setting up the DingTalk robot
    In the pins, create a new pin group, click "Group Settings->Smart Group Assistant->Add Robot->Customize->Security Settings", add the IP address of the server that sent the message, and then there will be a Webhook address. Please refer to https://blog.csdn.net/knight_zhou/article/details/105583741

6.4 Deploy PrometheusAlert Reference Documents

https://github.com/feiyu563/PrometheusAlert/blob/master/doc/readme/install.md

Tags: Operation & Maintenance Docker server Container

Posted by erth on Fri, 23 Sep 2022 22:41:10 +0530