Prometheus进程监控–process_exporter

Github: https://github.com/ncabatoff/process-exporter

或者下载最新的release(当前是v0.7.5):

1
2
3
4
$ wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.5/process-exporter-0.7.5.linux-amd64.tar.gz

$ tar zvxf process-exporter-0.7.5.linux-amd64.tar.gz
$ cp process-exporter-0.7.5.linux-amd64/process-exporter /usr/bin/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
process_names:

- name: "{{.Matches}}"
cmdline:
- '/usr/bin/yig'

- name: "{{.Matches}}"
cmdline:
- '/usr/bin/caddy'

- name: "{{.Matches}}"
cmdline:
- '/usr/bin/meepo'

- name: "{{.Matches}}"
cmdline:
- 'kafka.Kafka'

- name: "{{.Matches}}"
cmdline:
- 'org.apache.zookeeper.server.quorum.QuorumPeerMain'
  • 编写systemctl的service文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ vim /usr/lib/systemd/system/process_exporter.service

[Unit]
Description=Prometheus exporter for processors metrics, written in Go with pluggable metric collectors.
Documentation=https://github.com/ncabatoff/process-exporter
After=network.target

[Service]
Type=simple
User=yig
ExecStart=/usr/bin/process-exporter -config.path=/etc/exporters/process-exporter.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target
  • 重新加载 systemctl daemon
1
2
3
systemctl daemon-reload
systemctl start process_exporter
systemctl enable process_exporter
  • 增加prometheus:
1
2
3
4
5
6
- job_name: 'process_exporter'
static_configs:
- targets:
- 10.0.42.61:9256
labels:
region: oss-test-1

Grafana

grafana -> Create -> import

添加id:249

Import后,调整Prometheus数据源即可

upload successful

Alert

  • 新增process.rules.yml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
groups:
- name: alert.rules
rules:
- alert: process_is_down # 进程挂了,报警
expr: sum (namedprocess_namegroup_states) by (groupname, instance, region) == 0
for: 30s
labels:
region: '{{ $labels.region }}'
env: 'test'
level: emergency
expr: sum (namedprocess_namegroup_states) by (groupname, instance, region) == 0
annotations:
description: 'process: {{ $labels.groupname }}, instance: {{ $labels.instance }}'
value: '{{ $value }}'
summary: process is down
  • 停掉meepo服务,又tm收到报警。。。

upload successful