Prometheus监控之Blackbox probe

Prometheus 的探针监控可以在应用程序的外部对应用程序进行探测,比如:探测机器的 http 服务是否工作正常等。 这里就看看怎样使用 Prometheus 的 Blackbox Exporter 来实现这个功能。

Prometheus 的 Blackbox Exporter 允许通过 HTTP/HTTPS,TCP 和 ICMP 等来探测端点。

upload successful

NOTE: 安装和启动请自行百度,或者参考Github。

Github地址: https://github.com/prometheus/blackbox_exporter

配置Blackbox

1
2
3
4
5
6
7
8
9
modules:
http_oss_probe:
prober: http
timeout: 10s
http:
preferred_ip_protocol: "ip4"
method: GET
valid_status_codes: [200,403]

启动blackbox,默认会占用9115端口

upload successful

配置Prometheus

  • 新建一个http_oss_probes.json文件,配置targets和labels:
1
2
3
4
5
6
7
8
9
[
{
"targets": [
"http://s3.test.com",
"https://s3.test.com"
],
"labels": {"region":"oss-test-1","cluster":"test1"}
}
]
  • prometheus.yml增加job,引入刚配置的json文件:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
- job_name: 'oss_probe'
scrape_interval: 10s
metrics_path: /probe
params:
module: [http_oss_probe]
file_sd_configs:
- files:
- 'http_oss_probes.json'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.42.61:9115
  • 重启prometheus

Alert

  • 新增alert rule:
1
2
3
4
5
6
7
8
9
10
11
- alert: OSS_domain_probe_failed
expr: probe_success{job="oss_probe"} == 0
for: 10s
labels:
env: test
level: emergency
expr: probe_success{group="oss_probe"} == 0
annotations:
description: 'probe failed. cluster: {{ $labels.cluster }}, instance: {{ $labels.instance }}'
value: '{{ $value }}'
summary: OSS domain probe failed.
  • 停掉服务, 又tm收到报警:

upload successful