Prometheus的黑盒监控ssl证书监控

一、环境

主机名 IP地址 系统 说明
localhost 192.168.224.11 Centos7.6 docker安装的prometheus
server2.com 192.168.224.12 Centos7.6 blackbox_exporter版本0.23.0

1、环境搭建

docker安装

docker-compose安装

二、黑盒监控

1、白盒监控和黑盒监控

“白盒监控”–需要把对应的Exporter程序安装到被监控的目标主机上,从而实现对主机各种资源及其状态的数据采集工作。

但是由于某些情况下操作技术或其他原因,不是所有的Exporter都能部署到被监控的主机环境中,最典型的例子是监控全国网络质量的稳定性,通常的方法是使用ping操作,对选取的节点进行ICMP测试,此时不可能在他人应用环境中部署相关的Exporter程序。针对这样的应用的场景,Prometheus社区提供了黑盒解决方案,Blackbox Exporter无须安装在被监控的目标环境中,用户只需要将其安装在与Prometheus和被监控目标互通的环境中,通过HTTP、HTTPS、DNS、TCP、ICMP等方式对网络进行探测监控,还可以探测SSL证书过期时间。

blackbox_exporter

  • Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集

2、二进制安装(二选一)

https://prometheus.io/download/

1
2
3
4
5
6
7
8
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.23.0/blackbox_exporter-0.23.0.linux-amd64.tar.gz

tar zxvf blackbox_exporter-0.23.0.linux-amd64.tar.gz

mkdir /usr/local/Prometheus -p


mv blackbox_exporter-0.23.0.linux-amd64 /usr/local/Prometheus/blackbox_exporter

创建用户

1
useradd -M -s /usr/sbin/nologin prometheus

更改exporter文件夹权限

1
chown prometheus:prometheus -R /usr/local/Prometheus

创建systemd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat <<"EOF" >/etc/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/Prometheus/blackbox_exporter/blackbox_exporter \
--config.file "/usr/local/Prometheus/blackbox_exporter/blackbox.yml" \
--web.listen-address ":9115"
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

启动

1
2
3
systemctl daemon-reload

systemctl start blackbox_exporter

加入到开机自启动

1
systemctl enable blackbox_exporter

检查

1
systemctl status blackbox_exporter

启动不了检查日志

1
journalctl -u blackbox_exporter -f

2、docker安装(二选一)

创建配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
mkdir /data/blackbox_exporter/

cat >/data/blackbox_exporter/config.yml<<"EOF"
modules:
http_2xx:
prober: http
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ttl5:
prober: icmp
EOF

cf代理状态妈非200

官网案例

1
2
3
4
5
6
http_2xx:
prober: http
timeout: 5s
http:
method: GET
preferred_ip_protocol: "ip4"

注意:使用preferred_ip_protocol: “ip4” 可以检测cf代理目标在 cloudflare 后面,状态码非200

docker直接运行

1
sudo docker run -d --restart=always --name blackbox-exporter -p 9115:9115  -v /data/blackbox_exporter:/etc/blackbox_exporter prom/blackbox-exporter:v0.19.0 --config.file=/etc/blackbox_exporter/config.yml

docker-compose方式

为了方便省事,我mongodb用的管理员账号,生产不建议使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cd /data/blackbox_exporter/

cat >docker-compose.yaml <<"EOF"
version: '3.3'
services:
blackbox_exporter:
image: prom/blackbox-exporter
container_name: blackbox_exporter
restart: always
volumes:
- /data/blackbox_exporter:/etc/blackbox_exporter
ports:
- 9115:9115
EOF

启动

1
docker-compose up -d

检查

1
2
3
docker ps
或:
docker logs -f blackbox_exporter

3、参数解释

1
--config.file     #指定配置文件路径

4、metrics地址

名称 地址 备注
blackbox_exporter http://192.168.224.12:9115/metrics

5、Prometheus配置

配置prometheus去采集(拉取)blackbox_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
cd /data/docker-prometheus 

cat >> prometheus/prometheus.yml <<"EOF"

#http配置
- job_name: "blackbox_http"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://www.baidu.com
- https://www.jd.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.224.12:9115

#tcp检查配置
- job_name: "blackbox_tcp"
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- 192.168.224.11:22
- 192.168.224.11:9090
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.224.12:9115

#icmp检查配置 ping
- job_name: "blackbox_icmp"
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- 192.168.224.11
- 192.168.224.12
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.224.12:9115
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
2
3
4
5
http://192.168.224.12:9115/probe?target=https://www.baidu.com&module=http_2xx

http://192.168.224.12:9115

http://192.168.224.11:9090/targets?search=

监控项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
probe_


probe_success # 是否探测成功(取值 1、0 分别表示成功、失败)
probe_duration_seconds # 探测的耗时

# 关于 DNS
probe_dns_lookup_time_seconds # DNS 解析的耗时
probe_ip_protocol # IP 协议,取值为 4、6
probe_ip_addr_hash # IP 地址的哈希值,用于判断 IP 是否变化

# 关于 HTTP
probe_http_status_code # HTTP 响应的状态码。如果发生重定向,则取决于最后一次响应
probe_http_content_length # HTTP 响应的 body 长度,单位 bytes
probe_http_version # HTTP 响应的协议版本,比如 1.1
probe_http_ssl # HTTP 响应是否采用 SSL ,取值为 1、0
probe_ssl_earliest_cert_expiry # SSL 证书的过期时间,为 Unix 时间戳

触发器配置

Prometheus配置

1
2
3
4
# 报警(触发器)配置
rule_files:
- "alert.yml"
- "rules/*.yml"

添加blackbox_exporter触发器(告警规则)

1
cd /data/docker-prometheus

使用cat创建文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
cat >> prometheus/rules/blackbox_exporter.yml <<"EOF"
groups:
- name: Blackbox
rules:
- alert: 黑盒子探测失败告警
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: "黑盒子探测失败{{ $labels.instance }}"
description: "黑盒子检测失败,当前值:{{ $value }}"
- alert: 请求慢告警
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: "请求慢{{ $labels.instance }}"
description: "请求时间超过1秒,值为:{{ $value }}"
- alert: http状态码检测失败
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
for: 1m
labels:
severity: critical
annotations:
summary: "http状态码检测失败{{ $labels.instance }}"
description: "HTTP状态码非 200-399,当前状态码为:{{ $value }}"
- alert: ssl证书即将到期
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 1m
labels:
severity: warning
annotations:
summary: "证书即将到期{{ $labels.instance }}"
description: "SSL 证书在 30 天后到期,值:{{ $value }}"

- alert: ssl证书即将到期
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3
for: 1m
labels:
severity: critical
annotations:
summary: "证书即将到期{{ $labels.instance }}"
description: "SSL 证书在 3 天后到期,值:{{ $value }}"

- alert: ssl证书已过期
expr: probe_ssl_earliest_cert_expiry - time() <= 0
for: 1m
labels:
severity: critical
annotations:
summary: "证书已过期{{ $labels.instance }}"
description: "SSL 证书已经过期,请确认是否在使用"
EOF

检查:

1
vim prometheus/rules/blackbox_exporter.yml

检查配置

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
2
3
4
http://192.168.224.11:9090/alerts?search=
或:

http://192.168.224.11:9090/rules

Dashboard

grafana上添加图行。图行展示黑盒监控数据

1
https://grafana.com/grafana/dashboards/9965

问题

检测总耗时这个图行,名称显示异常。如下图:

p9tvyk9.png

解决:

检测总耗时这个图行点编辑—找到Options–把Legend里面的值从_修改为 ,如下图

p9tvvnS.png

其他的图行也是类似的方法。

评论


:D 一言句子获取中...

加载中,最新评论有1分钟缓存...