一、环境
主机名 |
IP地址 |
系统 |
说明 |
localhost |
192.168.224.11 |
Centos7.6 |
docker安装的prometheus |
server2.com |
192.168.224.12 |
Centos7.6 |
blackbox_exporter版本0.23.0 |
1、环境搭建
docker安装
略
docker-compose安装
略
二、黑盒监控
1、白盒监控和黑盒监控
“白盒监控”–需要把对应的Exporter程序安装到被监控的目标主机上,从而实现对主机各种资源及其状态的数据采集工作。
但是由于某些情况下操作技术或其他原因,不是所有的Exporter都能部署到被监控的主机环境中,最典型的例子是监控全国网络质量的稳定性,通常的方法是使用ping操作,对选取的节点进行ICMP测试,此时不可能在他人应用环境中部署相关的Exporter程序。针对这样的应用的场景,Prometheus社区提供了黑盒解决方案,Blackbox Exporter无须安装在被监控的目标环境中,用户只需要将其安装在与Prometheus和被监控目标互通的环境中,通过HTTP、HTTPS、DNS、TCP、ICMP等方式对网络进行探测监控,还可以探测SSL证书过期时间。
blackbox_exporter
- Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集
2、二进制安装(二选一)
https://prometheus.io/download/
1 2 3 4 5 6 7 8
| wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.23.0/blackbox_exporter-0.23.0.linux-amd64.tar.gz
tar zxvf blackbox_exporter-0.23.0.linux-amd64.tar.gz
mkdir /usr/local/Prometheus -p
mv blackbox_exporter-0.23.0.linux-amd64 /usr/local/Prometheus/blackbox_exporter
|
创建用户
1
| useradd -M -s /usr/sbin/nologin prometheus
|
更改exporter文件夹权限
1
| chown prometheus:prometheus -R /usr/local/Prometheus
|
创建systemd
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| cat <<"EOF" >/etc/systemd/system/blackbox_exporter.service [Unit] Description=blackbox_exporter After=network.target
[Service] Type=simple User=prometheus Group=prometheus ExecStart=/usr/local/Prometheus/blackbox_exporter/blackbox_exporter \ --config.file "/usr/local/Prometheus/blackbox_exporter/blackbox.yml" \ --web.listen-address ":9115" Restart=on-failure
[Install] WantedBy=multi-user.target EOF
|
启动
1 2 3
| systemctl daemon-reload
systemctl start blackbox_exporter
|
加入到开机自启动
1
| systemctl enable blackbox_exporter
|
检查
1
| systemctl status blackbox_exporter
|
启动不了检查日志
1
| journalctl -u blackbox_exporter -f
|
2、docker安装(二选一)
创建配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| mkdir /data/blackbox_exporter/
cat >/data/blackbox_exporter/config.yml<<"EOF" modules: http_2xx: prober: http http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp pop3s_banner: prober: tcp tcp: query_response: - expect: "^+OK" tls: true tls_config: insecure_skip_verify: false ssh_banner: prober: tcp tcp: query_response: - expect: "^SSH-2.0-" - send: "SSH-2.0-blackbox-ssh-check" irc_banner: prober: tcp tcp: query_response: - send: "NICK prober" - send: "USER prober prober prober :prober" - expect: "PING :([^ ]+)" send: "PONG ${1}" - expect: "^:[^ ]+ 001" icmp: prober: icmp icmp_ttl5: prober: icmp EOF
|
cf代理状态妈非200
官网案例
1 2 3 4 5 6
| http_2xx: prober: http timeout: 5s http: method: GET preferred_ip_protocol: "ip4"
|
注意:使用preferred_ip_protocol: “ip4” 可以检测cf代理目标在 cloudflare 后面,状态码非200
docker直接运行
1
| sudo docker run -d --restart=always --name blackbox-exporter -p 9115:9115 -v /data/blackbox_exporter:/etc/blackbox_exporter prom/blackbox-exporter:v0.19.0 --config.file=/etc/blackbox_exporter/config.yml
|
docker-compose方式
为了方便省事,我mongodb用的管理员账号,生产不建议使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| cd /data/blackbox_exporter/
cat >docker-compose.yaml <<"EOF" version: '3.3' services: blackbox_exporter: image: prom/blackbox-exporter container_name: blackbox_exporter restart: always volumes: - /data/blackbox_exporter:/etc/blackbox_exporter ports: - 9115:9115 EOF
|
启动
检查
1 2 3
| docker ps 或: docker logs -f blackbox_exporter
|
3、参数解释
4、metrics地址
名称 |
地址 |
备注 |
blackbox_exporter |
http://192.168.224.12:9115/metrics |
|
5、Prometheus配置
配置prometheus去采集(拉取)blackbox_exporter的监控样本数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
| cd /data/docker-prometheus
cat >> prometheus/prometheus.yml <<"EOF"
#http配置 - job_name: "blackbox_http" metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - https://www.baidu.com - https://www.jd.com relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.224.12:9115
#tcp检查配置 - job_name: "blackbox_tcp" metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - 192.168.224.11:22 - 192.168.224.11:9090 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.224.12:9115
#icmp检查配置 ping - job_name: "blackbox_icmp" metrics_path: /probe params: module: [icmp] static_configs: - targets: - 192.168.224.11 - 192.168.224.12 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.224.12:9115 EOF
|
重新加载配置
1
| curl -X POST http://localhost:9090/-/reload
|
检查
1 2 3 4 5
| http://192.168.224.12:9115/probe?target=https://www.baidu.com&module=http_2xx
http://192.168.224.12:9115
http://192.168.224.11:9090/targets?search=
|
监控项
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| probe_
probe_success # 是否探测成功(取值 1、0 分别表示成功、失败) probe_duration_seconds # 探测的耗时
# 关于 DNS probe_dns_lookup_time_seconds # DNS 解析的耗时 probe_ip_protocol # IP 协议,取值为 4、6 probe_ip_addr_hash # IP 地址的哈希值,用于判断 IP 是否变化
# 关于 HTTP probe_http_status_code # HTTP 响应的状态码。如果发生重定向,则取决于最后一次响应 probe_http_content_length # HTTP 响应的 body 长度,单位 bytes probe_http_version # HTTP 响应的协议版本,比如 1.1 probe_http_ssl # HTTP 响应是否采用 SSL ,取值为 1、0 probe_ssl_earliest_cert_expiry # SSL 证书的过期时间,为 Unix 时间戳
|
触发器配置
Prometheus配置
1 2 3 4
| # 报警(触发器)配置 rule_files: - "alert.yml" - "rules/*.yml"
|
添加blackbox_exporter触发器(告警规则)
1
| cd /data/docker-prometheus
|
使用cat创建文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
| cat >> prometheus/rules/blackbox_exporter.yml <<"EOF" groups: - name: Blackbox rules: - alert: 黑盒子探测失败告警 expr: probe_success == 0 for: 1m labels: severity: critical annotations: summary: "黑盒子探测失败{{ $labels.instance }}" description: "黑盒子检测失败,当前值:{{ $value }}" - alert: 请求慢告警 expr: avg_over_time(probe_duration_seconds[1m]) > 1 for: 1m labels: severity: warning annotations: summary: "请求慢{{ $labels.instance }}" description: "请求时间超过1秒,值为:{{ $value }}" - alert: http状态码检测失败 expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 1m labels: severity: critical annotations: summary: "http状态码检测失败{{ $labels.instance }}" description: "HTTP状态码非 200-399,当前状态码为:{{ $value }}" - alert: ssl证书即将到期 expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 1m labels: severity: warning annotations: summary: "证书即将到期{{ $labels.instance }}" description: "SSL 证书在 30 天后到期,值:{{ $value }}"
- alert: ssl证书即将到期 expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3 for: 1m labels: severity: critical annotations: summary: "证书即将到期{{ $labels.instance }}" description: "SSL 证书在 3 天后到期,值:{{ $value }}"
- alert: ssl证书已过期 expr: probe_ssl_earliest_cert_expiry - time() <= 0 for: 1m labels: severity: critical annotations: summary: "证书已过期{{ $labels.instance }}" description: "SSL 证书已经过期,请确认是否在使用" EOF
|
检查:
1
| vim prometheus/rules/blackbox_exporter.yml
|
检查配置
1
| docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
|
重新加载配置
1
| curl -X POST http://localhost:9090/-/reload
|
检查
1 2 3 4
| http://192.168.224.11:9090/alerts?search= 或:
http://192.168.224.11:9090/rules
|
Dashboard
grafana上添加图行。图行展示黑盒监控数据
1
| https://grafana.com/grafana/dashboards/9965
|
问题
检测总耗时这个图行,名称显示异常。如下图:
解决:
检测总耗时这个图行点编辑—找到Options
–把Legend里面的值从_
修改为
,如下图
其他的图行也是类似的方法。