Prometheus监控nginx,redis,rabbitmq,mongodb,docker等

Prometheus监控nginx,redis,rabbitmq,mongodb,docker等。

一、环境介绍

主机名 ip地址 系统 说明
localhost 192.168.224.11 centos7.8 docker方式安装Prometheus。
server2.com 192.168.224.12 centos7.8 mongo版本4.2.5,rabbitmq版本3.7.15,redis版本5,nginx版本1.21.6,docker版本23.01

1、环境搭建

安装docker和docker-compose

a、创建nginx目录

1
2
    mkdir /data/nginx/conf.d -p
cd /data/nginx/conf.d

在/data/nginx/conf.d目录里面新增加nginx的配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat >>server.conf<< "EOF"
server {
listen 80;
server_name localhost;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}
EOF

检查

1
cat server.conf

b、docker-compose安装rabbitmq,nginx,mongo,redis

1
2
mkdir /data/docker-compose -p
cd /data/docker-compose

通过cat 创建docker-compose.yaml⽂件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
cat > docker-compose.yaml <<"EOF"
version: '3'
services:
redis:
image: redis:5
container_name: redis
command: redis-server --requirepass 123456 --maxmemory 512mb
restart: always
volumes:
- /data/redis/data:/data
ports:
- 6379:6379
nginx:
image: nginx:1.21.6
container_name: nginx
restart: always
volumes:
- /data/nginx/conf.d:/etc/nginx/conf.d
- /data/nginx/html:/usr/share/nginx/html
- /data/nginx/log:/var/log/nginx
ports:
- 80:80
rabbitmq:
image: rabbitmq:3.7.15-management
container_name: rabbitmq
restart: always
volumes:
- /data/rabbitmq/data:/var/lib/rabbitmq
- /data/rabbitmq/log:/var/log/rabbitmq
ports:
- 5672:5672
- 15672:15672
mongo:
image: mongo:4.2.5
container_name: mongo
restart: always
volumes:
- /data/mongo/db:/data/db
ports:
- 27017:27017
command: [--auth]
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: 123456
EOF

运行

1
docker-compose up -d

检查

1
docker ps -a

STATUS列全部为up为正常。

二、监控nginx

nginx开启stu_status

监控nginx需要with-http_stub_status_module

检查是否安装有with-http_stub_status_module模块

1
sdocker exec -it nginx nginx -V 2>&1 | grep -o with-http_stub_status_module

nginx开启stub_status配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cd /data/nginx/conf.d
vim server.conf

server {
....
location /stub_status {
stub_status on;
access_log off;
#allow nginx_export的ip;
allow 0.0.0.0/0;
deny all;
}
....
}

检查配置文件

1
docker exec -it nginx nginx -t

重新加载配置⽂件

1
docker exec -it nginx nginx -s reload

查看

1
curl http://192.168.224.12/stub_status

参数解释:

Active connections – 活动连接数

accepts – 接收请求数

handled – 成功处理请求数

requests – 总请求数

reding – 正在进⾏读操作的请求数

writing – 正在进⾏写操作的请求数

waiting – 正在等待的请求数

1、二进制安装nginx_exporter

1
nginx_exporter下载地址: https://github.com/nginxinc/nginx-prometheus-exporter/releases

下载⼆进制包解压并放⼊**/usr/local/Prometheus**⽬录

1
2
3
4
5
6
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz

mkdir /usr/local/Prometheus/nginx_exporter -p
tar xvf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz -C /usr/local/Prometheus/nginx_exporter

ls /usr/local/Prometheus/nginx_exporter/

创建⽤户

1
useradd -M -s /usr/sbin/nologin prometheus

更改exporter文件夹权限

1
chown prometheus:prometheus -R /usr/local/Prometheus

创建 systemd 服务

nginx_exporter.service

1
2
3
4
5
6
7
8
9
10
11
12
13
cat > /etc/systemd/system/nginx_exporter.service <<"EOF"
[Unit]
Description=nginx-prometheus-exporter
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/usr/local/Prometheus/nginx_exporter/nginx-prometheus-exporter -nginx.scrape-uri=http://192.168.224.12/stub_status
[Install]
WantedBy=multi-user.target
EOF

启动 nginx_exporter

1
2
3
4
systemctl daemon-reload
systemctl start nginx_exporter.service
systemctl enable nginx_exporter.service
systemctl status nginx_exporter.service

启动不了检查⽇志

1
journalctl -u nginx_exporter.service -f

2、docker安装nginx_exporter

docker-compose方式

1
cd /data/nginx/

通过cat创建⽂件

1
2
3
4
5
6
7
8
9
10
11
12
13
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
nginx_exporter:
image: nginx/nginx-prometheus-exporter:0.11
container_name: nginx_exporter
hostname: nginx_exporter
command:
- '-nginx.scrape-uri=http://192.168.224.12/stub_status'
restart: always
ports:
- "9113:9113"
EOF

启动

1
docker-compose up -d

检查

1
2
3
docker ps -a
或:
docker logs -f nginx_exporter

3、参数解释

Environment variable 命令⾏参数 description
-nginx.scrape-uri nginx stub_status 复制

4metrics地址

注:安装好Exporter后会暴露⼀个 http://ip:端⼝/metrics 的HTTP服务

名称 地址
nginx_exporter http://192.168.224.12:9113/metrics

5Prometheus配置

配置prometheus去采集(拉取)nginx_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus
#在scrape_configs(搜刮配置):下⾯增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'nginx_exporter'
static_configs:
- targets: ['192.168.224.12:9113']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

p91Ykw9.png

6、常⽤的监控指标

1
2
3
4
5
6
7
nginx_connections_accepted 接收请求数
nginx_connections_active 活动连接数
nginx_connections_handled 成功处理请求数
nginx_connections_reding 正在进⾏读操作的请求数
nginx_connections_waiting 正在等待的请求数
nginx_connections_writing 正在进⾏写操作的请求数
nginx_connections_requests 总请求数

p91Y1wd.png

7、添加触发器

1
cd /data/docker-prometheus

追加以下信息

1
2
3
4
5
6
7
8
9
10
11
12
13
cat >>prometheus/alert.yml <<"EOF"
- name: nginx
rules:
# 对任何实例超过30秒无法联系的情况发出警报
- alert: NginxDown
expr: nginx_up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "nginx异常,实例:{{ $labels.instance }}"
description: "{{ $labels.job }} nginx已关闭"
EOF

检查

1
vim prometheus/alert.yml

检查配置

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
2
3
http://192.168.224.11:9090/alerts?search=
或者
http://192.168.224.11:9090/rules

8dashboard

grafana展示prometheus从nginx_exporter收集到的的数据

1
https://grafana.com/grafana/dashboards/12708

直接导入模板即可。后面的效果

p91YWXF.png

三、监控redis

redis已经在上面通过docker-compose部署好了

1、二进制安装redis_exporter

redis_exporter下载地址

1
https://github.com/oliver006/redis_exporter/releases

下载⼆进制包解压并放⼊**/usr/local/Prometheus**⽬录

1
2
3
4
5
wget https://github.com/oliver006/redis_exporter/releases/download/v1.48.0/redis_exporter-v1.48.0.linux-amd64.tar.gz

tar xvf redis_exporter-v1.48.0.linux-amd64.tar.gz

mv redis_exporter-v1.48.0.linux-amd64 /usr/local/Prometheus/redis_exporter

创建⽤户(前面已经创建了就不用创建了)

1
useradd -M -s /usr/sbin/nologin prometheus

更改exporter⽂件夹权限

1
chown prometheus:prometheus -R /usr/local/Prometheus/

创建 systemd 服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > /etc/systemd/system/redis_exporter.service <<"EOF"
[Unit]
Description=Prometheus Redis Exporter
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/usr/local/Prometheus/redis_exporter/redis_exporter \
-redis.addr localhost:6379 \
-redis.password 123456
[Install]
WantedBy=multi-user.target
EOF

启动 redis_exporter

1
2
3
4
systemctl daemon-reload
systemctl start redis_exporter
systemctl enable redis_exporter
systemctl status nredis_exporter

启动不了检查⽇志

1
journalctl -u redis_exporter -f

2、docker安装redis_exporter

docker直接运行

1
docker run -d --restart=always --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr redis://192.168.224.12:6379 --redis.password '123456'

docker-compose⽅式

1
cd /data/redis/
1
2
3
4
5
6
7
8
9
10
11
12
13
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
redis_exporter:
image: oliver006/redis_exporter
container_name: redis_exporter
restart: always
environment:
REDIS_ADDR: "192.168.224.12:6379"
REDIS_PASSWORD: 123456
ports:
- "9121:9121"
EOF

启动

1
docker-compose up -d

检查

1
2
3
docker ps -a
或:
docker logs -f redis_exporter

3、参数解释

Environment variable description
REDIS_ADDR 192.168.224.12:6379 redis服务器地址,如:ip:6379
REDIS_PASSWORD 123456 redis服务器管理密码

4metrics地址

注:安装好Exporter后会暴露⼀个 http://ip:端⼝/metrics 的HTTP服务

名称 地址
redis_exporter http://192.168.224.12:9121/metrics

5Prometheus配置

配置prometheus去采集(拉取)redis_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus
#在scrape_configs(搜刮配置):下⾯增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'redis_exporter'
static_configs:
- targets: ['192.168.224.12:9121']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

6、常⽤的监控指标

1
2
3
4
5
6
7
8
9
10
11
12
redis_up # 服务器是否在线
redis_uptime_in_seconds # 运⾏时⻓,单位 s
rate(redis_cpu_sys_seconds_total[1m]) +
rate(redis_cpu_user_seconds_total[1m]) # 占⽤ CPU 核数
redis_memory_used_bytes # 占⽤内存量
redis_memory_max_bytes # 限制的最⼤内存,如果没限制则为 0
delta(redis_net_input_bytes_total[1m]) # ⽹络接收的 bytes
delta(redis_net_output_bytes_total[1m]) # ⽹络发送的 bytes
redis_connected_clients # 客户端连接数
redis_connected_clients / redis_config_maxclients # 连接数使⽤率
redis_rejected_connections_total # 拒绝的客户端连接数
redis_connected_slaves # slave 连接数

7、触发器配置

Prometheus配置

1
2
3
4
5
6
vim prometheus/prometheus.yml

# 报警(触发器)配置
rule_files:
- "alert.yml"
- "rules/*.yml"

创建新目录

1
mkdir prometheus/rules

redis触发器(告警规则)

因为是单机,所以未配置集群的触发器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
cat >> prometheus/rules/redis.yml <<"EOF"
groups:
- name: redis
rules:
- alert: RedisDown
expr: redis_up == 0
for: 0m
labels:
severity: critical
annotations:
summary: 'Redis Down,实例:{{ $labels.instance }}'
description: "Redis实例 is down"
- alert: RedisMissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 0m
labels:
severity: critical
annotations:
summary: "Redis备份丢失,实例:{{ $labels.instance }}"
description: "Redis 24⼩时未备份"
- alert: RedisOutOfConfiguredMaxmemory
expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
for: 2m
labels:
severity: warning
annotations:
summary: "Redis超出配置的最⼤内存,实例:{{ $labels.instance }}"
description: "Redis内存使⽤超过配置最⼤内存的90%"
- alert: RedisTooManyConnections
expr: redis_connected_clients > 100
for: 2m
labels:
severity: warning
annotations:
summary: "Redis连接数过多,实例:{{ $labels.instance }}"
description: "Redis当前连接数为: {{ $value }}"
- alert: RedisNotEnoughConnections
expr: redis_connected_clients < 1
for: 2m
labels:
severity: warning
annotations:
summary: "Redis没有⾜够的连接,实例:{{ $labels.instance }}"
description: "Redis当前连接数为: {{ $value }}"
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Redis有拒绝连接,实例:{{ $labels.instance }}"
description: "与Redis 的某些连接被拒绝{{ $value }}"
EOF

检查配置

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
http://192.168.224.11:9090/alerts?search=

8dashboard

grafana展示prometheus从redis_exporter收集到的的数据

1
2
3
4
Redis:
https://grafana.com/grafana/dashboards/11835
或者
https://grafana.com/grafana/dashboards/17507

图⾏展示问题

redis dashbord 内存使⽤图⾏展示有问题,如下图

p91gVk6.png

http://192.168.224.11:9090/graph

查询到 redis_memory_max_bytes 为0,因为redis配置参数 maxmemory 未设置,默认 0 。

解决

1
redis增加配置 maxmemory 的值

四、监控rabbitmq

rabbitmq已经在安装Nginx时,通过docker-compose部署好了

1、二进制安装rabbitmq_exporter

rabbit_exporter下载地址:

1
https://github.com/kbudde/rabbitmq_exporter/releases

下载⼆进制包解压并放⼊**/usr/local/Prometheus**⽬录

1
2
3
4
wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC19/rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz

mkdir /usr/local/Prometheus/rabbitmq_exporter
tar xzf rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz -C /usr/local/Prometheus/rabbitmq_exporter/

更改权限

1
chown prometheus.prometheus -R /usr/local/Prometheus/

创建systemd服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat > /etc/systemd/system/rabbitmq_exporter.service <<"EOF"
[Unit]
Description=Prometheus rabbitmq Exporter
After=network.target
[Service]
Environment=RABBIT_USER=guest
Environment=RABBIT_PASSWORD=guest
Environment=RABBIT_URL=http://localhost:15672
OUTPUT_FORMAT=JSON
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/usr/local/Prometheus/rabbitmq_exporter/rabbitmq_exporter
[Install]
WantedBy=multi-user.target
EOF

启动

1
2
3
4
systemctl daemon-reload
systemctl start rabbitmq_exporter
systemctl enable rabbitmq_exporter
systemctl status rabbitmq_exporter

启动不了检查日志

1
journalctl -u rabbitmq_exporter -f

2、docker安装rabbitmq_exporter

docker直接运行

1
docker run -d -p 9419:9419 --name rabbitmq_exporter -e RABBIT_URL=http://192.168.224.12:15672  -e RABBIT_USER=guest -e RABBIT_PASSWORD=guest kbudde/rabbitmq-exporter

docker-compose方式

1
cd /data/rabbitmq/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
rabbitmq_exporter:
image: kbudde/rabbitmq-exporter
container_name: rabbitmq_exporter
restart: always
environment:
RABBIT_URL: "http://192.168.224.12:15672"
RABBIT_USER: "guest"
RABBIT_PASSWORD: "guest"
PUBLISH_PORT: "9419"
OUTPUT_FORMAT: "JSON"
ports:
- "9419:9419"
EOF

启动

1
docker-compose up -d

检查

1
ocker ps -a

3、参数解释

Environment variable default description
RABBIT_URL http://127.0.0.1:15672 rabbitMQ管理插件的url(必须以http(s)开头)
RABBIT_USER guest rabbitMQ管理插件的用户名
RABBIT_PASSWORD guest rabbitMQ管理插件的密码
OUTPUT_FORMAT JOSN 输出格式
PUBLISH_PORT 9419 运行端口(监听端口)

4、metrics地址

注:安装好Exporter后会暴露⼀个 http://ip:端⼝/metrics 的HTTP服务

名称 地址
rabbitmq_exporter http://192.168.224.12:9419/metrics

5、Prometheus配置

配置prometheus去采集(拉取)rabbitmq_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus
#在scrape_configs(搜刮配置):下⾯增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'rabbitmq_exporter'
static_configs:
- targets: ['192.168.224.12:9419']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

6、常⽤的监控指标

1
2
3
4
5
6
7
8
9
10
11
12
rabbitmq_queue_messages_unacknowledged_global 队列中有未确认的消息总数(未被消费的消息)
rabbitmq_node_disk_free_limit 使用磁盘大小
rabbitmq_node_disk_free 磁盘总大小

rabbit_node_mem_used 使用内存大小
rabbit_node_mem_limit 内存总大小

rabbitmq_sockets_used 使用sockets数量
rabbitmq_sockets_available 可用的sockets总数

rabbitmq_fd_used 使用文件描述符的数量
rabbitmq_fd_available 可用的文件描述符总数

7、触发器配置

Rabbitmq触发器

也是单节点,未配置集群触发器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
cat >> prometheus/rules/rabbitmq.yml <<"EOF"
groups:
- name: Rabbitmq
rules:
- alert: RabbitMQDown
expr: rabbitmq_up != 1
labels:
severity: High
annotations:
summary: "Rabbitmq Down,实例:{{ $labels.instance }}"
description: "Rabbitmq_exporter连不上RabbitMQ! ! !"
- alert: RabbitMQ有未确认消息
expr: rabbitmq_queue_messages_unacknowledged_global > 0
for: 1m
labels:
severity: critical
annotations:
summary: "RabbitMQ有未确认消息,实例:{{ $labels.instance }}"
description: 'RabbitMQ未确认消息>0,当前值为:{{ $value }}'
- alert: RabbitMQ可用磁盘空间不足告警
expr: rabbitmq_node_disk_free_alarm != 0
#expr: rabbitmq_node_disk_free_limit / rabbitmq_node_disk_free *100 > 90
for: 0m
labels:
severity: critical
annotations:
summary: "RabbitMQ可用磁盘空间不足,实例:{{ $labels.instance }}"
description: "RabbitMQ可用磁盘空间不足,请检查"
- alert: RabbitMQ可用内存不足告警
expr: rabbitmq_node_mem_alarm != 0
#expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90
for: 0m
labels:
severity: critical
annotations:
summary: "RabbitMQ可用内存不足,实例:{{ $labels.instance }}"
description: "RabbitMQ可用内存不足,请检查"
- alert: RabbitMQ_socket连接数使用过高告警
expr: rabbitmq_sockets_used / rabbitmq_sockets_available * 100 > 60
for: 0m
labels:
severity: critical
annotations:
summary: "RabbitMQ_socket连接数使用过高,实例:{{ $labels.instance }}"
description: 'RabbitMQ_sockets使用>60%,当前值为:{{ $value }}'
- alert: RabbitMQ文件描述符使用过高告警
expr: rabbitmq_fd_used / rabbitmq_fd_available * 100 > 60
for: 0m
labels:
severity: critical
annotations:
summary: "RabbitMQ文件描述符使用过高,实例:{{ $labels.instance }}"
description: 'RabbitMQ文件描述符使用>60%,当前值为:{{ $value }}'

EOF

检查配置

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
http://192.168.224.11:9090/alerts?search=

8dashboard

grafana展示prometheus从rabbitmq_exporter收集到的的数据

id: 4279

1
https://grafana.com/grafana/dashboards/4279-rabbitmq-monitoring/

p914bCt.png

五、监控mongodb

1、创建监控⽤户

登陆mongodb创建监控⽤户,权限为“readAnyDatabase”,如果是cluster环境,需要有权限“clusterMonitor”

登录mongodb(docker安装的mongo)

1
docker exec -it mongo mongo admin

登录mongodb(yum和apt安装的mongo)

1
mongo admin

创建监控用户

1
2
3
4
5
6
7
8
> db.auth('root','123456')
1
> db.createUser({ user:'exporter',pwd:'password',roles:[ { role:'readAnyDatabase', db: 'admin'},{ role: "clusterMonitor", db: "admin" }]});
#测试 使⽤上⾯创建的⽤户信息进⾏连接。
> db.auth('exporter', 'password')
1
#表示成功
> exit

2、二进制安装mongodb_expoter

mongodb_exporter地址

1
2
3
https://github.com/percona/mongodb_exporter/releases
或者
https://github.com/prometheus/mysqld_exporter/releases

下载⼆进制包解压并放⼊**/usr/local/Prometheus**⽬录

1
2
3
4
5
wget https://github.com/percona/mongodb_exporter/releases/download/v0.37.0/mongodb_exporter-0.37.0.linux-amd64.tar.gz

tar xzf mongodb_exporter-0.37.0.linux-amd64.tar.gz

mv mongodb_exporter-0.37.0.linux-amd64 /usr/local/Prometheus/mongodb_exporter

更改exporter⽂件夹权限

1
chown prometheus.prometheus -R /usr/local/Prometheus/

创建 systemd 服务

mongodb_exporter.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat <<EOF >/usr/lib/systemd/system/mongodb_exporter.service
[Unit]
Description=mongodb_exporter
Documentation=https://github.com/percona/mongodb_exporter
After=network.target
[Service]
Type=simple
User=prometheus
Environment="MONGODB_URI=mongodb://exporter:password@localhost:27017/admin"
ExecStart=/usr/local/Prometheus/mongodb_exporter/mongodb_exporter --log.level=error --collect-all --compatible-mode
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

启动 mongodb_exporter

1
2
3
4
systemctl daemon-reload
systemctl start mongodb_exporter.service
systemctl enable mongodb_exporter.service
systemctl status mongodb_exporter.service

启动不了检查⽇志

1
journalctl -u mongodb_exporter.service -f

3、docker安装mongodb_exporter

docker直接运⾏

1
docker run -d --restart=always -p 9216:9216 -p 17001:17001 --restart=always --name=mongodb-exporter bitnami/mongodb-exporter:latest --collect-all --compatible-mode --mongodb.uri=mongodb://exporter:password@192.168.224.12:27017/admin?ssl=false

docker-compose⽅式

1
cd /data/mongo/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat >docker-compose.yaml <<EOF
version: '3.3'
services:
mongodb_exporter:
image: bitnami/mongodb-exporter:latest
container_name: mongodb_exporter
restart: always
environment:
MONGODB_URI: "mongodb://exporter:password@192.168.224.12:27017/admin?ssl=false"
command:
- '--collect-all'
- '--compatible-mode'
ports:
- "9216:9216"
EOF

启动

1
docker-compose up -d

检查

1
docker ps -a

4、参数解释

Flag 含义 案例
-h, –help 显示上下⽂相关的帮助
–[no-]compatible-mode 启⽤旧的 mongodb exporter 兼容指标
–[no-]discovering-mode 启⽤⾃动发现集合
–mongodb.collstats-colls 逗号分隔的databases.collections列表以获取 $collStats –mongodb.collstats-colls=db1,db2.col2
–mongodb.indexstats-colls 逗号分隔的databases.collections列表以获取$indexStats –mongodb.indexstats-colls=db1.col1,db2.col2
–[no-]mongodb.direct-connect 是否应该进⾏直接连接。如果指定了多个主机或使⽤了 SRVURI,则直接连接⽆效
–[no-]mongodb.global-connpool 使⽤全局连接池⽽不是为每个http请求创建新池
–mongodb.uri MongoDB 连接 URI($MONGODB_URI) –smongodb.uri=mongodb://user:pass@127.0.0.1:27017/admin?
–web.listen-address ⽤于侦听 Web 界⾯和遥测的地址 –web.listen-address=”:9216”
–web.telemetry-path 指标公开路径 –web.telemetry-path=”/metrics”
–web.config 具有⽤于基本身份验证的 Prometheus TLS配置的⽂件的路径 –web.config=STRING
–log.level 仅记录具有给定严重性或更⾼严重性的消息。有效级别:[调试、信息、警告、错误、致命] –log.level=”erro
–collector.diagnosticdata 启⽤从getDiagnosticData收集指标
–collector.replicasetstatus 启⽤从replSetGetStatus 收集指标
–collector.dbstats 启⽤从 dbStats 收集指标
–collector.topmetrics 启⽤从 top admincommand 收集指标
–collector.indexstats 启⽤从 $indexStats收集指标
–collector.collstats 启⽤从 $collStats 收集指标
–collect-all 启⽤所有收集器。与指定所有 –collector.相同
–collector.collstats-limit=0 如果有超过 个集合,请禁⽤ collstats、dbstats、topmetrics和 indexstats 收集器。0=⽆限制
--metrics.overridedescendingindex 启⽤降序索引名称覆盖以将 -1 替换为_DESC
–version 显示版本并退出

5metrics地址

注:安装好Exporter后会暴露⼀个 http://ip:端⼝/metrics 的HTTP服务

名称 地址
rabbitmq_exporter http://192.168.224.12:9216/metrics

6Prometheus配置

配置prometheus去采集(拉取)mongodb_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus

#在scrape_configs(搜刮配置):下⾯增加如下配置:
cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'mongodb_exporter'
static_configs:
- targets: ['192.168.224.12:9216']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

7、常⽤的监控指标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
mongodb_ss_connections{conn_type="available"} 可⽤的连接总数
mongodb_ss_mem_virtual
mongodb_ss_mem_resident

# 关于 server status
mongodb_up # 服务器是否在线
mongodb_ss_ok{cl_id="", cl_role="mongod", rs_state="0"} # 服务器是否正常运⾏,取值为 1、0 。标签中记录了 Cluster、ReplicaSet 的信息
mongodb_ss_uptime # 服务器的运⾏时⻓,单位为秒
mongodb_ss_connections{conn_type="current"} # 客户端连接数

# 关于主机
mongodb_sys_cpu_num_cpus # 主机的 CPU 核数

# 关于 collection
mongodb_collstats_storageStats_count{database="xx", collection="xx"} # collection 全部⽂档的数量
mongodb_collstats_storageStats_size # collection 全部⽂档的体积,单位 bytes
mongodb_collstats_storageStats_storageSize # collection 全部⽂档占⽤的磁盘空间,默认会压缩
delta(mongodb_collstats_latencyStats_reads_ops[1m]) # collection 读操作的数量(每分钟)
delta(mongodb_collstats_latencyStats_reads_latency[1m]) # collection 读操作的延迟(每分钟),单位为微秒
mongodb_collstats_latencyStats_write_ops
mongodb_collstats_latencyStats_write_latency

# 关于 index
mongodb_collstats_storageStats_nindexes # collection 的 index 数量
mongodb_collstats_storageStats_totalIndexSize # collection 的 index 占⽤的磁盘空间
delta(mongodb_indexstats_accesses_ops[1m]) # index 被访问次数

# 关于操作
delta(mongodb_ss_opcounters[1m]) # 执⾏各种操作的数量
delta(mongodb_ss_opLatencies_latency[1m]) # 执⾏各种操作的延迟,单位为微秒
delta(mongodb_ss_metrics_document[1m]) # 各种⽂档的变化数量

# 关于锁
delta(mongodb_ss_locks_acquireCount{lock_mode="w"}[1m]) # 新加锁的数量。R 表示共享锁,W 表示独占锁,r表示意向共享锁,w 表示意向独占锁
mongodb_ss_globalLock_currentQueue{count_type="total"} # 被锁阻塞的操作数

8、触发器配置

mongodb触发器(告警规则)

因mongo单点,所以未配置复制触发器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat >> prometheus/rules/mongodb.yml <<"EOF"
groups:
- name: PerconaMongodbExporter
rules:
- alert: MongodbDown
expr: 'mongodb_up == 0'
for: 0m
labels:
severity: critical
annotations:
summary: "MongoDB Down 容器: $labels.instance"
description: "MongoDB 容器 is down, 当前值:{{ $value }}"

- alert: MongodbNumberCursorsOpen
expr: 'mongodb_ss_metrics_cursor_open{csr_type="total"} > 10 * 1000'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 数字有标打开告警 容器: $labels.instance"
description: "MongoDB 为客户端打开的游标过多 > 10k, 当前值:{{ $value }}"

- alert: MongodbCursorsTimeouts
expr: 'increase(mongodb_ss_metrics_cursor_timedOut[1m]) > 100'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 游标超时 容器: $labels.instance"
description: "太多游标超时, 当前值:{{ $value }}"

- alert: MongodbTooManyConnections
expr: 'avg by(instance) (rate(mongodb_ss_connections{conn_type="current"}[1m])) / avg by(instance) (sum (mongodb_ss_connections) by (instance)) * 100 > 80'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 太多连接 容器: $labels.instance"
description: "MongoDB 连接数 > 80%, 当前值:{{ $value }}"
- alert: MongodbVirtualMemoryUsage
expr: '(sum(mongodb_ss_mem_virtual) BY (instance) / sum(mongodb_ss_mem_resident) BY (instance)) > 3'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB虚拟内存使用告警 容器: $labels.instance"
description: "虚拟内存使用过高, 当前值:{{ $value }}"
EOF

检查配置

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
http://192.168.224.11:9090/alerts?search=

8dashboard

grafana展示prometheus从mongodb_exporter收集到的的数据

1
2
3
4
https://github.com/percona/grafana-dashboards/tree/main/dashboards/MongoDB

下载这个MongoDB_Instances_Overview.json文件
导入到grafana

p91viGD.png

看到有报缺少插件,polystat,可以下载插件

到设置里面的Plugins里面搜索polystat,然后安装

六、监控docker

为了能够获取到Docker容器的运⾏状态,⽤户可以通过Docker的stats命令获取到当前主机上运⾏容器的

统计信息,可以查看容器的CPU利⽤率、内存使用量、网络IO总量以及磁盘IO总量等信息。

1
docker stats

除了使⽤命令以外,⽤户还可以通过Docker提供的HTTP API查看容器详细的监控统计信息。

1、使⽤CAdvisor

CAdvisor是Google开源的⼀款⽤于展示和分析容器运⾏状态的可视化⼯具。通过在主机上运⾏CAdvisor

⽤户可以轻松的获取到当前主机上容器的运⾏统计信息,并以图表的形式向⽤户展示。

docker命令安装

1
2
3
4
5
6
7
8
9
docker run -d \
--restart=always \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--name=cadvisor \
google/cadvisor:latest

Docker-compose安装

1
2
mkdir /data/cadvisor
cd /data/cadvisors

通过cat新建docker-compose.yaml⽂件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > docker-compose.yaml <<"EOF"
version: '3.3'
services:
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- 8080:8080
EOF

启动

1
2
3
4
docker-compose up -d


docker ps

通过访问http://192.168.224.12:8080可以查看,当前主机上容器的运⾏状态,如下所示

访问http://192.168.224.12:8080/metrics即可获取到标准的Prometheus监控样本输出

2、Prometheus配置

配置prometheus去采集(拉取)cAdvisor的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus
#在scrape_configs(搜刮配置):下⾯增加如下配置:
cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'cadvisor'
scrape_interval: 15s
static_configs:
- targets: ['192.168.224.12:8080']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

启动完成后,可以在Prometheus UI中查看到当前所有的Target状态:

1
http://192.168.224.11:9090/targets?search=

3、常用监控指标

下⾯表格中列举了⼀些CAdvisor中获取到的典型监控指标:

1
container_
指标名称 类型 含义
container_cpu_load_average_10s gauge 过去10秒容器CPU的平均负载
container_cpu_usage_seconds_total counter 容器在每个CPU内核上的累积占⽤时间(单位:秒)
container_cpu_system_seconds_total counter System CPU累积占⽤时间(单位:秒)
container_cpu_user_seconds_total counter User CPU累积占⽤时间(单位:秒)
container_fs_usage_bytes gauge 容器中⽂件系统的使⽤量(单位:字节)
container_fs_limit_bytes gauge 容器可以使⽤的⽂件系统总量(单位:字节)
container_fs_reads_bytes_total counter 容器累积读取数据的总量(单位:字节)
container_fs_writes_bytes_total counter 容器累积写⼊数据的总量(单位:字节)
container_memory_max_usage_bytes gauge 容器的最⼤内存使⽤量(单位:字节)
container_memory_usage_bytes gauge 容器当前的内存使⽤量(单位:字节
container_spec_memory_limit_bytes gauge 容器的内存使⽤量限制
machine_memory_bytes gauge 当前主机的内存总量
container_network_receive_bytes_total counter 容器⽹络累积接收数据总量(单位:字节)
container_network_transmit_bytes_total counter 容器⽹络累积传输数据总量(单位:字

4、触发器配置

Prometheus配置

添加docker触发器(告警规则)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
cat >> prometheus/rules/docker.yml <<"EOF"
groups:
- name: DockerContainers
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 0m
labels:
severity: warning
annotations:
isummary: "Docker容器被杀死 容器: $labels.instance"
description: "{{ $value }}个容器消失了"
# This rule can be very noisy in dynamic infra with legitimate container start/stop/deployment.
- alert: ContainerAbsent
expr: absent(container_last_seen)
for: 5m
labels:
severity: warning
annotations:
summary: "无容器 容器: $labels.instance"
description: "5分钟检查容器不存在,值为:{{ $value }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total{name!=""}[3m])) BY (instance, name) * 100) > 300
for: 2m
labels:
severity: warning
annotations:
summary: "容器cpu使用率告警 容器: $labels.instance"
description: "容器cpu使用率超过300%,当前值为:{{ $value }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "容器内存使用率告警 容器: $labels.instance"
description: "容器内存使用率超过80%,当前值为:{{ $value }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current{name!=""}) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "容器存储io使用率告警 容器: $labels.instance"
description: "容器存储io使用率超过 80%,当前值为:{{ $value }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 2m
labels:
severity: warning
annotations:
summary: "容器限制告警 容器: $labels.instance"
description: "容器被限制,当前值为:{{ $value }}"
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

5、dashboard

grafana展示prometheus收集到的cadvisor的数据

1
2
3
https://grafana.com/grafana/dashboards/11600-docker-container/

id 11600

p9Yjrfx.png

七、监控mysql

使⽤docker-compose安装mysql,当然也可以⾃⾏安装

docker-compose安装mysql

1
2
mkdir /data/mysql -p
cd /data/mysql

通过cat创建docker-compose.yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat > docker-compose.yaml <<"EOF"
version: '3.1'
services:
db:
image: mysql
restart: always
container_name: mysql
environment:
TZ: Asia/Shanghai
LANG: en_US.UTF-8
MYSQL_ROOT_PASSWORD: 123456
command:
--default-authentication-plugin=mysql_native_password
--character-set-server=utf8mb4
--collation-server=utf8mb4_general_ci
--lower_case_table_names=1
--performance_schema=1
--sql-mode=""
--skip-log-bin
volumes:
#- /data/mysql/conf:/etc/mysql/conf.d #数据文件挂载
- /data/mysql/data:/var/lib/mysql #数据文件挂载
ports:
- 3306:3306
EOF

运⾏

1
docker-compose up -d

检查

1
docker ps -a

STATUS列全部为up 为正常

1、监控mysql

mysqld_exporter

1、创建⽤户

登录mysql

1
2
3
docker exec -it mysql mysql -uroot -p

输⼊密码:12345

创建

1
2
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'password' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

验证

1
docker exec -it mysql mysql -uexporter -p

2、⼆进制安装(⼆选⼀)

1
下载地址:https://github.com/prometheus/mysqld_exporter/releases

下载解压后移动**/usr/local/Prometheus**⽬录

1
2
3
4
5
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.freebsd-amd64.tar.gz

tar zxf mysqld_exporter-0.14.0.freebsd-amd64.tar.gz

mv mysqld_exporter-0.14.0.freebsd-amd64 /usr/local/Prometheus/mysqld_exporter

更改exporter⽂件夹权限

1
chown prometheus.prometheus -R /usr/local/Prometheus/

创建连接数据库的⽂件

1
2
3
4
5
6
7
cat >/usr/local/Prometheus/mysqld_exporter/.mysqld_exporter.cnf << "EOF"
[client]
user=exporter
password=password
host=192.168.224.12
port=3306
EOF

创建systemd服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
cat > /etc/systemd/system/mysqld_exporter.service << "EOF"
[Unit]
Description=Prometheus MySQL Exporter
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=always
ExecStart=/usr/local/Prometheus/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/usr/local/Prometheus/mysqld_exporter/.mysqld_exporter.cnf \
--collect.global_status \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.tablestats \
--collect.global_variables \
--collect.info_schema.innodb_metrics \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--collect.info_schema.tables \
--collect.perf_schema.tablelocks \
--collect.perf_schema.file_events \
--collect.perf_schema.eventswaits \
--collect.perf_schema.indexiowaits \
--collect.perf_schema.tableiowaits \
--collect.slave_status \
--web.listen-address=0.0.0.0:9104

[Install]
WantedBy=multi-user.target
EOF

启动 mysqld_exporter

1
2
3
4
5
6
7
systemctl daemon-reload
systemctl start mysqld_exporter
systemctl enable mysqld_exporter
systemctl status mysqld_exporter

启动不了检查⽇志
journalctl -u mysqld_exporter -f

3、docker安装

docker-compose运⾏

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mkdir /data/mysqld_exporter -p
cd /data/mysqld_exporter

cat >docker-compose.yaml<<"EOF"
version: '3.3'
services:
mysqld-exporter:
image: prom/mysqld-exporter
container_name: mysqld-exporter
restart: always
command:
- '--collect.info_schema.processlist'
- '--collect.info_schema.innodb_metrics'
- '--collect.info_schema.tablestats'
- '--collect.info_schema.tables'
- '--collect.info_schema.userstats'
- '--collect.engine_innodb_status'
environment:
- DATA_SOURCE_NAME=exporter:password@(192.168.224.12:3306)/
ports:
- 9104:9104
EOF

启动

1
2
3
4
docker-compose up -d

检查
docker ps -a

4、参数解释

Name MySQL Version Description
collect.auto_increment.columns 5.1 从 information_schema 收集 auto_increment 列和最⼤值.
collect.binlog_size 5.1 收集所有注册的binlog⽂件的当前⼤⼩
collect.engine_innodb_status 5.1 收集SHOW ENGINE INNODB STATUS
collect.engine_tokudb_status 5.6 收集SHOW ENGINE TOKUDB STATUS .
collect.global_status 5.1 收集SHOW GLOBAL STATUS(默认启⽤)
collect.global_variables 5.1 收集SHOW GLOBAL VARIABLES(默认启⽤)
collect.info_schema.clientstats 5.5 如果以 userstat=1 运⾏,设置为 true 以收集客户端统计信息
collect.info_schema.innodb_metrics 5.6 从 information_schema.innodb_metrics收集指标
collect.info_schema.innodb_tablespaces 5.7 从information_schema.innodb_sys_tablespaces收集指标
collect.info_schema.innodb_cmp 5.5 从information_schema.innodb_cmp收集 InnoDB 压缩表指标。
collect.info_schema.innodb_cmpmem 5.5 从information_schema.innodb_cmpmem缓冲池压缩指标。
collect.info_schema.processlist 5.1 从 information_schema.processlist 收集线程状态计数
collect.info_schema.processlist.min_time 5.1 线程必须处于要计算的每个状态的最短时间。 (默认值:0)
collect.info_schema.query_response_time 5.5 如果 query_response_time_stats 为 ON,则收集查询响应时间分布。
collect.info_schema.replica_host 5.6 从 information_schema.replica_host_status 收集指标。
collect.info_schema.tables 5.1 从information_schema.tables收集指标。
collect.info_schema.tables.databases 5.1 要为其收集表统计信息的数据库列表,或为所有
collect.info_schema.tablestats 5.1 如果以 userstat=1 运⾏,设置为 true 以收集表统计信息。
collect.info_schema.schemastats 5.1 如果以 userstat=1 运⾏,设置为 true 以收集架构统计信息
collect.info_schema.userstats 5.1 如果以 userstat=1 运⾏,设置为 true 以收集⽤户统计信息。
collect.mysql.user 5.5 从 mysql.user 表中收集数据
collect.perf_schema.eventsstatements 5.6 从 performance_schema.events_statements_summary_by_digest 收集指标。
collect.perf_schema.eventsstatements.digest_text_limit 5.6 规范化语句⽂本的最⼤⻓度。 (默认值:120)
collect.perf_schema.eventsstatements.limit 5.6 按响应时间限制事件语句摘要的数量。 (默认值:250)
collect.perf_schema.eventsstatements.timelimit 5.6 以秒为单位限制“last_seen”事件语句的存在时间。 (默认值:86400)
collect.perf_schema.eventsstatementssum 5.7 从 performance_schema.events_statements_summary_by_digest 汇总收集指标。
collect.perf_schema.eventswaits 5.5 从performance_schema.events_waits_summary_global_by_event_name收集指标
collect.perf_schema.file_events 5.6 从 performance_schema.file_summary_by_event_name 收集指标
collect.perf_schema.file_instances 5.5 从 performance_schema.file_summary_by_instance 收集指标。
collect.perf_schema.file_instances.remove_prefix 5.5 删除 performance_schema.file_summary_by_instance 中的路径前缀。
collect.perf_schema.indexiowaits 5.6 从 performance_schema.table_io_waits_summary_by_index_usage收集指标。
collect.perf_schema.memory_events 5.7 从 performance_schema.memory_summary_global_by_event_name收集指标。
collect.perf_schema.memory_events.remove_prefix 5.7 删除performance_schema.memory_summary_global_by_event_name 中的仪器前缀
collect.perf_schema.tableiowaits 5.6 从 performance_schema.table_io_waits_summary_by_table 收集指标
collect.perf_schema.tablelocks 5.6 从 performance_schema.table_lock_waits_summary_by_table 收集指标
collect.perf_schema.replication_group_members 5.7 从 performance_schema.replication_group_members 收集指标
collect.perf_schema.replication_group_member_stats 5.7 从 performance_schema.replication_group_member_stats 收集指标
collect.perf_schema.replication_applier_status_by_worker 5.7 从 performance_schema.replication_applier_status_by_worker 收集指标
collect.slave_status 5.1 从 SHOW SLAVE STATUS 收集(默认启⽤)
collect.slave_hosts 5.1 从 SHOW SLAVE HOSTS 收集
collect.heartbeat 5.1 从⼼跳收集
collect.heartbeat.database 5.1 从哪⾥收集⼼跳数据的数据库。 (默认:⼼跳)
collect.heartbeat.table 5.1 从哪⾥收集⼼跳数据的表。 (默认:⼼跳)
collect.heartbeat.utc 5.1 使⽤ UTC 作为当前服务器的时间戳(使⽤ –utc 调⽤ pt-heartbeat )。 (默认值:假)

5、Prometheus配置

配置prometheus去采集(拉取)mysql_exporter的监控样本数据

1
2
3
4
5
6
7
8
9
10
cd /data/docker-prometheus
#在scrape_configs(搜刮配置):下⾯增加如下配置:

cat >> prometheus/prometheus.yml << "EOF"
- job_name: 'mysqld_exporter'
static_configs:
- targets: ['192.168.224.12:9104']
labels:
instance: server2.com服务器
EOF

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

6、常用监控指标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql_up                                              # 服务器是否在线
mysql_global_status_uptime # 运行时长,单位 s
delta(mysql_global_status_bytes_received[1m]) # 网络接收的 bytes
delta(mysql_global_status_bytes_sent[1m]) # 网络发送的 bytes


mysql_global_status_threads_connected # 当前的客户端连接数
mysql_global_variables_max_connections # 允许的最大连接数
mysql_global_status_threads_running # 正在执行命令的客户端连接数,即非 sleep 状态
delta(mysql_global_status_aborted_connects[1m]) # 客户端建立连接失败的连接数,比如登录失败
delta(mysql_global_status_aborted_clients[1m]) # 客户端连接之后,未正常关闭的连接数


delta(mysql_global_status_commands_total{command="xx"}[1m]) > 0 # 每分钟各种命令的次数
delta(mysql_global_status_handlers_total{handler="xx"}[1m]) > 0 # 每分钟各种操作的次数
delta(mysql_global_status_handlers_total{handler="commit"}[1m]) > 0 # 每分钟 commit 的次数
delta(mysql_global_status_table_locks_immediate[1m]) # 请求获取锁,且立即获得的请求数
delta(mysql_global_status_table_locks_waited[1m]) # 请求获取锁,但需要等待的请求数。该值越少越好


delta(mysql_global_status_queries[1m]) # 每分钟的查询数
delta(mysql_global_status_slow_queries[1m]) # 慢查询数。如果未启用慢查询日志,则为 0
mysql_global_status_innodb_page_size # innodb 数据页的大小,单位 bytes
mysql_global_variables_innodb_buffer_pool_size # innodb_buffer_pool 的限制体积
mysql_global_status_buffer_pool_pages{state="data"} # 包含数据的数据页数,包括洁页、脏页
mysql_global_status_buffer_pool_dirty_pages # 脏页数

7、触发器配置

添加mysql触发器(告警规则)

1
cd /data/docker-prometheus

使⽤cat创建⽂件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
cat >> prometheus/rules/mysqld.yml <<"EOF"
groups:
- name: MySQL
rules:
- alert: MysqlDown
expr: mysql_up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "MySQL Down,实例:{{ $labels.instance }}"
description: "MySQL_exporter连不上MySQL了,当前状态为:{{ $value }}"
- alert: MysqlTooManyConnections
expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "Mysql连接数过多告警,实例:{{ $labels.instance }}"
description: "MySQL连接数>80%,当前值:{{ $value }}"
- alert: MysqlHighThreadsRunning
expr: max_over_time(mysql_global_status_threads_running[1m]) > 20
for: 2m
labels:
severity: warning
annotations:
summary: "Mysql运行的线程过多,实例:{{ $labels.instance }}"
description: "Mysql运行的线程 > 20,当前运行的线程:{{ $value }}"
- alert: MysqlSlowQueries
expr: increase(mysql_global_status_slow_queries[2m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "Mysql慢日志告警,实例:{{ $labels.instance }}"
description: "MySQL在过去2分钟有新的{{ $value }}条慢查询"
#MySQL innodb 日志写入停滞
- alert: MysqlInnodbLogWaits
expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10
for: 0m
labels:
severity: warning
annotations:
summary: "MySQL innodb日志等待,实例:{{ $labels.instance }}"
description: "MySQL innodb日志写入停滞,当前值: {{ $value }}"
- alert: MysqlRestarted
expr: mysql_global_status_uptime < 60
for: 0m
labels:
severity: info
annotations:
summary: "MySQL 重启,实例:{{ $labels.instance }}"
description: "不到一分钟前,MySQL重启过"
EOF

配置检查

1
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml

重新加载配置

1
curl -X POST http://localhost:9090/-/reload

检查

1
http://192.168.224.11:9090/alerts?search=

8、dashboard

grafana展示prometheus从mysql_exporter收集到的的数据

1
2
3
https://grafana.com/grafana/dashboards/7362

https://github.com/percona/grafana-dashboards/tree/main/dashboards/MySQL

p9txouT.png

Top Process States和 Process States图形修改为如下:

mysql_info_schema_threads 替换成:

mysql_info_schema_processlist_threads

p9txqUJ.png

数据库表监控

1
https://grafana.com/grafana/dashboards/9625

注意:2个图表没有数据,是因为只⽀持percona server 和 mariadb

p9txvgx.png

评论


:D 一言句子获取中...

加载中,最新评论有1分钟缓存...