当前位置: 代码迷 >> 综合 >> Prometheus+Node_exporter+Grafana+Alertmanager 监控部署(上)
  详细解决方案

Prometheus+Node_exporter+Grafana+Alertmanager 监控部署(上)

热度:42   发布时间:2023-12-12 18:21:49.0

一、Prometheus 安装及配置

1、下载及解压安装包

cd /usr/local/src/export VER="2.13.1"
wget https://github.com/prometheus/prometheus/releases/download/v${VER}/prometheus-${VER}.linux-amd64.tar.gzmkdir -p /data0/prometheus 
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheustar -xvf prometheus-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv prometheus-${VER}.linux-amd64 /data0/prometheus/prometheus_servercd /data0/prometheus/prometheus_server/
mkdir -p {data,config,logs,bin} 
mv prometheus promtool bin/
mv prometheus.yml config/chown -R prometheus.prometheus /data0/prometheus

2 、设置环境变量

vim /etc/profilePATH=/data0/prometheus/prometheus_server/bin:$PATH:$HOME/binsource /etc/profile

3、检查配置文件

promtool check config /data0/prometheus/prometheus_server/config/prometheus.ymlChecking /data0/prometheus/prometheus_server/config/prometheus.ymlSUCCESS: 0 rule files found

4、创建 prometheus.service 的 systemd unit 文件

  • 4.1、常规服务

sudo tee /etc/systemd/system/prometheus.service <<-'EOF'
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
Restart=on-failure[Install]
WantedBy=multi-user.target
EOFsystemctl enable prometheus.service
systemctl stop prometheus.service
systemctl restart prometheus.service
systemctl status prometheus.service
  • 4.2、使用 supervisor 管理 prometheus_server

yum install -y epel-release supervisorsudo tee /etc/supervisord.d/prometheus.ini<<-"EOF"
[program:prometheus]
# 启动程序的命令;
command = /data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
# 在supervisord启动的时候也自动启动;
autostart = true
# 程序异常退出后自动重启;
autorestart = true
# 启动5秒后没有异常退出,就当作已经正常启动了;
startsecs = 5
# 启动失败自动重试次数,默认是3;
startretries = 3
# 启动程序的用户;
user = prometheus
# 把stderr重定向到stdout,默认false;
redirect_stderr = true
# 标准日志输出;
stdout_logfile=/data0/prometheus/prometheus_server/logs/out-prometheus.log
# 错误日志输出;
stderr_logfile=/data0/prometheus/prometheus_server/logs/err-prometheus.log
# 标准日志文件大小,默认50MB;
stdout_logfile_maxbytes = 20MB
# 标准日志文件备份数;
stdout_logfile_backups = 20
EOFsystemctl daemon-reload
systemctl enable supervisord
systemctl stop supervisord
systemctl restart supervisord
supervisorctl restart prometheus
supervisorctl status

5、prometheus.yml 配置文件

#创建Alertmanager告警规则文件
mkdir -p /data0/prometheus/prometheus_server/rules/
touch /data0/prometheus/prometheus_server/rules/node_down.yml
touch /data0/prometheus/prometheus_server/rules/memory_over.yml
touch /data0/prometheus/prometheus_server/rules/disk_over.yml
touch /data0/prometheus/prometheus_server/rules/cpu_over.yml#prometheus配置文件
cat > /data0/prometheus/prometheus_server/config/prometheus.yml << \EOF
# my global config
global:scrape_interval: 15s # 设置抓取(pull)时间间隔,默认是1mevaluation_interval: 15s # 设置rules评估时间间隔,默认是1m# scrape_timeout is set to the global default (10s).# 告警管理配置,默认配置
alerting:alertmanagers:- static_configs:- targets:- 192.168.56.11:9093 # 这里修改为 alertmanagers 的地址# 加载rules,并根据设置的时间间隔定期评估
rule_files:
# - "first_rules.yml"
# - "second_rules.yml&