当前位置: 代码迷 >> 综合 >> Alertmanager 实现钉钉告警
  详细解决方案

Alertmanager 实现钉钉告警

热度:68   发布时间:2023-09-30 11:29:32.0

打开电脑版钉钉创建机器人  

 

WebHook 接收器


上面我们配置的是 AlertManager 自带的邮件报警模板,我们也说了 AlertManager 支持很多中报警接收器,比如 slack、微信之类的,其中最为灵活的方式当然是使用 webhook 了,我们可以定义一个 webhook 来接收报警信息,然后在 webhook 里面去进行处理,需要发送怎样的报警信息我们自定义就可以,下面的 JSON 数据就是 AlertManager 将报警信息 POST 给 webhook 的数据:

{"receiver": "webhook","status": "firing","alerts": [{"status": "firing","labels": {"alertname": "NodeMemoryUsage","beta_kubernetes_io_arch": "amd64","beta_kubernetes_io_os": "linux","instance": "node1","job": "nodes","kubernetes_io_arch": "amd64","kubernetes_io_hostname": "node1","kubernetes_io_os": "linux","team": "node"},"annotations": {"description": "node1: Memory usage is above 30% (current value is: 42.097619438581596)","summary": "node1: High Memory usage detected"},"startsAt": "2022-03-02T02:13:19.69Z","endsAt": "0001-01-01T00:00:00Z","generatorURL": "http://prometheus-649968556c-8p4tj:9090/graph?g0.expr=%28node_memory_MemTotal_bytes+-+%28node_memory_MemFree_bytes+%2B+node_memory_Buffers_bytes+%2B+node_memory_Cached_bytes%29%29+%2F+node_memory_MemTotal_bytes+%2A+100+%3E+30\u0026g0.tab=1","fingerprint": "8cc4749f998d64dd"}],"groupLabels": { "instance": "node1" },"commonLabels": {"alertname": "NodeMemoryUsage","beta_kubernetes_io_arch": "amd64","beta_kubernetes_io_os": "linux","instance": "node1","job": "nodes","kubernetes_io_arch": "amd64","kubernetes_io_hostname": "node1","kubernetes_io_os": "linux","team": "node"},"commonAnnotations": {"description": "node1: Memory usage is above 30% (current value is: 42.097619438581596)","summary": "node1: High Memory usage detected"},"externalURL": "http://alertmanager-5774d6f5f4-prdgr:9093","version": "4","groupKey": "{}/{team=\"node\"}:{instance=\"node1\"}","truncatedAlerts": 0
}

我这里实现了一个简单的 webhook 程序,代码仓库地址:https://github.com/cnych/promoter,该程序支持在消息通知中显示报警图表。

首先在钉钉群中选择创建一个自定义的机器人:

Alertmanager 实现钉钉告警

这里我们选择添加额外密钥的方式来验证机器人,其他两种方式可以忽略,需要记住该值,下面会使用:

Alertmanager 实现钉钉告警

创建完成后会提供一个 webhook 的地址,该地址会带一个 acess_token 的参数,该参数下面也会使用:

Alertmanager 实现钉钉告警

接下来我们需要将 webhook 服务部署到集群中,对应的资源清单如下:

# promoter.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: promoter-confnamespace: kube-mon
data:config.yaml: |global:prometheus_url: http://192.168.31.31:30104wechat_api_secret: <secret>  # 企业微信 secretwechat_api_corp_id: <corp_id>  # 企业微信 corp_iddingtalk_api_token: <token>  # 钉钉机器人 tokendingtalk_api_secret: <secret>  # 钉钉机器人 secrets3:access_key: <ak>secret_key: <sk>endpoint: oss-cn-beijing.aliyuncs.comregion: cn-beijingbucket: my-oss-testingreceivers:- name: test1wechat_configs:- agent_id: <agent_id>  to_user: "@all"message_type: markdowndingtalk_configs:- message_type: markdownat:isAtAll: true
---
apiVersion: apps/v1
kind: Deployment
metadata:name: promoternamespace: kube-monlabels:app: promoter
spec:selector:matchLabels:app: promotertemplate:metadata:labels:app: promoterspec:volumes:- name: promotercfgconfigMap:name: promoter-confcontainers:- name: promoterimage: cnych/promoter:mainimagePullPolicy: IfNotPresentargs:- "--config.file=/etc/promoter/config.yaml"ports:- containerPort: 8080volumeMounts:- mountPath: "/etc/promoter"name: promotercfg
---
apiVersion: v1
kind: Service
metadata:name: promoternamespace: kube-monlabels:app: promoter
spec:selector:app: promoterports:- port: 8080

 配置完成后,直接创建上面的资源对象即可:

? ? kubectl apply -f promoter.yaml
? ? kubectl get pods -n kube-mon -l app=promoter
NAME                        READY   STATUS    RESTARTS      AGE
promoter-67c5795c4c-7mlvq   1/1     Running   3 (34m ago)   3d16h

部署成功后,现在我们就可以给 AlertManager 配置一个 webhook 了,在上面的配置中增加一个路由接收器。

  routes:- receiver: webhookgroup_wait: 10sgroup_by: ['instance']match:team: node
receivers:
- name: 'webhook'webhook_configs:- url: 'http://promoter:8080/test1/send'send_resolved: true

我们这里配置了一个名为 webhook 的接收器,地址为:http://promoter:8080/test1/send,这个地址当然就是上面我们部署的钉钉的 webhook 的接收程序的 Service 地址。

然后我们可以更新 AlertManager 和 Prometheus 的 ConfigMap 资源对象,更新完成后,隔一会儿执行 reload 操作是更新生效,如果有报警触发的话,隔一会儿关于这个节点文件系统的报警就会被触发了,由于这个报警信息包含一个team=node 的 label 标签,所以会被路由到 webhook 这个接收器中,也就是上面我们自定义的这个 webhook,触发后可以观察这个 Pod 的日志:

? ? kubectl logs -f promoter-5dbd47798c-bnjqm -n kube-mon
ts=2022-03-07T01:38:08.001Z caller=main.go:58 level=info msg="Staring Promoter" version="(version=0.2.3, branch=HEAD, revision=0a9cf8fc9bd55d1d2d47d181867135914927c2fc)"
ts=2022-03-07T01:38:08.001Z caller=main.go:59 level=info build_context="(go=go1.17.8, user=root@91adc4eacff7, date=20220305-05:40:54)"
ts=2022-03-07T01:38:08.001Z caller=main.go:127 level=info component=configuration msg="Loading configuration file" file=/etc/promoter/config.yaml
ts=2022-03-07T01:38:08.002Z caller=main.go:138 level=info component=configuration msg="Completed loading of configuration file" file=/etc/promoter/config.yaml
ts=2022-03-07T01:38:08.003Z caller=main.go:88 level=info msg=Listening address=:808

可以看到 POST 请求已经成功了,同时这个时候正常来说就可以收到一条钉钉消息了:

Alertmanager 实现钉钉告警

 

 

1.创建钉钉机器人


打开电脑版钉钉,创建一个群,创建自定义机器人,按如下步骤创建

怎么添加自定义机器人?-钉钉帮助中心您好,群主/群成员可以在电脑端通过如下的路径添加自定义机器人:步骤一:【电脑钉钉 】-【群聊】-【群设置】-【智能群助手】-【添加更多】-【添加机器人】-【自定义】-【添加】,编辑机器人名称和选择添加的群组。完成必要的安全设置(至少选择一种Alertmanager 实现钉钉告警https://www.dingtalk.com/qidian/help-detail-20781541.html

我创建的机器人如下:

群设置-->智能群助手-->添加机器人-->自定义-->添加

Alertmanager 实现钉钉告警

Alertmanager 实现钉钉告警

机器人名称:kubernetes

接收群组:钉钉报警测试

安全设置:

自定义关键词:cluster1

Alertmanager 实现钉钉告警

上面配置好之后点击完成即可,这样就会创建一个kubernrtrs的报警机器人,创建机器人成功之后怎么查看webhook,按如下:

https://oapi.dingtalk.com/robot/send?access_token=b2d16db54d6fbd69230f080867d41345e6a883cdb8b929505642483216434f41

点击智能群助手,可以看到刚才创建的kubernetes这个机器人,点击kubernetes,就会进入到kubernetes机器人的设置界面

出现如下内容:

机器人名称:kubernetes

接受群组:钉钉报警测试

消息推送:开启

webhook:https://oapi.dingtalk.com/robot/send?access_token=b2d16db54d6fbd69230f080867d41345e6a883cdb8b929505642483216434f41

安全设置:

自定义关键词:cluster1 

Alertmanager 实现钉钉告警

Alertmanager 实现钉钉告警

以上都是在钉钉上面操作完成

2.安装钉钉的webhook插件,在k8s的master1节点操作



tar zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz压缩包所在的百度网盘地址如下:

链接:https://pan.baidu.com/s/1_HtVZsItq2KsYvOlkIP9DQ

提取码:d59o

 cd prometheus-webhook-dingtalk-0.3.0.linux-amd64

启动钉钉报警插件(关键词不要忘记了,这是上面已经创建好了的 profile="cluster1)

nohup ./prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="cluster1=https://oapi.dingtalk.com/robot/send?access_token=b2d16db54d6fbd69230f080867d41345e6a883cdb8b929505642483216434f41" &
[root@master prometheus-webhook-dingtalk-0.3.0.linux-amd64]# tail -f nohup.out 
level=info ts=2021-11-23T11:47:41.638047928Z caller=main.go:37 msg="Starting prometheus-webhook-dingtalk" version="(version=0.3.0, branch=HEAD, revision=4a7dee0be14073aba1ea2eed80acbb515564f664)"
level=info ts=2021-11-23T11:47:41.63812183Z caller=main.go:57 msg="Using default template"
level=info ts=2021-11-23T11:47:41.638150328Z caller=main.go:62 msg="Using following dingtalk profiles: map[cluster1:https://oapi.dingtalk.com/robot/send?access_token=b2d16db54d6fbd69230f080867d41345e6a883cdb8b929505642483216434f41]"
level=info ts=2021-11-23T11:47:41.638256519Z caller=main.go:83 msg="Listening on address" address=0.0.0.0:8060

 对原来的文件做备份
cp alertmanager-cm.yaml alertmanager-cm.yaml.bak

 3.alertManager对接webhook插件


Alertmanager 实现钉钉告警

Alertmanager 实现钉钉告警

[root@master ~]# kubectl get svc -n monitor
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
prometheus-webhook-dingtalk   NodePort    10.233.6.147    <none>        8060:30481/TCP      3m57s[root@master prometheus]# vim alertmanager-dingding-configmap.yaml receivers:- name: default-receiveremail_configs:- to: "xxx@163.com"- name: dingdingwebhook_configs:- url: http://prometheus-webhook-dingtalk.monitor:8060/dingtalk/dingding_ops/sendsend_resolved: true[root@master prometheus]# kubectl exec -it dns-test sh/ # nslookup prometheus-webhook-dingtalk.monitor
Server:    169.254.25.10
Address 1: 169.254.25.10Name:      prometheus-webhook-dingtalk.monitor
Address 1: 10.233.6.147 prometheus-webhook-dingtalk.monitor.svc.cluster.local