前两篇简单的介绍了一下prometheus的,本节原本是写node_exporter和cAdvisor的搭建,但网上教程很多,所以直接写整套环境的部署过程
一、架构
我们原来的系统架构是在AZURE上有两台虚机作为业务机,部署一个Load Balance,用户访问LB,LB根据特定规则将流量转发至内部的虚机VM1/VM2。并且VM1/VM2组成一个局域网,外界是无法直接访问的,只能通过LB跳转到VM1/VM2上
prometheus最优的部署方案是将prometheus节点部署到VM1/VM2的局域网内,对外暴露一个端口,或者在LB上设置NAT规则直接连接prometheus,这样做的优点是:
1、VM1/VM2/prometheus组成的局域网,外界无法访问;
2、默认情况下node_exporter和cAdvisor是http协议,避免了VM1/VM2上收集到的数据通过外网传输
但由于种种原因,我们的prometheus只能部署到外部,因此整个系统的架构如下图 :VM1/VM2上部署的node_exporter和cAdvisor对外暴露9091和8008端口(可自定义),由LB的NAT端口转发映射到LB上。然后再由prometheus分别去收集以下四个端口的监控数据。
二、各组件版本
Prometheus 2.9
Grafana 6.1.6
cAdvisor 0.17
node_exporter 0.17
stunnel 5.44
nginx 1.14
certbot 0.23
三、Load Balance上与prometheus相关的端口(设置的NAT入站规则)
19101端口连接VM1的9101
18008端口连接VM1的8008
29101端口连接VM2的9101
28008端口连接VM2的8008
四、部署过程
1、在AZURE上创建prometheus的虚机设置固定IP和域名
2、在prometheus server上安装docker
安装脚本如下:
sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg|sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install -y docker-ce sudo systemctl restart docker sudo docker images sudo groupadd docker sudo usermod -aG docker $USER
3、在prometheus server上安装prometheus
下载路径:
cd /usr/local/share/prometheus/
wget https://github.com/prometheus/prometheus/releases/download/v2.9.1/prometheus-2.9.1.linux-amd64.tar.gz
解压安装,并将prometheus加由systemd管理
sudo adduser prometheus sudo chown -R prometheus:prometheus /usr/local/share/prometheus/vim /etc/systemd/system/prometheus.service[Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target[Service] Restart=on-failure WorkingDirectory=/usr/local/share/prometheus/ ExecStart=/usr/local/share/prometheus/prometheus \--config.file=/usr/local/share/prometheus/prometheus.yml \--web.external-url=https://虚机域名 \--storage.tsdb.retention.time=30d[Install] WantedBy=multi-user.target
启动prometheus
sudo systemctl daemon-reload sudo systemctl start Prometheus
sudo systemctl enable Prometheus #设置为开机自启
sudo systemctl status prometheus
4、在prometheus server上安装nginx和certbot
sudo apt -y install nginx
certbot是一款免费生成tls的安全证书,安装脚本
sudo apt-get update sudo apt-get install software-properties-common sudo add-apt-repository universe sudo add-apt-repository ppa:certbot/certbot sudo apt-get update sudo apt-get install certbot python-certbot-nginx sudo certbot –nginx
根据提示输入邮箱、主机名等信息【主机名不能乱写】
生成证书的位置在 /etc/letsencrypt/live/主机名/
ertbot提供的证书有效期是90天,可以利用官方提供的命令定期重新生成证书,最后将其加入周期性计划任务中。
certbot renew --dry-run
配置nginx的配置文件
sudo vim /etc/nginx/nginx.conf#修改server配置 server {listen 443 ssl;server_name lkprometheusemu.southeastasia.cloudapp.azure.com;ssl_certificate /etc/letsencrypt/live/fullchain.pem; # managed by Certbotssl_certificate_key /etc/letsencrypt/live/privkey.pem; # managed by Certbotlocation / {root /var/www/html;index index.html;proxy_pass http://127.0.0.1:3000;}}
启动nginx
sudo nginx
5、在vm1/VM2上安装node_exporter
sudo wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz sudo tar -xf node_exporter-0.17.0.linux-amd64.tar.gz加入开机自启 vim /etc/systemd/system/nodeexporter.service [Unit] Description=Node Exporter After=network.target[Service] ExecStart=/usr/local/share/node_exporter/node_exporter --web.listen-address=127.0.0.1:9101[Install] WantedBy=multi-user.target启动并开机自启 sudo vim /etc/systemd/system/node_exporter.service sudo systemctl daemon-reload sudo systemctl start node_exporter.service sudo systemctl status node_exporter.service
对外暴露9101端口
6、在VM1/VM2上安装cAdvisor
切换到root用户执行
docker run -d \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ -p 8088:8080 \ --restart=always \ --name=cadvisor \ google/cadvisor
对外暴露8088端口
7、在VM1/VM2上安装stunnel
sudo apt install stunnel
启用stunnel
编辑 sudo vim /etc/default/stunnel4,将ENABLE改成1
8、在任意一个linux主机上创建自有证书
将生成的证书拷贝的VM1/VM2指定位置,我放在了/etc/stunnel/tls下
sudo mkdir /etc/stunnel/tls cd /etc/stunnel/tls sudo openssl genrsa -out key.pem 2048 #创建一个2048位的秘钥 sudo openssl req -new -x509 -key key.pem -out cert.pem -days 3650 -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=所在主机的主机名" sudo chmod 640 key.pem cert.pem private.pem
9、修改stunnel的配置
以下配置为vm1,在vm2上需将node_exporter1、cAdvisor1改成node_exporter2、cADVisor2
sudo vim /etc/stunnel/stunnel.confpid = /var/run/stunnel4/stunnel.pid output = /var/log/stunnel4/stunnel.log[node_exporter1] accept = 9101 connect = 127.0.0.1:9100 cert = /etc/stunnel/tls/cert.pem key = /etc/stunnel/tls/key.pem[cAdvisor1] accept = 8008 connect = 127.0.0.1:8088 cert = /etc/stunnel/tls/cert.pem key = /etc/stunnel/tls/key.pem
重启stunnel服务
sudo systemctl restart stunnel4
10、在prometheus server上配置prometheus
sudo vim /usr/local/share/prometheus/prometheus.yml# my global config global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'static_configs:- targets: ['127.0.0.1:9090']- job_name: 'node_exporter1'static_configs:- targets: ['LBIP:19101']scheme: https tls_config:insecure_skip_verify: true- job_name: 'node_exporter2'static_configs:- targets: ['LBIP:29101']scheme: httpstls_config:insecure_skip_verify: true- job_name: 'cadvisor1'static_configs:- targets: ['LBIP:18008']scheme: httpstls_config:insecure_skip_verify: true- job_name: 'cadvisor2'static_configs:- targets: ['LBIP:28008']scheme: httpstls_config:insecure_skip_verify: true
检查prometheus的配置是否成功,切换到prometheus的安装目录下执行
promtool check rules prometheus.yml
11、在prometheus server上安装grafana
wget https://dl.grafana.com/oss/release/grafana_6.1.6_amd64.deb sudo dpkg -i grafana_6.1.6_amd64.deb sudo /bin/systemctl daemon-reload sudo systemctl start grafana-server.service sudo systemctl enable grafana-server.service
配置grafana的邮件功能
sudo vim /etc/grafana/grafana.ini
重启grafana
sudo systemctl start grafana-server.service
登陆grafana后添加prometheus的数据源
默认的用户名是 admin
默认密码是 admin
此时,检查prometheus是否连接正常。
其余步骤见下文