网络知识 娱乐 Openstack架构下的日志链路追踪

Openstack架构下的日志链路追踪

我们需要解决什么问题?

1. 基于日志链路定位问题源头

当我们从上层平台发出一个请求后,由于用户不知道链路之间数据的传递关系,但是又想要快速定位问题出在什么地方,是云管平台,还是openstack,亦或者是操作系统层面,一个结构化的日志数据能够帮助我们快速定位问题。

2. 云管平台资源与Openstack资源的映射关系

用户一般通过云管平台或者API来向Openstack-API来申请资源调度,当云管平台向openstack发送一个HTTP请求后,Openstack会在回复的响应里添加一个request-id,使用这个request-id我们可以在日志中追踪到openstack的服务调度过程。

image-20210317162117715.png

3. 基于Timeline的日志比较判断问题源头

Openstack是一个复杂系统,一个api的调用失败可能是多个方面的问题,比如当我们在云管侧开通虚拟机失败时,日志可能出现在各个地方(nova-api,nova-compute, cinder-volume or neutron-vswitch等),但是我们需要知道的是哪个模块主要导致了该请求的失败。这是我们就需要使用timeline来横向比较同一时间段的各个模块的错误日志,从而判断出问题的源头。

image-20210222155010138.png

架构方案

filebeat->kafka->logstash->elasticsearch->kibana ->LogChainAnalysis

  1. filebeat:以容器的形式运行在控制计算节点采集数据。
  2. kafka: 随着环境规模的不断扩增,日志量不断增长,接入到日志服务的产品线不断增多,遇到流量高峰,写入到es的性能就会降低,cpu打满,随时都有集群宕机的风险。因此,接入消息队列,可以削峰填谷。
  3. Logstash:将无序的数据切割为结构化的数据。
  4. Elasticsearch:存储日志数据,并且提供索引提供给Kibana和LogChainAnalysis进行分析。
  5. Kibana:日志查看Dashboard。
  6. LogChainAnalysis:结构化数据的显示。

云管平台日志收集方案

云管日志是我们首先进行收集和处理的日志,我讲从这开始一步步结构整个链路。

云管平台一般会拥有大量的服务,而这些服务也会产生大量的日志,比如resource,identity,gateway等服务的日志。但是在这些日志中,只有资源服务会去调用openstack的接口去操作虚拟资源,所以我们需要对resource的日志进行二次处理。当云管平台向openstack发出一个request,openstack在认证通过后,会返回一个x-openstack-request-id和x-compute-request-id的响应报文。我们可以使用这个request-id去底层的各个日志中查询具体的日志信息。

而且在这个处理过的结构化日志中,我们同时可以查到云管平台所对应的资源ID(这里我标识为UUID),UUID和request-id就产生了一个对应关系,这也是我们LogChainAnalysis中重要的组成部分。

Openstack日志收集方案

openstack有大量日志,从各个日志间找到问题的源头是一件很麻烦的事。每个OpenStack服务发送一个带有HTTP响应的请求ID报头。这个值对于跟踪日志中的问题很有用。然而,当操作跨越服务边界时,跟踪就会变得困难,因为服务会为每个入站请求生成一个新的ID;nova的request-id不能帮助用户找到nova在完成对nova的请求时调用的其他服务的调试信息。当同时有许多请求时,这就变得尤其成问题。

request-id在请求处理开始时生成,这是用户在请求返回时将看到的ID。nova呼叫其他OpenStack服务(如glance)将以请求ID作为报头发送响应。通过记录两个请求ID (nova->glance)的映射,用户将能够轻松地在nova-compute日志中查找glance返回的请求ID。有了glance请求ID之后,用户就可以检查glance日志中与nova请求对应的调试信息。Openstack形成日志消息需要两个请求id: 一个由nova生成,另一个包含在来自另一个服务的响应中。nova生成的请求ID位于传递给python客户机包装器的上下文中。这个就是我后期开发LogChainAnalysis的思路。

部署环境

部署Elasticsearch

1. 环境准备

准备三台Linux系统,本教程使用的是CentOS7如下IP地址:

10.192.31.160
10.192.31.161
10.192.31.162

2. 下载Elasticsearch

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-linux-x86_64.tar.gz

3. 系统设置

Elasticsearch不能在 root 用户下启动,我们需要在三台机器上分别创建一个普通用户:

# 创建elastic用户
useradd elastic
# 设置用户密码
passwd elastic
# 切换到elastic用户
su elastic

分别在三台机器上的 /home/elastic/ 目录下创建elasticsearch文件夹,然后在elasticsearch文件夹下分别创建data、logs文件夹:

cd /home/elastic/
mkdir -p elasticsearch/data
mkdir -p elasticsearch/logs

在生产环境下我们要把Elasticsearch生成的索引文件数据存放到自定义的目录下

data:存储Elasticsearch索引文件数据

logs:存储日志文件

4. 配置Elasticsearch

首先我们将下载好的 elasticsearch-7.5.1-linux-x86_64.tar.gz 压缩包上传到 192.168.28.129 这台机器上的 /home/elastic/elasticsearch/ 目录下,随便那一台机器都可以没有顺序区分。

解压 elasticsearch-7.5.1-linux-x86_64.tar.gz

tar -xvf elasticsearch-7.5.1-linux-x86_64.tar.gz

5. 修改elasticsearch.yml

输入如下命令修改 elasticsearch.yml 配置文件:

vi elasticsearch-7.5.1/config/elasticsearch.yml

修改后的配置文件如下(以下是master节点的配置,其他2个节点主要配置对应的network.host参数,其他参数不用变):

主要修改如下几处配置:

  1. http.port:当前启动Elasticsearch的端口号,一般默认 `9200` 即可,当然你也可以修改
  2. network.host:Elasticsearch绑定的IP,外界可以通过这个IP访问到当前Elasticsearch节点,一般配配置当前系统的IP,或者 `0.0.0.0` (任何地址都能访问到)。
  3. discovery.seed_hosts:配置所有Elasticsearch节点绑定的IP地址。
  4. cluster.initial_master_nodes:配置那些节点可以有资格被选为主节点。
  5. xpack.monitoring.collection.enabled:收集监控数据默认为false不收集监控数据

6. 启动Elasticsearch

Elasticsearch可以从后台启动:./bin/elasticsearch -d 启动ES

分别在三台机器上启动Elasticsearch,启动过程中建议单个机器启动成功后在启动另一台。

7. 检查集群

上面我们已经搭建好了三个节点的集群,并且已经启动了。

接下来我们来检查一下集群是否已经形成,给三台服务器中的任意一台发送http请求:[http://10.192.31.160:9200/_cat/health?v](http://10.192.31.160:9200/_cat/health?v)

部署Kibana

容器化部署

docker pull kibana:7.10.1
docker run --name kibana -e ELASTICSEARCH_URL=http://10.192.31.160:9200 -p 5601:5601 -d kibana:7.10.1

部署kafka

1. 环境准备

java环境安装:Centos7默认yum安装java8:`yum install java -y`

kafka包下载: 下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/

2. kafka配置启动

先启动zookeeper,再启动kafka:

./kafka-server-start.sh ../config/server.properties

配置system服务启动:

vim /etc/systemd/system/zookeeper.service

3. 配置system服务启动

vim /etc/systemd/system/zookeeper.service

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/root/kafka/bin/zookeeper-server-start.sh /root/kafka/config/zookeeper.properties
ExecStop=/root/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
User=root
Group=root

[Install]
WantedBy=multi-user.target     

vim /etc/systemd/system/kafka.service

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
ExecStart=/root/kafka/bin/kafka-server-start.sh /root/kafka/config/server.properties
ExecStop=/root/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal


[Install]
WantedBy=multi-user.target

4. 启动服务

systemctl start zookeeper    #启动zookeepe
systemctl enable zookeeper    #开机自启动
systemctl start kafka    #启动kafka
systemctl enable kafka    #开机自启动

5. 验证kafka启动成功

查看topic是否产生数据

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic nova-api-log

部署Logstash

1. 使用yum进行安装

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Add the following in your /etc/yum.repos.d/ directory in a file with a .repo suffix, for example logstash.repo

[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

安装:sudo yum install logstash

2. 修改配置文件

cd /usr/share/logstash/
vim /etc/logstash/conf.d/logstash.conf

openstack-logstash.conf

input {
  kafka {
    bootstrap_servers => "10.192.31.163:9092"
    topics => ["nova-compute-log", "nova-api-log","nova-scheduler-log", "nova-conductor-log", "cinder-volume-log", "cinder-api-log","cinder-scheduler-log" , "keystone-log", "neutron-server-log", "openvswitch-agent-log",  "glance-api-log", "glance-registry-log"]
    group_id => "LogChainAnalysis"
    decorate_events => true
    auto_offset_reset => "latest"
    consumer_threads => 5
    codec => "json"
  }
}

filter{
  if [@metadata][kafka][topic] == "nova-compute-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }
     mutate {
         add_field => {"[@metadata][index]" => "nova-compute-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "nova-api-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }
     mutate {
         add_field => {"[@metadata][index]" => "nova-api-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "nova-scheduler-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }
     mutate {
         add_field => {"[@metadata][index]" => "nova-scheduler-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "nova-conductor-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }
     mutate {
         add_field => {"[@metadata][index]" => "nova-conductor-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  if [@metadata][kafka][topic] == "cinder-volume-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }
     mutate {
         add_field => {"[@metadata][index]" => "cinder-volume-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "cinder-api-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "cinder-api-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "cinder-scheduler-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "cinder-scheduler-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }

  if [@metadata][kafka][topic] == "keystone-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "keystone-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "neutron-server-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "neutron-server-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if [@metadata][kafka][topic] == "openvswitch-agent-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "openvswitch-agent-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
 
  if [@metadata][kafka][topic] == "glance-api-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "glance-api-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  

  if [@metadata][kafka][topic] == "glance-registry-log" {
     grok {
       match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
     }

     mutate {
         add_field => {"[@metadata][index]" => "glance-registry-log-%{+YYYY.MM.dd}"}
         add_field => {"vip" => "%{[fields][vip]}"}
     }
  }
  
  if ![request_id] { drop {} }
  mutate {
      remove_field => ["kafka"]
      remove_field => ["message"]
  }
}

output {
   stdout { }
   elasticsearch {
     hosts => ["http://10.192.31.160:9200", "http://10.192.31.161:9200", "http://10.192.31.162:9200"]
     index => "%{[@metadata][index]}"
     timeout => 300
   }
}

resource-logstash.conf

input {
  kafka {
    bootstrap_servers => "127.0.0.1:9092"
    topics => ["resource-log"]
    group_id => "LogChainAnalysis"
    decorate_events => true
    consumer_threads => 5
    auto_offset_reset => "latest"
    enable_auto_commit => true
    codec => "json"
  }
}

filter{
  if [@metadata][kafka][topic] == "resource-log" {
     if [message] =~ "tat" {
       grok {
         match => ["message", "^(tat)"]
         add_tag => ["stacktrace"]
       }
     }
     grok {
       match => [ "message",
                  "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{TIME})%{SPACE}*%{LOGLEVEL:level}%{SPACE}*%{NOTSPACE:module}[%{UUID:uuid}]-+(?<logmessage>.*)"
                ]
     }
     mutate {
         add_field => {"vip" => "%{[fields][vip]}"}
         add_field => {"[@metadata][index]" => "resource-log-%{+YYYY.MM.dd}"}
     }
     
  }
  
  mutate {
      remove_field => ["kafka"]
      remove_field => ["message"]
  }
}

output {
   stdout { }
   elasticsearch {
     hosts => ["http://10.192.31.160:9200", "http://10.192.31.161:9200", "http://10.192.31.162:9200"]
     index => "%{[@metadata][index]}"
     timeout => 300
   }
}

3. 启动logstash

systemctl enable logstash
systemctl start logstash

部署filebeat

  1. 拉取filebeat镜像 docker pull elastic/filebeat:7.10.2 https://www.elastic.co/guide/en/beats/filebeat/current/running-on-docker.html

2. filebeat配置文件 创建一个配置文件目录mkdir -p /data/filebeat 修改filebeat配置文件,filebeat配置文件/data/filebeat/filebeat.docker.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /log/haihe/resource/resource.log
  multiline:
    pattern: '^['
    negate:  true
    match:   after
  fields:
    log_topic: resource-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
    - /log/nova/nova-api.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: nova-api-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
    - /log/nova/nova-compute.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: nova-compute-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
    - /log/nova/nova-scheduler.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: nova-scheduler-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
    - /log/nova/nova-conductor.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: nova-conductor-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/cinder/api.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: cinder-api-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/cinder/volume.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: cinder-volume-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/cinder/scheduler.log
  exclude_files: ['.gz$']
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: cinder-scheduler-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/neutron/server.log
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: neutron-server-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/neutron/openvswitch-agent.log
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: openvswitch-agent-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/glance/api.log
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: glance-api-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/glance/registry.log
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: glance-registry-log
    vip: 172.118.32.30
- type: log
  enabled: true
  paths:
  - /log/keystone/keystone.log
  multiline:
    pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
    negate:  true
    match:   after
  fields:
    log_topic: keystone-log
    vip: 172.118.32.30
output.kafka:
  enabled: true
  hosts: ['10.192.31.163:9092']
  topic: '%{[fields.log_topic]}'
  codec.json:
    pretty: false
  partition.round_robin:
    reachable_only: false
  required_acks: 1
  compression: gzip

参数讲解:

1. multiline:多行日志合并
2. log_topic:传输给kafka的topic
3. vip:集群VIP,这个我们后面会用到

3. docker启动镜像

docker run -d 
  --restart=always 
  --log-driver json-file 
  --name=filebeat 
  --user=root 
  --volume="/data/filebeat/filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro" 
  --volume="/var/log/:/log/" 
  --volume="/var/lib/docker/containers:/var/lib/docker/containers:ro" 
  --volume="/var/run/docker.sock:/var/run/docker.sock:ro" 
  elastic/filebeat:7.10.2 filebeat -e -strict.perms=false  

部署LogChainAnalysis

从git上下载项目

git clone https://github.com/zelat/LogChainAnalysis
cd LogChainAnalysis
nohup python3 manage.py >/dev/null 2>&1 &

Logstash切割Springboot日志

我们首先查看springboot的logback的日志,下面为logback pattern日志打印格式:

?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{TIME})%{SPACE}*%{LOGLEVEL:level}%{SPACE}*%{NOTSPACE:module}[%{UUID:uuid}]-+(?<logmessage>.*

%-5level 表示,将输出从程序启动到创建日志记录的时间进行左对齐且最小宽度为5

在logstash中我们使用连续空格来判断:%{SPACE}* `匹配不确定数量的空格.

我们需要将日志切割为以下格式:

Name

Description

vip

集群VIP地址

@timestamp

时间戳

level

日志级别一般分为INFO、DEBUG、ERROR等

module

uuid

上层资源ID

logmessage

具体的消息内容

Logstash切割Openstack日志

分割openstack日志,获取vip,@timestamp,level,request_id,project_id,user_id,module,logmessage的数据

NAME

DESCRIPTION

vip

集群VIP地址

@timestamp

时间戳

level

日志级别一般分为INFO、DEBUG、ERROR等

request_id

openstack request id

project_id

openstack项目ID

user_id

openstack用户ID

module

模块名

logmessage

具体日志信息

数据汇入LogChainAnalysis

1. LogChainAnalysis系统

vip: 输入集群vip地址

UUID:云管平台资源ID

2. 得到日志链路

这里介绍下这个json文件是什么意思,云管侧UUID对应的底层request-id为req-d9e461b1-860e-4b50-9d5a-55b66371032a,它同时存在于nova-api,nova-compute,nova-conductor,nova-scheduler的组件日志中,同时nova-compute还调用了一些其他组件的服务比如cider-api,neutron-server,cinder-volume,这些组件会将它的request-id回复给nova-compute。

3. 查看具体日志

点击具体的日志名,我们可以得到具体的日志信息。

Demo

视频内容

遇到的问题

1. filebeat一直提示 [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer 原因:可能是无法连接到kafka,需要修改kafka的server.properties,ip为kafka所在的机器内网ip advertised.listeners=PLAINTEXT://192.168.1.142:9092

2. 日志量太大的问题 在logstash配置文件中增加以下字段,过滤掉不带request_id的日志 if ![request_id] { drop {} }

3. 将filebeat所在host.ip一同回传 在filebeat配置文件中间增加以下字段:

processors:
     - add_host_metadata: ~
     - drop_fields:
       fields: ["host.architecture", "host.containerized", "host.id", "host.os.name", "host.os.family", "host.os.version", "host.os.kernel"]

4. elasticsearch 集群无法启动出现如下提示 failed to send join request to master https://blog.csdn.net/diyiday/article/details/83926488

5. 解决es集群启动完成后报master_not_discovered_exception https://yanglinwei.blog.csdn.net/article/details/105274464

参考文档

  1. Elasticsearch集群搭建(基于Elasticsearch7.5.1)
  2. centos7快速部署单机kafka
  3. Kafka常用topic操作命令汇总
  4. Configure the Kafka output
  5. filebeat采集数据的几个痛点的解决方案
  6. Logstash-filebeat6.5多行日志合并
  7. Logstash消费kafka同步数据到Elasticsearch
  8. Service map
  9. Kv filter plugin
  10. ELK - logstash 多个配置文件及模板的使用
  11. Manage Spring Boot Logs with Elasticsearch, Logstash and Kibana
  12. Monitoring Java applications with Elastic: Multiservice traces and correlated logs