当前位置: 代码迷 >> 综合 >> Skywalking-APM -权威指南
  详细解决方案

Skywalking-APM -权威指南

热度:97   发布时间:2023-12-16 11:34:46.0

Skywalking-APM

-权威指南

 

 SkyWalking 6.x 架构图

 

 

 

 

服务端

运行条件

 

  1. Linux centos 7.x
  2. Jdk1.8

 

 

配置文件

 

alarm-settings.yml

 

rules:

  # Rule unique name, must be ended with `_rule`.

  endpoint_percent_rule:

    # Metrics value need to be long, double or int

    metrics-name: endpoint_percent

    threshold: 75

    op: <

    # The length of time to evaluate the metrics

    period: 10

    # How many times after the metrics match the condition, will trigger alarm

    count: 3

    # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.

    silence-period: 10

    

  service_percent_rule:

    metrics-name: service_percent

    # [Optional] Default, match all services in this metrics

    include-names:

      - service_a

      - service_b

    threshold: 85

    op: <

    period: 10

    count: 4

 

  1. Service average response time over 1s in last 3 minutes.
  2. Service success rate lower than 80% in last 2 minutes.
  3. Service 90% response time is over 1s in last 3 minutes
  4. Service Instance average response time over 1s in last 2 minutes.
  5. Endpoint average response time over 1s in last 2 minutes.

application.yml

 

receiver-register:
  default:
receiver-trace:
  default:
    bufferPath: ../trace-buffer/  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: 100 # Unit is MB
    bufferDataMaxFileSize: 500 # Unit is MB
    bufferFileCleanWhenRestart: false
    sampleRate: ${SW_TRACE_SAMPLE_RATE:1000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
receiver-jvm:
  default:
service-mesh:
  default:
    bufferPath: ../mesh-buffer/  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: 100 # Unit is MB
    bufferDataMaxFileSize: 500 # Unit is MB
    bufferFileCleanWhenRestart: false
istio-telemetry:
  default:
envoy-metric:
  default:
receiver_zipkin:
  default:
    host: 0.0.0.0
    port: 9411
    contextPath: /

 

TTL

In SkyWalking, there are two types of observability data, besides metadata.

  1. Record, including trace and alarm. Maybe log in the future.
  2. Metric, including such as p99/p95/p90/p75/p50, heatmap, success rate, cpm(rpm) etc. Metric is separated in minute/hour/day/month dimensions in storage, different indexes or tables.

 

 

# Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
    enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
    dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
    recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
    minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
    hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
    dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month

 

 

ElasticSearch 6 storage TTL

Specifically:

 

You have following settings in Elasticsearch storage.

    # Those data TTL settings will override the same settings in core module.

    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day

    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day

    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month

  • recordDataTTL affects Record data.
  • otherMetricsDataTTL affects minute/hour/day dimensions of metrics. minuteMetricsDataTTL, hourMetricsDataTTL and dayMetricsDataTTL are still there, but the Unit of them changed to DAY too. If you want to set them manually, please remove otherMetricsDataTTL.
  • monthMetricsDataTTL affects month dimension of metrics.

 storage

Native supported storage

  • H2

ElasticSearch 6

  • MySQL
  • TiDB

ElasticSearch 6

 

storage:
  elasticsearch:
    # nameSpace: ${SW_NAMESPACE:""}
    # user: ${SW_ES_USER:""} # User needs to be set when Http Basic authentication is enabled
    # password: ${SW_ES_PASSWORD:""} # Password to be set when Http Basic authentication is enabled
    #trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:""}
    #trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
    # Those data TTL settings will override the same settings in core module.
    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:2000} # Execute the bulk every 2000 requests
    bulkSize: ${SW_STORAGE_ES_BULK_SIZE:20} # flush the bulk every 20mb
    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests

 

sampleRate

receiver-trace:
  default:
    bufferPath: ../trace-buffer/  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: 100 # Unit is MB
    bufferDataMaxFileSize: 500 # Unit is MB
    bufferFileCleanWhenRestart: false
    sampleRate: ${SW_TRACE_SAMPLE_RATE:1000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.

 

 

客户端

运行条件

  1. Agent is available for JDK 6 - 12.
  2. Find agent folder in SkyWalking release package
  3. Set agent.service_name in config/agent.config. Could be any String in English.
  4. Set collector.backend_service in config/agent.config. Default point to 127.0.0.1:11800, only works for local backend.
  5. Add -javaagent:/path/to/skywalking-package/agent/skywalking-agent.jar to JVM argument. And make sure to add it before the -jar argument.

 

配置文件

agent/config/agent.config

 

 

property key

Description

Default

agent.namespace

 

 

agent.service_name

 

 

agent.sample_n_per_3_secs

 

 

agent.authentication

 

 

agent.span_limit_per_segment

 

 

agent.ignore_suffix

 

 

agent.is_open_debugging_class

 

 

agent.active_v2_header

 

 

agent.instance_uuid

 

 

agent.instance_properties[key]=value

 

 

agent.cause_exception_depth

 

 

agent.active_v1_header

 

 

agent.cool_down_threshold

 

 

agent.force_reconnection_period

 

 

agent.operation_name_threshold

 

 

collector.grpc_channel_check_interval

 

 

collector.app_and_service_register_check_interval

 

 

collector.backend_service

 

 

collector.grpc_upstream_timeout

 

 

logging.level

 

 

logging.file_name

 

 

logging.output

 

 

logging.dir

 

 

logging.pattern

 

 

logging.max_file_size

 

 

logging.max_history_files

 

 

jvm.buffer_size

 

 

buffer.channel_size

 

 

buffer.buffer_size

 

 

dictionary.service_code_buffer_size

 

 

dictionary.endpoint_name_buffer_size

 

 

plugin.peer_max_length

 

 

plugin.mongodb.trace_param

 

 

plugin.mongodb.filter_length_limit

 

 

plugin.elasticsearch.trace_dsl

 

 

plugin.springmvc.use_qualified_name_as_endpoint_name

 

 

plugin.toolit.use_qualified_name_as_operation_name

 

 

plugin.mysql.trace_sql_parameters

 

 

plugin.mysql.sql_parameters_max_length

 

 

plugin.postgresql.sql_parameters_max_length

 

 

plugin.solrj.trace_statement

 

 

plugin.solrj.trace_ops_params

 

 

plugin.light4j.trace_handler_chain

 

 

plugin.opgroup.*

 

 

 

 

成功案例

智能日志管理平台 https://developer.qiniu.com/insight

 

 

Pandora 智能日志管理平台是一站式的日志数据管理平台,具有日志统一存储、实时检索、查询和分析、监控告警能力,并提供计算引擎(流式计算、批量计算)对数据做进一步的分析,同时支持异常检测和预测等机器学习功能,帮助用户提升运维、运营效率,快速查找和定位问题,广泛应用于在线业务监控、运维排障、安全审计、用户业务分析等场景。

参考文档

  1. https://blog.csdn.net/gzy11/article/details/86679473#1322_mysql_175
  2. https://blog.csdn.net/gzy11/article/details/86679585#_4
  3. https://developer.qiniu.com/insight/manual/5435/skywalking-tracking-tomcat-services