Skywalking-APM
-权威指南
SkyWalking 6.x 架构图
服务端
运行条件
- Linux centos 7.x
- Jdk1.8
配置文件
alarm-settings.yml
rules:
# Rule unique name, must be ended with `_rule`.
endpoint_percent_rule:
# Metrics value need to be long, double or int
metrics-name: endpoint_percent
threshold: 75
op: <
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 3
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 10
service_percent_rule:
metrics-name: service_percent
# [Optional] Default, match all services in this metrics
include-names:
- service_a
- service_b
threshold: 85
op: <
period: 10
count: 4
- Service average response time over 1s in last 3 minutes.
- Service success rate lower than 80% in last 2 minutes.
- Service 90% response time is over 1s in last 3 minutes
- Service Instance average response time over 1s in last 2 minutes.
- Endpoint average response time over 1s in last 2 minutes.
application.yml
receiver-register:
default:
receiver-trace:
default:
bufferPath: ../trace-buffer/ # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: 100 # Unit is MB
bufferDataMaxFileSize: 500 # Unit is MB
bufferFileCleanWhenRestart: false
sampleRate: ${SW_TRACE_SAMPLE_RATE:1000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
receiver-jvm:
default:
service-mesh:
default:
bufferPath: ../mesh-buffer/ # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: 100 # Unit is MB
bufferDataMaxFileSize: 500 # Unit is MB
bufferFileCleanWhenRestart: false
istio-telemetry:
default:
envoy-metric:
default:
receiver_zipkin:
default:
host: 0.0.0.0
port: 9411
contextPath: /
TTL
In SkyWalking, there are two types of observability data, besides metadata.
- Record, including trace and alarm. Maybe log in the future.
- Metric, including such as p99/p95/p90/p75/p50, heatmap, success rate, cpm(rpm) etc. Metric is separated in minute/hour/day/month dimensions in storage, different indexes or tables.
# Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
ElasticSearch 6 storage TTL
Specifically:
You have following settings in Elasticsearch storage.
# Those data TTL settings will override the same settings in core module.
recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
- recordDataTTL affects Record data.
- otherMetricsDataTTL affects minute/hour/day dimensions of metrics. minuteMetricsDataTTL, hourMetricsDataTTL and dayMetricsDataTTL are still there, but the Unit of them changed to DAY too. If you want to set them manually, please remove otherMetricsDataTTL.
- monthMetricsDataTTL affects month dimension of metrics.
storage
Native supported storage
- H2
ElasticSearch 6
- MySQL
- TiDB
ElasticSearch 6
storage:
elasticsearch:
# nameSpace: ${SW_NAMESPACE:""}
# user: ${SW_ES_USER:""} # User needs to be set when Http Basic authentication is enabled
# password: ${SW_ES_PASSWORD:""} # Password to be set when Http Basic authentication is enabled
#trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:""}
#trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
# Those data TTL settings will override the same settings in core module.
recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
# Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:2000} # Execute the bulk every 2000 requests
bulkSize: ${SW_STORAGE_ES_BULK_SIZE:20} # flush the bulk every 20mb
flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
sampleRate
receiver-trace:
default:
bufferPath: ../trace-buffer/ # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: 100 # Unit is MB
bufferDataMaxFileSize: 500 # Unit is MB
bufferFileCleanWhenRestart: false
sampleRate: ${SW_TRACE_SAMPLE_RATE:1000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
客户端
运行条件
- Agent is available for JDK 6 - 12.
- Find agent folder in SkyWalking release package
- Set agent.service_name in config/agent.config. Could be any String in English.
- Set collector.backend_service in config/agent.config. Default point to 127.0.0.1:11800, only works for local backend.
- Add -javaagent:/path/to/skywalking-package/agent/skywalking-agent.jar to JVM argument. And make sure to add it before the -jar argument.
配置文件
agent/config/agent.config
property key |
Description |
Default |
agent.namespace |
|
|
agent.service_name |
|
|
agent.sample_n_per_3_secs |
|
|
agent.authentication |
|
|
agent.span_limit_per_segment |
|
|
agent.ignore_suffix |
|
|
agent.is_open_debugging_class |
|
|
agent.active_v2_header |
|
|
agent.instance_uuid |
|
|
agent.instance_properties[key]=value |
|
|
agent.cause_exception_depth |
|
|
agent.active_v1_header |
|
|
agent.cool_down_threshold |
|
|
agent.force_reconnection_period |
|
|
agent.operation_name_threshold |
|
|
collector.grpc_channel_check_interval |
|
|
collector.app_and_service_register_check_interval |
|
|
collector.backend_service |
|
|
collector.grpc_upstream_timeout |
|
|
logging.level |
|
|
logging.file_name |
|
|
logging.output |
|
|
logging.dir |
|
|
logging.pattern |
|
|
logging.max_file_size |
|
|
logging.max_history_files |
|
|
jvm.buffer_size |
|
|
buffer.channel_size |
|
|
buffer.buffer_size |
|
|
dictionary.service_code_buffer_size |
|
|
dictionary.endpoint_name_buffer_size |
|
|
plugin.peer_max_length |
|
|
plugin.mongodb.trace_param |
|
|
plugin.mongodb.filter_length_limit |
|
|
plugin.elasticsearch.trace_dsl |
|
|
plugin.springmvc.use_qualified_name_as_endpoint_name |
|
|
plugin.toolit.use_qualified_name_as_operation_name |
|
|
plugin.mysql.trace_sql_parameters |
|
|
plugin.mysql.sql_parameters_max_length |
|
|
plugin.postgresql.sql_parameters_max_length |
|
|
plugin.solrj.trace_statement |
|
|
plugin.solrj.trace_ops_params |
|
|
plugin.light4j.trace_handler_chain |
|
|
plugin.opgroup.* |
|
|
成功案例
智能日志管理平台 https://developer.qiniu.com/insight
Pandora 智能日志管理平台是一站式的日志数据管理平台,具有日志统一存储、实时检索、查询和分析、监控告警能力,并提供计算引擎(流式计算、批量计算)对数据做进一步的分析,同时支持异常检测和预测等机器学习功能,帮助用户提升运维、运营效率,快速查找和定位问题,广泛应用于在线业务监控、运维排障、安全审计、用户业务分析等场景。
参考文档
- https://blog.csdn.net/gzy11/article/details/86679473#1322_mysql_175
- https://blog.csdn.net/gzy11/article/details/86679585#_4
- https://developer.qiniu.com/insight/manual/5435/skywalking-tracking-tomcat-services