foreword
In the last chapter, we mainly introduced some data generation methods of the OpenTelemetry client, but the data of the client must eventually be sent to the server for unified collection and integration, so that we can see the complete call chain, metrics and other information. Therefore, in this chapter, we will mainly introduce the collection capabilities of the server.
Client data reporting
The client will generate call chains, metrics, logs and other information according to certain rules, and then push them to the remote server. Generally speaking, OpenTelemetry's server-client request protocol standards for data transfer are Http and Grpc protocols, and the implementation of these two data protocols should be included in the sdk of each language and the server-side implementation.
According to common sense, the amount of data such as the call chain is huge, so there will be some operation options similar to Batch on the client side. This option will integrate multiple Span information together and send them together to reduce the loss on the network side. .
We will collectively call this kind of data reporting by the client as export, and at the same time, the components that implement these reports are collectively called exporters. exporters will contain different data protocols and formats, the default format is OTLP.
OTLP
OTLP refers to OpenTelemetry Protocol, which is the OpenTelemetry data protocol. The OTLP specification stipulates the encoding, transmission and delivery of telemetry data between the client and the service collection end.
OTLP is divided into OTLP/gRPC and OTLP/HTTP in terms of implementation.
OTLP/HTTP
OTLP/HTTP supports two modes during data transmission: binary and json
Binary uses the proto3 encoding standard, and must be marked in the request header with Content-Type: application/x-protobuf
The JSON format uses the JSON Mapping defined by the proto3 standard to handle the mapping relationship between Protobuf and JSON.
OTLP/gRPC
Normal request: After the connection between the client and the server is established, the client can continuously send requests to the server, and the server will respond one by one.
Concurrent requests: The client can send the next request before the server responds to increase the amount of concurrency.
Collector
Introduction to Collector
OpenTelemetry provides an open source Collector to report, collect, process and output client data. otel collector is a "universal" collector that supports multiple protocols and data sources. It can be said that he can directly support many data sources you can think of.
The otel collector is implemented in golang, and the rc version 1.0.0 has been released by the time the article is written. Collector is divided into two projects opentelemetry-collector,opentelemetry-collector-contrib . opentelemetry-collector is the core project, which implements the basic mechanism of collector and some basic components, while opentelemetry-collector-contrib will have a large number of components, and these components are inconvenient to be directly integrated into the core collector for different reasons, so separate A project was built to integrate these components. Our follow-up collector function introduction and verification will be based on opentelemetry-collector-contrib.
Collector uses
The composition of the otel collector is very clear, divided into:
- Receiver
- Processor
- Exporter
- Extension
- Service
A sample of the entire configuration file is as follows:
receivers: otlp: protocols: grpc: http: exporters: jaeger: endpoint: localhost:14250 tls: insecure: true logging: loglevel: debug processors: batch: extensions: health_check: pprof: zpages: service: extensions: [pprof, zpages, health_check] pipelines: traces: receivers: [otlp] exporters: [jaeger, logging] processors: [batch]
This configuration is a configuration used in my local test. This configuration is very simple. It receives the reported data of otlp http/grpc, performs batch processing, and then outputs it to the console log and jaeger. After we have configured various data sources and plug-ins, we configure the data sources and plug-ins used in the pipeline.
Receiver
Receiver refers to the receiver, that is, the form of the data source received by the collector. Receiver can support multiple data sources, and can also support pull and push modes.
receivers: # Data sources: logs fluentforward: endpoint: 0.0.0.0:8006 # Data sources: metrics hostmetrics: scrapers: cpu: disk: filesystem: load: memory: network: process: processes: swap: # Data sources: traces jaeger: protocols: grpc: thrift_binary: thrift_compact: thrift_http: # Data sources: traces kafka: protocol_version: 2.0.0 # Data sources: traces, metrics opencensus: # Data sources: traces, metrics, logs otlp: protocols: grpc: http: # Data sources: metrics prometheus: config: scrape_configs: - job_name: "otel-collector" scrape_interval: 5s static_configs: - targets: ["localhost:8888"] # Data sources: traces zipkin:
The above is an example of a receiver, which shows a variety of configurations for receiving data sources.
Processor
Processor is a plug-in similar to processing data executed between Receiver and Exportor. Multiple Processor s can be configured and executed sequentially according to the order of the pipeline s in the configuration.
The following are some sample Processor configurations:
processors: # Data sources: traces attributes: actions: - key: environment value: production action: insert - key: db.statement action: delete - key: email action: hash # Data sources: traces, metrics, logs batch: # Data sources: metrics filter: metrics: include: match_type: regexp metric_names: - prefix/.* - prefix_.* # Data sources: traces, metrics, logs memory_limiter: check_interval: 5s limit_mib: 4000 spike_limit_mib: 500 # Data sources: traces resource: attributes: - key: cloud.zone value: "zone-1" action: upsert - key: k8s.cluster.name from_attribute: k8s-cluster action: insert - key: redundant-attribute action: delete # Data sources: traces probabilistic_sampler: hash_seed: 22 sampling_percentage: 15 # Data sources: traces span: name: to_attributes: rules: - ^\/api\/v1\/document\/(?P<documentId>.*)\/update$ from_attributes: ["db.svc", "operation"] separator: "::"
Exportor
Exportor refers to the exporter, that is, the form of the data source output by the collector. Exportor can support multiple data sources, and can also support pull and push modes.
The following are some Exportor samples:
exporters: # Data sources: traces, metrics, logs file: path: ./filename.json # Data sources: traces jaeger: endpoint: "jaeger-all-in-one:14250" tls: cert_file: cert.pem key_file: cert-key.pem # Data sources: traces kafka: protocol_version: 2.0.0 # Data sources: traces, metrics, logs logging: loglevel: debug # Data sources: traces, metrics opencensus: endpoint: "otelcol2:55678" # Data sources: traces, metrics, logs otlp: endpoint: otelcol2:4317 tls: cert_file: cert.pem key_file: cert-key.pem # Data sources: traces, metrics otlphttp: endpoint: https://example.com:4318/v1/traces # Data sources: metrics prometheus: endpoint: "prometheus:8889" namespace: "default" # Data sources: metrics prometheusremotewrite: endpoint: "http://some.url:9411/api/prom/push" # For official Prometheus (e.g. running via Docker) # endpoint: 'http://prometheus:9090/api/v1/write' # tls: # insecure: true # Data sources: traces zipkin: endpoint: "http://localhost:9411/api/v2/spans"
Extension
Extension is the extension of the collector. It should be noted that the Extension does not process otel data. It is responsible for the expansion capabilities of non-otel data such as health check service discovery, compression algorithms, and so on.
Some Extension samples:
extensions: health_check: pprof: zpages: memory_ballast: size_mib: 512
Service
The above-mentioned configurations are the specific data sources configured or the application configurations of the plug-in itself, but whether they actually take effect or not, the order of use is configured in the Service. It mainly includes the following items:
- extensions
- pipelines
- telemetry
Extensions are configured in the form of an array, regardless of sequence:
service: extensions: [health_check, pprof, zpages]
The pipelines configuration distinguishes trace s, metrics and logs, each of which can be configured with separate receivers, processors and exporters, all of which are configured in the form of an array, where the array configuration of processors needs to be configured according to the desired execution order, while others The order does not matter.
service: pipelines: metrics: receivers: [opencensus, prometheus] exporters: [opencensus, prometheus] traces: receivers: [opencensus, jaeger] processors: [batch] exporters: [opencensus, zipkin]
The telemetry configuration is the configuration of the collector itself, mainly log and metrics. The following configuration configures the log level of the collector itself and the output address of the metrics:
service: telemetry: logs: level: debug initial_fields: service: my-instance metrics: level: detailed address: 0.0.0.0:8888
Personalized Collector
If you want to customize a personalized Collector to include your desired Receiver, Exportor, etc., an ultimate solution is to download the source code, then configure the golang environment, modify the code according to your needs and compile it. This method can be perfectly customized, but it will be more troublesome, especially for non-golang developers, it is very troublesome to build a golang environment.
OpenTelemetry provides a ocb(OpenTelemetry Collector Builder) The way to facilitate everyone to customize Collector. Interested friends can refer to this document use.
Summarize
The collector is an important part of the entire call chain. All client data requires a unified collector to receive data and perform certain cleaning and forwarding tasks. The current OpenTelemetry Collector has done a lot of work to maintain compatibility and performance. Look forward to the official release of OpenTelemetry Collector version 1.0.0 as soon as possible.