OpenTelemetry Series | Mysterious Collector - Opentelemetry Collector

foreword

In the last chapter, we mainly introduced some data generation methods of the OpenTelemetry client, but the data of the client must eventually be sent to the server for unified collection and integration, so that we can see the complete call chain, metrics and other information. Therefore, in this chapter, we will mainly introduce the collection capabilities of the server.

Client data reporting

The client will generate call chains, metrics, logs and other information according to certain rules, and then push them to the remote server. Generally speaking, OpenTelemetry's server-client request protocol standards for data transfer are Http and Grpc protocols, and the implementation of these two data protocols should be included in the sdk of each language and the server-side implementation.

According to common sense, the amount of data such as the call chain is huge, so there will be some operation options similar to Batch on the client side. This option will integrate multiple Span information together and send them together to reduce the loss on the network side. .

We will collectively call this kind of data reporting by the client as export, and at the same time, the components that implement these reports are collectively called exporters. exporters will contain different data protocols and formats, the default format is OTLP.

OTLP

OTLP refers to OpenTelemetry Protocol, which is the OpenTelemetry data protocol. The OTLP specification stipulates the encoding, transmission and delivery of telemetry data between the client and the service collection end.

OTLP is divided into OTLP/gRPC and OTLP/HTTP in terms of implementation.

OTLP/HTTP

OTLP/HTTP supports two modes during data transmission: binary and json

Binary uses the proto3 encoding standard, and must be marked in the request header with Content-Type: application/x-protobuf

The JSON format uses the JSON Mapping defined by the proto3 standard to handle the mapping relationship between Protobuf and JSON.

OTLP/gRPC

Normal request: After the connection between the client and the server is established, the client can continuously send requests to the server, and the server will respond one by one.
Concurrent requests: The client can send the next request before the server responds to increase the amount of concurrency.

Collector

Introduction to Collector

OpenTelemetry provides an open source Collector to report, collect, process and output client data. otel collector is a "universal" collector that supports multiple protocols and data sources. It can be said that he can directly support many data sources you can think of.

The otel collector is implemented in golang, and the rc version 1.0.0 has been released by the time the article is written. Collector is divided into two projects opentelemetry-collectoropentelemetry-collector-contrib . opentelemetry-collector is the core project, which implements the basic mechanism of collector and some basic components, while opentelemetry-collector-contrib will have a large number of components, and these components are inconvenient to be directly integrated into the core collector for different reasons, so separate A project was built to integrate these components. Our follow-up collector function introduction and verification will be based on opentelemetry-collector-contrib.

Collector uses

The composition of the otel collector is very clear, divided into:

  • Receiver
  • Processor
  • Exporter
  • Extension
  • Service

A sample of the entire configuration file is as follows:

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  jaeger:
    endpoint: localhost:14250
    tls:
      insecure: true
  logging:
    loglevel: debug

processors:
  batch:

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [jaeger, logging]
      processors: [batch]

This configuration is a configuration used in my local test. This configuration is very simple. It receives the reported data of otlp http/grpc, performs batch processing, and then outputs it to the console log and jaeger. After we have configured various data sources and plug-ins, we configure the data sources and plug-ins used in the pipeline.

Receiver

Receiver refers to the receiver, that is, the form of the data source received by the collector. Receiver can support multiple data sources, and can also support pull and push modes.

receivers:
  # Data sources: logs
  fluentforward:
    endpoint: 0.0.0.0:8006

  # Data sources: metrics
  hostmetrics:
    scrapers:
      cpu:
      disk:
      filesystem:
      load:
      memory:
      network:
      process:
      processes:
      swap:

  # Data sources: traces
  jaeger:
    protocols:
      grpc:
      thrift_binary:
      thrift_compact:
      thrift_http:

  # Data sources: traces
  kafka:
    protocol_version: 2.0.0

  # Data sources: traces, metrics
  opencensus:

  # Data sources: traces, metrics, logs
  otlp:
    protocols:
      grpc:
      http:

  # Data sources: metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: "otel-collector"
          scrape_interval: 5s
          static_configs:
            - targets: ["localhost:8888"]

  # Data sources: traces
  zipkin:

The above is an example of a receiver, which shows a variety of configurations for receiving data sources.

Processor

Processor is a plug-in similar to processing data executed between Receiver and Exportor. Multiple Processor s can be configured and executed sequentially according to the order of the pipeline s in the configuration.

The following are some sample Processor configurations:

processors:
  # Data sources: traces
  attributes:
    actions:
      - key: environment
        value: production
        action: insert
      - key: db.statement
        action: delete
      - key: email
        action: hash

  # Data sources: traces, metrics, logs
  batch:

  # Data sources: metrics
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names:
          - prefix/.*
          - prefix_.*

  # Data sources: traces, metrics, logs
  memory_limiter:
    check_interval: 5s
    limit_mib: 4000
    spike_limit_mib: 500

  # Data sources: traces
  resource:
    attributes:
      - key: cloud.zone
        value: "zone-1"
        action: upsert
      - key: k8s.cluster.name
        from_attribute: k8s-cluster
        action: insert
      - key: redundant-attribute
        action: delete

  # Data sources: traces
  probabilistic_sampler:
    hash_seed: 22
    sampling_percentage: 15

  # Data sources: traces
  span:
    name:
      to_attributes:
        rules:
          - ^\/api\/v1\/document\/(?P<documentId>.*)\/update$
      from_attributes: ["db.svc", "operation"]
      separator: "::"

Exportor

Exportor refers to the exporter, that is, the form of the data source output by the collector. Exportor can support multiple data sources, and can also support pull and push modes.

The following are some Exportor samples:

exporters:
  # Data sources: traces, metrics, logs
  file:
    path: ./filename.json

  # Data sources: traces
  jaeger:
    endpoint: "jaeger-all-in-one:14250"
    tls:
      cert_file: cert.pem
      key_file: cert-key.pem

  # Data sources: traces
  kafka:
    protocol_version: 2.0.0

  # Data sources: traces, metrics, logs
  logging:
    loglevel: debug

  # Data sources: traces, metrics
  opencensus:
    endpoint: "otelcol2:55678"

  # Data sources: traces, metrics, logs
  otlp:
    endpoint: otelcol2:4317
    tls:
      cert_file: cert.pem
      key_file: cert-key.pem

  # Data sources: traces, metrics
  otlphttp:
    endpoint: https://example.com:4318/v1/traces

  # Data sources: metrics
  prometheus:
    endpoint: "prometheus:8889"
    namespace: "default"

  # Data sources: metrics
  prometheusremotewrite:
    endpoint: "http://some.url:9411/api/prom/push"
    # For official Prometheus (e.g. running via Docker)
    # endpoint: 'http://prometheus:9090/api/v1/write'
    # tls:
    #   insecure: true

  # Data sources: traces
  zipkin:
    endpoint: "http://localhost:9411/api/v2/spans"

Extension

Extension is the extension of the collector. It should be noted that the Extension does not process otel data. It is responsible for the expansion capabilities of non-otel data such as health check service discovery, compression algorithms, and so on.

Some Extension samples:

extensions:
  health_check:
  pprof:
  zpages:
  memory_ballast:
    size_mib: 512

Service

The above-mentioned configurations are the specific data sources configured or the application configurations of the plug-in itself, but whether they actually take effect or not, the order of use is configured in the Service. It mainly includes the following items:

  • extensions
  • pipelines
  • telemetry

Extensions are configured in the form of an array, regardless of sequence:

service:
  extensions: [health_check, pprof, zpages]

The pipelines configuration distinguishes trace s, metrics and logs, each of which can be configured with separate receivers, processors and exporters, all of which are configured in the form of an array, where the array configuration of processors needs to be configured according to the desired execution order, while others The order does not matter.

service:
  pipelines:
    metrics:
      receivers: [opencensus, prometheus]
      exporters: [opencensus, prometheus]
    traces:
      receivers: [opencensus, jaeger]
      processors: [batch]
      exporters: [opencensus, zipkin]

The telemetry configuration is the configuration of the collector itself, mainly log and metrics. The following configuration configures the log level of the collector itself and the output address of the metrics:

service:
  telemetry:
    logs:
      level: debug
      initial_fields:
        service: my-instance
    metrics:
      level: detailed
      address: 0.0.0.0:8888

Personalized Collector

If you want to customize a personalized Collector to include your desired Receiver, Exportor, etc., an ultimate solution is to download the source code, then configure the golang environment, modify the code according to your needs and compile it. This method can be perfectly customized, but it will be more troublesome, especially for non-golang developers, it is very troublesome to build a golang environment.

OpenTelemetry provides a ocb(OpenTelemetry Collector Builder) The way to facilitate everyone to customize Collector. Interested friends can refer to this document use.

Summarize

The collector is an important part of the entire call chain. All client data requires a unified collector to receive data and perform certain cleaning and forwarding tasks. The current OpenTelemetry Collector has done a lot of work to maintain compatibility and performance. Look forward to the official release of OpenTelemetry Collector version 1.0.0 as soon as possible.

Tags: Java Go Back-end

Posted by christh on Sun, 18 Dec 2022 13:12:38 +0530