Skip to main content

OpenTelemetry

The OpenTelemetry collector is the center of Lumie's telemetry pipeline. It runs as a DaemonSet and handles three jobs at once:

  • receives OTLP telemetry from instrumented services
  • scrapes metrics targets discovered from ServiceMonitor and PodMonitor resources
  • tails container logs from the node filesystem

Source paths

  • lumie-infra/observability/opentelemetry/argocd.yaml
  • lumie-infra/observability/opentelemetry/kustomization.yaml
  • lumie-infra/observability/opentelemetry/manifests/collector.yaml
  • lumie-infra/observability/opentelemetry/manifests/targetallocator-rbac.yaml

Runtime flow

Collector contract

  • mode: daemonset
  • host ports:
    • 4317 for OTLP gRPC
    • 4318 for OTLP HTTP
  • hostPath mounts:
    • /var/log/pods
    • /var/lib/docker/containers
  • exporters:
    • otlphttp/prometheus
    • otlphttp/loki
    • otlp/tempo

Why this matters

Lumie's scrape model is collector-first:

  • prometheus.receiver.target_allocator reads ServiceMonitor and PodMonitor definitions.
  • Prometheus itself is configured to match only scrape-by: prometheus-only labels.
  • Most workload metrics therefore travel target -> collector -> Prometheus OTLP receiver, not target -> Prometheus directly.

Log pipeline

The filelog receiver distinguishes Docker-style JSON logs from containerd logs, extracts namespace, pod, container, and pod UID from the file path, and attaches label hints for Loki. That makes Loki query labels depend on collector parsing, not on a separate Promtail deployment.

Failure modes

  • If the collector DaemonSet is unhealthy, metrics, logs, and traces can all degrade together.
  • If Target Allocator RBAC breaks, the collector keeps running but stops discovering scrape targets.
  • If node log mounts change, the filelog receiver silently loses container logs.
  • If host ports 4317 or 4318 are blocked, app instrumentation appears healthy locally but nothing arrives downstream.

Verification

kubectl get applications.argoproj.io -n argocd opentelemetry
kubectl get opentelemetrycollectors.opentelemetry.io -n opentelemetry
kubectl get pods -n opentelemetry
kubectl logs -n opentelemetry daemonset/otel-collector-collector --tail=200
kubectl get clusterrole,clusterrolebinding | rg 'otel-collector|otel-targetallocator'