Skip to main content

Tempo

Tempo is Lumie's trace store. Like Loki, it is deliberately optimized for short-term operational visibility rather than durable long-term retention.

Source paths

  • lumie-infra/observability/tempo/argocd.yaml
  • lumie-infra/observability/tempo/helm-values.yaml
  • lumie-infra/observability/opentelemetry/manifests/collector.yaml

Runtime contract

  • chart: grafana/tempo
  • replicas: 1
  • storage backend: local filesystem
  • persistence: disabled
  • retention: 72h
  • OTLP receivers exposed on 4317 and 4318
  • metrics generator: disabled

Why metrics generation is disabled

metricsGenerator.enabled: false is intentional. The values file notes that failed remote writes previously allowed Tempo WAL growth and OOM conditions, so Lumie keeps trace-derived metrics off until the wider ingestion path changes.

Runtime flow

Operational boundaries

  • Tempo is internal-only; Grafana is the normal human interface.
  • Data lives under /var/tempo on emptyDir, so pod replacement loses trace history.
  • GOMEMLIMIT is pinned in the Helm values to keep Go GC behavior under the pod memory ceiling.

Failure modes

  • Pod restart or node evacuation clears local trace history.
  • If the collector exporter to tempo.tempo.svc.cluster.local:4317 breaks, services remain instrumented but no traces appear in Grafana.
  • If operators expect trace-derived metrics, they will not appear because the metrics generator is explicitly off.

Verification

kubectl get applications.argoproj.io -n argocd tempo
kubectl get pods -n tempo
kubectl describe pod -n tempo tempo-0
kubectl logs -n opentelemetry daemonset/otel-collector-collector --tail=200