Thanos
Thanos is present in Lumie, but only part of the usual Thanos stack is active. Today it is a query layer in front of Prometheus sidecars, not a long-term metrics archive.
Source paths
lumie-infra/observability/thanos/argocd.yamllumie-infra/observability/thanos/helm-values.yamllumie-infra/observability/thanos/common-values.yamllumie-infra/observability/prometheus/helm-values.yaml
Active components
query.enabled: truequeryFrontend.enabled: falsestoregateway.enabled: falsecompactor.enabled: falsereceive.enabled: falseruler.enabled: false
What it really does today
- Grafana uses Thanos as its default Prometheus-compatible datasource.
- Query uses DNS discovery against
prometheus-kube-prometheus-thanos-discoveryin theprometheusnamespace. - Deduplication is enabled through
--query.replica-label=prometheus_replica, even though Lumie currently runs a single Prometheus replica.
Current limitation
existingObjstoreSecret: thanos-objstore-secret is still rendered, but Prometheus has objectStorageConfig removed and Thanos Store Gateway and Compactor are off. In practice that means:
- no long-term block upload from Prometheus
- no historical object-store reads from Store Gateway
- no compaction or downsampling jobs
Treat the deployment as query-only unless the repo re-enables those pieces.
Failure modes
- Teams may assume Thanos implies long retention; the repo does not back that up today.
- If Prometheus or its sidecar service is unavailable, Thanos Query loses data immediately because there is no Store Gateway fallback.
- Because Grafana defaults to Thanos, datasource errors there can look like broad metrics outages even when Prometheus itself is healthy.
Verification
kubectl get applications.argoproj.io -n argocd thanos
kubectl get pods -n thanos
kubectl describe deploy -n thanos thanos-query
kubectl get svc -n prometheus | rg thanos