Thanos

Thanos is present in Lumie, but only part of the usual Thanos stack is active. Today it is a query layer in front of Prometheus sidecars, not a long-term metrics archive.

Source paths

lumie-infra/observability/thanos/argocd.yaml
lumie-infra/observability/thanos/helm-values.yaml
lumie-infra/observability/thanos/common-values.yaml
lumie-infra/observability/prometheus/helm-values.yaml

Active components

query.enabled: true
queryFrontend.enabled: false
storegateway.enabled: false
compactor.enabled: false
receive.enabled: false
ruler.enabled: false

What it really does today

Grafana uses Thanos as its default Prometheus-compatible datasource.
Query uses DNS discovery against prometheus-kube-prometheus-thanos-discovery in the prometheus namespace.
Deduplication is enabled through --query.replica-label=prometheus_replica, even though Lumie currently runs a single Prometheus replica.

Current limitation

existingObjstoreSecret: thanos-objstore-secret is still rendered, but Prometheus has objectStorageConfig removed and Thanos Store Gateway and Compactor are off. In practice that means:

no long-term block upload from Prometheus
no historical object-store reads from Store Gateway
no compaction or downsampling jobs

Treat the deployment as query-only unless the repo re-enables those pieces.

Failure modes

Teams may assume Thanos implies long retention; the repo does not back that up today.
If Prometheus or its sidecar service is unavailable, Thanos Query loses data immediately because there is no Store Gateway fallback.
Because Grafana defaults to Thanos, datasource errors there can look like broad metrics outages even when Prometheus itself is healthy.

Verification

kubectl get applications.argoproj.io -n argocd thanos
kubectl get pods -n thanos
kubectl describe deploy -n thanos thanos-query
kubectl get svc -n prometheus | rg thanos

Source paths​

Active components​

What it really does today​

Current limitation​

Failure modes​

Verification​

Related pages​