Skip to main content

CloudNativePG

CloudNativePG is the shared PostgreSQL control plane for Lumie. The operator lives in lumie-infra/storage/cnpg/**; the actual product and platform database clusters are declared in other repo areas and reconciled through CNPG CRDs.

Source paths

  • lumie-infra/storage/cnpg/argocd.yaml
  • lumie-infra/storage/cnpg/helm-values.yaml
  • lumie-infra/storage/cnpg/common-values.yaml
  • lumie-infra/charts/cnpg/templates/cnpg-cluster.yaml
  • lumie-infra/charts/cnpg/templates/cnpg-scheduled-backup.yaml
  • lumie-infra/charts/cnpg/templates/cnpg-dump-backup.yaml

What the operator owns

  • Installs the CNPG CRDs and controller in the cnpg namespace.
  • Exposes monitoring through the operator PodMonitor.
  • Reconciles these runtime resources:
    • Cluster
    • ScheduledBackup
    • Pooler
  • Inherits labels and annotations across managed resources through config.data.INHERITED_* in lumie-infra/storage/cnpg/helm-values.yaml.

Runtime flow

The operator itself is shared, but cluster intent is split:

  • Product databases come from lumie-infra/applications/lumie/**.
  • The shared infra database comes from lumie-infra/storage/infra-db/**.

Important contract

This template excerpt is the main contract between Lumie chart values and CNPG-managed clusters:

monitoring:
enablePodMonitor: true
backup:
barmanObjectStore:
destinationPath: ...
endpointURL: ...

Source path: lumie-infra/charts/cnpg/templates/cnpg-cluster.yaml

That means a cluster only gets backups when its own values enable .common.database.backup, and every rendered cluster automatically exposes a PodMonitor.

Dependencies

  • ArgoCD for reconciliation
  • local-path-retain or another referenced StorageClass on each cluster CR
  • Vault-backed secrets for bootstrap credentials or backup credentials
  • Prometheus for CNPG metrics scraping

Operational boundaries

  • The operator does not define application-specific databases or roles by itself. Those are expressed in the owning cluster manifest or bootstrap SQL.
  • The operator manages the CNPG-native poolers, but the current pooler CRs live under applications/lumie/**, not under storage/pgbouncer/**.
  • cnpg-dump-backup.yaml still supports pg_dump-to-MinIO CronJobs for charts that enable dumpBackup, but the main Lumie database and infra-db are configured for Barman object-store backups instead.

Failure modes

  • Webhook or CRD drift can block all Cluster and Pooler updates. storage/cnpg/argocd.yaml already ignores noisy CRD annotation differences, but webhook issues still stop reconciliation.
  • Missing backup credentials make the cluster healthy enough to serve traffic while silently failing WAL archive or base-backup work.
  • Local-path storage prevents automatic recovery onto arbitrary nodes after node loss.
  • If the operator is down, existing databases can continue serving traffic, but failover, backup, and pooler reconciliation stop.

Verification

kubectl get applications.argoproj.io -n argocd cnpg
kubectl get pods -n cnpg
kubectl get clusters.postgresql.cnpg.io -A
kubectl get scheduledbackups.postgresql.cnpg.io -A
kubectl get poolers.postgresql.cnpg.io -A
kubectl logs -n cnpg deploy/cnpg-cloudnative-pg --tail=200

Observability

  • Operator metrics are enabled through monitoring.podMonitorEnabled: true.
  • Every rendered cluster enables monitoring.enablePodMonitor: true.
  • Grafana keeps the CNPG dashboard JSON in lumie-infra/observability/grafana/dashboards/cloudnativepg.json.