Keycloak
Keycloak provides infrastructure SSO for internal operator tools. In the current repo state it is primarily the infra realm for applications such as Coder, Vault, Zot, Gitea, and RabbitMQ.
Responsibility
- Run Keycloak in the
keycloaknamespace. - Serve OIDC endpoints directly at
auth.lumie-edu.comfor browser-based flows. - Store realm state in the shared
infra-dbPostgreSQL cluster. - Converge declarative realm configuration through a post-sync job instead of
kc.sh import.
Source paths
| Path | Role |
|---|---|
lumie-infra/security/keycloak/argocd.yaml | ArgoCD Application with chart values, common chart values, and extra manifests |
lumie-infra/security/keycloak/helm-values.yaml | Server startup, DB, metrics, and health settings |
lumie-infra/security/keycloak/common-values.yaml | External ingress, realm import ConfigMap, and Vault secret rendering |
lumie-infra/security/keycloak/manifests/realm-sync-job.yaml | PostSync realm convergence job using keycloak-config-cli |
lumie-infra/charts/common/templates/additional-ingresses.yaml | Template that renders auth.lumie-edu.com |
lumie-infra/charts/common/templates/vault-static-secrets.yaml | Template used by the local Vault secret declarations |
lumie-infra/security/teleport/agent/helm-values.yaml | Separate Teleport app entry for the Keycloak UI |
Public surface and contracts
| Surface | Contract |
|---|---|
| Direct OIDC ingress | https://auth.lumie-edu.com |
| In-cluster HTTP service | keycloak-keycloakx-http.keycloak.svc.cluster.local |
| Database | Shared infra-db, database keycloak, user keycloak |
| Admin bootstrap secret | keycloak-secrets |
| Realm sync secret | keycloak-oauth-secrets |
| Teleport app | keycloak |
The direct ingress matters because browser OIDC flows for clients such as Coder and Vault cannot terminate through the Teleport app proxy alone.
Runtime flow
Declared realm contract
The checked-in infra-realm.json defines:
- realm
infra; - realm roles
adminanddeveloper; - client scopes
rabbitmq-adminandgroups; - OIDC clients:
codervaultzotgitearabbitmq
- user
bluemaynewith realm rolesadminanddeveloper.
Client secrets are not hardcoded in Git. The keycloak-realm-sync job reads them from keycloak-oauth-secrets, substitutes them into the realm JSON, and applies the result through the live admin REST API.
Why realm sync is a Job
The repo intentionally avoids kc.sh import --override for realm management. The post-sync job in realm-sync-job.yaml exists because:
- REST-based creation preserves Keycloak's standard client-scope initialization.
- secret placeholders are substituted client-side before the realm JSON is written to the database;
- reruns are idempotent and safe after partial failure.
Failure behavior and operational risks
- Missing OAuth secrets produce
invalid_clientfailures during login even when the Keycloak pod itself is healthy. - If the PostSync job fails midway, the realm can be partially updated; the documented recovery path is to rerun the job, not to roll back the database manually.
- Database bootstrap or admin-password secret failures block startup before realm sync runs.
- Because direct ingress is separate from the Teleport app route, ingress or certificate failures can break OIDC while the Teleport-proxied UI still works.
Contract drift
There is a real checked-in mismatch to be aware of:
security/keycloak/helm-values.yamlcomments describe realmsinfra,lumie, andlumie-dev.security/keycloak/manifests/realm-sync-job.yamlcomments also say the ConfigMap mountsinfra-realm.json,lumie-realm.json, andlumie-dev-realm.json.- The current
security/keycloak/common-values.yamlonly rendersinfra-realm.json, and the current Vault secret template only exposes the infra client secrets used by that file.
Treat the current repo state as infra-realm-only unless the additional realm JSON files are added back.
Observability
serviceMonitor.enabled: trueexposes Keycloak metrics to Prometheus.health.enabled: trueand the startup probe against/healthcover boot readiness.- The
keycloak-realm-syncjob logs are the source of truth for realm convergence failures.
Verification
kubectl get applications.argoproj.io -n argocd keycloak
kubectl get statefulset,pods,svc,ingress,secrets -n keycloak
kubectl get jobs -n keycloak -l app.kubernetes.io/component=realm-sync
kubectl logs -n keycloak job/keycloak-realm-sync
kubectl port-forward -n keycloak svc/keycloak-keycloakx-http 8080:80
curl -sS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8080/health
curl -sS -o /dev/null -w "%{http_code}\n" \
https://auth.lumie-edu.com/realms/infra/.well-known/openid-configuration
Success signals:
- The
keycloakArgo CD application isHealthyandSynced. - StatefulSet
keycloakis ready, and the direct-ingressIngressforauth.lumie-edu.comexists. - Job
keycloak-realm-synccompletes successfully and does not repeatinvalid_clientor import-substitution errors in its logs. GET /healthon the in-cluster service returns HTTP200.- The public OIDC discovery endpoint for realm
infrareturns HTTP200.