Appearance
Grafana
Namespace: grafana | URL: https://grafana.astaup.de | Manifests: infrastructure/monitoring/grafana/
Deployment
Deployed via the Grafana Operator (grafana.integreatly.org/v1beta1). The operator manages the Grafana Deployment and reconciles GrafanaDatasource, GrafanaDashboard, GrafanaAlertRuleGroup, and GrafanaContactPoint CRDs into a running Grafana instance.
The dashboards: grafana label on the Grafana CR is the selector that all CRDs use to target this instance (instanceSelector.matchLabels). This allows multiple Grafana instances to coexist in future.
Datasources
Both datasources use access: proxy — Grafana fetches on behalf of the browser, so the browser never needs direct cluster access.
| Name | Type | URL | Notes |
|---|---|---|---|
| Mimir | Prometheus | mimir-gateway.mimir/prometheus | Default datasource; prometheusType: Mimir enables Mimir-specific query hints; httpMethod: POST for large queries |
| Loki | Loki | loki-gateway.loki |
OIDC (Keycloak)
Auth via Keycloak at idp.astaup.de/realms/astaup.de. Client secret is stored SOPS-encrypted in secrets/monitoring/grafana-oidc-secret.enc.yaml, decrypted to a grafana-oauth secret in the grafana namespace, and injected as AUTH_CLIENT_SECRET env var.
use_pkce: "true"— required because Keycloak enforces PKCE (RFC 9700) by default. Without it, Keycloak rejects the auth request withMissing parameter: code_challenge_methodallow_sign_up: "true"— accounts are created on first login; no pre-provisioning neededdisable_login_form: "false"— keeps the local admin login available as a fallback
Role mapping from Keycloak groups:
/Mitarbeitende/IT-Administration→ Admin/Mitarbeitende(any other) → Viewer- Everyone else → Viewer
Alerting
All alerting config is managed as Grafana Operator CRDs, committed to Git. Grafana's built-in alerting engine evaluates rules and routes via its internal Alertmanager — no external Alertmanager is used.
Notification routing (GrafanaNotificationPolicy):
- All alerts → Slack
#it-alerts - Grouped by
alertname+namespace group_wait: 30s,group_interval: 5m,repeat_interval: 4h- Webhook URL from
grafana-slack-webhooksecret
Alert rules (GrafanaAlertRuleGroup, all query Mimir):
| Rule | Condition | Severity |
|---|---|---|
| Pod CrashLoopBackOff | >2 restarts in 15m and pod not ready | critical |
| PVC disk usage | configurable threshold | — |
| ztunnel HBONE | ztunnel connectivity issues | — |
| Zammad | application-specific checks | — |
Ingress
Istio Gateway (grafana in grafana namespace) with dedicated load-balancer IPs assigned via mikrolb annotation. HTTP redirects to HTTPS (301). TLS cert from cert-manager.
Traffic path: Internet → mikrolb → Istio Gateway → grafana-service:3000
Istio AuthorizationPolicies:
allow-intra-namespace— intra-namespace traffic allowedgrafana-istio-allow— allows HTTP/HTTPS to the Gateway podgrafana-allow— only the Istio gateway's service account and the Grafana Operator's service account can reach the Grafana pod on port 3000