You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: "{{`{{ $labels.pod_name }}`}} container is beeing throttled and probably hit CPU limit. Investigate root cause and increase limit and/or number of replicas if necessary."
description: "{{`{{ $labels.kubernetes_pod_name }}`}} container is spending too much time in pause garbage collector. Investigate root cause and increase heap size and/or number of replicas if necessary."
76
76
summary: "{{`{{ $labels.kubernetes_pod_name }}`}} is doing too much pause GC"
77
77
- alert: "[LCM] there is more than 100 jobs on cluster={{ .Values.clusterId }}"
78
78
expr: count(kube_job_info{namespace="lcm"}) > 100
79
79
labels:
80
-
severity: critical
81
-
team: lcm # switch to msf in production
80
+
severity: warning
81
+
team: lcm
82
82
cluster_id: {{ .Values.clusterId }}
83
83
annotations:
84
84
description: "There is more than 100 jobs in LCM namespace. They are likely not deleted."
85
85
summary: "There is more than 100 jobs in LCM namespace."
86
+
- alert: "[LCM] Resource quotas hit CPU limit on cluster={{ .Values.clusterId }}"
0 commit comments