[dev-v2.9] Rebase federator (#4176)

Co-authored-by: joshmeranda <joshua.meranda@gmail.com>
pull/4177/head
Josh Meranda 2024-07-03 16:01:18 -04:00 committed by GitHub
parent e874635dfc
commit d787a62a4c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
99 changed files with 10105 additions and 24 deletions

View File

@ -0,0 +1,21 @@
annotations:
catalog.cattle.io/certified: rancher
catalog.cattle.io/display-name: Prometheus Federator
catalog.cattle.io/kube-version: '>= 1.26.0-0 < 1.31.0-0'
catalog.cattle.io/namespace: cattle-monitoring-system
catalog.cattle.io/os: linux,windows
catalog.cattle.io/permits-os: linux,windows
catalog.cattle.io/provides-gvr: helm.cattle.io.projecthelmchart/v1alpha1
catalog.cattle.io/rancher-version: '>= 2.9.0-0 < 2.10.0-0'
catalog.cattle.io/release-name: prometheus-federator
apiVersion: v2
appVersion: 0.3.5
dependencies:
- condition: helmProjectOperator.enabled
name: helmProjectOperator
repository: file://./charts/helmProjectOperator
version: 0.2.1
description: Prometheus Federator
icon: https://raw.githubusercontent.com/rancher/prometheus-federator/main/assets/logos/prometheus-federator.svg
name: prometheus-federator
version: 104.0.0-rc1+up0.4.2

View File

@ -0,0 +1,120 @@
# Prometheus Federator
This chart is deploys a Helm Project Operator (based on the [rancher/helm-project-operator](https://github.com/rancher/helm-project-operator)), an operator that manages deploying Helm charts each containing a Project Monitoring Stack, where each stack contains:
- [Prometheus](https://prometheus.io/) (managed externally by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))
- [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (managed externally by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))
- [Grafana](https://github.com/helm/charts/tree/master/stable/grafana) (deployed via an embedded Helm chart)
- Default PrometheusRules and Grafana dashboards based on the collection of community-curated resources from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus/)
- Default ServiceMonitors that watch the deployed resources
> **Important Note: Prometheus Federator is designed to be deployed alongside an existing Prometheus Operator deployment in a cluster that has already installed the Prometheus Operator CRDs.**
By default, the chart is configured and intended to be deployed alongside [rancher-monitoring](https://rancher.com/docs/rancher/v2.6/en/monitoring-alerting/), which deploys Prometheus Operator alongside a Cluster Prometheus that each Project Monitoring Stack is configured to federate namespace-scoped metrics from by default.
## Pre-Installation: Using Prometheus Federator with Rancher and rancher-monitoring
If you are running your cluster on [Rancher](https://rancher.com/) and already have [rancher-monitoring](https://rancher.com/docs/rancher/v2.6/en/monitoring-alerting/) deployed onto your cluster, Prometheus Federator's default configuration should already be configured to work with your existing Cluster Monitoring Stack; however, here are some notes on how we recommend you configure rancher-monitoring to optimize the security and usability of Prometheus Federator in your cluster:
### Ensure the cattle-monitoring-system namespace is placed into the System Project (or a similarly locked down Project that has access to other Projects in the cluster)
Prometheus Operator's security model expects that the namespace it is deployed into (`cattle-monitoring-system`) has limited access for anyone except Cluster Admins to avoid privilege escalation via execing into Pods (such as the Jobs executing Helm operations). In addition, deploying Prometheus Federator and all Project Prometheus stacks into the System Project ensures that the each Project Prometheus is able to reach out to scrape workloads across all Projects (even if Network Policies are defined via Project Network Isolation) but has limited access for Project Owners, Project Members, and other users to be able to access data they shouldn't have access to (i.e. being allowed to exec into pods, set up the ability to scrape namespaces outside of a given Project, etc.).
### Configure rancher-monitoring to only watch for resources created by the Helm chart itself
Since each Project Monitoring Stack will watch the other namespaces and collect additional custom workload metrics or dashboards already, it's recommended to configure the following settings on all selectors to ensure that the Cluster Prometheus Stack only monitors resources created by the Helm Chart itself:
```
matchLabels:
release: "rancher-monitoring"
```
The following selector fields are recommended to have this value:
- `.Values.alertmanager.alertmanagerSpec.alertmanagerConfigSelector`
- `.Values.prometheus.prometheusSpec.serviceMonitorSelector`
- `.Values.prometheus.prometheusSpec.podMonitorSelector`
- `.Values.prometheus.prometheusSpec.ruleSelector`
- `.Values.prometheus.prometheusSpec.probeSelector`
Once this setting is turned on, you can always create ServiceMonitors or PodMonitors that are picked up by the Cluster Prometheus by adding the label `release: "rancher-monitoring"` to them (in which case they will be ignored by Project Monitoring Stacks automatically by default, even if the namespace in which those ServiceMonitors or PodMonitors reside in are not system namespaces).
> Note: If you don't want to allow users to be able to create ServiceMonitors and PodMonitors that aggregate into the Cluster Prometheus in Project namespaces, you can additionally set the namespaceSelectors on the chart to only target system namespaces (which must contain `cattle-monitoring-system` and `cattle-dashboards`, where resources are deployed into by default by rancher-monitoring; you will also need to monitor the `default` namespace to get apiserver metrics or create a custom ServiceMonitor to scrape apiserver metrics from the Service residing in the default namespace) to limit your Cluster Prometheus from picking up other Prometheus Operator CRs; in that case, it would be recommended to turn `.Values.prometheus.prometheusSpec.ignoreNamespaceSelectors=true` to allow you to define ServiceMonitors that can monitor non-system namespaces from within a system namespace.
In addition, if you modified the default `.Values.grafana.sidecar.*.searchNamespace` values on the Grafana Helm subchart for Monitoring V2, it is also recommended to remove the overrides or ensure that your defaults are scoped to only system namespaces for the following values:
- `.Values.grafana.sidecar.dashboards.searchNamespace` (default `cattle-dashboards`)
- `.Values.grafana.sidecar.datasources.searchNamespace` (default `null`, which means it uses the release namespace `cattle-monitoring-system`)
- `.Values.grafana.sidecar.plugins.searchNamespace` (default `null`, which means it uses the release namespace `cattle-monitoring-system`)
- `.Values.grafana.sidecar.notifiers.searchNamespace` (default `null`, which means it uses the release namespace `cattle-monitoring-system`)
### Increase the CPU / memory limits of the Cluster Prometheus
Depending on a cluster's setup, it's generally recommended to give a large amount of dedicated memory to the Cluster Prometheus to avoid restarts due to out-of-memory errors (OOMKilled), usually caused by churn created in the cluster that causes a large number of high cardinality metrics to be generated and ingested by Prometheus within one block of time; this is one of the reasons why the default Rancher Monitoring stack expects around 4GB of RAM to be able to operate in a normal-sized cluster. However, when introducing Project Monitoring Stacks that are all sending `/federate` requests to the same Cluster Prometheus and are reliant on the Cluster Prometheus being "up" to federate that system data on their namespaces, it's even more important that the Cluster Prometheus has an ample amount of CPU / memory assigned to it to prevent an outage that can cause data gaps across all Project Prometheis in the cluster.
> Note: There are no specific recommendations on how much memory the Cluster Prometheus should be configured with since it depends entirely on the user's setup (namely the likelihood of encountering a high churn rate and the scale of metrics that could be generated at that time); it generally varies per setup.
## How does the operator work?
1. On deploying this chart, users can create ProjectHelmCharts CRs with `spec.helmApiVersion` set to `monitoring.cattle.io/v1alpha1` (also known as "Project Monitors" in the Rancher UI) in a **Project Registration Namespace (`cattle-project-<id>`)**.
2. On seeing each ProjectHelmChartCR, the operator will automatically deploy a Project Prometheus stack on the Project Owner's behalf in the **Project Release Namespace (`cattle-project-<id>-monitoring`)** based on a HelmChart CR and a HelmRelease CR automatically created by the ProjectHelmChart controller in the **Operator / System Namespace**.
3. RBAC will automatically be assigned in the Project Release Namespace to allow users to view the Prometheus, Alertmanager, and Grafana UIs of the Project Monitoring Stack deployed; this will be based on RBAC defined on the Project Registration Namespace against the [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) (see below for more information about configuring RBAC).
### What is a Project?
In Prometheus Federator, a Project is a group of namespaces that can be identified by a `metav1.LabelSelector`; by default, the label used to identify projects is `field.cattle.io/projectId`, the label used to identify namespaces that are contained within a given [Rancher](https://rancher.com/) Project.
### Configuring the Helm release created by a ProjectHelmChart
The `spec.values` of this ProjectHelmChart resources will correspond to the `values.yaml` override to be supplied to the underlying Helm chart deployed by the operator on the user's behalf; to see the underlying chart's `values.yaml` spec, either:
- View to the chart's definition located at [`rancher/prometheus-federator` under `charts/rancher-project-monitoring`](https://github.com/rancher/prometheus-federator/blob/main/charts/rancher-project-monitoring) (where the chart version will be tied to the version of this operator)
- Look for the ConfigMap named `monitoring.cattle.io.v1alpha1` that is automatically created in each Project Registration Namespace, which will contain both the `values.yaml` and `questions.yaml` that was used to configure the chart (which was embedded directly into the `prometheus-federator` binary).
### Namespaces
As a Project Operator based on [rancher/helm-project-operator](https://github.com/rancher/helm-project-operator), Prometheus Federator has three different classifications of namespaces that the operator looks out for:
1. **Operator / System Namespace**: this is the namespace that the operator is deployed into (e.g. `cattle-monitoring-system`). This namespace will contain all HelmCharts and HelmReleases for all ProjectHelmCharts watched by this operator. **Only Cluster Admins should have access to this namespace.**
2. **Project Registration Namespace (`cattle-project-<id>`)**: this is the set of namespaces that the operator watches for ProjectHelmCharts within. The RoleBindings and ClusterRoleBindings that apply to this namespace will also be the source of truth for the auto-assigned RBAC created in the Project Release Namespace (see more details below). **Project Owners (admin), Project Members (edit), and Read-Only Members (view) should have access to this namespace**.
> Note: Project Registration Namespaces will be auto-generated by the operator and imported into the Project it is tied to if `.Values.global.cattle.projectLabel` is provided (which is set to `field.cattle.io/projectId` by default); this indicates that a Project Registration Namespace should be created by the operator if at least one namespace is observed with that label. The operator will not let these namespaces be deleted unless either all namespaces with that label are gone (e.g. this is the last namespace in that project, in which case the namespace will be marked with the label `"helm.cattle.io/helm-project-operator-orphaned": "true"`, which signals that it can be deleted) or it is no longer watching that project (because the project ID was provided under `.Values.helmProjectOperator.otherSystemProjectLabelValues`, which serves as a denylist for Projects). These namespaces will also never be auto-deleted to avoid destroying user data; it is recommended that users clean up these namespaces manually if desired on creating or deleting a project
> Note: if `.Values.global.cattle.projectLabel` is not provided, the Operator / System Namespace will also be the Project Registration Namespace
3. **Project Release Namespace (`cattle-project-<id>-monitoring`)**: this is the set of namespaces that the operator deploys Project Monitoring Stacks within on behalf of a ProjectHelmChart; the operator will also automatically assign RBAC to Roles created in this namespace by the Project Monitoring Stack based on bindings found in the Project Registration Namespace. **Only Cluster Admins should have access to this namespace; Project Owners (admin), Project Members (edit), and Read-Only Members (view) will be assigned limited access to this namespace by the deployed Helm Chart and Prometheus Federator.**
> Note: Project Release Namespaces are automatically deployed and imported into the project whose ID is specified under `.Values.helmProjectOperator.projectReleaseNamespaces.labelValue` (which defaults to the value of `.Values.global.cattle.systemProjectId` if not specified) whenever a ProjectHelmChart is specified in a Project Registration Namespace
> Note: Project Release Namespaces follow the same orphaning conventions as Project Registration Namespaces (see note above)
> Note: if `.Values.projectReleaseNamespaces.enabled` is false, the Project Release Namespace will be the same as the Project Registration Namespace
### Helm Resources (HelmChart, HelmRelease)
On deploying a ProjectHelmChart, the Prometheus Federator will automatically create and manage two child custom resources that manage the underlying Helm resources in turn:
- A HelmChart CR (managed via an embedded [k3s-io/helm-contoller](https://github.com/k3s-io/helm-controller) in the operator): this custom resource automatically creates a Job in the same namespace that triggers a `helm install`, `helm upgrade`, or `helm uninstall` depending on the change applied to the HelmChart CR; this CR is automatically updated on changes to the ProjectHelmChart (e.g. modifying the values.yaml) or changes to the underlying Project definition (e.g. adding or removing namespaces from a project).
> **Important Note: If a ProjectHelmChart is not deploying or updating the underlying Project Monitoring Stack for some reason, the Job created by this resource in the Operator / System namespace should be the first place you check to see if there's something wrong with the Helm operation; however, this is generally only accessible by a Cluster Admin.**
- A HelmRelease CR (managed via an embedded [rancher/helm-locker](https://github.com/rancher/helm-locker) in the operator): this custom resource automatically locks a deployed Helm release in place and automatically overwrites updates to underlying resources unless the change happens via a Helm operation (`helm install`, `helm upgrade`, or `helm uninstall` performed by the HelmChart CR).
> Note: HelmRelease CRs emit Kubernetes Events that detect when an underlying Helm release is being modified and locks it back to place; to view these events, you can use `kubectl describe helmrelease <helm-release-name> -n <operator/system-namespace>`; you can also view the logs on this operator to see when changes are detected and which resources were attempted to be modified
Both of these resources are created for all Helm charts in the Operator / System namespaces to avoid escalation of privileges to underprivileged users.
### RBAC
As described in the section on namespaces above, Prometheus Federator expects that Project Owners, Project Members, and other users in the cluster with Project-level permissions (e.g. permissions in a certain set of namespaces identified by a single label selector) have minimal permissions in any namespaces except the Project Registration Namespace (which is imported into the project by default) and those that already comprise their projects. Therefore, in order to allow Project Owners to assign specific chart permissions to other users in their Project namespaces, the Helm Project Operator will automatically watch the following bindings:
- ClusterRoleBindings
- RoleBindings in the Project Release Namespace
On observing a change to one of those types of bindings, the Helm Project Operator will check whether the `roleRef` that the the binding points to matches a ClusterRole with the name provided under `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.admin`, `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.edit`, or `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.view`; by default, these roleRefs correspond will correspond to `admin`, `edit`, and `view` respectively, which are the [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles).
> Note: for Rancher RBAC users, these [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) directly correlate to the `Project Owner`, `Project Member`, and `Read-Only` default Project Role Templates.
If the `roleRef` matches, the Helm Project Operator will filter the `subjects` of the binding for all Users and Groups and use that to automatically construct a RoleBinding for each Role in the Project Release Namespace with the same name as the role and the following labels:
- `helm.cattle.io/project-helm-chart-role: {{ .Release.Name }}`
- `helm.cattle.io/project-helm-chart-role-aggregate-from: <admin|edit|view>`
By default, the `rancher-project-monitoring` (the underlying chart deployed by Prometheus Federator) creates three default Roles per Project Release Namespace that provide `admin`, `edit`, and `view` users to permissions to view the Prometheus, Alertmanager, and Grafana UIs of the Project Monitoring Stack to provide least privilege; however, if a Cluster Admin would like to assign additional permissions to certain users, they can either directly assign RoleBindings in the Project Release Namespace to certain users or created Roles with the above two labels on them to allow Project Owners to control assigning those RBAC roles to users in their Project Registration namespaces.
### Advanced Helm Project Operator Configuration
|Value|Configuration|
|---|---------------------------|
|`helmProjectOperator.valuesOverride`| Allows an Operator to override values that are set on each ProjectHelmChart deployment on an operator-level; user-provided options (specified on the `spec.values` of the ProjectHelmChart) are automatically overridden if operator-level values are provided. For an exmaple, see how the default value overrides `federate.targets` (note: when overriding list values like `federate.targets`, user-provided list values will **not** be concatenated) |
|`helmProjectOperator.projectReleaseNamespaces.labelValues`| The value of the Project that all Project Release Namespaces should be auto-imported into (via label and annotation). Not recommended to be overridden on a Rancher setup. |
|`helmProjectOperator.otherSystemProjectLabelValues`| Other namespaces that the operator should treat as a system namespace that should not be monitored. By default, all namespaces that match `global.cattle.systemProjectId` will not be matched. `cattle-monitoring-system`, `cattle-dashboards`, and `kube-system` are explicitly marked as system namespaces as well, regardless of label or annotation. |
|`helmProjectOperator.releaseRoleBindings.aggregate`| Whether to automatically create RBAC resources in Project Release namespaces
|`helmProjectOperator.releaseRoleBindings.clusterRoleRefs.<admin\|edit\|view>`| ClusterRoles to reference to discover subjects to create RoleBindings for in the Project Release Namespace for all corresponding Project Release Roles. See RBAC above for more information |
|`helmProjectOperator.hardenedNamespaces.enabled`| Whether to automatically patch the default ServiceAccount with `automountServiceAccountToken: false` and create a default NetworkPolicy in all managed namespaces in the cluster; the default values ensure that the creation of the namespace does not break a CIS 1.16 hardened scan |
|`helmProjectOperator.hardenedNamespaces.configuration`| The configuration to be supplied to the default ServiceAccount or auto-generated NetworkPolicy on managing a namespace |
|`helmProjectOperator.helmController.enabled`| Whether to enable an embedded k3s-io/helm-controller instance within the Helm Project Operator. Should be disabled for RKE2/K3s clusters before v1.23.14 / v1.24.8 / v1.25.4 since RKE2/K3s clusters already run Helm Controller at a cluster-wide level to manage internal Kubernetes components |
|`helmProjectOperator.helmLocker.enabled`| Whether to enable an embedded rancher/helm-locker instance within the Helm Project Operator. |

View File

@ -0,0 +1,27 @@
# Prometheus Federator
This chart deploys an operator that manages Project Monitoring Stacks composed of the following set of resources that are scoped to project namespaces:
- [Prometheus](https://prometheus.io/) (managed externally by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))
- [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (managed externally by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))
- [Grafana](https://github.com/helm/charts/tree/master/stable/grafana) (deployed via an embedded Helm chart)
- Default PrometheusRules and Grafana dashboards based on the collection of community-curated resources from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus/)
- Default ServiceMonitors that watch the deployed Prometheus, Grafana, and Alertmanager
Since this Project Monitoring Stack deploys Prometheus Operator CRs, an existing Prometheus Operator instance must already be deployed in the cluster for Prometheus Federator to successfully be able to deploy Project Monitoring Stacks. It is recommended to use [`rancher-monitoring`](https://rancher.com/docs/rancher/v2.6/en/monitoring-alerting/) for this. For more information on how the chart works or advanced configurations, please read the `README.md`.
## Upgrading to Kubernetes v1.25+
Starting in Kubernetes v1.25, [Pod Security Policies](https://kubernetes.io/docs/concepts/security/pod-security-policy/) have been removed from the Kubernetes API.
As a result, **before upgrading to Kubernetes v1.25** (or on a fresh install in a Kubernetes v1.25+ cluster), users are expected to perform an in-place upgrade of this chart with `global.cattle.psp.enabled` set to `false` if it has been previously set to `true`.
> **Note:**
> In this chart release, any previous field that was associated with any PSP resources have been removed in favor of a single global field: `global.cattle.psp.enabled`.
> **Note:**
> If you upgrade your cluster to Kubernetes v1.25+ before removing PSPs via a `helm upgrade` (even if you manually clean up resources), **it will leave the Helm release in a broken state within the cluster such that further Helm operations will not work (`helm uninstall`, `helm upgrade`, etc.).**
>
> If your charts get stuck in this state, please consult the Rancher docs on how to clean up your Helm release secrets.
Upon setting `global.cattle.psp.enabled` to false, the chart will remove any PSP resources deployed on its behalf from the cluster. This is the default setting for this chart.
As a replacement for PSPs, [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) should be used. Please consult the Rancher docs for more details on how to configure your chart release namespaces to work with the new Pod Security Admission and apply Pod Security Standards.

View File

@ -0,0 +1,15 @@
annotations:
catalog.cattle.io/certified: rancher
catalog.cattle.io/display-name: Helm Project Operator
catalog.cattle.io/kube-version: '>=1.16.0-0'
catalog.cattle.io/namespace: cattle-helm-system
catalog.cattle.io/os: linux,windows
catalog.cattle.io/permits-os: linux,windows
catalog.cattle.io/provides-gvr: helm.cattle.io.projecthelmchart/v1alpha1
catalog.cattle.io/rancher-version: '>= 2.6.0-0'
catalog.cattle.io/release-name: helm-project-operator
apiVersion: v2
appVersion: 0.2.1
description: Helm Project Operator
name: helmProjectOperator
version: 0.2.1

View File

@ -0,0 +1,77 @@
# Helm Project Operator
## How does the operator work?
1. On deploying a Helm Project Operator, users can create ProjectHelmCharts CRs with `spec.helmApiVersion` set to `dummy.cattle.io/v1alpha1` in a **Project Registration Namespace (`cattle-project-<id>`)**.
2. On seeing each ProjectHelmChartCR, the operator will automatically deploy the embedded Helm chart on the Project Owner's behalf in the **Project Release Namespace (`cattle-project-<id>-dummy`)** based on a HelmChart CR and a HelmRelease CR automatically created by the ProjectHelmChart controller in the **Operator / System Namespace**.
3. RBAC will automatically be assigned in the Project Release Namespace to allow users to based on Role created in the Project Release Namespace with a given set of labels; this will be based on RBAC defined on the Project Registration Namespace against the [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) (see below for more information about configuring RBAC).
### What is a Project?
In Helm Project Operator, a Project is a group of namespaces that can be identified by a `metav1.LabelSelector`; by default, the label used to identify projects is `field.cattle.io/projectId`, the label used to identify namespaces that are contained within a given [Rancher](https://rancher.com/) Project.
### What is a ProjectHelmChart?
A ProjectHelmChart is an instance of a (project-scoped) Helm chart deployed on behalf of a user who has permissions to create ProjectHelmChart resources in a Project Registration namespace.
Generally, the best way to think about the ProjectHelmChart model is by comparing it to two other models:
1. Managed Kubernetes providers (EKS, GKE, AKS, etc.): in this model, a user has the ability to say "I want a Kubernetes cluster" but the underlying cloud provider is responsible for provisioning the infrastructure and offering **limited view and access** of the underlying resources created on their behalf; similarly, Helm Project Operator allows a Project Owner to say "I want this Helm chart deployed", but the underlying Operator is responsible for "provisioning" (deploying) the Helm chart and offering **limited view and access** of the underlying Kubernetes resources created on their behalf (based on configuring "least-privilege" Kubernetes RBAC for the Project Owners / Members in the newly created Project Release Namespace).
2. Dynamically-provisioned Persistent Volumes: in this model, a single resource (PersistentVolume) exists that allows you to specify a Storage Class that actually implements provisioning the underlying storage via a Storage Class Provisioner (e.g. Longhorn). Similarly, the ProjectHelmChart exists that allows you to specify a `spec.helmApiVersion` ("storage class") that actually implements deploying the underlying Helm chart via a Helm Project Operator (e.g. [`rancher/prometheus-federator`](https://github.com/rancher/prometheus-federator)).
### Configuring the Helm release created by a ProjectHelmChart
The `spec.values` of this ProjectHelmChart resources will correspond to the `values.yaml` override to be supplied to the underlying Helm chart deployed by the operator on the user's behalf; to see the underlying chart's `values.yaml` spec, either:
- View to the chart's definition located at [`rancher/helm-project-operator` under `charts/example-chart`](https://github.com/rancher/helm-project-operator/blob/main/charts/example-chart) (where the chart version will be tied to the version of this operator)
- Look for the ConfigMap named `dummy.cattle.io.v1alpha1` that is automatically created in each Project Registration Namespace, which will contain both the `values.yaml` and `questions.yaml` that was used to configure the chart (which was embedded directly into the `helm-project-operator` binary).
### Namespaces
All Helm Project Operators have three different classifications of namespaces that the operator looks out for:
1. **Operator / System Namespace**: this is the namespace that the operator is deployed into (e.g. `cattle-helm-system`). This namespace will contain all HelmCharts and HelmReleases for all ProjectHelmCharts watched by this operator. **Only Cluster Admins should have access to this namespace.**
2. **Project Registration Namespace (`cattle-project-<id>`)**: this is the set of namespaces that the operator watches for ProjectHelmCharts within. The RoleBindings and ClusterRoleBindings that apply to this namespace will also be the source of truth for the auto-assigned RBAC created in the Project Release Namespace (see more details below). **Project Owners (admin), Project Members (edit), and Read-Only Members (view) should have access to this namespace**.
> Note: Project Registration Namespaces will be auto-generated by the operator and imported into the Project it is tied to if `.Values.global.cattle.projectLabel` is provided (which is set to `field.cattle.io/projectId` by default); this indicates that a Project Registration Namespace should be created by the operator if at least one namespace is observed with that label. The operator will not let these namespaces be deleted unless either all namespaces with that label are gone (e.g. this is the last namespace in that project, in which case the namespace will be marked with the label `"helm.cattle.io/helm-project-operator-orphaned": "true"`, which signals that it can be deleted) or it is no longer watching that project (because the project ID was provided under `.Values.helmProjectOperator.otherSystemProjectLabelValues`, which serves as a denylist for Projects). These namespaces will also never be auto-deleted to avoid destroying user data; it is recommended that users clean up these namespaces manually if desired on creating or deleting a project
> Note: if `.Values.global.cattle.projectLabel` is not provided, the Operator / System Namespace will also be the Project Registration Namespace
3. **Project Release Namespace (`cattle-project-<id>-dummy`)**: this is the set of namespaces that the operator deploys Helm charts within on behalf of a ProjectHelmChart; the operator will also automatically assign RBAC to Roles created in this namespace by the Helm charts based on bindings found in the Project Registration Namespace. **Only Cluster Admins should have access to this namespace; Project Owners (admin), Project Members (edit), and Read-Only Members (view) will be assigned limited access to this namespace by the deployed Helm Chart and Helm Project Operator.**
> Note: Project Release Namespaces are automatically deployed and imported into the project whose ID is specified under `.Values.helmProjectOperator.projectReleaseNamespaces.labelValue` (which defaults to the value of `.Values.global.cattle.systemProjectId` if not specified) whenever a ProjectHelmChart is specified in a Project Registration Namespace
> Note: Project Release Namespaces follow the same orphaning conventions as Project Registration Namespaces (see note above)
> Note: if `.Values.projectReleaseNamespaces.enabled` is false, the Project Release Namespace will be the same as the Project Registration Namespace
### Helm Resources (HelmChart, HelmRelease)
On deploying a ProjectHelmChart, the Helm Project Operator will automatically create and manage two child custom resources that manage the underlying Helm resources in turn:
- A HelmChart CR (managed via an embedded [k3s-io/helm-contoller](https://github.com/k3s-io/helm-controller) in the operator): this custom resource automatically creates a Job in the same namespace that triggers a `helm install`, `helm upgrade`, or `helm uninstall` depending on the change applied to the HelmChart CR; this CR is automatically updated on changes to the ProjectHelmChart (e.g. modifying the values.yaml) or changes to the underlying Project definition (e.g. adding or removing namespaces from a project).
> **Important Note: If a ProjectHelmChart is not deploying or updating the underlying Project Monitoring Stack for some reason, the Job created by this resource in the Operator / System namespace should be the first place you check to see if there's something wrong with the Helm operation; however, this is generally only accessible by a Cluster Admin.**
- A HelmRelease CR (managed via an embedded [rancher/helm-locker](https://github.com/rancher/helm-locker) in the operator): this custom resource automatically locks a deployed Helm release in place and automatically overwrites updates to underlying resources unless the change happens via a Helm operation (`helm install`, `helm upgrade`, or `helm uninstall` performed by the HelmChart CR).
> Note: HelmRelease CRs emit Kubernetes Events that detect when an underlying Helm release is being modified and locks it back to place; to view these events, you can use `kubectl describe helmrelease <helm-release-name> -n <operator/system-namespace>`; you can also view the logs on this operator to see when changes are detected and which resources were attempted to be modified
Both of these resources are created for all Helm charts in the Operator / System namespaces to avoid escalation of privileges to underprivileged users.
### RBAC
As described in the section on namespaces above, Helm Project Operator expects that Project Owners, Project Members, and other users in the cluster with Project-level permissions (e.g. permissions in a certain set of namespaces identified by a single label selector) have minimal permissions in any namespaces except the Project Registration Namespace (which is imported into the project by default) and those that already comprise their projects. Therefore, in order to allow Project Owners to assign specific chart permissions to other users in their Project namespaces, the Helm Project Operator will automatically watch the following bindings:
- ClusterRoleBindings
- RoleBindings in the Project Release Namespace
On observing a change to one of those types of bindings, the Helm Project Operator will check whether the `roleRef` that the the binding points to matches a ClusterRole with the name provided under `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.admin`, `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.edit`, or `helmProjectOperator.releaseRoleBindings.clusterRoleRefs.view`; by default, these roleRefs correspond will correspond to `admin`, `edit`, and `view` respectively, which are the [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles).
> Note: for Rancher RBAC users, these [default Kubernetes user-facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) directly correlate to the `Project Owner`, `Project Member`, and `Read-Only` default Project Role Templates.
If the `roleRef` matches, the Helm Project Operator will filter the `subjects` of the binding for all Users and Groups and use that to automatically construct a RoleBinding for each Role in the Project Release Namespace with the same name as the role and the following labels:
- `helm.cattle.io/project-helm-chart-role: {{ .Release.Name }}`
- `helm.cattle.io/project-helm-chart-role-aggregate-from: <admin|edit|view>`
By default, the `example-chart` (the underlying chart deployed by Helm Project Operator) does not create any default roles; however, if a Cluster Admin would like to assign additional permissions to certain users, they can either directly assign RoleBindings in the Project Release Namespace to certain users or created Roles with the above two labels on them to allow Project Owners to control assigning those RBAC roles to users in their Project Registration namespaces.
### Advanced Helm Project Operator Configuration
|Value|Configuration|
|---|---------------------------|
|`valuesOverride`| Allows an Operator to override values that are set on each ProjectHelmChart deployment on an operator-level; user-provided options (specified on the `spec.values` of the ProjectHelmChart) are automatically overridden if operator-level values are provided. For an exmaple, see how the default value overrides `federate.targets` (note: when overriding list values like `federate.targets`, user-provided list values will **not** be concatenated) |
|`projectReleaseNamespaces.labelValues`| The value of the Project that all Project Release Namespaces should be auto-imported into (via label and annotation). Not recommended to be overridden on a Rancher setup. |
|`otherSystemProjectLabelValues`| Other namespaces that the operator should treat as a system namespace that should not be monitored. By default, all namespaces that match `global.cattle.systemProjectId` will not be matched. `kube-system` is explicitly marked as a system namespace as well, regardless of label or annotation. |
|`releaseRoleBindings.aggregate`| Whether to automatically create RBAC resources in Project Release namespaces
|`releaseRoleBindings.clusterRoleRefs.<admin\|edit\|view>`| ClusterRoles to reference to discover subjects to create RoleBindings for in the Project Release Namespace for all corresponding Project Release Roles. See RBAC above for more information |
|`hardenedNamespaces.enabled`| Whether to automatically patch the default ServiceAccount with `automountServiceAccountToken: false` and create a default NetworkPolicy in all managed namespaces in the cluster; the default values ensure that the creation of the namespace does not break a CIS 1.16 hardened scan |
|`hardenedNamespaces.configuration`| The configuration to be supplied to the default ServiceAccount or auto-generated NetworkPolicy on managing a namespace |
|`helmController.enabled`| Whether to enable an embedded k3s-io/helm-controller instance within the Helm Project Operator. Should be disabled for RKE2 clusters since RKE2 clusters already run Helm Controller to manage internal Kubernetes components |
|`helmLocker.enabled`| Whether to enable an embedded rancher/helm-locker instance within the Helm Project Operator. |

View File

@ -0,0 +1,20 @@
# Helm Project Operator
This chart installs the example [Helm Project Operator](https://github.com/rancher/helm-project-operator) onto your cluster.
## Upgrading to Kubernetes v1.25+
Starting in Kubernetes v1.25, [Pod Security Policies](https://kubernetes.io/docs/concepts/security/pod-security-policy/) have been removed from the Kubernetes API.
As a result, **before upgrading to Kubernetes v1.25** (or on a fresh install in a Kubernetes v1.25+ cluster), users are expected to perform an in-place upgrade of this chart with `global.cattle.psp.enabled` set to `false` if it has been previously set to `true`.
> **Note:**
> In this chart release, any previous field that was associated with any PSP resources have been removed in favor of a single global field: `global.cattle.psp.enabled`.
> **Note:**
> If you upgrade your cluster to Kubernetes v1.25+ before removing PSPs via a `helm upgrade` (even if you manually clean up resources), **it will leave the Helm release in a broken state within the cluster such that further Helm operations will not work (`helm uninstall`, `helm upgrade`, etc.).**
>
> If your charts get stuck in this state, please consult the Rancher docs on how to clean up your Helm release secrets.
Upon setting `global.cattle.psp.enabled` to false, the chart will remove any PSP resources deployed on its behalf from the cluster. This is the default setting for this chart.
As a replacement for PSPs, [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) should be used. Please consult the Rancher docs for more details on how to configure your chart release namespaces to work with the new Pod Security Admission and apply Pod Security Standards.

View File

@ -0,0 +1,43 @@
questions:
- variable: global.cattle.psp.enabled
default: "false"
description: "Flag to enable or disable the installation of PodSecurityPolicies by this chart in the target cluster. If the cluster is running Kubernetes 1.25+, you must update this value to false."
label: "Enable PodSecurityPolicies"
type: boolean
group: "Security Settings"
- variable: helmController.enabled
label: Enable Embedded Helm Controller
description: 'Note: If you are running this chart in an RKE2 cluster, this should be disabled.'
type: boolean
group: Helm Controller
- variable: helmLocker.enabled
label: Enable Embedded Helm Locker
type: boolean
group: Helm Locker
- variable: projectReleaseNamespaces.labelValue
label: Project Release Namespace Project ID
description: By default, the System Project is selected. This can be overriden to a different Project (e.g. p-xxxxx)
type: string
required: false
group: Namespaces
- variable: releaseRoleBindings.clusterRoleRefs.admin
label: Admin ClusterRole
description: By default, admin selects Project Owners. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: admin
required: false
group: RBAC
- variable: releaseRoleBindings.clusterRoleRefs.edit
label: Edit ClusterRole
description: By default, edit selects Project Members. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: edit
required: false
group: RBAC
- variable: releaseRoleBindings.clusterRoleRefs.view
label: View ClusterRole
description: By default, view selects Read-Only users. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: view
required: false
group: RBAC

View File

@ -0,0 +1,2 @@
{{ $.Chart.Name }} has been installed. Check its status by running:
kubectl --namespace {{ template "helm-project-operator.namespace" . }} get pods -l "release={{ $.Release.Name }}"

View File

@ -0,0 +1,66 @@
# Rancher
{{- define "system_default_registry" -}}
{{- if .Values.global.cattle.systemDefaultRegistry -}}
{{- printf "%s/" .Values.global.cattle.systemDefaultRegistry -}}
{{- end -}}
{{- end -}}
# Windows Support
{{/*
Windows cluster will add default taint for linux nodes,
add below linux tolerations to workloads could be scheduled to those linux nodes
*/}}
{{- define "linux-node-tolerations" -}}
- key: "cattle.io/os"
value: "linux"
effect: "NoSchedule"
operator: "Equal"
{{- end -}}
{{- define "linux-node-selector" -}}
{{- if semverCompare "<1.14-0" .Capabilities.KubeVersion.GitVersion -}}
beta.kubernetes.io/os: linux
{{- else -}}
kubernetes.io/os: linux
{{- end -}}
{{- end -}}
# Helm Project Operator
{{/* vim: set filetype=mustache: */}}
{{/* Expand the name of the chart. This is suffixed with -alertmanager, which means subtract 13 from longest 63 available */}}
{{- define "helm-project-operator.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 50 | trimSuffix "-" -}}
{{- end }}
{{/*
Allow the release namespace to be overridden for multi-namespace deployments in combined charts
*/}}
{{- define "helm-project-operator.namespace" -}}
{{- if .Values.namespaceOverride -}}
{{- .Values.namespaceOverride -}}
{{- else -}}
{{- .Release.Namespace -}}
{{- end -}}
{{- end -}}
{{/* Create chart name and version as used by the chart label. */}}
{{- define "helm-project-operator.chartref" -}}
{{- replace "+" "_" .Chart.Version | printf "%s-%s" .Chart.Name -}}
{{- end }}
{{/* Generate basic labels */}}
{{- define "helm-project-operator.labels" -}}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: "{{ replace "+" "_" .Chart.Version }}"
app.kubernetes.io/part-of: {{ template "helm-project-operator.name" . }}
chart: {{ template "helm-project-operator.chartref" . }}
release: {{ $.Release.Name | quote }}
heritage: {{ $.Release.Service | quote }}
{{- if .Values.commonLabels}}
{{ toYaml .Values.commonLabels }}
{{- end }}
{{- end -}}

View File

@ -0,0 +1,82 @@
apiVersion: batch/v1
kind: Job
metadata:
name: {{ template "helm-project-operator.name" . }}-cleanup
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
annotations:
"helm.sh/hook": pre-delete
"helm.sh/hook-delete-policy": before-hook-creation, hook-succeeded, hook-failed
spec:
template:
metadata:
name: {{ template "helm-project-operator.name" . }}-cleanup
labels: {{ include "helm-project-operator.labels" . | nindent 8 }}
app: {{ template "helm-project-operator.name" . }}
spec:
serviceAccountName: {{ template "helm-project-operator.name" . }}
{{- if .Values.cleanup.securityContext }}
securityContext: {{ toYaml .Values.cleanup.securityContext | nindent 8 }}
{{- end }}
initContainers:
- name: add-cleanup-annotations
image: {{ template "system_default_registry" . }}{{ .Values.cleanup.image.repository }}:{{ .Values.cleanup.image.tag }}
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
command:
- /bin/sh
- -c
- >
echo "Labeling all ProjectHelmCharts with helm.cattle.io/helm-project-operator-cleanup=true";
EXPECTED_HELM_API_VERSION={{ .Values.helmApiVersion }};
IFS=$'\n';
for namespace in $(kubectl get namespaces -l helm.cattle.io/helm-project-operated=true --no-headers -o=custom-columns=NAME:.metadata.name); do
for projectHelmChartAndHelmApiVersion in $(kubectl get projecthelmcharts -n ${namespace} --no-headers -o=custom-columns=NAME:.metadata.name,HELMAPIVERSION:.spec.helmApiVersion); do
projectHelmChartAndHelmApiVersion=$(echo ${projectHelmChartAndHelmApiVersion} | xargs);
projectHelmChart=$(echo ${projectHelmChartAndHelmApiVersion} | cut -d' ' -f1);
helmApiVersion=$(echo ${projectHelmChartAndHelmApiVersion} | cut -d' ' -f2);
if [[ ${helmApiVersion} != ${EXPECTED_HELM_API_VERSION} ]]; then
echo "Skipping marking ${namespace}/${projectHelmChart} with cleanup annotation since spec.helmApiVersion: ${helmApiVersion} is not ${EXPECTED_HELM_API_VERSION}";
continue;
fi;
kubectl label projecthelmcharts -n ${namespace} ${projectHelmChart} helm.cattle.io/helm-project-operator-cleanup=true --overwrite;
done;
done;
{{- if .Values.cleanup.resources }}
resources: {{ toYaml .Values.cleanup.resources | nindent 12 }}
{{- end }}
{{- if .Values.cleanup.containerSecurityContext }}
securityContext: {{ toYaml .Values.cleanup.containerSecurityContext | nindent 12 }}
{{- end }}
containers:
- name: ensure-subresources-deleted
image: {{ template "system_default_registry" . }}{{ .Values.cleanup.image.repository }}:{{ .Values.cleanup.image.tag }}
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- >
SYSTEM_NAMESPACE={{ .Release.Namespace }}
EXPECTED_HELM_API_VERSION={{ .Values.helmApiVersion }};
HELM_API_VERSION_TRUNCATED=$(echo ${EXPECTED_HELM_API_VERSION} | cut -d'/' -f0);
echo "Ensuring HelmCharts and HelmReleases are deleted from ${SYSTEM_NAMESPACE}...";
while [[ "$(kubectl get helmcharts,helmreleases -l helm.cattle.io/helm-api-version=${HELM_API_VERSION_TRUNCATED} -n ${SYSTEM_NAMESPACE} 2>&1)" != "No resources found in ${SYSTEM_NAMESPACE} namespace." ]]; do
echo "waiting for HelmCharts and HelmReleases to be deleted from ${SYSTEM_NAMESPACE}... sleeping 3 seconds";
sleep 3;
done;
echo "Successfully deleted all HelmCharts and HelmReleases in ${SYSTEM_NAMESPACE}!";
{{- if .Values.cleanup.resources }}
resources: {{ toYaml .Values.cleanup.resources | nindent 12 }}
{{- end }}
{{- if .Values.cleanup.containerSecurityContext }}
securityContext: {{ toYaml .Values.cleanup.containerSecurityContext | nindent 12 }}
{{- end }}
restartPolicy: OnFailure
nodeSelector: {{ include "linux-node-selector" . | nindent 8 }}
{{- if .Values.cleanup.nodeSelector }}
{{- toYaml .Values.cleanup.nodeSelector | nindent 8 }}
{{- end }}
tolerations: {{ include "linux-node-tolerations" . | nindent 8 }}
{{- if .Values.cleanup.tolerations }}
{{- toYaml .Values.cleanup.tolerations | nindent 8 }}
{{- end }}

View File

@ -0,0 +1,57 @@
{{- if and .Values.global.rbac.create .Values.global.rbac.userRoles.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ template "helm-project-operator.name" . }}-admin
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
{{- if .Values.global.rbac.userRoles.aggregateToDefaultRoles }}
rbac.authorization.k8s.io/aggregate-to-admin: "true"
{{- end }}
rules:
- apiGroups:
- helm.cattle.io
resources:
- projecthelmcharts
- projecthelmcharts/finalizers
- projecthelmcharts/status
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ template "helm-project-operator.name" . }}-edit
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
{{- if .Values.global.rbac.userRoles.aggregateToDefaultRoles }}
rbac.authorization.k8s.io/aggregate-to-edit: "true"
{{- end }}
rules:
- apiGroups:
- helm.cattle.io
resources:
- projecthelmcharts
- projecthelmcharts/status
verbs:
- 'get'
- 'list'
- 'watch'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ template "helm-project-operator.name" . }}-view
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
{{- if .Values.global.rbac.userRoles.aggregateToDefaultRoles }}
rbac.authorization.k8s.io/aggregate-to-view: "true"
{{- end }}
rules:
- apiGroups:
- helm.cattle.io
resources:
- projecthelmcharts
- projecthelmcharts/status
verbs:
- 'get'
- 'list'
- 'watch'
{{- end }}

View File

@ -0,0 +1,14 @@
## Note: If you add another entry to this ConfigMap, make sure a corresponding env var is set
## in the deployment of the operator to ensure that a Helm upgrade will force the operator
## to reload the values in the ConfigMap and redeploy
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "helm-project-operator.name" . }}-config
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
data:
hardened.yaml: |-
{{ .Values.hardenedNamespaces.configuration | toYaml | indent 4 }}
values.yaml: |-
{{ .Values.valuesOverride | toYaml | indent 4 }}

View File

@ -0,0 +1,126 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "helm-project-operator.name" . }}
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
spec:
{{- if .Values.replicas }}
replicas: {{ .Values.replicas }}
{{- end }}
selector:
matchLabels:
app: {{ template "helm-project-operator.name" . }}
release: {{ $.Release.Name | quote }}
template:
metadata:
labels: {{ include "helm-project-operator.labels" . | nindent 8 }}
app: {{ template "helm-project-operator.name" . }}
spec:
containers:
- name: {{ template "helm-project-operator.name" . }}
image: "{{ template "system_default_registry" . }}{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
args:
- {{ template "helm-project-operator.name" . }}
- --namespace={{ template "helm-project-operator.namespace" . }}
- --controller-name={{ template "helm-project-operator.name" . }}
- --values-override-file=/etc/helmprojectoperator/config/values.yaml
{{- if .Values.global.cattle.systemDefaultRegistry }}
- --system-default-registry={{ .Values.global.cattle.systemDefaultRegistry }}
{{- end }}
{{- if .Values.global.cattle.url }}
- --cattle-url={{ .Values.global.cattle.url }}
{{- end }}
{{- if .Values.global.cattle.projectLabel }}
- --project-label={{ .Values.global.cattle.projectLabel }}
{{- end }}
{{- if not .Values.projectReleaseNamespaces.enabled }}
- --system-project-label-values={{ join "," (append .Values.otherSystemProjectLabelValues .Values.global.cattle.systemProjectId) }}
{{- else if and (ne (len .Values.global.cattle.systemProjectId) 0) (ne (len .Values.projectReleaseNamespaces.labelValue) 0) (ne .Values.projectReleaseNamespaces.labelValue .Values.global.cattle.systemProjectId) }}
- --system-project-label-values={{ join "," (append .Values.otherSystemProjectLabelValues .Values.global.cattle.systemProjectId) }}
{{- else if len .Values.otherSystemProjectLabelValues }}
- --system-project-label-values={{ join "," .Values.otherSystemProjectLabelValues }}
{{- end }}
{{- if .Values.projectReleaseNamespaces.enabled }}
{{- if .Values.projectReleaseNamespaces.labelValue }}
- --project-release-label-value={{ .Values.projectReleaseNamespaces.labelValue }}
{{- else if .Values.global.cattle.systemProjectId }}
- --project-release-label-value={{ .Values.global.cattle.systemProjectId }}
{{- end }}
{{- end }}
{{- if .Values.global.cattle.clusterId }}
- --cluster-id={{ .Values.global.cattle.clusterId }}
{{- end }}
{{- if .Values.releaseRoleBindings.aggregate }}
{{- if .Values.releaseRoleBindings.clusterRoleRefs }}
{{- if .Values.releaseRoleBindings.clusterRoleRefs.admin }}
- --admin-cluster-role={{ .Values.releaseRoleBindings.clusterRoleRefs.admin }}
{{- end }}
{{- if .Values.releaseRoleBindings.clusterRoleRefs.edit }}
- --edit-cluster-role={{ .Values.releaseRoleBindings.clusterRoleRefs.edit }}
{{- end }}
{{- if .Values.releaseRoleBindings.clusterRoleRefs.view }}
- --view-cluster-role={{ .Values.releaseRoleBindings.clusterRoleRefs.view }}
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.hardenedNamespaces.enabled }}
- --hardening-options-file=/etc/helmprojectoperator/config/hardening.yaml
{{- else }}
- --disable-hardening
{{- end }}
{{- if .Values.debug }}
- --debug
- --debug-level={{ .Values.debugLevel }}
{{- end }}
{{- if not .Values.helmController.enabled }}
- --disable-embedded-helm-controller
{{- else }}
- --helm-job-image={{ template "system_default_registry" . }}{{ .Values.helmController.job.image.repository }}:{{ .Values.helmController.job.image.tag }}
{{- end }}
{{- if not .Values.helmLocker.enabled }}
- --disable-embedded-helm-locker
{{- end }}
{{- if .Values.additionalArgs }}
{{- toYaml .Values.additionalArgs | nindent 10 }}
{{- end }}
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
## Note: The below two values only exist to force Helm to upgrade the deployment on
## a change to the contents of the ConfigMap during an upgrade. Neither serve
## any practical purpose and can be removed and replaced with a configmap reloader
## in a future change if dynamic updates are required.
- name: HARDENING_OPTIONS_SHA_256_HASH
value: {{ .Values.hardenedNamespaces.configuration | toYaml | sha256sum }}
- name: VALUES_OVERRIDE_SHA_256_HASH
value: {{ .Values.valuesOverride | toYaml | sha256sum }}
{{- if .Values.resources }}
resources: {{ toYaml .Values.resources | nindent 12 }}
{{- end }}
{{- if .Values.containerSecurityContext }}
securityContext: {{ toYaml .Values.containerSecurityContext | nindent 12 }}
{{- end }}
volumeMounts:
- name: config
mountPath: "/etc/helmprojectoperator/config"
serviceAccountName: {{ template "helm-project-operator.name" . }}
{{- if .Values.securityContext }}
securityContext: {{ toYaml .Values.securityContext | nindent 8 }}
{{- end }}
nodeSelector: {{ include "linux-node-selector" . | nindent 8 }}
{{- if .Values.nodeSelector }}
{{- toYaml .Values.nodeSelector | nindent 8 }}
{{- end }}
tolerations: {{ include "linux-node-tolerations" . | nindent 8 }}
{{- if .Values.tolerations }}
{{- toYaml .Values.tolerations | nindent 8 }}
{{- end }}
volumes:
- name: config
configMap:
name: {{ template "helm-project-operator.name" . }}-config

View File

@ -0,0 +1,68 @@
{{- if .Values.global.cattle.psp.enabled }}
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: {{ template "helm-project-operator.name" . }}-psp
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
{{- if .Values.global.rbac.pspAnnotations }}
annotations: {{ toYaml .Values.global.rbac.pspAnnotations | nindent 4 }}
{{- end }}
spec:
privileged: false
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Permits the container to run with root privileges as well.
rule: 'RunAsAny'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 0
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 0
max: 65535
readOnlyRootFilesystem: false
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: {{ template "helm-project-operator.name" . }}-psp
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
rules:
{{- if semverCompare "> 1.15.0-0" .Capabilities.KubeVersion.GitVersion }}
- apiGroups: ['policy']
{{- else }}
- apiGroups: ['extensions']
{{- end }}
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- {{ template "helm-project-operator.name" . }}-psp
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: {{ template "helm-project-operator.name" . }}-psp
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ template "helm-project-operator.name" . }}-psp
subjects:
- kind: ServiceAccount
name: {{ template "helm-project-operator.name" . }}
namespace: {{ template "helm-project-operator.namespace" . }}
{{- end }}

View File

@ -0,0 +1,32 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ template "helm-project-operator.name" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: "cluster-admin" # see note below
subjects:
- kind: ServiceAccount
name: {{ template "helm-project-operator.name" . }}
namespace: {{ template "helm-project-operator.namespace" . }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ template "helm-project-operator.name" . }}
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
app: {{ template "helm-project-operator.name" . }}
{{- if .Values.global.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.global.imagePullSecrets | nindent 2 }}
{{- end }}
# ---
# NOTE:
# As of now, due to the fact that the k3s-io/helm-controller can only deploy jobs that are cluster-bound to the cluster-admin
# ClusterRole, the only way for this operator to be able to perform that binding is if it is also bound to the cluster-admin ClusterRole.
#
# As a result, this ClusterRoleBinding will be left as a work-in-progress until changes are made in k3s-io/helm-controller to allow us to grant
# only scoped down permissions to the Job that is deployed.

View File

@ -0,0 +1,62 @@
{{- if .Values.systemNamespacesConfigMap.create }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "helm-project-operator.name" . }}-system-namespaces
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
data:
system-namespaces.json: |-
{
{{- if .Values.projectReleaseNamespaces.enabled }}
{{- if .Values.projectReleaseNamespaces.labelValue }}
"projectReleaseLabelValue": {{ .Values.projectReleaseNamespaces.labelValue | quote }},
{{- else if .Values.global.cattle.systemProjectId }}
"projectReleaseLabelValue": {{ .Values.global.cattle.systemProjectId | quote }},
{{- else }}
"projectReleaseLabelValue": "",
{{- end }}
{{- else }}
"projectReleaseLabelValue": "",
{{- end }}
{{- if not .Values.projectReleaseNamespaces.enabled }}
"systemProjectLabelValues": {{ append .Values.otherSystemProjectLabelValues .Values.global.cattle.systemProjectId | toJson }}
{{- else if and (ne (len .Values.global.cattle.systemProjectId) 0) (ne (len .Values.projectReleaseNamespaces.labelValue) 0) (ne .Values.projectReleaseNamespaces.labelValue .Values.global.cattle.systemProjectId) }}
"systemProjectLabelValues": {{ append .Values.otherSystemProjectLabelValues .Values.global.cattle.systemProjectId | toJson }}
{{- else if len .Values.otherSystemProjectLabelValues }}
"systemProjectLabelValues": {{ .Values.otherSystemProjectLabelValues | toJson }}
{{- else }}
"systemProjectLabelValues": []
{{- end }}
}
---
{{- if (and .Values.systemNamespacesConfigMap.rbac.enabled .Values.systemNamespacesConfigMap.rbac.subjects) }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ template "helm-project-operator.name" . }}-system-namespaces
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
- "{{ template "helm-project-operator.name" . }}-system-namespaces"
verbs:
- 'get'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ template "helm-project-operator.name" . }}-system-namespaces
namespace: {{ template "helm-project-operator.namespace" . }}
labels: {{ include "helm-project-operator.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ template "helm-project-operator.name" . }}-system-namespaces
subjects: {{ .Values.systemNamespacesConfigMap.rbac.subjects | toYaml | nindent 2 }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,7 @@
#{{- if gt (len (lookup "rbac.authorization.k8s.io/v1" "ClusterRole" "" "")) 0 -}}
#{{- if .Values.global.cattle.psp.enabled }}
#{{- if not (.Capabilities.APIVersions.Has "policy/v1beta1/PodSecurityPolicy") }}
#{{- fail "The target cluster does not have the PodSecurityPolicy API resource. Please disable PSPs in this chart before proceeding." -}}
#{{- end }}
#{{- end }}
#{{- end }}

View File

@ -0,0 +1,228 @@
# Default values for helm-project-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# Helm Project Operator Configuration
global:
cattle:
clusterId: ""
psp:
enabled: false
projectLabel: field.cattle.io/projectId
systemDefaultRegistry: ""
systemProjectId: ""
url: ""
rbac:
## Create RBAC resources for ServiceAccounts and users
##
create: true
userRoles:
## Create default user ClusterRoles to allow users to interact with ProjectHelmCharts
create: true
## Aggregate default user ClusterRoles into default k8s ClusterRoles
aggregateToDefaultRoles: true
pspAnnotations: {}
## Specify pod annotations
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
##
# seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
# seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
## Reference to one or more secrets to be used when pulling images
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
imagePullSecrets: []
# - name: "image-pull-secret"
helmApiVersion: dummy.cattle.io/v1alpha1
## valuesOverride overrides values that are set on each ProjectHelmChart deployment on an operator-level
## User-provided values will be overwritten based on the values provided here
valuesOverride: {}
## projectReleaseNamespaces are auto-generated namespaces that are created to host Helm Releases
## managed by this operator on behalf of a ProjectHelmChart
projectReleaseNamespaces:
## Enabled determines whether Project Release Namespaces should be created. If false, the underlying
## Helm release will be deployed in the Project Registration Namespace
enabled: true
## labelValue is the value of the Project that the projectReleaseNamespace should be created within
## If empty, this will be set to the value of global.cattle.systemProjectId
## If global.cattle.systemProjectId is also empty, project release namespaces will be disabled
labelValue: ""
## otherSystemProjectLabelValues are project labels that identify namespaces as those that should be treated as system projects
## i.e. they will be entirely ignored by the operator
## By default, the global.cattle.systemProjectId will be in this list
otherSystemProjectLabelValues: []
## releaseRoleBindings configures RoleBindings automatically created by the Helm Project Operator
## in Project Release Namespaces where underlying Helm charts are deployed
releaseRoleBindings:
## aggregate enables creating these RoleBindings off aggregating RoleBindings in the
## Project Registration Namespace or ClusterRoleBindings that bind users to the ClusterRoles
## specified under clusterRoleRefs
aggregate: true
## clusterRoleRefs are the ClusterRoles whose RoleBinding or ClusterRoleBindings should determine
## the RoleBindings created in the Project Release Namespace
##
## By default, these are set to create RoleBindings based on the RoleBindings / ClusterRoleBindings
## attached to the default K8s user-facing ClusterRoles of admin, edit, and view.
## ref: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles
##
clusterRoleRefs:
admin: admin
edit: edit
view: view
hardenedNamespaces:
# Whether to automatically manage the configuration of the default ServiceAccount and
# auto-create a NetworkPolicy for each namespace created by this operator
enabled: true
configuration:
# Values to be applied to each default ServiceAccount created in a managed namespace
serviceAccountSpec:
secrets: []
imagePullSecrets: []
automountServiceAccountToken: false
# Values to be applied to each default generated NetworkPolicy created in a managed namespace
networkPolicySpec:
podSelector: {}
egress: []
ingress: []
policyTypes: ["Ingress", "Egress"]
## systemNamespacesConfigMap is a ConfigMap created to allow users to see valid entries
## for registering a ProjectHelmChart for a given Project on the Rancher Dashboard UI.
## It does not need to be enabled for a non-Rancher use case.
systemNamespacesConfigMap:
## Create indicates whether the system namespaces configmap should be created
## This is a required value for integration with Rancher Dashboard
create: true
## RBAC provides options around the RBAC created to allow users to be able to view
## the systemNamespacesConfigMap; if not specified, only users with the ability to
## view ConfigMaps in the namespace where this chart is deployed will be able to
## properly view the system namespaces on the Rancher Dashboard UI
rbac:
## enabled indicates that we should deploy a RoleBinding and Role to view this ConfigMap
enabled: true
## subjects are the subjects that should be bound to this default RoleBinding
## By default, we allow anyone who is authenticated to the system to be able to view
## this ConfigMap in the deployment namespace
subjects:
- kind: Group
name: system:authenticated
nameOverride: ""
namespaceOverride: ""
replicas: 1
image:
repository: rancher/helm-project-operator
tag: v0.2.1
pullPolicy: IfNotPresent
helmController:
# Note: should be disabled for RKE2 clusters since they already run Helm Controller to manage internal Kubernetes components
enabled: true
job:
image:
repository: rancher/klipper-helm
tag: v0.7.0-build20220315
helmLocker:
enabled: true
# Additional arguments to be passed into the Helm Project Operator image
additionalArgs: []
## Define which Nodes the Pods are scheduled on.
## ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}
## Tolerations for use with node taints
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
resources: {}
# limits:
# memory: 500Mi
# cpu: 1000m
# requests:
# memory: 100Mi
# cpu: 100m
containerSecurityContext: {}
# allowPrivilegeEscalation: false
# capabilities:
# drop:
# - ALL
# privileged: false
# readOnlyRootFilesystem: true
securityContext: {}
# runAsGroup: 1000
# runAsUser: 1000
# supplementalGroups:
# - 1000
debug: false
debugLevel: 0
cleanup:
image:
repository: rancher/shell
tag: v0.1.19
pullPolicy: IfNotPresent
## Define which Nodes the Pods are scheduled on.
## ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}
## Tolerations for use with node taints
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
containerSecurityContext: {}
# allowPrivilegeEscalation: false
# capabilities:
# drop:
# - ALL
# privileged: false
# readOnlyRootFilesystem: true
securityContext:
runAsNonRoot: false
runAsUser: 0
resources: {}
# limits:
# memory: 500Mi
# cpu: 1000m
# requests:
# memory: 100Mi
# cpu: 100m

View File

@ -0,0 +1,43 @@
questions:
- variable: global.cattle.psp.enabled
default: "false"
description: "Flag to enable or disable the installation of PodSecurityPolicies by this chart in the target cluster. If the cluster is running Kubernetes 1.25+, you must update this value to false."
label: "Enable PodSecurityPolicies"
type: boolean
group: "Security Settings"
- variable: helmProjectOperator.helmController.enabled
label: Enable Embedded Helm Controller
description: 'Note: If you are running Prometheus Federator in an RKE2 / K3s cluster before v1.23.14 / v1.24.8 / v1.25.4, this should be disabled.'
type: boolean
group: Helm Controller
- variable: helmProjectOperator.helmLocker.enabled
label: Enable Embedded Helm Locker
type: boolean
group: Helm Locker
- variable: helmProjectOperator.projectReleaseNamespaces.labelValue
label: Project Release Namespace Project ID
description: By default, the System Project is selected. This can be overriden to a different Project (e.g. p-xxxxx)
type: string
required: false
group: Namespaces
- variable: helmProjectOperator.releaseRoleBindings.clusterRoleRefs.admin
label: Admin ClusterRole
description: By default, admin selects Project Owners. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: admin
required: false
group: RBAC
- variable: helmProjectOperator.releaseRoleBindings.clusterRoleRefs.edit
label: Edit ClusterRole
description: By default, edit selects Project Members. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: edit
required: false
group: RBAC
- variable: helmProjectOperator.releaseRoleBindings.clusterRoleRefs.view
label: View ClusterRole
description: By default, view selects Read-Only users. This can be overridden to a different ClusterRole (e.g. rt-xxxxx)
type: string
default: view
required: false
group: RBAC

View File

@ -0,0 +1,3 @@
{{ $.Chart.Name }} has been installed. Check its status by running:
kubectl --namespace {{ template "prometheus-federator.namespace" . }} get pods -l "release={{ $.Release.Name }}"

View File

@ -0,0 +1,66 @@
# Rancher
{{- define "system_default_registry" -}}
{{- if .Values.global.cattle.systemDefaultRegistry -}}
{{- printf "%s/" .Values.global.cattle.systemDefaultRegistry -}}
{{- end -}}
{{- end -}}
# Windows Support
{{/*
Windows cluster will add default taint for linux nodes,
add below linux tolerations to workloads could be scheduled to those linux nodes
*/}}
{{- define "linux-node-tolerations" -}}
- key: "cattle.io/os"
value: "linux"
effect: "NoSchedule"
operator: "Equal"
{{- end -}}
{{- define "linux-node-selector" -}}
{{- if semverCompare "<1.14-0" .Capabilities.KubeVersion.GitVersion -}}
beta.kubernetes.io/os: linux
{{- else -}}
kubernetes.io/os: linux
{{- end -}}
{{- end -}}
# Helm Project Operator
{{/* vim: set filetype=mustache: */}}
{{/* Expand the name of the chart. This is suffixed with -alertmanager, which means subtract 13 from longest 63 available */}}
{{- define "prometheus-federator.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 50 | trimSuffix "-" -}}
{{- end }}
{{/*
Allow the release namespace to be overridden for multi-namespace deployments in combined charts
*/}}
{{- define "prometheus-federator.namespace" -}}
{{- if .Values.namespaceOverride -}}
{{- .Values.namespaceOverride -}}
{{- else -}}
{{- .Release.Namespace -}}
{{- end -}}
{{- end -}}
{{/* Create chart name and version as used by the chart label. */}}
{{- define "prometheus-federator.chartref" -}}
{{- replace "+" "_" .Chart.Version | printf "%s-%s" .Chart.Name -}}
{{- end }}
{{/* Generate basic labels */}}
{{- define "prometheus-federator.labels" }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: "{{ replace "+" "_" .Chart.Version }}"
app.kubernetes.io/part-of: {{ template "prometheus-federator.name" . }}
chart: {{ template "prometheus-federator.chartref" . }}
release: {{ $.Release.Name | quote }}
heritage: {{ $.Release.Service | quote }}
{{- if .Values.commonLabels}}
{{ toYaml .Values.commonLabels }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,94 @@
# Default values for helm-project-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# Prometheus Federator Configuration
global:
cattle:
psp:
enabled: false
systemDefaultRegistry: ""
projectLabel: field.cattle.io/projectId
clusterId: ""
systemProjectId: ""
url: ""
rbac:
pspEnabled: true
pspAnnotations: {}
## Specify pod annotations
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
##
# seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
# seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
## Reference to one or more secrets to be used when pulling images
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
imagePullSecrets: []
# - name: "image-pull-secret"
helmProjectOperator:
enabled: true
# ensures that all resources created by subchart show up as prometheus-federator
helmApiVersion: monitoring.cattle.io/v1alpha1
nameOverride: prometheus-federator
helmController:
# Note: should be disabled for RKE2 clusters since they already run Helm Controller to manage internal Kubernetes components
enabled: true
helmLocker:
enabled: true
## valuesOverride overrides values that are set on each Project Prometheus Stack Helm Chart deployment on an operator level
## all values provided here will override any user-provided values automatically
valuesOverride:
federate:
# Change this to point at all Prometheuses you want all your Project Prometheus Stacks to federate from
# By default, this matches the default deployment of Rancher Monitoring
targets:
- rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090
image:
repository: rancher/prometheus-federator
tag: v0.3.5
pullPolicy: IfNotPresent
# Additional arguments to be passed into the Prometheus Federator image
additionalArgs: []
## Define which Nodes the Pods are scheduled on.
## ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}
## Tolerations for use with node taints
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
resources: {}
# limits:
# memory: 500Mi
# cpu: 1000m
# requests:
# memory: 100Mi
# cpu: 100m
securityContext: {}
# allowPrivilegeEscalation: false
# readOnlyRootFilesystem: true
debug: false
debugLevel: 0

View File

@ -13,14 +13,3 @@
catalog.cattle.io/release-name: prometheus-federator
apiVersion: v2
appVersion: 0.3.5
@@ -12,8 +14,9 @@
- condition: helmProjectOperator.enabled
name: helmProjectOperator
repository: file://./charts/helmProjectOperator
- version: 0.2.1
+ version: '*'
description: Prometheus Federator
icon: https://raw.githubusercontent.com/rancher/prometheus-federator/main/assets/logos/prometheus-federator.svg
+kubeVersion: '>=1.26.0-0'
name: prometheus-federator
version: 0.4.1

View File

@ -1,4 +1,4 @@
url: https://github.com/rancher/prometheus-federator.git
subdirectory: charts/prometheus-federator/0.4.0
commit: 35bebb0b19b69584cfee819511532a7198e56df4
subdirectory: charts/prometheus-federator/0.4.2
commit: 734fcff1d3e46f2eb2deb9fa4a388fa11905af12
version: 104.0.0-rc1

View File

@ -1,4 +1,4 @@
workingDir: ""
url: https://github.com/rancher/prometheus-federator.git
subdirectory: charts/rancher-project-monitoring/0.4.1/charts/grafana
subdirectory: charts/rancher-project-monitoring/0.4.2/charts/grafana
commit: cff814f807f6e86229ad940e77f3d14768cc1b86

View File

@ -0,0 +1,5 @@
root = true
[files/dashboards/*.json]
indent_size = 2
indent_style = space

View File

@ -0,0 +1,171 @@
{{/*
Generate config map data
*/}}
{{- define "grafana.configData" -}}
{{ include "grafana.assertNoLeakedSecrets" . }}
{{- $files := .Files }}
{{- $root := . -}}
{{- with .Values.plugins }}
plugins: {{ join "," . }}
{{- end }}
grafana.ini: |
{{- range $elem, $elemVal := index .Values "grafana.ini" }}
{{- if not (kindIs "map" $elemVal) }}
{{- if kindIs "invalid" $elemVal }}
{{ $elem }} =
{{- else if kindIs "string" $elemVal }}
{{ $elem }} = {{ tpl $elemVal $ }}
{{- else }}
{{ $elem }} = {{ $elemVal }}
{{- end }}
{{- end }}
{{- end }}
{{- range $key, $value := index .Values "grafana.ini" }}
{{- if kindIs "map" $value }}
[{{ $key }}]
{{- range $elem, $elemVal := $value }}
{{- if kindIs "invalid" $elemVal }}
{{ $elem }} =
{{- else if kindIs "string" $elemVal }}
{{ $elem }} = {{ tpl $elemVal $ }}
{{- else }}
{{ $elem }} = {{ $elemVal }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.datasources }}
{{- if not (hasKey $value "secret") }}
{{ $key }}: |
{{- tpl (toYaml $value | nindent 2) $root }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.notifiers }}
{{- if not (hasKey $value "secret") }}
{{ $key }}: |
{{- toYaml $value | nindent 2 }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.alerting }}
{{- if (hasKey $value "file") }}
{{ $key }}:
{{- toYaml ( $files.Get $value.file ) | nindent 2 }}
{{- else if (or (hasKey $value "secret") (hasKey $value "secretFile"))}}
{{/* will be stored inside secret generated by "configSecret.yaml"*/}}
{{- else }}
{{ $key }}: |
{{- tpl (toYaml $value | nindent 2) $root }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.dashboardProviders }}
{{ $key }}: |
{{- toYaml $value | nindent 2 }}
{{- end }}
{{- if .Values.dashboards }}
download_dashboards.sh: |
#!/usr/bin/env sh
set -euf
{{- if .Values.dashboardProviders }}
{{- range $key, $value := .Values.dashboardProviders }}
{{- range $value.providers }}
mkdir -p {{ .options.path }}
{{- end }}
{{- end }}
{{- end }}
{{ $dashboardProviders := .Values.dashboardProviders }}
{{- range $provider, $dashboards := .Values.dashboards }}
{{- range $key, $value := $dashboards }}
{{- if (or (hasKey $value "gnetId") (hasKey $value "url")) }}
curl -skf \
--connect-timeout 60 \
--max-time 60 \
{{- if not $value.b64content }}
{{- if not $value.acceptHeader }}
-H "Accept: application/json" \
{{- else }}
-H "Accept: {{ $value.acceptHeader }}" \
{{- end }}
{{- if $value.token }}
-H "Authorization: token {{ $value.token }}" \
{{- end }}
{{- if $value.bearerToken }}
-H "Authorization: Bearer {{ $value.bearerToken }}" \
{{- end }}
{{- if $value.basic }}
-H "Authorization: Basic {{ $value.basic }}" \
{{- end }}
{{- if $value.gitlabToken }}
-H "PRIVATE-TOKEN: {{ $value.gitlabToken }}" \
{{- end }}
-H "Content-Type: application/json;charset=UTF-8" \
{{- end }}
{{- $dpPath := "" -}}
{{- range $kd := (index $dashboardProviders "dashboardproviders.yaml").providers }}
{{- if eq $kd.name $provider }}
{{- $dpPath = $kd.options.path }}
{{- end }}
{{- end }}
{{- if $value.url }}
"{{ $value.url }}" \
{{- else }}
"https://grafana.com/api/dashboards/{{ $value.gnetId }}/revisions/{{- if $value.revision -}}{{ $value.revision }}{{- else -}}1{{- end -}}/download" \
{{- end }}
{{- if $value.datasource }}
{{- if kindIs "string" $value.datasource }}
| sed '/-- .* --/! s/"datasource":.*,/"datasource": "{{ $value.datasource }}",/g' \
{{- end }}
{{- if kindIs "slice" $value.datasource }}
{{- range $value.datasource }}
| sed '/-- .* --/! s/${{"{"}}{{ .name }}}/{{ .value }}/g' \
{{- end }}
{{- end }}
{{- end }}
{{- if $value.b64content }}
| base64 -d \
{{- end }}
> "{{- if $dpPath -}}{{ $dpPath }}{{- else -}}/var/lib/grafana/dashboards/{{ $provider }}{{- end -}}/{{ $key }}.json"
{{ end }}
{{- end }}
{{- end }}
{{- end }}
{{- end -}}
{{/*
Generate dashboard json config map data
*/}}
{{- define "grafana.configDashboardProviderData" -}}
provider.yaml: |-
apiVersion: 1
providers:
- name: '{{ .Values.sidecar.dashboards.provider.name }}'
orgId: {{ .Values.sidecar.dashboards.provider.orgid }}
{{- if not .Values.sidecar.dashboards.provider.foldersFromFilesStructure }}
folder: '{{ .Values.sidecar.dashboards.provider.folder }}'
{{- end }}
type: {{ .Values.sidecar.dashboards.provider.type }}
disableDeletion: {{ .Values.sidecar.dashboards.provider.disableDelete }}
allowUiUpdates: {{ .Values.sidecar.dashboards.provider.allowUiUpdates }}
updateIntervalSeconds: {{ .Values.sidecar.dashboards.provider.updateIntervalSeconds | default 30 }}
options:
foldersFromFilesStructure: {{ .Values.sidecar.dashboards.provider.foldersFromFilesStructure }}
path: {{ .Values.sidecar.dashboards.folder }}{{- with .Values.sidecar.dashboards.defaultFolderName }}/{{ . }}{{- end }}
{{- end -}}
{{- define "grafana.secretsData" -}}
{{- if and (not .Values.env.GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION) (not .Values.admin.existingSecret) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD__FILE) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD) }}
admin-user: {{ .Values.adminUser | b64enc | quote }}
{{- if .Values.adminPassword }}
admin-password: {{ .Values.adminPassword | b64enc | quote }}
{{- else }}
admin-password: {{ include "grafana.password" . }}
{{- end }}
{{- end }}
{{- if not .Values.ldap.existingSecret }}
ldap-toml: {{ tpl .Values.ldap.config $ | b64enc | quote }}
{{- end }}
{{- end -}}

View File

@ -0,0 +1,43 @@
{{- $createConfigSecret := eq (include "grafana.shouldCreateConfigSecret" .) "true" -}}
{{- if and .Values.createConfigmap $createConfigSecret }}
{{- $files := .Files }}
{{- $root := . -}}
apiVersion: v1
kind: Secret
metadata:
name: "{{ include "grafana.fullname" . }}-config-secret"
namespace: {{ include "grafana.namespace" . }}
labels:
{{- include "grafana.labels" . | nindent 4 }}
{{- with .Values.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
data:
{{- range $key, $value := .Values.alerting }}
{{- if (hasKey $value "secretFile") }}
{{- $key | nindent 2 }}:
{{- toYaml ( $files.Get $value.secretFile ) | b64enc | nindent 4}}
{{/* as of https://helm.sh/docs/chart_template_guide/accessing_files/ this will only work if you fork this chart and add files to it*/}}
{{- end }}
{{- end }}
stringData:
{{- range $key, $value := .Values.datasources }}
{{- if (hasKey $value "secret") }}
{{- $key | nindent 2 }}: |
{{- tpl (toYaml $value.secret | nindent 4) $root }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.notifiers }}
{{- if (hasKey $value "secret") }}
{{- $key | nindent 2 }}: |
{{- tpl (toYaml $value.secret | nindent 4) $root }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.alerting }}
{{ if (hasKey $value "secret") }}
{{- $key | nindent 2 }}: |
{{- tpl (toYaml $value.secret | nindent 4) $root }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,246 @@
{
"description": "Bundle",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Bundles",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 0
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_not_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_out_of_sync{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Out of Sync"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_err_applied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Err Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_pending{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Pending"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_wait_applied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Bundles",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_not_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_out_of_sync{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Out of Sync"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_err_applied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Err Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_pending{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Pending"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundle_wait_applied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Bundles",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(fleet_bundle_desired_ready, exported_namespace)",
"refresh": 2,
"type": "query"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"includeAll": true,
"name": "name",
"query": "label_values(fleet_bundle_desired_ready{exported_namespace=~\"$namespace\"}, name)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / Bundle",
"uid": "fleet-bundle"
}

View File

@ -0,0 +1,219 @@
{
"description": "BundleDeployment",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Ready\"}) / sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\"})"
}
],
"title": "Ready BundleDeployments",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 0
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Ready\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"NotReady\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"WaitApplied\"})",
"legendFormat": "Wait Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"ErrApplied\"})",
"legendFormat": "Err Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"OutOfSync\"})",
"legendFormat": "OutOfSync"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Pending\"})",
"legendFormat": "Pending"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Modified\"})",
"legendFormat": "Modified"
}
],
"title": "BundleDeployments",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Ready\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"NotReady\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"WaitApplied\"})",
"legendFormat": "Wait Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"ErrApplied\"})",
"legendFormat": "Err Applied"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"OutOfSync\"})",
"legendFormat": "OutOfSync"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Pending\"})",
"legendFormat": "Pending"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_bundledeployment_state{cluster_namespace=~\"$namespace\",state=\"Modified\"})",
"legendFormat": "Modified"
}
],
"title": "BundleDeployments",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(fleet_bundledeployment_state, cluster_namespace)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / BundleDeployment",
"uid": "fleet-bundledeployment"
}

View File

@ -0,0 +1,484 @@
{
"description": "Cluster",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_cluster_desired_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Git Repos",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 0
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_desired_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Git Repos",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_desired_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_ready_git_repos{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Git Repos",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 13
},
"id": 4,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_cluster_resources_count_desiredready{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 13
},
"id": 5,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_desiredready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_notready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_orphaned{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Orphaned"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_waitapplied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 21
},
"id": 6,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_desiredready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_notready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_orphaned{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Orphaned"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_resources_count_waitapplied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Resources",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 26
},
"id": 7,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"Ready\"}) / sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 26
},
"id": 8,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"Ready\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"NotReady\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"WaitCheckIn\"})",
"legendFormat": "Wait Check In"
}
],
"title": "Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 34
},
"id": 9,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"Ready\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"NotReady\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_state{exported_namespace=\"$namespace\",name=~\"$name\",state=\"WaitCheckIn\"})",
"legendFormat": "Wait Check In"
}
],
"title": "Clusters",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(fleet_cluster_desired_ready_git_repos, exported_namespace)",
"refresh": 2,
"type": "query"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"includeAll": true,
"name": "name",
"query": "label_values(fleet_cluster_desired_ready_git_repos{exported_namespace=~\"$namespace\"}, name)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / Cluster",
"uid": "fleet-cluster"
}

View File

@ -0,0 +1,468 @@
{
"description": "ClusterGroup",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_cluster_group_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Bundles",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 0
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Bundles",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_bundle_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_bundle_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Bundles",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 13
},
"id": 4,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "(sum(fleet_cluster_group_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"}) - sum(fleet_cluster_group_non_ready_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})) / sum(fleet_cluster_group_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 13
},
"id": 5,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Total"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_non_ready_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Non Ready"
}
],
"title": "Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 21
},
"id": 6,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Total"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_non_ready_cluster_count{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Non Ready"
}
],
"title": "Clusters",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 26
},
"id": 7,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_cluster_group_resource_count_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 26
},
"id": 8,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_notready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_orphaned{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Orphaned"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_waitapplied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 34
},
"id": 9,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_notready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_orphaned{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Orphaned"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_cluster_group_resource_count_waitapplied{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Wait Applied"
}
],
"title": "Resources",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(fleet_cluster_group_bundle_desired_ready, exported_namespace)",
"refresh": 2,
"type": "query"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"includeAll": true,
"name": "name",
"query": "label_values(fleet_cluster_group_bundle_desired_ready{exported_namespace=~\"$namespace\"}, name)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / ClusterGroup",
"uid": "fleet-cluster-group"
}

View File

@ -0,0 +1,454 @@
{
"description": "Controller Runtime",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "controller_runtime_active_workers{job=\"$job\", namespace=\"$namespace\"}",
"legendFormat": "{{controller}} {{instance}}"
}
],
"title": "Number of Workers in Use",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(rate(controller_runtime_reconcile_errors_total{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, pod)",
"legendFormat": "{{instance}} {{pod}}"
}
],
"title": "Reconciliation Error Count per Controller",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 16
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(rate(controller_runtime_reconcile_total{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, pod)",
"legendFormat": "{{instance}} {{pod}}"
}
],
"title": "Total Reconciliation Count per Controller",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 24
},
"id": 4,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "workqueue_depth{job=\"$job\", namespace=\"$namespace\"}",
"legendFormat": "{{instance}} {{pod}}"
}
],
"title": "WorkQueue Depth",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 32
},
"id": 5,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.50, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P50 {{name}}"
}
],
"title": "Seconds for Items Stay in Queue (before being requested) P50",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 40
},
"id": 6,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.90, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P90 {{name}}"
}
],
"title": "Seconds for Items Stay in Queue (before being requested) P90",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 48
},
"id": 7,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P99 {{name}}"
}
],
"title": "Seconds for Items Stay in Queue (before being requested) P99",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 56
},
"id": 8,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(rate(workqueue_adds_total{job=\"$job\", namespace=\"$namespace\"}[2m])) by (instance, name)",
"legendFormat": "{{name}} {{instance}}"
}
],
"title": "Work Queue Add Rate",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 64
},
"id": 9,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "rate(workqueue_unfinished_work_seconds{job=\"$job\", namespace=\"$namespace\"}[5m])",
"legendFormat": "{{name}} {{instance}}"
}
],
"title": "Unfinished Seconds",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 72
},
"id": 10,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.50, sum(rate(workqueue_work_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P50 {{name}}"
}
],
"title": "Seconds Processing Items from WorkQueue - 50th Percentile",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 80
},
"id": 11,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.90, sum(rate(workqueue_work_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P90 {{name}}"
}
],
"title": "Seconds Processing Items from WorkQueue - 90th Percentile",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 88
},
"id": 12,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name, le))",
"legendFormat": "P99 {{name}}"
}
],
"title": "Seconds Processing Items from WorkQueue - 99th Percentile",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": null,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 96
},
"id": 13,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(rate(workqueue_retries_total{job=\"$job\", namespace=\"$namespace\"}[5m])) by (instance, name)",
"legendFormat": "{{name}} {{instance}}"
}
],
"title": "Work Queue Retries Rate",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(controller_runtime_reconcile_total, namespace)",
"refresh": 2,
"type": "query"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "job",
"query": "label_values(controller_runtime_reconcile_total{namespace=~\"$namespace\"}, job)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / Controller-Runtime",
"uid": "fleet-controller-runtime"
}

View File

@ -0,0 +1,325 @@
{
"description": "GitRepo",
"graphTooltip": 1,
"panels": [
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 0
},
"id": 1,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_gitrepo_desired_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 0
},
"id": 2,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_desired_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Clusters",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 3,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_desired_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_ready_clusters{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
}
],
"title": "Clusters",
"type": "timeseries"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": "percentunit"
}
},
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 13
},
"id": 4,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_ready{exported_namespace=\"$namespace\",name=~\"$name\"}) / sum(fleet_gitrepo_resources_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})"
}
],
"title": "Ready Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 13
},
"id": 5,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_not_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
}
],
"title": "Resources",
"type": "stat"
},
{
"datasource": {
"type": "datasource",
"uid": "-- Mixed --"
},
"fieldConfig": {
"defaults": {
"decimals": 0,
"unit": null
}
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 21
},
"id": 6,
"pluginVersion": "v11.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_desired_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Desired Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_not_ready{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Not Ready"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_missing{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Missing"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_modified{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Modified"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(fleet_gitrepo_resources_unknown{exported_namespace=\"$namespace\",name=~\"$name\"})",
"legendFormat": "Unknown"
}
],
"title": "Resources",
"type": "timeseries"
}
],
"schemaVersion": 39,
"templating": {
"list": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"name": "namespace",
"query": "label_values(fleet_gitrepo_desired_ready_clusters, exported_namespace)",
"refresh": 2,
"type": "query"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"includeAll": true,
"name": "name",
"query": "label_values(fleet_gitrepo_desired_ready_clusters{exported_namespace=~\"$namespace\"}, name)",
"refresh": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timezone": "utc",
"title": "Fleet / GitRepo",
"uid": "fleet-gitrepo"
}

View File

@ -0,0 +1,4 @@
{{ range .Values.extraManifests }}
---
{{ tpl (toYaml .) $ }}
{{ end }}

View File

@ -0,0 +1,15 @@
{{- if and .Values.prometheus.enabled .Values.prometheus.prometheusSpec.thanos .Values.prometheus.prometheusSpec.thanos.objectStorageConfig}}
{{- if and .Values.prometheus.prometheusSpec.thanos.objectStorageConfig.secret (not .Values.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret) }}
apiVersion: v1
kind: Secret
metadata:
name: {{ template "project-prometheus-stack.fullname" . }}-prometheus
namespace: {{ template "project-prometheus-stack.namespace" . }}
labels:
app: {{ template "project-prometheus-stack.name" . }}-prometheus
app.kubernetes.io/component: prometheus
{{ include "project-prometheus-stack.labels" . | indent 4 }}
data:
object-storage-configs.yaml: {{ toYaml .Values.prometheus.prometheusSpec.thanos.objectStorageConfig.secret | b64enc | quote }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,7 @@
--- charts-original/.helmignore
+++ charts/.helmignore
@@ -26,3 +26,4 @@
kube-prometheus-*.tgz
unittests/
+files/dashboards/

View File

@ -1,6 +1,6 @@
--- charts-original/Chart.yaml
+++ charts/Chart.yaml
@@ -10,11 +10,11 @@
@@ -10,39 +10,42 @@
catalog.cattle.io/certified: rancher
catalog.cattle.io/deploys-on-os: windows
catalog.cattle.io/display-name: Monitoring
@ -14,11 +14,49 @@
catalog.cattle.io/release-name: rancher-monitoring
catalog.cattle.io/requests-cpu: 4500m
catalog.cattle.io/requests-memory: 4000Mi
@@ -37,6 +37,7 @@
catalog.cattle.io/type: cluster-tool
catalog.cattle.io/ui-component: monitoring
- catalog.cattle.io/upstream-version: 45.31.1
+ catalog.cattle.io/upstream-version: 57.0.3
apiVersion: v2
-appVersion: v0.65.1
+appVersion: v0.72.0
dependencies:
- condition: grafana.enabled
name: grafana
repository: file://./charts/grafana
- version: 6.59.0
-description: Collects several related Helm charts, Grafana dashboards, and Prometheus
- rules combined with documentation and scripts to provide easy to operate end-to-end
- Kubernetes cluster monitoring with Prometheus. Depends on the existence of a Cluster
- Prometheus deployed via Prometheus Operator
+description: kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards,
+ and Prometheus rules combined with documentation and scripts to provide easy to
+ operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus
+ Operator.
home: https://github.com/prometheus-operator/kube-prometheus
-icon: https://raw.githubusercontent.com/prometheus/prometheus.github.io/master/assets/prometheus_logo-cb55bb5c346.png
+icon: file://assets/logos/rancher-monitoring.png
keywords:
+- operator
- prometheus
- monitoring
+kubeVersion: '>=1.26.0-0'
-- monitoring
+- kube-prometheus
+kubeVersion: '>=1.19.0-0'
maintainers:
- email: arvind.iyengar@suse.com
name: Arvind
-- email: arvind.iyengar@suse.com
- name: Arvind
-- email: amangeet.samra@suse.com
- name: Geet
- url: https://github.com/geethub97
+- email: alexandre.lamarre@suse.com
+ name: Alexandre
+- email: joshua.meranda@suse.com
+ name: Joshua
name: rancher-project-monitoring
+sources:
+- https://github.com/prometheus-community/helm-charts
+- https://github.com/prometheus-operator/kube-prometheus
type: application
-version: 0.4.1
+version: 0.4.2

View File

@ -1,6 +1,12 @@
--- charts-original/charts/grafana/Chart.yaml
+++ charts/charts/grafana/Chart.yaml
@@ -6,7 +6,7 @@
@@ -1,25 +1,25 @@
annotations:
- artifacthub.io/license: AGPL-3.0-only
+ artifacthub.io/license: Apache-2.0
artifacthub.io/links: |
- name: Chart Source
url: https://github.com/grafana/helm-charts
- name: Upstream Project
url: https://github.com/grafana/grafana
catalog.cattle.io/hidden: "true"
@ -9,7 +15,15 @@
catalog.cattle.io/os: linux
catalog.rancher.io/certified: rancher
catalog.rancher.io/namespace: cattle-monitoring-system
@@ -19,7 +19,7 @@
catalog.rancher.io/release-name: rancher-grafana
apiVersion: v2
-appVersion: 10.1.0
+appVersion: 10.4.1
description: The leading tool for querying and visualizing time series and metrics.
-home: https://grafana.net
-icon: https://raw.githubusercontent.com/grafana/grafana/master/public/img/logo_transparent_400x.png
+home: https://grafana.com
+icon: https://artifacthub.io/image/b4fed1a7-6c8f-4945-b99d-096efa3e4116
keywords:
- monitoring
- metric
@ -18,3 +32,9 @@
maintainers:
- email: zanhsieh@gmail.com
name: zanhsieh
@@ -36,4 +36,4 @@
- https://github.com/grafana/grafana
- https://github.com/grafana/helm-charts
type: application
-version: 6.59.0
+version: 7.3.11

View File

@ -0,0 +1,270 @@
--- charts-original/charts/grafana/README.md
+++ charts/charts/grafana/README.md
@@ -46,6 +46,13 @@
This version requires Helm >= 3.1.0.
+### To 7.0.0
+
+For consistency with other Helm charts, the `global.image.registry` parameter was renamed
+to `global.imageRegistry`. If you were not previously setting `global.image.registry`, no action
+is required on upgrade. If you were previously setting `global.image.registry`, you will
+need to instead set `global.imageRegistry`.
+
## Configuration
| Parameter | Description | Default |
@@ -59,6 +66,7 @@
| `readinessProbe` | Readiness Probe settings | `{ "httpGet": { "path": "/api/health", "port": 3000 } }`|
| `securityContext` | Deployment securityContext | `{"runAsUser": 472, "runAsGroup": 472, "fsGroup": 472}` |
| `priorityClassName` | Name of Priority Class to assign pods | `nil` |
+| `image.registry` | Image registry | `docker.io` |
| `image.repository` | Image repository | `grafana/grafana` |
| `image.tag` | Overrides the Grafana image tag whose default is the chart appVersion (`Must be >= 5.0.0`) | `` |
| `image.sha` | Image sha (optional) | `` |
@@ -77,6 +85,7 @@
| `service.loadBalancerIP` | IP address to assign to load balancer (if supported) | `nil` |
| `service.loadBalancerSourceRanges` | list of IP CIDRs allowed access to lb (if supported) | `[]` |
| `service.externalIPs` | service external IP addresses | `[]` |
+| `service.externalTrafficPolicy` | change the default externalTrafficPolicy | `nil` |
| `headlessService` | Create a headless service | `false` |
| `extraExposePorts` | Additional service ports for sidecar containers| `[]` |
| `hostAliases` | adds rules to the pod's /etc/hosts | `[]` |
@@ -86,7 +95,7 @@
| `ingress.path` | Ingress accepted path | `/` |
| `ingress.pathType` | Ingress type of path | `Prefix` |
| `ingress.hosts` | Ingress accepted hostnames | `["chart-example.local"]` |
-| `ingress.extraPaths` | Ingress extra paths to prepend to every host configuration. Useful when configuring [custom actions with AWS ALB Ingress Controller](https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#actions). Requires `ingress.hosts` to have one or more host entries. | `[]` |
+| `ingress.extraPaths` | Ingress extra paths to prepend to every host configuration. Useful when configuring [custom actions with AWS ALB Ingress Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/guide/ingress/annotations/#actions). Requires `ingress.hosts` to have one or more host entries. | `[]` |
| `ingress.tls` | Ingress TLS configuration | `[]` |
| `ingress.ingressClassName` | Ingress Class Name. MAY be required for Kubernetes versions >= 1.18 | `""` |
| `resources` | CPU/Memory resource requests/limits | `{}` |
@@ -111,6 +120,7 @@
| `persistence.inMemory.enabled` | If persistence is not enabled, whether to mount the local storage in-memory to improve performance | `false` |
| `persistence.inMemory.sizeLimit` | SizeLimit for the in-memory local storage | `nil` |
| `initChownData.enabled` | If false, don't reset data ownership at startup | true |
+| `initChownData.image.registry` | init-chown-data container image registry | `docker.io` |
| `initChownData.image.repository` | init-chown-data container image repository | `busybox` |
| `initChownData.image.tag` | init-chown-data container image tag | `1.31.1` |
| `initChownData.image.sha` | init-chown-data container image sha (optional)| `""` |
@@ -126,6 +136,8 @@
| `enableServiceLinks` | Inject Kubernetes services as environment variables. | `true` |
| `extraSecretMounts` | Additional grafana server secret mounts | `[]` |
| `extraVolumeMounts` | Additional grafana server volume mounts | `[]` |
+| `extraVolumes` | Additional Grafana server volumes | `[]` |
+| `automountServiceAccountToken` | Mounted the service account token on the grafana pod. Mandatory, if sidecars are enabled | `true` |
| `createConfigmap` | Enable creating the grafana configmap | `true` |
| `extraConfigmapMounts` | Additional grafana server configMap volume mounts (values are templated) | `[]` |
| `extraEmptyDirMounts` | Additional grafana server emptyDir volume mounts | `[]` |
@@ -137,6 +149,7 @@
| `dashboards` | Dashboards to import | `{}` |
| `dashboardsConfigMaps` | ConfigMaps reference that contains dashboards | `{}` |
| `grafana.ini` | Grafana's primary configuration | `{}` |
+| `global.imageRegistry` | Global image pull registry for all images. | `null` |
| `global.imagePullSecrets` | Global image pull secrets (can be templated). Allows either an array of {name: pullSecret} maps (k8s-style), or an array of strings (more common helm-style). | `[]` |
| `ldap.enabled` | Enable LDAP authentication | `false` |
| `ldap.existingSecret` | The name of an existing secret containing the `ldap.toml` file, this must have the key `ldap-toml`. | `""` |
@@ -147,8 +160,9 @@
| `podLabels` | Pod labels | `{}` |
| `podPortName` | Name of the grafana port on the pod | `grafana` |
| `lifecycleHooks` | Lifecycle hooks for podStart and preStop [Example](https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers) | `{}` |
-| `sidecar.image.repository` | Sidecar image repository | `quay.io/kiwigrid/k8s-sidecar` |
-| `sidecar.image.tag` | Sidecar image tag | `1.24.6` |
+| `sidecar.image.registry` | Sidecar image registry | `quay.io` |
+| `sidecar.image.repository` | Sidecar image repository | `kiwigrid/k8s-sidecar` |
+| `sidecar.image.tag` | Sidecar image tag | `1.26.0` |
| `sidecar.image.sha` | Sidecar image sha (optional) | `""` |
| `sidecar.imagePullPolicy` | Sidecar image pull policy | `IfNotPresent` |
| `sidecar.resources` | Sidecar resources | `{}` |
@@ -162,7 +176,7 @@
| `sidecar.alerts.resource` | Should the sidecar looks into secrets, configmaps or both. | `both` |
| `sidecar.alerts.reloadURL` | Full url of datasource configuration reload API endpoint, to invoke after a config-map change | `"http://localhost:3000/api/admin/provisioning/alerting/reload"` |
| `sidecar.alerts.skipReload` | Enabling this omits defining the REQ_URL and REQ_METHOD environment variables | `false` |
-| `sidecar.alerts.initDatasources` | Set to true to deploy the datasource sidecar as an initContainer in addition to a container. This is needed if skipReload is true, to load any alerts defined at startup time. | `false` |
+| `sidecar.alerts.initAlerts` | Set to true to deploy the alerts sidecar as an initContainer. This is needed if skipReload is true, to load any alerts defined at startup time. | `false` |
| `sidecar.alerts.extraMounts` | Additional alerts sidecar volume mounts. | `[]` |
| `sidecar.dashboards.enabled` | Enables the cluster wide search for dashboards and adds/updates/deletes them in grafana | `false` |
| `sidecar.dashboards.SCProvider` | Enables creation of sidecar provider | `true` |
@@ -210,7 +224,7 @@
| `admin.existingSecret` | The name of an existing secret containing the admin credentials (can be templated). | `""` |
| `admin.userKey` | The key in the existing admin secret containing the username. | `"admin-user"` |
| `admin.passwordKey` | The key in the existing admin secret containing the password. | `"admin-password"` |
-| `serviceAccount.autoMount` | Automount the service account token in the pod| `true` |
+| `serviceAccount.automountServiceAccountToken` | Automount the service account token on all pods where is service account is used | `false` |
| `serviceAccount.annotations` | ServiceAccount annotations | |
| `serviceAccount.create` | Create service account | `true` |
| `serviceAccount.labels` | ServiceAccount labels | `{}` |
@@ -226,14 +240,16 @@
| `command` | Define command to be executed by grafana container at startup | `nil` |
| `args` | Define additional args if command is used | `nil` |
| `testFramework.enabled` | Whether to create test-related resources | `true` |
-| `testFramework.image` | `test-framework` image repository. | `bats/bats` |
-| `testFramework.tag` | `test-framework` image tag. | `v1.4.1` |
+| `testFramework.image.registry` | `test-framework` image registry. | `docker.io` |
+| `testFramework.image.repository` | `test-framework` image repository. | `bats/bats` |
+| `testFramework.image.tag` | `test-framework` image tag. | `v1.4.1` |
| `testFramework.imagePullPolicy` | `test-framework` image pull policy. | `IfNotPresent` |
| `testFramework.securityContext` | `test-framework` securityContext | `{}` |
| `downloadDashboards.env` | Environment variables to be passed to the `download-dashboards` container | `{}` |
| `downloadDashboards.envFromSecret` | Name of a Kubernetes secret (must be manually created in the same namespace) containing values to be added to the environment. Can be templated | `""` |
| `downloadDashboards.resources` | Resources of `download-dashboards` container | `{}` |
-| `downloadDashboardsImage.repository` | Curl docker image repo | `curlimages/curl` |
+| `downloadDashboardsImage.registry` | Curl docker image registry | `docker.io` |
+| `downloadDashboardsImage.repository` | Curl docker image repository | `curlimages/curl` |
| `downloadDashboardsImage.tag` | Curl docker image tag | `7.73.0` |
| `downloadDashboardsImage.sha` | Curl docker image sha (optional) | `""` |
| `downloadDashboardsImage.pullPolicy` | Curl docker image pull policy | `IfNotPresent` |
@@ -250,6 +266,7 @@
| `serviceMonitor.metricRelabelings` | MetricRelabelConfigs to apply to samples before ingestion. | `[]` |
| `revisionHistoryLimit` | Number of old ReplicaSets to retain | `10` |
| `imageRenderer.enabled` | Enable the image-renderer deployment & service | `false` |
+| `imageRenderer.image.registry` | image-renderer Image registry | `docker.io` |
| `imageRenderer.image.repository` | image-renderer Image repository | `grafana/grafana-image-renderer` |
| `imageRenderer.image.tag` | image-renderer Image tag | `latest` |
| `imageRenderer.image.sha` | image-renderer Image sha (optional) | `""` |
@@ -258,6 +275,7 @@
| `imageRenderer.envValueFrom` | Environment variables for image-renderer from alternate sources. See the API docs on [EnvVarSource](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#envvarsource-v1-core) for format details. Can be templated | `{}` |
| `imageRenderer.serviceAccountName` | image-renderer deployment serviceAccountName | `""` |
| `imageRenderer.securityContext` | image-renderer deployment securityContext | `{}` |
+| `imageRenderer.podAnnotations ` | image-renderer image-renderer pod annotation | `{}` |
| `imageRenderer.hostAliases` | image-renderer deployment Host Aliases | `[]` |
| `imageRenderer.priorityClassName` | image-renderer deployment priority class | `''` |
| `imageRenderer.service.enabled` | Enable the image-renderer service | `true` |
@@ -299,24 +317,35 @@
path: "/grafana"
```
-### Example of extraVolumeMounts
+### Example of extraVolumeMounts and extraVolumes
+
+Configure additional volumes with `extraVolumes` and volume mounts with `extraVolumeMounts`.
-Volume can be type persistentVolumeClaim or hostPath but not both at same time.
-If neither existingClaim or hostPath argument is given then type is emptyDir.
+Example for `extraVolumeMounts` and corresponding `extraVolumes`:
```yaml
-- extraVolumeMounts:
+extraVolumeMounts:
- name: plugins
mountPath: /var/lib/grafana/plugins
subPath: configs/grafana/plugins
- existingClaim: existing-grafana-claim
readOnly: false
- name: dashboards
mountPath: /var/lib/grafana/dashboards
hostPath: /usr/shared/grafana/dashboards
readOnly: false
+
+extraVolumes:
+ - name: plugins
+ existingClaim: existing-grafana-claim
+ - name: dashboards
+ hostPath: /usr/shared/grafana/dashboards
```
+Volumes default to `emptyDir`. Set to `persistentVolumeClaim`,
+`hostPath`, `csi`, or `configMap` for other types. For a
+`persistentVolumeClaim`, specify an existing claim name with
+`existingClaim`.
+
## Import dashboards
There are a few methods to import dashboards to Grafana. Below are some examples and explanations as to how to use each method:
@@ -345,6 +374,14 @@
gnetId: 2
revision: 2
datasource: Prometheus
+ loki-dashboard-quick-search:
+ gnetId: 12019
+ revision: 2
+ datasource:
+ - name: DS_PROMETHEUS
+ value: Prometheus
+ - name: DS_LOKI
+ value: Loki
local-dashboard:
url: https://raw.githubusercontent.com/user/repository/master/dashboards/dashboard.json
```
@@ -520,9 +557,61 @@
# default org_id: 1
```
-## Provision alert rules, contact points, notification policies and notification templates
+## Sidecar for alerting resources
+
+If the parameter `sidecar.alerts.enabled` is set, a sidecar container is deployed in the grafana
+pod. This container watches all configmaps (or secrets) in the cluster (namespace defined by `sidecar.alerts.searchNamespace`) and filters out the ones with
+a label as defined in `sidecar.alerts.label` (default is `grafana_alert`). The files defined in those configmaps are written
+to a folder and accessed by grafana. Changes to the configmaps are monitored and the imported alerting resources are updated, however, deletions are a little more complicated (see below).
+
+This sidecar can be used to provision alert rules, contact points, notification policies, notification templates and mute timings as shown in [Grafana Documentation](https://grafana.com/docs/grafana/next/alerting/set-up/provision-alerting-resources/file-provisioning/).
+
+To fetch the alert config which will be provisioned, use the alert provisioning API ([Grafana Documentation](https://grafana.com/docs/grafana/next/developers/http_api/alerting_provisioning/)).
+You can use either JSON or YAML format.
+
+Example config for an alert rule:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: sample-grafana-alert
+ labels:
+ grafana_alert: "1"
+data:
+ k8s-alert.yml: |-
+ apiVersion: 1
+ groups:
+ - orgId: 1
+ name: k8s-alert
+ [...]
+```
+
+To delete provisioned alert rules is a two step process, you need to delete the configmap which defined the alert rule
+and then create a configuration which deletes the alert rule.
+
+Example deletion configuration:
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: delete-sample-grafana-alert
+ namespace: monitoring
+ labels:
+ grafana_alert: "1"
+data:
+ delete-k8s-alert.yml: |-
+ apiVersion: 1
+ deleteRules:
+ - orgId: 1
+ uid: 16624780-6564-45dc-825c-8bded4ad92d3
+```
+
+## Statically provision alerting resources
+If you don't need to change alerting resources (alert rules, contact points, notification policies and notification templates) regularly you could use the `alerting` config option instead of the sidecar option above.
+This will grab the alerting config and apply it statically at build time for the helm file.
-There are two methods to provision alerting configuration in Grafana. Below are some examples and explanations as to how to use each method:
+There are two methods to statically provision alerting configuration in Grafana. Below are some examples and explanations as to how to use each method:
```yaml
alerting:
@@ -552,13 +641,14 @@
title: '{{ `{{ template "default.title" . }}` }}'
```
-There are two possibilities:
+The two possibilities for static alerting resource provisioning are:
-* Inlining the file contents as described in the example `values.yaml` and the official [Grafana documentation](https://grafana.com/docs/grafana/next/alerting/set-up/provision-alerting-resources/file-provisioning/).
-* Importing a file using a relative path starting from the chart root directory.
+* Inlining the file contents as shown for contact points in the above example.
+* Importing a file using a relative path starting from the chart root directory as shown for the alert rules in the above example.
### Important notes on file provisioning
+* The format of the files is defined in the [Grafana documentation](https://grafana.com/docs/grafana/next/alerting/set-up/provision-alerting-resources/file-provisioning/) on file provisioning.
* The chart supports importing YAML and JSON files.
* The filename must be unique, otherwise one volume mount will overwrite the other.
* In case of inlining, double curly braces that arise from the Grafana configuration format and are not intended as templates for the chart must be escaped.

View File

@ -0,0 +1,100 @@
--- charts-original/charts/grafana/templates/_helpers.tpl
+++ charts/charts/grafana/templates/_helpers.tpl
@@ -174,13 +174,11 @@
Return the appropriate apiVersion for Horizontal Pod Autoscaler.
*/}}
{{- define "grafana.hpa.apiVersion" -}}
-{{- if $.Capabilities.APIVersions.Has "autoscaling/v2/HorizontalPodAutoscaler" }}
-{{- print "autoscaling/v2" }}
-{{- else if $.Capabilities.APIVersions.Has "autoscaling/v2beta2/HorizontalPodAutoscaler" }}
-{{- print "autoscaling/v2beta2" }}
-{{- else }}
-{{- print "autoscaling/v2beta1" }}
-{{- end }}
+{{- if .Capabilities.APIVersions.Has "autoscaling/v2" }}
+{{- print "autoscaling/v2" }}
+{{- else }}
+{{- print "autoscaling/v2beta2" }}
+{{- end }}
{{- end }}
{{/*
@@ -230,3 +228,78 @@
{{- end }}
{{- end }}
{{- end }}
+
+
+{{/*
+ Checks whether or not the configSecret secret has to be created
+ */}}
+{{- define "grafana.shouldCreateConfigSecret" -}}
+{{- $secretFound := false -}}
+{{- range $key, $value := .Values.datasources }}
+ {{- if hasKey $value "secret" }}
+ {{- $secretFound = true}}
+ {{- end }}
+{{- end }}
+{{- range $key, $value := .Values.notifiers }}
+ {{- if hasKey $value "secret" }}
+ {{- $secretFound = true}}
+ {{- end }}
+{{- end }}
+{{- range $key, $value := .Values.alerting }}
+ {{- if (or (hasKey $value "secret") (hasKey $value "secretFile")) }}
+ {{- $secretFound = true}}
+ {{- end }}
+{{- end }}
+{{- $secretFound}}
+{{- end -}}
+
+{{/*
+ Checks whether the user is attempting to store secrets in plaintext
+ in the grafana.ini configmap
+*/}}
+{{/* grafana.assertNoLeakedSecrets checks for sensitive keys in values */}}
+{{- define "grafana.assertNoLeakedSecrets" -}}
+ {{- $sensitiveKeysYaml := `
+sensitiveKeys:
+- path: ["database", "password"]
+- path: ["smtp", "password"]
+- path: ["security", "secret_key"]
+- path: ["security", "admin_password"]
+- path: ["auth.basic", "password"]
+- path: ["auth.ldap", "bind_password"]
+- path: ["auth.google", "client_secret"]
+- path: ["auth.github", "client_secret"]
+- path: ["auth.gitlab", "client_secret"]
+- path: ["auth.generic_oauth", "client_secret"]
+- path: ["auth.okta", "client_secret"]
+- path: ["auth.azuread", "client_secret"]
+- path: ["auth.grafana_com", "client_secret"]
+- path: ["auth.grafananet", "client_secret"]
+- path: ["azure", "user_identity_client_secret"]
+- path: ["unified_alerting", "ha_redis_password"]
+- path: ["metrics", "basic_auth_password"]
+- path: ["external_image_storage.s3", "secret_key"]
+- path: ["external_image_storage.webdav", "password"]
+- path: ["external_image_storage.azure_blob", "account_key"]
+` | fromYaml -}}
+ {{- if $.Values.assertNoLeakedSecrets -}}
+ {{- $grafanaIni := index .Values "grafana.ini" -}}
+ {{- range $_, $secret := $sensitiveKeysYaml.sensitiveKeys -}}
+ {{- $currentMap := $grafanaIni -}}
+ {{- $shouldContinue := true -}}
+ {{- range $index, $elem := $secret.path -}}
+ {{- if and $shouldContinue (hasKey $currentMap $elem) -}}
+ {{- if eq (len $secret.path) (add1 $index) -}}
+ {{- if not (regexMatch "\\$(?:__(?:env|file|vault))?{[^}]+}" (index $currentMap $elem)) -}}
+ {{- fail (printf "Sensitive key '%s' should not be defined explicitly in values. Use variable expansion instead. You can disable this client-side validation by changing the value of assertNoLeakedSecrets." (join "." $secret.path)) -}}
+ {{- end -}}
+ {{- else -}}
+ {{- $currentMap = index $currentMap $elem -}}
+ {{- end -}}
+ {{- else -}}
+ {{- $shouldContinue = false -}}
+ {{- end -}}
+ {{- end -}}
+ {{- end -}}
+ {{- end -}}
+{{- end -}}

View File

@ -0,0 +1,391 @@
--- charts-original/charts/grafana/templates/_pod.tpl
+++ charts/charts/grafana/templates/_pod.tpl
@@ -5,7 +5,7 @@
schedulerName: "{{ . }}"
{{- end }}
serviceAccountName: {{ include "grafana.serviceAccountName" . }}
-automountServiceAccountToken: {{ .Values.serviceAccount.autoMount }}
+automountServiceAccountToken: {{ .Values.automountServiceAccountToken }}
{{- with .Values.securityContext }}
securityContext:
{{- toYaml . | nindent 2 }}
@@ -14,18 +14,26 @@
hostAliases:
{{- toYaml . | nindent 2 }}
{{- end }}
+{{- if .Values.dnsPolicy }}
+dnsPolicy: {{ .Values.dnsPolicy }}
+{{- end }}
+{{- with .Values.dnsConfig }}
+dnsConfig:
+ {{- toYaml . | nindent 2 }}
+{{- end }}
{{- with .Values.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
-{{- if ( or .Values.persistence.enabled .Values.dashboards .Values.extraInitContainers (and .Values.sidecar.datasources.enabled .Values.sidecar.datasources.initDatasources) (and .Values.sidecar.notifiers.enabled .Values.sidecar.notifiers.initNotifiers)) }}
+{{- if ( or .Values.persistence.enabled .Values.dashboards .Values.extraInitContainers (and .Values.sidecar.alerts.enabled .Values.sidecar.alerts.initAlerts) (and .Values.sidecar.datasources.enabled .Values.sidecar.datasources.initDatasources) (and .Values.sidecar.notifiers.enabled .Values.sidecar.notifiers.initNotifiers)) }}
initContainers:
{{- end }}
{{- if ( and .Values.persistence.enabled .Values.initChownData.enabled ) }}
- name: init-chown-data
+ {{- $registry := include "system_default_registry" . | default .Values.initChownData.image.registry -}}
{{- if .Values.initChownData.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.initChownData.image.repository }}:{{ .Values.initChownData.image.tag }}@sha256:{{ .Values.initChownData.image.sha }}"
+ image: "{{ $registry }}{{ .Values.initChownData.image.repository }}:{{ .Values.initChownData.image.tag }}@sha256:{{ .Values.initChownData.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.initChownData.image.repository }}:{{ .Values.initChownData.image.tag }}"
+ image: "{{ $registry }}{{ .Values.initChownData.image.repository }}:{{ .Values.initChownData.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.initChownData.image.pullPolicy }}
{{- with .Values.initChownData.securityContext }}
@@ -50,10 +58,11 @@
{{- end }}
{{- if .Values.dashboards }}
- name: download-dashboards
+ {{- $registry := include "system_default_registry" . | default .Values.downloadDashboardsImage.registry -}}
{{- if .Values.downloadDashboardsImage.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.downloadDashboardsImage.repository }}:{{ .Values.downloadDashboardsImage.tag }}@sha256:{{ .Values.downloadDashboardsImage.sha }}"
+ image: "{{ $registry }}{{ .Values.downloadDashboardsImage.repository }}:{{ .Values.downloadDashboardsImage.tag }}@sha256:{{ .Values.downloadDashboardsImage.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.downloadDashboardsImage.repository }}:{{ .Values.downloadDashboardsImage.tag }}"
+ image: "{{ $registry }}{{ .Values.downloadDashboardsImage.repository }}:{{ .Values.downloadDashboardsImage.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.downloadDashboardsImage.pullPolicy }}
command: ["/bin/sh"]
@@ -96,12 +105,86 @@
readOnly: {{ .readOnly }}
{{- end }}
{{- end }}
+{{- if and .Values.sidecar.alerts.enabled .Values.sidecar.alerts.initAlerts }}
+ - name: {{ include "grafana.name" . }}-init-sc-alerts
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
+ {{- if .Values.sidecar.image.sha }}
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ {{- else }}
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ {{- end }}
+ imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
+ env:
+ {{- range $key, $value := .Values.sidecar.alerts.env }}
+ - name: "{{ $key }}"
+ value: "{{ $value }}"
+ {{- end }}
+ {{- if .Values.sidecar.alerts.ignoreAlreadyProcessed }}
+ - name: IGNORE_ALREADY_PROCESSED
+ value: "true"
+ {{- end }}
+ - name: METHOD
+ value: "LIST"
+ - name: LABEL
+ value: "{{ .Values.sidecar.alerts.label }}"
+ {{- with .Values.sidecar.alerts.labelValue }}
+ - name: LABEL_VALUE
+ value: {{ quote . }}
+ {{- end }}
+ {{- if or .Values.sidecar.logLevel .Values.sidecar.alerts.logLevel }}
+ - name: LOG_LEVEL
+ value: {{ default .Values.sidecar.logLevel .Values.sidecar.alerts.logLevel }}
+ {{- end }}
+ - name: FOLDER
+ value: "/etc/grafana/provisioning/alerting"
+ - name: RESOURCE
+ value: {{ quote .Values.sidecar.alerts.resource }}
+ {{- with .Values.sidecar.enableUniqueFilenames }}
+ - name: UNIQUE_FILENAMES
+ value: "{{ . }}"
+ {{- end }}
+ {{- with .Values.sidecar.alerts.searchNamespace }}
+ - name: NAMESPACE
+ value: {{ . | join "," | quote }}
+ {{- end }}
+ {{- with .Values.sidecar.alerts.skipTlsVerify }}
+ - name: SKIP_TLS_VERIFY
+ value: {{ quote . }}
+ {{- end }}
+ {{- with .Values.sidecar.alerts.script }}
+ - name: SCRIPT
+ value: {{ quote . }}
+ {{- end }}
+ {{- with .Values.sidecar.livenessProbe }}
+ livenessProbe:
+ {{- toYaml . | nindent 6 }}
+ {{- end }}
+ {{- with .Values.sidecar.readinessProbe }}
+ readinessProbe:
+ {{- toYaml . | nindent 6 }}
+ {{- end }}
+ {{- with .Values.sidecar.resources }}
+ resources:
+ {{- toYaml . | nindent 6 }}
+ {{- end }}
+ {{- with .Values.sidecar.securityContext }}
+ securityContext:
+ {{- toYaml . | nindent 6 }}
+ {{- end }}
+ volumeMounts:
+ - name: sc-alerts-volume
+ mountPath: "/etc/grafana/provisioning/alerting"
+ {{- with .Values.sidecar.alerts.extraMounts }}
+ {{- toYaml . | trim | nindent 6 }}
+ {{- end }}
+{{- end }}
{{- if and .Values.sidecar.datasources.enabled .Values.sidecar.datasources.initDatasources }}
- name: {{ include "grafana.name" . }}-init-sc-datasources
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -155,10 +238,11 @@
{{- end }}
{{- if and .Values.sidecar.notifiers.enabled .Values.sidecar.notifiers.initNotifiers }}
- name: {{ include "grafana.name" . }}-init-sc-notifiers
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -229,12 +313,13 @@
enableServiceLinks: {{ .Values.enableServiceLinks }}
{{- end }}
containers:
-{{- if .Values.sidecar.alerts.enabled }}
+{{- if and .Values.sidecar.alerts.enabled (not .Values.sidecar.alerts.initAlerts) }}
- name: {{ include "grafana.name" . }}-sc-alerts
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -333,14 +418,15 @@
mountPath: "/etc/grafana/provisioning/alerting"
{{- with .Values.sidecar.alerts.extraMounts }}
{{- toYaml . | trim | nindent 6 }}
- {{- end }}
+ {{- end }}
{{- end}}
{{- if .Values.sidecar.dashboards.enabled }}
- name: {{ include "grafana.name" . }}-sc-dashboard
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -348,6 +434,11 @@
- name: "{{ $key }}"
value: "{{ $value }}"
{{- end }}
+ {{- range $key, $value := .Values.sidecar.datasources.envValueFrom }}
+ - name: {{ $key | quote }}
+ valueFrom:
+ {{- tpl (toYaml $value) $ | nindent 10 }}
+ {{- end }}
{{- if .Values.sidecar.dashboards.ignoreAlreadyProcessed }}
- name: IGNORE_ALREADY_PROCESSED
value: "true"
@@ -443,12 +534,13 @@
{{- toYaml . | trim | nindent 6 }}
{{- end }}
{{- end}}
-{{- if .Values.sidecar.datasources.enabled }}
+{{- if and .Values.sidecar.datasources.enabled (not .Values.sidecar.datasources.initDatasources) }}
- name: {{ include "grafana.name" . }}-sc-datasources
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -546,10 +638,11 @@
{{- end}}
{{- if .Values.sidecar.notifiers.enabled }}
- name: {{ include "grafana.name" . }}-sc-notifiers
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -649,10 +742,11 @@
{{- end}}
{{- if .Values.sidecar.plugins.enabled }}
- name: {{ include "grafana.name" . }}-sc-plugins
+ {{- $registry := include "system_default_registry" . | default .Values.sidecar.image.registry -}}
{{- if .Values.sidecar.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}@sha256:{{ .Values.sidecar.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
+ image: "{{ $registry }}{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.sidecar.imagePullPolicy }}
env:
@@ -749,10 +843,11 @@
mountPath: "/etc/grafana/provisioning/plugins"
{{- end}}
- name: {{ .Chart.Name }}
+ {{- $registry := include "system_default_registry" . | default .Values.image.registry -}}
{{- if .Values.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}@sha256:{{ .Values.image.sha }}"
+ image: "{{ $registry }}{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}@sha256:{{ .Values.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
+ image: "{{ $registry }}{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
{{- end }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
{{- if .Values.command }}
@@ -809,26 +904,47 @@
{{- end }}
{{- end }}
{{- with .Values.datasources }}
+ {{- $datasources := . }}
{{- range (keys . | sortAlpha) }}
+ {{- if (or (hasKey (index $datasources .) "secret")) }} {{/*check if current datasource should be handeled as secret */}}
+ - name: config-secret
+ mountPath: "/etc/grafana/provisioning/datasources/{{ . }}"
+ subPath: {{ . | quote }}
+ {{- else }}
- name: config
mountPath: "/etc/grafana/provisioning/datasources/{{ . }}"
subPath: {{ . | quote }}
{{- end }}
{{- end }}
+ {{- end }}
{{- with .Values.notifiers }}
+ {{- $notifiers := . }}
{{- range (keys . | sortAlpha) }}
+ {{- if (or (hasKey (index $notifiers .) "secret")) }} {{/*check if current notifier should be handeled as secret */}}
+ - name: config-secret
+ mountPath: "/etc/grafana/provisioning/notifiers/{{ . }}"
+ subPath: {{ . | quote }}
+ {{- else }}
- name: config
mountPath: "/etc/grafana/provisioning/notifiers/{{ . }}"
subPath: {{ . | quote }}
{{- end }}
{{- end }}
+ {{- end }}
{{- with .Values.alerting }}
+ {{- $alertingmap := .}}
{{- range (keys . | sortAlpha) }}
+ {{- if (or (hasKey (index $.Values.alerting .) "secret") (hasKey (index $.Values.alerting .) "secretFile")) }} {{/*check if current alerting entry should be handeled as secret */}}
+ - name: config-secret
+ mountPath: "/etc/grafana/provisioning/alerting/{{ . }}"
+ subPath: {{ . | quote }}
+ {{- else }}
- name: config
mountPath: "/etc/grafana/provisioning/alerting/{{ . }}"
subPath: {{ . | quote }}
{{- end }}
{{- end }}
+ {{- end }}
{{- with .Values.dashboardProviders }}
{{- range (keys . | sortAlpha) }}
- name: config
@@ -962,11 +1078,17 @@
- secretRef:
name: {{ tpl .name $ }}
optional: {{ .optional | default false }}
+ {{- if .prefix }}
+ prefix: {{ tpl .prefix $ }}
+ {{- end }}
{{- end }}
{{- range .Values.envFromConfigMaps }}
- configMapRef:
name: {{ tpl .name $ }}
optional: {{ .optional | default false }}
+ {{- if .prefix }}
+ prefix: {{ tpl .prefix $ }}
+ {{- end }}
{{- end }}
{{- end }}
{{- with .Values.livenessProbe }}
@@ -989,8 +1111,8 @@
{{- tpl . $ | nindent 2 }}
{{- end }}
nodeSelector: {{ include "linux-node-selector" . | nindent 2 }}
-{{- if .Values.nodeSelector }}
-{{ toYaml .Values.nodeSelector | indent 2 }}
+{{- with .Values.nodeSelector }}
+ {{- toYaml . | nindent 2 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
@@ -1001,13 +1123,19 @@
{{- toYaml . | nindent 2 }}
{{- end }}
tolerations: {{ include "linux-node-tolerations" . | nindent 2 }}
-{{- if .Values.tolerations }}
-{{ toYaml .Values.tolerations | indent 2 }}
+{{- with .Values.tolerations }}
+ {{- toYaml . | nindent 2 }}
{{- end }}
volumes:
- name: config
configMap:
name: {{ include "grafana.fullname" . }}
+ {{- $createConfigSecret := eq (include "grafana.shouldCreateConfigSecret" .) "true" -}}
+ {{- if and .Values.createConfigmap $createConfigSecret }}
+ - name: config-secret
+ secret:
+ secretName: {{ include "grafana.fullname" . }}-config-secret
+ {{- end }}
{{- range .Values.extraConfigmapMounts }}
- name: {{ tpl .name $root }}
configMap:
@@ -1131,17 +1259,23 @@
{{- toYaml .csi | nindent 6 }}
{{- end }}
{{- end }}
- {{- range .Values.extraVolumeMounts }}
+ {{- range .Values.extraVolumes }}
- name: {{ .name }}
{{- if .existingClaim }}
persistentVolumeClaim:
claimName: {{ .existingClaim }}
{{- else if .hostPath }}
hostPath:
- path: {{ .hostPath }}
+ {{ toYaml .hostPath | nindent 6 }}
{{- else if .csi }}
csi:
- {{- toYaml .data | nindent 6 }}
+ {{- toYaml .csi | nindent 6 }}
+ {{- else if .configMap }}
+ configMap:
+ {{- toYaml .configMap | nindent 6 }}
+ {{- else if .emptyDir }}
+ emptyDir:
+ {{- toYaml .emptyDir | nindent 6 }}
{{- else }}
emptyDir: {}
{{- end }}

View File

@ -0,0 +1,21 @@
--- charts-original/charts/grafana/templates/clusterrole.yaml
+++ charts/charts/grafana/templates/clusterrole.yaml
@@ -1,14 +1,14 @@
-{{- if and .Values.rbac.create (or (not .Values.rbac.namespaced) .Values.rbac.extraClusterRoleRules) (not .Values.rbac.useExistingRole) }}
+{{- if and .Values.rbac.create (or (not .Values.rbac.namespaced) .Values.rbac.extraClusterRoleRules) (not .Values.rbac.useExistingClusterRole) }}
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
{{- include "grafana.labels" . | nindent 4 }}
-{{- with .Values.annotations }}
+ {{- with .Values.annotations }}
annotations:
-{{ toYaml . | indent 4 }}
+ {{- toYaml . | nindent 4 }}
+ {{- end }}
name: {{ include "grafana.fullname" . }}-clusterrole
-{{- end}}
{{- if or .Values.sidecar.dashboards.enabled .Values.rbac.extraClusterRoleRules .Values.sidecar.datasources.enabled .Values.sidecar.plugins.enabled .Values.sidecar.alerts.enabled }}
rules:
{{- if or .Values.sidecar.dashboards.enabled .Values.sidecar.datasources.enabled .Values.sidecar.plugins.enabled .Values.sidecar.alerts.enabled }}

View File

@ -0,0 +1,13 @@
--- charts-original/charts/grafana/templates/clusterrolebinding.yaml
+++ charts/charts/grafana/templates/clusterrolebinding.yaml
@@ -15,8 +15,8 @@
namespace: {{ include "grafana.namespace" . }}
roleRef:
kind: ClusterRole
- {{- if .Values.rbac.useExistingRole }}
- name: {{ .Values.rbac.useExistingRole }}
+ {{- if .Values.rbac.useExistingClusterRole }}
+ name: {{ .Values.rbac.useExistingClusterRole }}
{{- else }}
name: {{ include "grafana.fullname" . }}-clusterrole
{{- end }}

View File

@ -0,0 +1,23 @@
--- charts-original/charts/grafana/templates/configmap-dashboard-provider.yaml
+++ charts/charts/grafana/templates/configmap-dashboard-provider.yaml
@@ -11,19 +11,5 @@
name: {{ include "grafana.fullname" . }}-config-dashboards
namespace: {{ include "grafana.namespace" . }}
data:
- provider.yaml: |-
- apiVersion: 1
- providers:
- - name: '{{ .Values.sidecar.dashboards.provider.name }}'
- orgId: {{ .Values.sidecar.dashboards.provider.orgid }}
- {{- if not .Values.sidecar.dashboards.provider.foldersFromFilesStructure }}
- folder: '{{ .Values.sidecar.dashboards.provider.folder }}'
- {{- end }}
- type: {{ .Values.sidecar.dashboards.provider.type }}
- disableDeletion: {{ .Values.sidecar.dashboards.provider.disableDelete }}
- allowUiUpdates: {{ .Values.sidecar.dashboards.provider.allowUiUpdates }}
- updateIntervalSeconds: {{ .Values.sidecar.dashboards.provider.updateIntervalSeconds | default 30 }}
- options:
- foldersFromFilesStructure: {{ .Values.sidecar.dashboards.provider.foldersFromFilesStructure }}
- path: {{ .Values.sidecar.dashboards.folder }}{{- with .Values.sidecar.dashboards.defaultFolderName }}/{{ . }}{{- end }}
+ {{- include "grafana.configDashboardProviderData" . | nindent 2 }}
{{- end }}

View File

@ -0,0 +1,137 @@
--- charts-original/charts/grafana/templates/configmap.yaml
+++ charts/charts/grafana/templates/configmap.yaml
@@ -1,6 +1,4 @@
{{- if .Values.createConfigmap }}
-{{- $files := .Files }}
-{{- $root := . -}}
apiVersion: v1
kind: ConfigMap
metadata:
@@ -13,126 +11,5 @@
{{- toYaml . | nindent 4 }}
{{- end }}
data:
- {{- with .Values.plugins }}
- plugins: {{ join "," . }}
- {{- end }}
- grafana.ini: |
- {{- range $elem, $elemVal := index .Values "grafana.ini" }}
- {{- if not (kindIs "map" $elemVal) }}
- {{- if kindIs "invalid" $elemVal }}
- {{ $elem }} =
- {{- else if kindIs "string" $elemVal }}
- {{ $elem }} = {{ tpl $elemVal $ }}
- {{- else }}
- {{ $elem }} = {{ $elemVal }}
- {{- end }}
- {{- end }}
- {{- end }}
- {{- range $key, $value := index .Values "grafana.ini" }}
- {{- if kindIs "map" $value }}
- [{{ $key }}]
- {{- range $elem, $elemVal := $value }}
- {{- if kindIs "invalid" $elemVal }}
- {{ $elem }} =
- {{- else if kindIs "string" $elemVal }}
- {{ $elem }} = {{ tpl $elemVal $ }}
- {{- else }}
- {{ $elem }} = {{ $elemVal }}
- {{- end }}
- {{- end }}
- {{- end }}
- {{- end }}
-
- {{- range $key, $value := .Values.datasources }}
- {{- $key | nindent 2 }}: |
- {{- tpl (toYaml $value | nindent 4) $root }}
- {{- end }}
-
- {{- range $key, $value := .Values.notifiers }}
- {{- $key | nindent 2 }}: |
- {{- toYaml $value | nindent 4 }}
- {{- end }}
-
- {{- range $key, $value := .Values.alerting }}
- {{- if (hasKey $value "file") }}
- {{- $key | nindent 2 }}:
- {{- toYaml ( $files.Get $value.file ) | nindent 4}}
- {{- else }}
- {{- $key | nindent 2 }}: |
- {{- tpl (toYaml $value | nindent 4) $root }}
- {{- end }}
- {{- end }}
-
- {{- range $key, $value := .Values.dashboardProviders }}
- {{- $key | nindent 2 }}: |
- {{- toYaml $value | nindent 4 }}
- {{- end }}
-
-{{- if .Values.dashboards }}
- download_dashboards.sh: |
- #!/usr/bin/env sh
- set -euf
- {{- if .Values.dashboardProviders }}
- {{- range $key, $value := .Values.dashboardProviders }}
- {{- range $value.providers }}
- mkdir -p {{ .options.path }}
- {{- end }}
- {{- end }}
- {{- end }}
- {{ $dashboardProviders := .Values.dashboardProviders }}
- {{- range $provider, $dashboards := .Values.dashboards }}
- {{- range $key, $value := $dashboards }}
- {{- if (or (hasKey $value "gnetId") (hasKey $value "url")) }}
- curl -skf \
- --connect-timeout 60 \
- --max-time 60 \
- {{- if not $value.b64content }}
- {{- if not $value.acceptHeader }}
- -H "Accept: application/json" \
- {{- else }}
- -H "Accept: {{ $value.acceptHeader }}" \
- {{- end }}
- {{- if $value.token }}
- -H "Authorization: token {{ $value.token }}" \
- {{- end }}
- {{- if $value.bearerToken }}
- -H "Authorization: Bearer {{ $value.bearerToken }}" \
- {{- end }}
- {{- if $value.basic }}
- -H "Authorization: Basic {{ $value.basic }}" \
- {{- end }}
- {{- if $value.gitlabToken }}
- -H "PRIVATE-TOKEN: {{ $value.gitlabToken }}" \
- {{- end }}
- -H "Content-Type: application/json;charset=UTF-8" \
- {{- end }}
- {{- $dpPath := "" -}}
- {{- range $kd := (index $dashboardProviders "dashboardproviders.yaml").providers }}
- {{- if eq $kd.name $provider }}
- {{- $dpPath = $kd.options.path }}
- {{- end }}
- {{- end }}
- {{- if $value.url }}
- "{{ $value.url }}" \
- {{- else }}
- "https://grafana.com/api/dashboards/{{ $value.gnetId }}/revisions/{{- if $value.revision -}}{{ $value.revision }}{{- else -}}1{{- end -}}/download" \
- {{- end }}
- {{- if $value.datasource }}
- {{- if kindIs "string" $value.datasource }}
- | sed '/-- .* --/! s/"datasource":.*,/"datasource": "{{ $value.datasource }}",/g' \
- {{- end }}
- {{- if kindIs "slice" $value.datasource }}
- {{- range $value.datasource }}
- | sed '/-- .* --/! s/${{"{"}}{{ .name }}}/{{ .value }}/g' \
- {{- end }}
- {{- end }}
- {{- end }}
- {{- if $value.b64content }}
- | base64 -d \
- {{- end }}
- > "{{- if $dpPath -}}{{ $dpPath }}{{- else -}}/var/lib/grafana/dashboards/{{ $provider }}{{- end -}}/{{ $key }}.json"
- {{ end }}
- {{- end }}
- {{- end }}
-{{- end }}
+ {{- include "grafana.configData" . | nindent 2 }}
{{- end }}

View File

@ -0,0 +1,12 @@
--- charts-original/charts/grafana/templates/dashboards-json-configmap.yaml
+++ charts/charts/grafana/templates/dashboards-json-configmap.yaml
@@ -9,6 +9,9 @@
labels:
{{- include "grafana.labels" $ | nindent 4 }}
dashboard-provider: {{ $provider }}
+ {{- if $.Values.sidecar.dashboards.enabled }}
+ {{ $.Values.sidecar.dashboards.label }}: {{ $.Values.sidecar.dashboards.labelValue | quote }}
+ {{- end }}
{{- if $dashboards }}
data:
{{- $dashboardFound := false }}

View File

@ -0,0 +1,23 @@
--- charts-original/charts/grafana/templates/deployment.yaml
+++ charts/charts/grafana/templates/deployment.yaml
@@ -33,14 +33,16 @@
{{- toYaml . | nindent 8 }}
{{- end }}
annotations:
- checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
+ checksum/config: {{ include "grafana.configData" . | sha256sum }}
+ {{- if .Values.dashboards }}
checksum/dashboards-json-config: {{ include (print $.Template.BasePath "/dashboards-json-configmap.yaml") . | sha256sum }}
- checksum/sc-dashboard-provider-config: {{ include (print $.Template.BasePath "/configmap-dashboard-provider.yaml") . | sha256sum }}
+ {{- end }}
+ checksum/sc-dashboard-provider-config: {{ include "grafana.configDashboardProviderData" . | sha256sum }}
{{- if and (or (and (not .Values.admin.existingSecret) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD__FILE) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD)) (and .Values.ldap.enabled (not .Values.ldap.existingSecret))) (not .Values.env.GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION) }}
- checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
+ checksum/secret: {{ include "grafana.secretsData" . | sha256sum }}
{{- end }}
{{- if .Values.envRenderSecret }}
- checksum/secret-env: {{ include (print $.Template.BasePath "/secret-env.yaml") . | sha256sum }}
+ checksum/secret-env: {{ tpl (toYaml .Values.envRenderSecret) . | sha256sum }}
{{- end }}
kubectl.kubernetes.io/default-container: {{ .Chart.Name }}
{{- with .Values.podAnnotations }}

View File

@ -0,0 +1,16 @@
--- charts-original/charts/grafana/templates/image-renderer-deployment.yaml
+++ charts/charts/grafana/templates/image-renderer-deployment.yaml
@@ -65,10 +65,11 @@
{{- end }}
containers:
- name: {{ .Chart.Name }}-image-renderer
+ {{- $registry := include "system_default_registry" | default .Values.imageRenderer.image.registry -}}
{{- if .Values.imageRenderer.image.sha }}
- image: "{{ template "system_default_registry" . }}{{ .Values.imageRenderer.image.repository }}:{{ .Values.imageRenderer.image.tag }}@sha256:{{ .Values.imageRenderer.image.sha }}"
+ image: "{{ $registry }}{{ .Values.imageRenderer.image.repository }}:{{ .Values.imageRenderer.image.tag }}@sha256:{{ .Values.imageRenderer.image.sha }}"
{{- else }}
- image: "{{ template "system_default_registry" . }}{{ .Values.imageRenderer.image.repository }}:{{ .Values.imageRenderer.image.tag }}"
+ image: "{{ $registry }}{{ .Values.imageRenderer.image.repository }}:{{ .Values.imageRenderer.image.tag }}"
{{- end }}
imagePullPolicy: {{ .Values.imageRenderer.image.pullPolicy }}
{{- if .Values.imageRenderer.command }}

View File

@ -0,0 +1,11 @@
--- charts-original/charts/grafana/templates/ingress.yaml
+++ charts/charts/grafana/templates/ingress.yaml
@@ -34,7 +34,7 @@
rules:
{{- if .Values.ingress.hosts }}
{{- range .Values.ingress.hosts }}
- - host: {{ tpl . $ }}
+ - host: {{ tpl . $ | quote }}
http:
paths:
{{- with $extraPaths }}

View File

@ -0,0 +1,20 @@
--- charts-original/charts/grafana/templates/networkpolicy.yaml
+++ charts/charts/grafana/templates/networkpolicy.yaml
@@ -27,8 +27,17 @@
{{- if .Values.networkPolicy.egress.enabled }}
egress:
+ {{- if not .Values.networkPolicy.egress.blockDNSResolution }}
+ - ports:
+ - port: 53
+ protocol: UDP
+ {{- end }}
- ports:
{{ .Values.networkPolicy.egress.ports | toJson }}
+ {{- with .Values.networkPolicy.egress.to }}
+ to:
+ {{- toYaml . | nindent 12 }}
+ {{- end }}
{{- end }}
{{- if .Values.networkPolicy.ingress }}
ingress:

View File

@ -0,0 +1,12 @@
--- charts-original/charts/grafana/templates/pvc.yaml
+++ charts/charts/grafana/templates/pvc.yaml
@@ -27,6 +27,9 @@
resources:
requests:
storage: {{ .Values.persistence.size | quote }}
+ {{- if (lookup "v1" "PersistentVolumeClaim" (include "grafana.namespace" .) (include "grafana.fullname" .)) }}
+ volumeName: {{ (lookup "v1" "PersistentVolumeClaim" (include "grafana.namespace" .) (include "grafana.fullname" .)).spec.volumeName }}
+ {{- end }}
{{- with .Values.persistence.storageClassName }}
storageClassName: {{ . }}
{{- end }}

View File

@ -0,0 +1,9 @@
--- charts-original/charts/grafana/templates/role.yaml
+++ charts/charts/grafana/templates/role.yaml
@@ -1,5 +1,5 @@
{{- if and .Values.rbac.create (not .Values.rbac.useExistingRole) -}}
-apiVersion: {{ include "grafana.rbac.apiVersion" . }}
+apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "grafana.fullname" . }}

View File

@ -0,0 +1,9 @@
--- charts-original/charts/grafana/templates/rolebinding.yaml
+++ charts/charts/grafana/templates/rolebinding.yaml
@@ -1,5 +1,5 @@
{{- if .Values.rbac.create }}
-apiVersion: {{ include "grafana.rbac.apiVersion" . }}
+apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "grafana.fullname" . }}

View File

@ -0,0 +1,19 @@
--- charts-original/charts/grafana/templates/secret.yaml
+++ charts/charts/grafana/templates/secret.yaml
@@ -12,15 +12,5 @@
{{- end }}
type: Opaque
data:
- {{- if and (not .Values.env.GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION) (not .Values.admin.existingSecret) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD__FILE) (not .Values.env.GF_SECURITY_ADMIN_PASSWORD) }}
- admin-user: {{ .Values.adminUser | b64enc | quote }}
- {{- if .Values.adminPassword }}
- admin-password: {{ .Values.adminPassword | b64enc | quote }}
- {{- else }}
- admin-password: {{ include "grafana.password" . }}
- {{- end }}
- {{- end }}
- {{- if not .Values.ldap.existingSecret }}
- ldap-toml: {{ tpl .Values.ldap.config $ | b64enc | quote }}
- {{- end }}
+ {{- include "grafana.secretsData" . | nindent 2 }}
{{- end }}

View File

@ -0,0 +1,27 @@
--- charts-original/charts/grafana/templates/service.yaml
+++ charts/charts/grafana/templates/service.yaml
@@ -21,10 +21,13 @@
clusterIP: {{ . }}
{{- end }}
{{- else if eq .Values.service.type "LoadBalancer" }}
- type: {{ .Values.service.type }}
+ type: LoadBalancer
{{- with .Values.service.loadBalancerIP }}
loadBalancerIP: {{ . }}
{{- end }}
+ {{- with .Values.service.loadBalancerClass }}
+ loadBalancerClass: {{ . }}
+ {{- end }}
{{- with .Values.service.loadBalancerSourceRanges }}
loadBalancerSourceRanges:
{{- toYaml . | nindent 4 }}
@@ -36,6 +39,9 @@
externalIPs:
{{- toYaml . | nindent 4 }}
{{- end }}
+ {{- with .Values.service.externalTrafficPolicy }}
+ externalTrafficPolicy: {{ . }}
+ {{- end }}
ports:
- name: {{ .Values.service.portName }}
port: {{ .Values.service.port }}

View File

@ -0,0 +1,20 @@
--- charts-original/charts/grafana/templates/serviceaccount.yaml
+++ charts/charts/grafana/templates/serviceaccount.yaml
@@ -1,7 +1,7 @@
{{- if .Values.serviceAccount.create }}
-{{- $root := . -}}
apiVersion: v1
kind: ServiceAccount
+automountServiceAccountToken: {{ .Values.serviceAccount.autoMount | default .Values.serviceAccount.automountServiceAccountToken }}
metadata:
labels:
{{- include "grafana.labels" . | nindent 4 }}
@@ -10,7 +10,7 @@
{{- end }}
{{- with .Values.serviceAccount.annotations }}
annotations:
- {{- tpl (toYaml . | nindent 4) $root }}
+ {{- tpl (toYaml . | nindent 4) $ }}
{{- end }}
name: {{ include "grafana.serviceAccountName" . }}
namespace: {{ include "grafana.namespace" . }}

View File

@ -0,0 +1,11 @@
--- charts-original/charts/grafana/templates/servicemonitor.yaml
+++ charts/charts/grafana/templates/servicemonitor.yaml
@@ -12,7 +12,7 @@
labels:
{{- include "grafana.labels" . | nindent 4 }}
{{- with .Values.serviceMonitor.labels }}
- {{- toYaml . | nindent 4 }}
+ {{- tpl (toYaml . | nindent 4) $ }}
{{- end }}
spec:
endpoints:

View File

@ -0,0 +1,21 @@
--- charts-original/charts/grafana/templates/tests/test.yaml
+++ charts/charts/grafana/templates/tests/test.yaml
@@ -34,13 +34,17 @@
{{- end }}
containers:
- name: {{ .Release.Name }}-test
- image: "{{ template "system_default_registry" . }}{{ .Values.testFramework.image}}:{{ .Values.testFramework.tag }}"
+ image: "{{ template "system_default_registry" . | default .Values.testFramework.image.registry }}/{{ .Values.testFramework.image.repository }}:{{ .Values.testFramework.image.tag }}"
imagePullPolicy: "{{ .Values.testFramework.imagePullPolicy}}"
command: ["/opt/bats/bin/bats", "-t", "/tests/run.sh"]
volumeMounts:
- mountPath: /tests
name: tests
readOnly: true
+ {{- with .Values.testFramework.resources }}
+ resources:
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
volumes:
- name: tests
configMap:

View File

@ -0,0 +1,280 @@
--- charts-original/charts/grafana/values.yaml
+++ charts/charts/grafana/values.yaml
@@ -8,7 +8,8 @@
rbac:
create: true
## Use an existing ClusterRole/Role (depending on rbac.namespaced false/true)
- # useExistingRole: name-of-some-(cluster)role
+ # useExistingRole: name-of-some-role
+ # useExistingClusterRole: name-of-some-clusterRole
pspEnabled: false
pspUseAppArmor: false
namespaced: false
@@ -26,16 +27,22 @@
nameTest:
## ServiceAccount labels.
labels: {}
-## Service account annotations. Can be templated.
-# annotations:
-# eks.amazonaws.com/role-arn: arn:aws:iam::123456789000:role/iam-role-name-here
- autoMount: true
+ ## Service account annotations. Can be templated.
+ # annotations:
+ # eks.amazonaws.com/role-arn: arn:aws:iam::123456789000:role/iam-role-name-here
+
+ ## autoMount is deprecated in favor of automountServiceAccountToken
+ # autoMount: false
+ automountServiceAccountToken: true
replicas: 1
## Create a headless service for the deployment
headlessService: false
+## Should the service account be auto mounted on the pod
+automountServiceAccountToken: true
+
## Create HorizontalPodAutoscaler object for deployment type
#
autoscaling:
@@ -85,7 +92,7 @@
image:
repository: rancher/mirrored-grafana-grafana
# Overrides the Grafana image tag whose default is the chart appVersion
- tag: 9.1.5
+ tag: 10.3.3
sha: ""
pullPolicy: IfNotPresent
@@ -168,6 +175,9 @@
service:
enabled: true
type: ClusterIP
+ loadBalancerIP: ""
+ loadBalancerClass: ""
+ loadBalancerSourceRanges: []
port: 80
targetPort: 3000
# targetPort: 4181 To be used with a proxy extraContainer
@@ -186,7 +196,7 @@
path: /metrics
# namespace: monitoring (defaults to use the namespace this chart is deployed to)
labels: {}
- interval: 1m
+ interval: 30s
scheme: http
tlsConfig: {}
scrapeTimeout: 30s
@@ -198,7 +208,6 @@
# - name: keycloak
# port: 8080
# targetPort: 8080
- # type: ClusterIP
# overrides pod.spec.hostAliases in the grafana deployment's pods
hostAliases: []
@@ -218,7 +227,7 @@
labels: {}
path: /
- # pathType is only for k8s >= 1.18
+ # pathType is only for k8s >= 1.1=
pathType: Prefix
hosts:
@@ -439,6 +448,7 @@
## Name is templated.
envFromSecrets: []
## - name: secret-name
+## prefix: prefix
## optional: true
## The names of conifgmaps in the same kubernetes namespace which contain values to be added to the environment
@@ -447,6 +457,7 @@
## ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmapenvsource-v1-core
envFromConfigMaps: []
## - name: configmap-name
+## prefix: prefix
## optional: true
# Inject Kubernetes services as environment variables.
@@ -492,15 +503,22 @@
# - name: extra-volume-0
# mountPath: /mnt/volume0
# readOnly: true
- # existingClaim: volume-claim
# - name: extra-volume-1
# mountPath: /mnt/volume1
# readOnly: true
- # hostPath: /usr/shared/
# - name: grafana-secrets
# mountPath: /mnt/volume2
- # csi: true
- # data:
+
+## Additional Grafana server volumes
+extraVolumes: []
+ # - name: extra-volume-0
+ # existingClaim: volume-claim
+ # - name: extra-volume-1
+ # hostPath:
+ # path: /usr/shared/
+ # type: ""
+ # - name: grafana-secrets
+ # csi:
# driver: secrets-store.csi.k8s.io
# readOnly: true
# volumeAttributes:
@@ -593,21 +611,22 @@
# labels:
# team: sre_team_1
# contactpoints.yaml:
- # apiVersion: 1
- # contactPoints:
- # - orgId: 1
- # name: cp_1
- # receivers:
- # - uid: first_uid
- # type: pagerduty
- # settings:
- # integrationKey: XXX
- # severity: critical
- # class: ping failure
- # component: Grafana
- # group: app-stack
- # summary: |
- # {{ `{{ include "default.message" . }}` }}
+ # secret:
+ # apiVersion: 1
+ # contactPoints:
+ # - orgId: 1
+ # name: cp_1
+ # receivers:
+ # - uid: first_uid
+ # type: pagerduty
+ # settings:
+ # integrationKey: XXX
+ # severity: critical
+ # class: ping failure
+ # component: Grafana
+ # group: app-stack
+ # summary: |
+ # {{ `{{ include "default.message" . }}` }}
## Configure notifiers
## ref: http://docs.grafana.org/administration/provisioning/#alert-notification-channels
@@ -770,7 +789,7 @@
sidecar:
image:
repository: rancher/mirrored-kiwigrid-k8s-sidecar
- tag: 1.24.6
+ tag: 1.26.1
sha: ""
imagePullPolicy: IfNotPresent
resources: {}
@@ -823,7 +842,9 @@
# Absolute path to shell script to execute after a alert got reloaded
script: null
skipReload: false
- # Deploy the alert sidecar as an initContainer in addition to a container.
+ # This is needed if skipReload is true, to load any alerts defined at startup time.
+ # Deploy the alert sidecar as an initContainer.
+ initAlerts: false
# Additional alert sidecar volume mounts
extraMounts: []
# Sets the size limit of the alert sidecar emptyDir volume
@@ -895,6 +916,7 @@
enabled: false
# Additional environment variables for the datasourcessidecar
env: {}
+ envValueFrom: {}
# Do not reprocess already processed unchanged resources on k8s API reconnect.
# ignoreAlreadyProcessed: true
# label that the configmaps with datasources are marked with
@@ -926,8 +948,8 @@
# Absolute path to shell script to execute after a datasource got reloaded
script: null
skipReload: true
- # Deploy the datasource sidecar as an initContainer in addition to a container.
# This is needed if skipReload is true, to load any datasources defined at startup time.
+ # Deploy the datasources sidecar as an initContainer.
initDatasources: true
# Sets the size limit of the datasource sidecar emptyDir volume
sizeLimit: {}
@@ -1022,14 +1044,22 @@
## Add a seperate remote image renderer deployment/service
imageRenderer:
+ deploymentStrategy: {}
# Enable the image-renderer deployment & service
enabled: false
replicas: 1
+ autoscaling:
+ enabled: false
+ minReplicas: 1
+ maxReplicas: 5
+ targetCPU: "60"
+ targetMemory: ""
+ behavior: {}
image:
# image-renderer Image repository
repository: rancher/mirrored-grafana-grafana-image-renderer
# image-renderer Image tag
- tag: 3.8.0
+ tag: 3.10.1
# image-renderer Image sha (optional)
sha: ""
# image-renderer ImagePullPolicy
@@ -1067,6 +1097,8 @@
drop: ['ALL']
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
+ ## image-renderer pod annotation
+ podAnnotations: {}
# image-renderer deployment Host Aliases
hostAliases: []
# image-renderer deployment priority class
@@ -1180,14 +1212,25 @@
## created allowing grafana to connect to external data sources from kubernetes cluster.
enabled: false
##
+ ## @param networkPolicy.egress.blockDNSResolution When enabled, DNS resolution will be blocked
+ ## for all pods in the grafana namespace.
+ blockDNSResolution: false
+ ##
## @param networkPolicy.egress.ports Add individual ports to be allowed by the egress
ports: []
## Add ports to the egress by specifying - port: <port number>
## E.X.
- ## ports:
- ## - port: 80
- ## - port: 443
- ##
+ ## - port: 80
+ ## - port: 443
+ ##
+ ## @param networkPolicy.egress.to Allow egress traffic to specific destinations
+ to: []
+ ## Add destinations to the egress by specifying - ipBlock: <CIDR>
+ ## E.X.
+ ## to:
+ ## - namespaceSelector:
+ ## matchExpressions:
+ ## - {key: role, operator: In, values: [grafana]}
##
##
##
@@ -1208,3 +1251,13 @@
# data:
# - key: grafana-admin-password
# name: adminPassword
+
+# assertNoLeakedSecrets is a helper function defined in _helpers.tpl that checks if secret
+# values are not exposed in the rendered grafana.ini configmap. It is enabled by default.
+#
+# To pass values into grafana.ini without exposing them in a configmap, use variable expansion:
+# https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#variable-expansion
+#
+# Alternatively, if you wish to allow secret values to be exposed in the rendered grafana.ini configmap,
+# you can disable this check by setting assertNoLeakedSecrets to false.
+assertNoLeakedSecrets: true

View File

@ -0,0 +1,33 @@
--- charts-original/templates/_helpers.tpl
+++ charts/templates/_helpers.tpl
@@ -5,6 +5,15 @@
{{- end -}}
{{- end -}}
+{{- define "monitoring_registry" -}}
+ {{- $temp_registry := (include "system_default_registry" .) -}}
+ {{- if $temp_registry -}}
+ {{- trimSuffix "/" $temp_registry -}}
+ {{- else -}}
+ {{- .Values.global.imageRegistry -}}
+ {{- end -}}
+{{- end -}}
+
{{/*
https://github.com/helm/helm/issues/4535#issuecomment-477778391
Usage: {{ include "call-nested" (list . "SUBCHART_NAME" "TEMPLATE") }}
@@ -436,3 +445,14 @@
kubernetes.io/metadata.name: {{ $ns }}
{{- end }}
{{- end -}}
+
+{{- define "project-prometheus-stack.operator.admission-webhook.dnsNames" }}
+{{- $fullname := include "project-prometheus-stack.operator.fullname" . }}
+{{- $namespace := include "project-prometheus-stack.namespace" . }}
+{{- $fullname }}
+{{ $fullname }}.{{ $namespace }}.svc
+{{- if .Values.prometheusOperator.admissionWebhooks.deployment.enabled }}
+{{ $fullname }}-webhook
+{{ $fullname }}-webhook.{{ $namespace }}.svc
+{{- end }}
+{{- end }}

View File

@ -0,0 +1,45 @@
--- charts-original/templates/alertmanager/alertmanager.yaml
+++ charts/templates/alertmanager/alertmanager.yaml
@@ -13,7 +13,7 @@
{{- end }}
spec:
{{- if .Values.alertmanager.alertmanagerSpec.image }}
- {{- $registry := .Values.global.imageRegistry | default .Values.alertmanager.alertmanagerSpec.image.registry -}}
+ {{- $registry := include "monitoring_registry" . | default .Values.alertmanager.alertmanagerSpec.image.registry }}
{{- if and .Values.alertmanager.alertmanagerSpec.image.tag .Values.alertmanager.alertmanagerSpec.image.sha }}
image: "{{ $registry }}/{{ .Values.alertmanager.alertmanagerSpec.image.repository }}:{{ .Values.alertmanager.alertmanagerSpec.image.tag }}@sha256:{{ .Values.alertmanager.alertmanagerSpec.image.sha }}"
{{- else if .Values.alertmanager.alertmanagerSpec.image.sha }}
@@ -31,6 +31,7 @@
replicas: {{ .Values.alertmanager.alertmanagerSpec.replicas }}
listenLocal: {{ .Values.alertmanager.alertmanagerSpec.listenLocal }}
serviceAccountName: {{ template "project-prometheus-stack.alertmanager.serviceAccountName" . }}
+ automountServiceAccountToken: {{ .Values.alertmanager.alertmanagerSpec.automountServiceAccountToken }}
{{- if .Values.alertmanager.alertmanagerSpec.externalUrl }}
externalUrl: "{{ tpl .Values.alertmanager.alertmanagerSpec.externalUrl . }}"
{{- else if and .Values.alertmanager.ingress.enabled .Values.alertmanager.ingress.hosts }}
@@ -161,10 +162,25 @@
{{- if .Values.alertmanager.alertmanagerSpec.clusterAdvertiseAddress }}
clusterAdvertiseAddress: {{ .Values.alertmanager.alertmanagerSpec.clusterAdvertiseAddress }}
{{- end }}
+{{- if .Values.alertmanager.alertmanagerSpec.clusterGossipInterval }}
+ clusterGossipInterval: {{ .Values.alertmanager.alertmanagerSpec.clusterGossipInterval }}
+{{- end }}
+{{- if .Values.alertmanager.alertmanagerSpec.clusterPeerTimeout }}
+ clusterPeerTimeout: {{ .Values.alertmanager.alertmanagerSpec.clusterPeerTimeout }}
+{{- end }}
+{{- if .Values.alertmanager.alertmanagerSpec.clusterPushpullInterval }}
+ clusterPushpullInterval: {{ .Values.alertmanager.alertmanagerSpec.clusterPushpullInterval }}
+{{- end }}
{{- if .Values.alertmanager.alertmanagerSpec.forceEnableClusterMode }}
forceEnableClusterMode: {{ .Values.alertmanager.alertmanagerSpec.forceEnableClusterMode }}
{{- end }}
{{- if .Values.alertmanager.alertmanagerSpec.minReadySeconds }}
minReadySeconds: {{ .Values.alertmanager.alertmanagerSpec.minReadySeconds }}
{{- end }}
+{{- with .Values.alertmanager.alertmanagerSpec.additionalConfig }}
+ {{- tpl (toYaml .) $ | nindent 2 }}
+{{- end }}
+{{- with .Values.alertmanager.alertmanagerSpec.additionalConfigString }}
+ {{- tpl . $ | nindent 2 }}
+{{- end }}
{{- end }}

View File

@ -0,0 +1,11 @@
--- charts-original/templates/alertmanager/ingress.yaml
+++ charts/templates/alertmanager/ingress.yaml
@@ -31,7 +31,7 @@
rules:
{{- if .Values.alertmanager.ingress.hosts }}
{{- range $host := .Values.alertmanager.ingress.hosts }}
- - host: {{ tpl $host $ }}
+ - host: {{ tpl $host $ | quote }}
http:
paths:
{{- range $p := $paths }}

View File

@ -0,0 +1,31 @@
--- charts-original/templates/alertmanager/service.yaml
+++ charts/templates/alertmanager/service.yaml
@@ -1,3 +1,4 @@
+{{- $kubeTargetVersion := default .Capabilities.KubeVersion.GitVersion .Values.kubeTargetVersionOverride }}
{{- if .Values.alertmanager.enabled }}
apiVersion: v1
kind: Service
@@ -43,6 +44,12 @@
port: {{ .Values.alertmanager.service.port }}
targetPort: {{ .Values.alertmanager.service.targetPort }}
protocol: TCP
+ - name: reloader-web
+ {{- if semverCompare ">=1.20.0-0" $kubeTargetVersion }}
+ appProtocol: http
+ {{- end }}
+ port: 8080
+ targetPort: reloader-web
{{- if .Values.alertmanager.service.additionalPorts }}
{{ toYaml .Values.alertmanager.service.additionalPorts | indent 2 }}
{{- end }}
@@ -52,5 +59,10 @@
{{- if .Values.alertmanager.service.sessionAffinity }}
sessionAffinity: {{ .Values.alertmanager.service.sessionAffinity }}
{{- end }}
+{{- if eq .Values.alertmanager.service.sessionAffinity "ClientIP" }}
+ sessionAffinityConfig:
+ clientIP:
+ timeoutSeconds: {{ .Values.alertmanager.service.sessionAffinityConfig.clientIP.timeoutSeconds }}
+{{- end }}
type: "{{ .Values.alertmanager.service.type }}"
{{- end }}

View File

@ -0,0 +1,74 @@
--- charts-original/templates/alertmanager/servicemonitor.yaml
+++ charts/templates/alertmanager/servicemonitor.yaml
@@ -27,7 +27,7 @@
interval: {{ .Values.alertmanager.serviceMonitor.interval }}
{{- end }}
{{- if .Values.alertmanager.serviceMonitor.proxyUrl }}
- proxyUrl: {{ .Values.alertmanager.serviceMonitor.proxyUrl}}
+ proxyUrl: {{ .Values.alertmanager.serviceMonitor.proxyUrl }}
{{- end }}
{{- if .Values.alertmanager.serviceMonitor.scheme }}
scheme: {{ .Values.alertmanager.serviceMonitor.scheme }}
@@ -36,25 +36,49 @@
bearerTokenFile: {{ .Values.alertmanager.serviceMonitor.bearerTokenFile }}
{{- end }}
{{- if .Values.alertmanager.serviceMonitor.tlsConfig }}
- tlsConfig: {{ toYaml .Values.alertmanager.serviceMonitor.tlsConfig | nindent 6 }}
+ tlsConfig: {{- toYaml .Values.alertmanager.serviceMonitor.tlsConfig | nindent 6 }}
{{- end }}
path: "{{ trimSuffix "/" .Values.alertmanager.alertmanagerSpec.routePrefix }}/metrics"
metricRelabelings:
{{- if .Values.alertmanager.serviceMonitor.metricRelabelings }}
- {{ tpl (toYaml .Values.alertmanager.serviceMonitor.metricRelabelings | indent 6) . }}
+ {{- tpl (toYaml .Values.alertmanager.serviceMonitor.metricRelabelings | nindent 6) . }}
{{- end }}
{{ if .Values.global.cattle.clusterId }}
- - sourceLabels: [__address__]
- targetLabel: cluster_id
- replacement: {{ .Values.global.cattle.clusterId }}
+ - sourceLabels: [__address__]
+ targetLabel: cluster_id
+ replacement: {{ .Values.global.cattle.clusterId }}
{{- end }}
{{ if .Values.global.cattle.clusterName }}
- - sourceLabels: [__address__]
- targetLabel: cluster_name
- replacement: {{ .Values.global.cattle.clusterName }}
- {{- end }}
-{{- if .Values.alertmanager.serviceMonitor.relabelings }}
- relabelings:
-{{ toYaml .Values.alertmanager.serviceMonitor.relabelings | indent 6 }}
-{{- end }}
+ - sourceLabels: [__address__]
+ targetLabel: cluster_name
+ replacement: {{ .Values.global.cattle.clusterName }}
+ {{- end }}
+ {{- if .Values.alertmanager.serviceMonitor.relabelings }}
+ relabelings: {{- toYaml .Values.alertmanager.serviceMonitor.relabelings | nindent 6 }}
+ {{- end }}
+ {{- range .Values.alertmanager.serviceMonitor.additionalEndpoints }}
+ - port: {{ .port }}
+ {{- if or $.Values.alertmanager.serviceMonitor.interval .interval }}
+ interval: {{ default $.Values.alertmanager.serviceMonitor.interval .interval }}
+ {{- end }}
+ {{- if or $.Values.alertmanager.serviceMonitor.proxyUrl .proxyUrl }}
+ proxyUrl: {{ default $.Values.alertmanager.serviceMonitor.proxyUrl .proxyUrl }}
+ {{- end }}
+ {{- if or $.Values.alertmanager.serviceMonitor.scheme .scheme }}
+ scheme: {{ default $.Values.alertmanager.serviceMonitor.scheme .scheme }}
+ {{- end }}
+ {{- if or $.Values.alertmanager.serviceMonitor.bearerTokenFile .bearerTokenFile }}
+ bearerTokenFile: {{ default $.Values.alertmanager.serviceMonitor.bearerTokenFile .bearerTokenFile }}
+ {{- end }}
+ {{- if or $.Values.alertmanager.serviceMonitor.tlsConfig .tlsConfig }}
+ tlsConfig: {{- default $.Values.alertmanager.serviceMonitor.tlsConfig .tlsConfig | toYaml | nindent 6 }}
+ {{- end }}
+ path: {{ .path }}
+ {{- if or $.Values.alertmanager.serviceMonitor.metricRelabelings .metricRelabelings }}
+ metricRelabelings: {{- tpl (default $.Values.alertmanager.serviceMonitor.metricRelabelings .metricRelabelings | toYaml | nindent 6) . }}
+ {{- end }}
+ {{- if or $.Values.alertmanager.serviceMonitor.relabelings .relabelings }}
+ relabelings: {{- default $.Values.alertmanager.serviceMonitor.relabelings .relabelings | toYaml | nindent 6 }}
+ {{- end }}
+ {{- end }}
{{- end }}

View File

@ -0,0 +1,30 @@
--- charts-original/templates/grafana/configmaps-datasources.yaml
+++ charts/templates/grafana/configmaps-datasources.yaml
@@ -55,11 +55,25 @@
timeInterval: {{ $scrapeInterval }}
{{- if $.Values.grafana.sidecar.datasources.exemplarTraceIdDestinations }}
exemplarTraceIdDestinations:
- - datasourceUid: {{ .Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.datasourceUid }}
- name: {{ .Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.traceIdLabelName }}
+ - datasourceUid: {{ $.Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.datasourceUid }}
+ name: {{ $.Values.grafana.sidecar.datasources.exemplarTraceIdDestinations.traceIdLabelName }}
{{- end }}
{{- end }}
{{- end }}
+{{- if .Values.grafana.sidecar.datasources.alertmanager.enabled }}
+ - name: Alertmanager
+ type: alertmanager
+ uid: {{ .Values.grafana.sidecar.datasources.alertmanager.uid }}
+ {{- if .Values.grafana.sidecar.datasources.alertmanager.url }}
+ url: {{ .Values.grafana.sidecar.datasources.alertmanager.url }}
+ {{- else }}
+ url: http://{{ template "project-prometheus-stack.fullname" . }}-alertmanager.{{ template "project-prometheus-stack.namespace" . }}:{{ .Values.alertmanager.service.port }}/{{ trimPrefix "/" .Values.alertmanager.alertmanagerSpec.routePrefix }}
+ {{- end }}
+ access: proxy
+ jsonData:
+ handleGrafanaManagedAlerts: {{ .Values.grafana.sidecar.datasources.alertmanager.handleGrafanaManagedAlerts }}
+ implementation: {{ .Values.grafana.sidecar.datasources.alertmanager.implementation }}
+{{- end }}
{{- end }}
{{- if .Values.grafana.additionalDataSources }}
{{ tpl (toYaml .Values.grafana.additionalDataSources | indent 4) . }}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/alertmanager-overview.yaml
+++ charts/templates/grafana/dashboards-1.14/alertmanager-overview.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'alertmanager-overview' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'alertmanager-overview' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/cluster-total.yaml
+++ charts/templates/grafana/dashboards-1.14/cluster-total.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'cluster-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'cluster-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/grafana-overview.yaml
+++ charts/templates/grafana/dashboards-1.14/grafana-overview.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'grafana-overview' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'grafana-overview' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/k8s-resources-namespace.yaml
+++ charts/templates/grafana/dashboards-1.14/k8s-resources-namespace.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'k8s-resources-namespace' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'k8s-resources-namespace' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/k8s-resources-node.yaml
+++ charts/templates/grafana/dashboards-1.14/k8s-resources-node.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'k8s-resources-node' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'k8s-resources-node' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/k8s-resources-pod.yaml
+++ charts/templates/grafana/dashboards-1.14/k8s-resources-pod.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'k8s-resources-pod' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'k8s-resources-pod' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/k8s-resources-workload.yaml
+++ charts/templates/grafana/dashboards-1.14/k8s-resources-workload.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'k8s-resources-workload' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'k8s-resources-workload' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/k8s-resources-workloads-namespace.yaml
+++ charts/templates/grafana/dashboards-1.14/k8s-resources-workloads-namespace.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'k8s-resources-workloads-namespace' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'k8s-resources-workloads-namespace' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/namespace-by-pod.yaml
+++ charts/templates/grafana/dashboards-1.14/namespace-by-pod.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'namespace-by-pod' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'namespace-by-pod' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/namespace-by-workload.yaml
+++ charts/templates/grafana/dashboards-1.14/namespace-by-workload.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'namespace-by-workload' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'namespace-by-workload' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/persistentvolumesusage.yaml
+++ charts/templates/grafana/dashboards-1.14/persistentvolumesusage.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'persistentvolumesusage' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'persistentvolumesusage' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/pod-total.yaml
+++ charts/templates/grafana/dashboards-1.14/pod-total.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'pod-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'pod-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/prometheus-remote-write.yaml
+++ charts/templates/grafana/dashboards-1.14/prometheus-remote-write.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'prometheus-remote-write' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'prometheus-remote-write' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/prometheus.yaml
+++ charts/templates/grafana/dashboards-1.14/prometheus.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'prometheus' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'prometheus' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,9 @@
--- charts-original/templates/grafana/dashboards-1.14/workload-total.yaml
+++ charts/templates/grafana/dashboards-1.14/workload-total.yaml
@@ -1,5 +1,5 @@
{{- /*
-Generated from 'workload-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/grafana-dashboardDefinitions.yaml
+Generated from 'workload-total' from https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/a8ba97a150c75be42010c75d10b720c55e182f1a/manifests/grafana-dashboardDefinitions.yaml
Do not change in-place! In order to change this file first read following link:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack
*/ -}}

View File

@ -0,0 +1,14 @@
--- charts-original/templates/prometheus/podDisruptionBudget.yaml
+++ charts/templates/prometheus/podDisruptionBudget.yaml
@@ -16,6 +16,10 @@
{{- end }}
selector:
matchLabels:
+ {{- if .Values.prometheus.agentMode }}
+ app.kubernetes.io/name: prometheus-agent
+ {{- else }}
app.kubernetes.io/name: prometheus
- prometheus: {{ template "project-prometheus-stack.prometheus.crname" . }}
+ {{- end }}
+ operator.prometheus.io/name: {{ template "project-prometheus-stack.prometheus.crname" . }}
{{- end }}

View File

@ -0,0 +1,322 @@
--- charts-original/templates/prometheus/prometheus.yaml
+++ charts/templates/prometheus/prometheus.yaml
@@ -1,6 +1,11 @@
{{- if .Values.prometheus.enabled }}
+{{- if .Values.prometheus.agentMode }}
+apiVersion: monitoring.coreos.com/v1alpha1
+kind: PrometheusAgent
+{{- else }}
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
+{{- end }}
metadata:
name: {{ template "project-prometheus-stack.prometheus.crname" . }}
namespace: {{ template "project-prometheus-stack.namespace" . }}
@@ -12,7 +17,7 @@
{{ toYaml .Values.prometheus.annotations | indent 4 }}
{{- end }}
spec:
-{{- if or .Values.prometheus.prometheusSpec.alertingEndpoints .Values.alertmanager.enabled }}
+{{- if and (not .Values.prometheus.agentMode) (or .Values.prometheus.prometheusSpec.alertingEndpoints .Values.alertmanager.enabled) }}
alerting:
alertmanagers:
{{- if .Values.prometheus.prometheusSpec.alertingEndpoints }}
@@ -24,6 +29,13 @@
{{- if .Values.alertmanager.alertmanagerSpec.routePrefix }}
pathPrefix: "{{ .Values.alertmanager.alertmanagerSpec.routePrefix }}"
{{- end }}
+ {{- if .Values.alertmanager.alertmanagerSpec.scheme }}
+ scheme: {{ .Values.alertmanager.alertmanagerSpec.scheme }}
+ {{- end }}
+ {{- if .Values.alertmanager.alertmanagerSpec.tlsConfig }}
+ tlsConfig:
+{{ toYaml .Values.alertmanager.alertmanagerSpec.tlsConfig | indent 10 }}
+ {{- end }}
apiVersion: {{ .Values.alertmanager.apiVersion }}
{{- end }}
{{- end }}
@@ -32,7 +44,7 @@
{{ toYaml .Values.prometheus.prometheusSpec.apiserverConfig | indent 4}}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.image }}
- {{- $registry := .Values.global.imageRegistry | default .Values.prometheus.prometheusSpec.image.registry -}}
+ {{- $registry := include "monitoring_registry" . | default .Values.prometheus.prometheusSpec.image.registry -}}
{{- if and .Values.prometheus.prometheusSpec.image.tag .Values.prometheus.prometheusSpec.image.sha }}
image: "{{ $registry }}/{{ .Values.prometheus.prometheusSpec.image.repository }}:{{ .Values.prometheus.prometheusSpec.image.tag }}@sha256:{{ .Values.prometheus.prometheusSpec.image.sha }}"
{{- else if .Values.prometheus.prometheusSpec.image.sha }}
@@ -84,12 +96,14 @@
logLevel: {{ .Values.prometheus.prometheusSpec.logLevel }}
logFormat: {{ .Values.prometheus.prometheusSpec.logFormat }}
listenLocal: {{ .Values.prometheus.prometheusSpec.listenLocal }}
+{{- if not .Values.prometheus.agentMode }}
enableAdminAPI: {{ .Values.prometheus.prometheusSpec.enableAdminAPI }}
+{{- end }}
{{- if .Values.prometheus.prometheusSpec.web }}
web:
{{ toYaml .Values.prometheus.prometheusSpec.web | indent 4 }}
{{- end }}
-{{- if .Values.prometheus.prometheusSpec.exemplars }}
+{{- if and (not .Values.prometheus.agentMode) .Values.prometheus.prometheusSpec.exemplars }}
exemplars:
{{ toYaml .Values.prometheus.prometheusSpec.exemplars | indent 4 }}
{{- end }}
@@ -105,13 +119,14 @@
{{- if .Values.prometheus.prometheusSpec.scrapeTimeout }}
scrapeTimeout: {{ .Values.prometheus.prometheusSpec.scrapeTimeout }}
{{- end }}
-{{- if .Values.prometheus.prometheusSpec.evaluationInterval }}
+{{- if and (not .Values.prometheus.agentMode) .Values.prometheus.prometheusSpec.evaluationInterval }}
evaluationInterval: {{ .Values.prometheus.prometheusSpec.evaluationInterval }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.resources }}
resources:
{{ toYaml .Values.prometheus.prometheusSpec.resources | indent 4 }}
{{- end }}
+{{- if not .Values.prometheus.agentMode }}
retention: {{ .Values.prometheus.prometheusSpec.retention | quote }}
{{- if .Values.prometheus.prometheusSpec.retentionSize }}
retentionSize: {{ .Values.prometheus.prometheusSpec.retentionSize | quote }}
@@ -122,6 +137,7 @@
outOfOrderTimeWindow: {{ .Values.prometheus.prometheusSpec.tsdb.outOfOrderTimeWindow }}
{{- end }}
{{- end }}
+{{- end }}
{{- if eq .Values.prometheus.prometheusSpec.walCompression false }}
walCompression: false
{{ else }}
@@ -151,13 +167,13 @@
{{- end }}
{{- if .Values.prometheus.prometheusSpec.serviceMonitorNamespaceSelector }}
serviceMonitorNamespaceSelector:
-{{ toYaml .Values.prometheus.prometheusSpec.serviceMonitorNamespaceSelector | indent 4 }}
+{{ tpl (toYaml .Values.prometheus.prometheusSpec.serviceMonitorNamespaceSelector | indent 4) . }}
{{ else }}
serviceMonitorNamespaceSelector: {}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.podMonitorSelector }}
podMonitorSelector:
-{{ toYaml .Values.prometheus.prometheusSpec.podMonitorSelector | indent 4 }}
+{{ tpl (toYaml .Values.prometheus.prometheusSpec.podMonitorSelector | indent 4) . }}
{{ else if .Values.prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues }}
podMonitorSelector:
matchLabels:
@@ -176,7 +192,8 @@
{{ else }}
probeSelector: {}
{{- end }}
-{{- if (or .Values.prometheus.prometheusSpec.remoteRead .Values.prometheus.prometheusSpec.additionalRemoteRead) }}
+ probeNamespaceSelector: {{ .Values.global.cattle.projectNamespaceSelector | toYaml | nindent 4 }}
+{{- if and (not .Values.prometheus.agentMode) (or .Values.prometheus.prometheusSpec.remoteRead .Values.prometheus.prometheusSpec.additionalRemoteRead) }}
remoteRead:
{{- if .Values.prometheus.prometheusSpec.remoteRead }}
{{ tpl (toYaml .Values.prometheus.prometheusSpec.remoteRead | indent 4) . }}
@@ -194,16 +211,15 @@
{{ toYaml .Values.prometheus.prometheusSpec.additionalRemoteWrite | indent 4 }}
{{- end }}
{{- end }}
- probeNamespaceSelector: {{ .Values.global.cattle.projectNamespaceSelector | toYaml | nindent 4 }}
{{- if .Values.prometheus.prometheusSpec.securityContext }}
securityContext:
{{ toYaml .Values.prometheus.prometheusSpec.securityContext | indent 4 }}
{{- end }}
ruleNamespaceSelector: {{ .Values.global.cattle.projectNamespaceSelector | toYaml | nindent 4 }}
-{{- if not (has "agent" .Values.prometheus.prometheusSpec.enableFeatures) }}
+{{- if not .Values.prometheus.agentMode }}
{{- if .Values.prometheus.prometheusSpec.ruleSelector }}
ruleSelector:
-{{ toYaml .Values.prometheus.prometheusSpec.ruleSelector | indent 4}}
+{{ tpl (toYaml .Values.prometheus.prometheusSpec.ruleSelector | indent 4) . }}
{{- else if .Values.prometheus.prometheusSpec.ruleSelectorNilUsesHelmValues }}
ruleSelector:
matchLabels:
@@ -212,6 +228,22 @@
ruleSelector: {}
{{- end }}
{{- end }}
+{{- if .Values.prometheus.prometheusSpec.scrapeConfigSelector }}
+ scrapeConfigSelector:
+{{ tpl (toYaml .Values.prometheus.prometheusSpec.scrapeConfigSelector | indent 4) . }}
+{{ else if .Values.prometheus.prometheusSpec.scrapeConfigSelectorNilUsesHelmValues }}
+ scrapeConfigSelector:
+ matchLabels:
+ release: {{ $.Release.Name | quote }}
+{{ else }}
+ scrapeConfigSelector: {}
+{{- end }}
+{{- if .Values.prometheus.prometheusSpec.scrapeConfigNamespaceSelector }}
+ scrapeConfigNamespaceSelector:
+{{ tpl (toYaml .Values.prometheus.prometheusSpec.scrapeConfigNamespaceSelector | indent 4) . }}
+{{ else }}
+ scrapeConfigNamespaceSelector: {}
+{{- end }}
{{- if .Values.prometheus.prometheusSpec.storageSpec }}
storage:
{{ tpl (toYaml .Values.prometheus.prometheusSpec.storageSpec | indent 4) . }}
@@ -220,7 +252,7 @@
podMetadata:
{{ tpl (toYaml .Values.prometheus.prometheusSpec.podMetadata | indent 4) . }}
{{- end }}
-{{- if .Values.prometheus.prometheusSpec.query }}
+{{- if and (not .Values.prometheus.agentMode) .Values.prometheus.prometheusSpec.query }}
query:
{{ toYaml .Values.prometheus.prometheusSpec.query | indent 4}}
{{- end }}
@@ -236,7 +268,7 @@
labelSelector:
matchExpressions:
- {key: app.kubernetes.io/name, operator: In, values: [prometheus]}
- - {key: prometheus, operator: In, values: [{{ template "project-prometheus-stack.prometheus.crname" . }}]}
+ - {key: prometheus, operator: In, values: [{{ template "kube-prometheus-stack.prometheus.crname" . }}]}
{{- else if eq .Values.prometheus.prometheusSpec.podAntiAffinity "soft" }}
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
@@ -246,7 +278,7 @@
labelSelector:
matchExpressions:
- {key: app.kubernetes.io/name, operator: In, values: [prometheus]}
- - {key: prometheus, operator: In, values: [{{ template "project-prometheus-stack.prometheus.crname" . }}]}
+ - {key: prometheus, operator: In, values: [{{ template "kube-prometheus-stack.prometheus.crname" . }}]}
{{- end }}
{{- end }}
tolerations: {{ include "linux-node-tolerations" . | nindent 4 }}
@@ -259,11 +291,11 @@
{{- end }}
{{- if .Values.global.imagePullSecrets }}
imagePullSecrets:
-{{ include "project-prometheus-stack.imagePullSecrets" . | trim | indent 4 }}
+{{ include "kube-prometheus-stack.imagePullSecrets" . | trim | indent 4 }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.additionalScrapeConfigs }}
additionalScrapeConfigs:
- name: {{ template "project-prometheus-stack.fullname" . }}-prometheus-scrape-confg
+ name: {{ template "kube-prometheus-stack.fullname" . }}-prometheus-scrape-confg
key: additional-scrape-configs.yaml
{{- end }}
{{- if .Values.prometheus.prometheusSpec.additionalScrapeConfigsSecret.enabled }}
@@ -271,10 +303,11 @@
name: {{ .Values.prometheus.prometheusSpec.additionalScrapeConfigsSecret.name }}
key: {{ .Values.prometheus.prometheusSpec.additionalScrapeConfigsSecret.key }}
{{- end }}
+{{- if not .Values.prometheus.agentMode }}
{{- if or .Values.prometheus.prometheusSpec.additionalAlertManagerConfigs .Values.prometheus.prometheusSpec.additionalAlertManagerConfigsSecret }}
additionalAlertManagerConfigs:
{{- if .Values.prometheus.prometheusSpec.additionalAlertManagerConfigs }}
- name: {{ template "project-prometheus-stack.fullname" . }}-prometheus-am-confg
+ name: {{ template "kube-prometheus-stack.fullname" . }}-prometheus-am-confg
key: additional-alertmanager-configs.yaml
{{- end }}
{{- if .Values.prometheus.prometheusSpec.additionalAlertManagerConfigsSecret }}
@@ -287,7 +320,7 @@
{{- end }}
{{- if .Values.prometheus.prometheusSpec.additionalAlertRelabelConfigs }}
additionalAlertRelabelConfigs:
- name: {{ template "project-prometheus-stack.fullname" . }}-prometheus-am-relabel-confg
+ name: {{ template "kube-prometheus-stack.fullname" . }}-prometheus-am-relabel-confg
key: additional-alert-relabel-configs.yaml
{{- end }}
{{- if .Values.prometheus.prometheusSpec.additionalAlertRelabelConfigsSecret }}
@@ -295,6 +328,7 @@
name: {{ .Values.prometheus.prometheusSpec.additionalAlertRelabelConfigsSecret.name }}
key: {{ .Values.prometheus.prometheusSpec.additionalAlertRelabelConfigsSecret.key }}
{{- end }}
+{{- end }}
{{- if .Values.prometheus.prometheusSpec.containers }}
containers:
{{ tpl .Values.prometheus.prometheusSpec.containers $ | indent 4 }}
@@ -306,13 +340,26 @@
{{- if .Values.prometheus.prometheusSpec.priorityClassName }}
priorityClassName: {{ .Values.prometheus.prometheusSpec.priorityClassName }}
{{- end }}
+{{- if not .Values.prometheus.agentMode }}
{{- if .Values.prometheus.prometheusSpec.thanos }}
thanos:
-{{ toYaml .Values.prometheus.prometheusSpec.thanos | indent 4 }}
+{{- with (omit .Values.prometheus.prometheusSpec.thanos "objectStorageConfig")}}
+{{ toYaml . | indent 4 }}
+{{- end }}
+{{- if ((.Values.prometheus.prometheusSpec.thanos.objectStorageConfig).existingSecret) }}
+ objectStorageConfig:
+ key: "{{.Values.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret.key }}"
+ name: "{{.Values.prometheus.prometheusSpec.thanos.objectStorageConfig.existingSecret.name }}"
+{{- else if ((.Values.prometheus.prometheusSpec.thanos.objectStorageConfig).secret) }}
+ objectStorageConfig:
+ key: object-storage-configs.yaml
+ name: {{ template "project-prometheus-stack.fullname" . }}-prometheus
+{{- end }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.disableCompaction }}
disableCompaction: {{ .Values.prometheus.prometheusSpec.disableCompaction }}
{{- end }}
+{{- end }}
portName: {{ .Values.prometheus.prometheusSpec.portName }}
{{- if .Values.prometheus.prometheusSpec.volumes }}
volumes:
@@ -336,6 +383,7 @@
{{- if .Values.prometheus.prometheusSpec.enforcedNamespaceLabel }}
enforcedNamespaceLabel: {{ .Values.prometheus.prometheusSpec.enforcedNamespaceLabel }}
{{- $prometheusDefaultRulesExcludedFromEnforce := (include "rules.names" .) | fromYaml }}
+{{- if not .Values.prometheus.agentMode }}
prometheusRulesExcludedFromEnforce:
{{- range $prometheusDefaultRulesExcludedFromEnforce.rules }}
- ruleNamespace: "{{ template "project-prometheus-stack.namespace" $ }}"
@@ -344,20 +392,27 @@
{{- if .Values.prometheus.prometheusSpec.prometheusRulesExcludedFromEnforce }}
{{ toYaml .Values.prometheus.prometheusSpec.prometheusRulesExcludedFromEnforce | indent 4 }}
{{- end }}
+{{- end }}
excludedFromEnforcement:
{{- range $prometheusDefaultRulesExcludedFromEnforce.rules }}
- group: monitoring.coreos.com
resource: prometheusrules
- namespace: "{{ template "project-prometheus-stack.namespace" $ }}"
- name: "{{ printf "%s-%s" (include "project-prometheus-stack.fullname" $) . | trunc 63 | trimSuffix "-" }}"
+ namespace: "{{ template "kube-prometheus-stack.namespace" $ }}"
+ name: "{{ printf "%s-%s" (include "kube-prometheus-stack.fullname" $) . | trunc 63 | trimSuffix "-" }}"
{{- end }}
{{- if .Values.prometheus.prometheusSpec.excludedFromEnforcement }}
{{ tpl (toYaml .Values.prometheus.prometheusSpec.excludedFromEnforcement | indent 4) . }}
{{- end }}
{{- end }}
-{{- if .Values.prometheus.prometheusSpec.queryLogFile }}
+{{- if and (not .Values.prometheus.agentMode) .Values.prometheus.prometheusSpec.queryLogFile }}
queryLogFile: {{ .Values.prometheus.prometheusSpec.queryLogFile }}
{{- end }}
+{{- if .Values.prometheus.prometheusSpec.sampleLimit }}
+ sampleLimit: {{ .Values.prometheus.prometheusSpec.sampleLimit }}
+{{- end }}
+{{- if .Values.prometheus.prometheusSpec.enforcedKeepDroppedTargets }}
+ enforcedKeepDroppedTargets: {{ .Values.prometheus.prometheusSpec.enforcedKeepDroppedTargets }}
+{{- end }}
{{- if .Values.prometheus.prometheusSpec.enforcedSampleLimit }}
enforcedSampleLimit: {{ .Values.prometheus.prometheusSpec.enforcedSampleLimit }}
{{- end }}
@@ -373,15 +428,28 @@
{{- if .Values.prometheus.prometheusSpec.enforcedLabelValueLengthLimit}}
enforcedLabelValueLengthLimit: {{ .Values.prometheus.prometheusSpec.enforcedLabelValueLengthLimit }}
{{- end }}
-{{- if .Values.prometheus.prometheusSpec.allowOverlappingBlocks }}
+{{- if and (not .Values.prometheus.agentMode) .Values.prometheus.prometheusSpec.allowOverlappingBlocks }}
allowOverlappingBlocks: {{ .Values.prometheus.prometheusSpec.allowOverlappingBlocks }}
{{- end }}
{{- if .Values.prometheus.prometheusSpec.minReadySeconds }}
minReadySeconds: {{ .Values.prometheus.prometheusSpec.minReadySeconds }}
{{- end }}
+{{- if .Values.prometheus.prometheusSpec.maximumStartupDurationSeconds }}
+ maximumStartupDurationSeconds: {{ .Values.prometheus.prometheusSpec.maximumStartupDurationSeconds }}
+{{- end }}
hostNetwork: {{ .Values.prometheus.prometheusSpec.hostNetwork }}
{{- if .Values.prometheus.prometheusSpec.hostAliases }}
hostAliases:
{{ toYaml .Values.prometheus.prometheusSpec.hostAliases | indent 4 }}
{{- end }}
+{{- if .Values.prometheus.prometheusSpec.tracingConfig }}
+ tracingConfig:
+{{ toYaml .Values.prometheus.prometheusSpec.tracingConfig | indent 4 }}
+{{- end }}
+{{- with .Values.prometheus.prometheusSpec.additionalConfig }}
+ {{- tpl (toYaml .) $ | nindent 2 }}
+{{- end }}
+{{- with .Values.prometheus.prometheusSpec.additionalConfigString }}
+ {{- tpl . $ | nindent 2 }}
+{{- end }}
{{- end }}

View File

@ -0,0 +1,24 @@
--- charts-original/templates/prometheus/psp-clusterrole.yaml
+++ charts/templates/prometheus/psp-clusterrole.yaml
@@ -3,10 +3,10 @@
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
- name: {{ template "project-prometheus-stack.fullname" . }}-prometheus-psp
+ name: {{ template "kube-prometheus-stack.fullname" . }}-prometheus-psp
labels:
- app: {{ template "project-prometheus-stack.name" . }}-prometheus
-{{ include "project-prometheus-stack.labels" . | indent 4 }}
+ app: {{ template "kube-prometheus-stack.name" . }}-prometheus
+{{ include "kube-prometheus-stack.labels" . | indent 4 }}
rules:
{{- $kubeTargetVersion := default .Capabilities.KubeVersion.GitVersion .Values.kubeTargetVersionOverride }}
{{- if semverCompare "> 1.15.0-0" $kubeTargetVersion }}
@@ -17,6 +17,6 @@
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- - {{ template "project-prometheus-stack.fullname" . }}-prometheus
+ - {{ template "kube-prometheus-stack.fullname" . }}-prometheus
{{- end }}
{{- end }}

View File

@ -0,0 +1,314 @@
--- charts-original/templates/prometheus/rules-1.14/alertmanager.rules.yaml
+++ charts/templates/prometheus/rules-1.14/alertmanager.rules.yaml
@@ -33,6 +33,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: Configuration has failed to load for {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod{{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerfailedreload
summary: Reloading an Alertmanager configuration has failed.
@@ -40,11 +43,19 @@
# Without max_over_time, failed scrapes could create false negatives, see
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
max_over_time(alertmanager_config_last_reload_successful{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m]) == 0
- for: 10m
+ for: {{ dig "AlertmanagerFailedReload" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerFailedReload" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerMembersInconsistent | default false) }}
@@ -53,6 +64,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: Alertmanager {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod{{`}}`}} has only found {{`{{`}} $value {{`}}`}} members of the {{`{{`}}$labels.job{{`}}`}} cluster.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagermembersinconsistent
summary: A member of an Alertmanager cluster has not found all other cluster members.
@@ -60,13 +74,21 @@
# Without max_over_time, failed scrapes could create false negatives, see
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
max_over_time(alertmanager_cluster_members{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m])
- < on (namespace,service) group_left
- count by (namespace,service) (max_over_time(alertmanager_cluster_members{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m]))
- for: 15m
- labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ < on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) group_left
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) (max_over_time(alertmanager_cluster_members{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m]))
+ for: {{ dig "AlertmanagerMembersInconsistent" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
+ labels:
+ severity: {{ dig "AlertmanagerMembersInconsistent" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerFailedToSendAlerts | default false) }}
@@ -75,6 +97,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: Alertmanager {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod{{`}}`}} failed to send {{`{{`}} $value | humanizePercentage {{`}}`}} of notifications to {{`{{`}} $labels.integration {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerfailedtosendalerts
summary: An Alertmanager instance failed to send notifications.
@@ -82,14 +107,22 @@
(
rate(alertmanager_notifications_failed_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m])
/
- rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m])
+ ignoring (reason) group_left rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m])
)
> 0.01
- for: 5m
+ for: {{ dig "AlertmanagerFailedToSendAlerts" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerFailedToSendAlerts" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerClusterFailedToSendAlerts | default false) }}
@@ -98,21 +131,32 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: The minimum notification failure rate to {{`{{`}} $labels.integration {{`}}`}} sent from any instance in the {{`{{`}}$labels.job{{`}}`}} cluster is {{`{{`}} $value | humanizePercentage {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerclusterfailedtosendalerts
summary: All Alertmanager instances in a cluster failed to send notifications to a critical integration.
expr: |-
- min by (namespace,service, integration) (
+ min by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service, integration) (
rate(alertmanager_notifications_failed_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration=~`.*`}[5m])
/
- rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration=~`.*`}[5m])
+ ignoring (reason) group_left rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration=~`.*`}[5m])
)
> 0.01
- for: 5m
+ for: {{ dig "AlertmanagerClusterFailedToSendAlerts" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerClusterFailedToSendAlerts" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerClusterFailedToSendAlerts | default false) }}
@@ -121,21 +165,32 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: The minimum notification failure rate to {{`{{`}} $labels.integration {{`}}`}} sent from any instance in the {{`{{`}}$labels.job{{`}}`}} cluster is {{`{{`}} $value | humanizePercentage {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerclusterfailedtosendalerts
summary: All Alertmanager instances in a cluster failed to send notifications to a non-critical integration.
expr: |-
- min by (namespace,service, integration) (
+ min by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service, integration) (
rate(alertmanager_notifications_failed_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration!~`.*`}[5m])
/
- rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration!~`.*`}[5m])
+ ignoring (reason) group_left rate(alertmanager_notifications_total{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}", integration!~`.*`}[5m])
)
> 0.01
- for: 5m
+ for: {{ dig "AlertmanagerClusterFailedToSendAlerts" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerClusterFailedToSendAlerts" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerConfigInconsistent | default false) }}
@@ -144,19 +199,30 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: Alertmanager instances within the {{`{{`}}$labels.job{{`}}`}} cluster have different configurations.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerconfiginconsistent
summary: Alertmanager instances within the same cluster have different configurations.
expr: |-
- count by (namespace,service) (
- count_values by (namespace,service) ("config_hash", alertmanager_config_hash{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"})
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service) (
+ count_values by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service) ("config_hash", alertmanager_config_hash{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"})
)
!= 1
- for: 20m
+ for: {{ dig "AlertmanagerConfigInconsistent" "for" "20m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerConfigInconsistent" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerClusterDown | default false) }}
@@ -165,25 +231,36 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: '{{`{{`}} $value | humanizePercentage {{`}}`}} of Alertmanager instances within the {{`{{`}}$labels.job{{`}}`}} cluster have been up for less than half of the last 5m.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerclusterdown
summary: Half or more of the Alertmanager instances within the same cluster are down.
expr: |-
(
- count by (namespace,service) (
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) (
avg_over_time(up{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[5m]) < 0.5
)
/
- count by (namespace,service) (
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) (
up{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}
)
)
>= 0.5
- for: 5m
+ for: {{ dig "AlertmanagerClusterDown" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerClusterDown" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.AlertmanagerClusterCrashlooping | default false) }}
@@ -192,25 +269,36 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.alertmanager | indent 8 }}
+{{- end }}
description: '{{`{{`}} $value | humanizePercentage {{`}}`}} of Alertmanager instances within the {{`{{`}}$labels.job{{`}}`}} cluster have restarted at least 5 times in the last 10m.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/alertmanager/alertmanagerclustercrashlooping
summary: Half or more of the Alertmanager instances within the same cluster are crashlooping.
expr: |-
(
- count by (namespace,service) (
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) (
changes(process_start_time_seconds{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}[10m]) > 4
)
/
- count by (namespace,service) (
+ count by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace,service,cluster) (
up{job="{{ $alertmanagerJob }}",namespace="{{ $namespace }}"}
)
)
>= 0.5
- for: 5m
+ for: {{ dig "AlertmanagerClusterCrashlooping" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "AlertmanagerClusterCrashlooping" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.alertmanager }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -0,0 +1,94 @@
--- charts-original/templates/prometheus/rules-1.14/general.rules.yaml
+++ charts/templates/prometheus/rules-1.14/general.rules.yaml
@@ -30,15 +30,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.general }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.general | indent 8 }}
+{{- end }}
description: '{{`{{`}} printf "%.4g" $value {{`}}`}}% of the {{`{{`}} $labels.job {{`}}`}}/{{`{{`}} $labels.service {{`}}`}} targets in {{`{{`}} $labels.namespace {{`}}`}} namespace are down.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/general/targetdown
summary: One or more targets are unreachable.
- expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10
- for: 10m
+ expr: 100 * (count(up == 0) BY (cluster, job, namespace, service) / count(up) BY (cluster, job, namespace, service)) > 10
+ for: {{ dig "TargetDown" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "TargetDown" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.Watchdog | default false) }}
@@ -47,6 +58,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.general }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.general | indent 8 }}
+{{- end }}
description: 'This is an alert meant to ensure that the entire alerting pipeline is functional.
This alert is always firing, therefore it should always be firing in Alertmanager
@@ -62,9 +76,14 @@
summary: An alert that should always be firing to certify that Alertmanager is working properly.
expr: vector(1)
labels:
- severity: none
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "Watchdog" "severity" "none" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.InfoInhibitor | default false) }}
@@ -73,6 +92,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.general }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.general | indent 8 }}
+{{- end }}
description: 'This is an alert that is used to inhibit info alerts.
By themselves, the info-level alerts are sometimes very noisy, but they are relevant when combined with
@@ -88,11 +110,16 @@
'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/general/infoinhibitor
summary: Info-level alert inhibition.
- expr: ALERTS{severity = "info"} == 1 unless on(namespace) ALERTS{alertname != "InfoInhibitor", severity =~ "warning|critical", alertstate="firing"} == 1
+ expr: ALERTS{severity = "info"} == 1 unless on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace) ALERTS{alertname != "InfoInhibitor", severity =~ "warning|critical", alertstate="firing"} == 1
labels:
- severity: none
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "InfoInhibitor" "severity" "none" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.general }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- end }}
\ No newline at end of file

View File

@ -0,0 +1,632 @@
--- charts-original/templates/prometheus/rules-1.14/kubernetes-apps.yaml
+++ charts/templates/prometheus/rules-1.14/kubernetes-apps.yaml
@@ -31,15 +31,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: 'Pod {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod {{`}}`}} ({{`{{`}} $labels.container {{`}}`}}) is in waiting state (reason: "CrashLoopBackOff").'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubepodcrashlooping
summary: Pod is crash looping.
- expr: max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[5m]) >= 1
- for: 15m
+ expr: max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", namespace=~"{{ $targetNamespace }}"}[5m]) >= 1
+ for: {{ dig "KubePodCrashLooping" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePodCrashLooping" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubePodNotReady | default false) }}
@@ -48,22 +59,33 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: Pod {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod {{`}}`}} has been in a non-ready state for longer than 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubepodnotready
summary: Pod has been in a non-ready state for more than 15 minutes.
expr: |-
- sum by (namespace, pod, cluster) (
- max by(namespace, pod, cluster) (
- kube_pod_status_phase{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}", phase=~"Pending|Unknown|Failed"}
- ) * on(namespace, pod, cluster) group_left(owner_kind) topk by(namespace, pod, cluster) (
- 1, max by(namespace, pod, owner_kind, cluster) (kube_pod_owner{owner_kind!="Job"})
+ sum by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, cluster) (
+ max by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, cluster) (
+ kube_pod_status_phase{namespace=~"{{ $targetNamespace }}", phase=~"Pending|Unknown|Failed"}
+ ) * on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, cluster) group_left(owner_kind) topk by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, cluster) (
+ 1, max by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, owner_kind, cluster) (kube_pod_owner{owner_kind!="Job"})
)
) > 0
- for: 15m
+ for: {{ dig "KubePodNotReady" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePodNotReady" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeDeploymentGenerationMismatch | default false) }}
@@ -72,18 +94,29 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: Deployment generation for {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.deployment {{`}}`}} does not match, this indicates that the Deployment has failed but has not been rolled back.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedeploymentgenerationmismatch
summary: Deployment generation mismatch due to possible roll-back
expr: |-
- kube_deployment_status_observed_generation{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_deployment_status_observed_generation{namespace=~"{{ $targetNamespace }}"}
!=
- kube_deployment_metadata_generation{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
- for: 15m
+ kube_deployment_metadata_generation{namespace=~"{{ $targetNamespace }}"}
+ for: {{ dig "KubeDeploymentGenerationMismatch" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeDeploymentGenerationMismatch" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeDeploymentReplicasMismatch | default false) }}
@@ -92,24 +125,65 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: Deployment {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.deployment {{`}}`}} has not matched the expected number of replicas for longer than 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedeploymentreplicasmismatch
summary: Deployment has not matched the expected number of replicas.
expr: |-
(
- kube_deployment_spec_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_deployment_spec_replicas{namespace=~"{{ $targetNamespace }}"}
>
- kube_deployment_status_replicas_available{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_deployment_status_replicas_available{namespace=~"{{ $targetNamespace }}"}
) and (
- changes(kube_deployment_status_replicas_updated{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[10m])
+ changes(kube_deployment_status_replicas_updated{namespace=~"{{ $targetNamespace }}"}[10m])
==
0
)
- for: 15m
+ for: {{ dig "KubeDeploymentReplicasMismatch" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
+ labels:
+ severity: {{ dig "KubeDeploymentReplicasMismatch" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- end }}
+{{- end }}
+{{- if not (.Values.defaultRules.disabled.KubeDeploymentRolloutStuck | default false) }}
+ - alert: KubeDeploymentRolloutStuck
+ annotations:
+{{- if .Values.defaultRules.additionalRuleAnnotations }}
+{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
+{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
+ description: Rollout of deployment {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.deployment {{`}}`}} is not progressing for longer than 15 minutes.
+ runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedeploymentrolloutstuck
+ summary: Deployment rollout is not progressing.
+ expr: |-
+ kube_deployment_status_condition{condition="Progressing", status="false",namespace=~"{{ $targetNamespace }}"}
+ != 0
+ for: {{ dig "KubeDeploymentRolloutStuck" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeDeploymentRolloutStuck" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeStatefulSetReplicasMismatch | default false) }}
@@ -118,24 +192,35 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: StatefulSet {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.statefulset {{`}}`}} has not matched the expected number of replicas for longer than 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubestatefulsetreplicasmismatch
- summary: Deployment has not matched the expected number of replicas.
+ summary: StatefulSet has not matched the expected number of replicas.
expr: |-
(
- kube_statefulset_status_replicas_ready{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_replicas_ready{namespace=~"{{ $targetNamespace }}"}
!=
- kube_statefulset_status_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_replicas{namespace=~"{{ $targetNamespace }}"}
) and (
- changes(kube_statefulset_status_replicas_updated{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[10m])
+ changes(kube_statefulset_status_replicas_updated{namespace=~"{{ $targetNamespace }}"}[10m])
==
0
)
- for: 15m
+ for: {{ dig "KubeStatefulSetReplicasMismatch" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeStatefulSetReplicasMismatch" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeStatefulSetGenerationMismatch | default false) }}
@@ -144,18 +229,29 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: StatefulSet generation for {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.statefulset {{`}}`}} does not match, this indicates that the StatefulSet has failed but has not been rolled back.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubestatefulsetgenerationmismatch
summary: StatefulSet generation mismatch due to possible roll-back
expr: |-
- kube_statefulset_status_observed_generation{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_observed_generation{namespace=~"{{ $targetNamespace }}"}
!=
- kube_statefulset_metadata_generation{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
- for: 15m
+ kube_statefulset_metadata_generation{namespace=~"{{ $targetNamespace }}"}
+ for: {{ dig "KubeStatefulSetGenerationMismatch" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeStatefulSetGenerationMismatch" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeStatefulSetUpdateNotRolledOut | default false) }}
@@ -164,32 +260,43 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: StatefulSet {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.statefulset {{`}}`}} update has not been rolled out.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubestatefulsetupdatenotrolledout
summary: StatefulSet update has not been rolled out.
expr: |-
(
max without (revision) (
- kube_statefulset_status_current_revision{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_current_revision{namespace=~"{{ $targetNamespace }}"}
unless
- kube_statefulset_status_update_revision{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_update_revision{namespace=~"{{ $targetNamespace }}"}
)
*
(
- kube_statefulset_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_replicas{namespace=~"{{ $targetNamespace }}"}
!=
- kube_statefulset_status_replicas_updated{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_statefulset_status_replicas_updated{namespace=~"{{ $targetNamespace }}"}
)
) and (
- changes(kube_statefulset_status_replicas_updated{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[5m])
+ changes(kube_statefulset_status_replicas_updated{namespace=~"{{ $targetNamespace }}"}[5m])
==
0
)
- for: 15m
+ for: {{ dig "KubeStatefulSetUpdateNotRolledOut" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeStatefulSetUpdateNotRolledOut" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeDaemonSetRolloutStuck | default false) }}
@@ -198,38 +305,49 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: DaemonSet {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.daemonset {{`}}`}} has not finished or progressed for at least 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedaemonsetrolloutstuck
summary: DaemonSet rollout is stuck.
expr: |-
(
(
- kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_current_number_scheduled{namespace=~"{{ $targetNamespace }}"}
!=
- kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_desired_number_scheduled{namespace=~"{{ $targetNamespace }}"}
) or (
- kube_daemonset_status_number_misscheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_number_misscheduled{namespace=~"{{ $targetNamespace }}"}
!=
0
) or (
- kube_daemonset_status_updated_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_updated_number_scheduled{namespace=~"{{ $targetNamespace }}"}
!=
- kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_desired_number_scheduled{namespace=~"{{ $targetNamespace }}"}
) or (
- kube_daemonset_status_number_available{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_number_available{namespace=~"{{ $targetNamespace }}"}
!=
- kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_desired_number_scheduled{namespace=~"{{ $targetNamespace }}"}
)
) and (
- changes(kube_daemonset_status_updated_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[5m])
+ changes(kube_daemonset_status_updated_number_scheduled{namespace=~"{{ $targetNamespace }}"}[5m])
==
0
)
- for: 15m
+ for: {{ dig "KubeDaemonSetRolloutStuck" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeDaemonSetRolloutStuck" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeContainerWaiting | default false) }}
@@ -238,15 +356,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: pod/{{`{{`}} $labels.pod {{`}}`}} in namespace {{`{{`}} $labels.namespace {{`}}`}} on container {{`{{`}} $labels.container{{`}}`}} has been in waiting state for longer than 1 hour.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubecontainerwaiting
summary: Pod container waiting longer than 1 hour
- expr: sum by (namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}) > 0
- for: 1h
+ expr: sum by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{namespace=~"{{ $targetNamespace }}"}) > 0
+ for: {{ dig "KubeContainerWaiting" "for" "1h" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeContainerWaiting" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeDaemonSetNotScheduled | default false) }}
@@ -255,18 +384,29 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: '{{`{{`}} $value {{`}}`}} Pods of DaemonSet {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.daemonset {{`}}`}} are not scheduled.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedaemonsetnotscheduled
summary: DaemonSet pods are not scheduled.
expr: |-
- kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_daemonset_status_desired_number_scheduled{namespace=~"{{ $targetNamespace }}"}
-
- kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"} > 0
- for: 10m
+ kube_daemonset_status_current_number_scheduled{namespace=~"{{ $targetNamespace }}"} > 0
+ for: {{ dig "KubeDaemonSetNotScheduled" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeDaemonSetNotScheduled" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeDaemonSetMisScheduled | default false) }}
@@ -275,15 +415,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: '{{`{{`}} $value {{`}}`}} Pods of DaemonSet {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.daemonset {{`}}`}} are running where they are not supposed to run.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubedaemonsetmisscheduled
summary: DaemonSet pods are misscheduled.
- expr: kube_daemonset_status_number_misscheduled{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"} > 0
- for: 15m
+ expr: kube_daemonset_status_number_misscheduled{namespace=~"{{ $targetNamespace }}"} > 0
+ for: {{ dig "KubeDaemonSetMisScheduled" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeDaemonSetMisScheduled" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeJobNotCompleted | default false) }}
@@ -292,17 +443,25 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: Job {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.job_name {{`}}`}} is taking more than {{`{{`}} "43200" | humanizeDuration {{`}}`}} to complete.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubejobnotcompleted
summary: Job did not complete in time
expr: |-
- time() - max by(namespace, job_name, cluster) (kube_job_status_start_time{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ time() - max by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}namespace, job_name, cluster) (kube_job_status_start_time{namespace=~"{{ $targetNamespace }}"}
and
- kube_job_status_active{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"} > 0) > 43200
+ kube_job_status_active{namespace=~"{{ $targetNamespace }}"} > 0) > 43200
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeJobNotCompleted" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeJobFailed | default false) }}
@@ -311,15 +470,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: Job {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.job_name {{`}}`}} failed to complete. Removing failed job after investigation should clear this alert.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubejobfailed
summary: Job failed to complete.
- expr: kube_job_failed{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"} > 0
- for: 15m
+ expr: kube_job_failed{namespace=~"{{ $targetNamespace }}"} > 0
+ for: {{ dig "KubeJobFailed" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeJobFailed" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeHpaReplicasMismatch | default false) }}
@@ -328,28 +498,39 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.horizontalpodautoscaler {{`}}`}} has not matched the desired number of replicas for longer than 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubehpareplicasmismatch
summary: HPA has not matched desired number of replicas.
expr: |-
- (kube_horizontalpodautoscaler_status_desired_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ (kube_horizontalpodautoscaler_status_desired_replicas{namespace=~"{{ $targetNamespace }}"}
!=
- kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"})
+ kube_horizontalpodautoscaler_status_current_replicas{namespace=~"{{ $targetNamespace }}"})
and
- (kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ (kube_horizontalpodautoscaler_status_current_replicas{namespace=~"{{ $targetNamespace }}"}
>
- kube_horizontalpodautoscaler_spec_min_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"})
+ kube_horizontalpodautoscaler_spec_min_replicas{namespace=~"{{ $targetNamespace }}"})
and
- (kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ (kube_horizontalpodautoscaler_status_current_replicas{namespace=~"{{ $targetNamespace }}"}
<
- kube_horizontalpodautoscaler_spec_max_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"})
+ kube_horizontalpodautoscaler_spec_max_replicas{namespace=~"{{ $targetNamespace }}"})
and
- changes(kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}[15m]) == 0
- for: 15m
+ changes(kube_horizontalpodautoscaler_status_current_replicas{namespace=~"{{ $targetNamespace }}"}[15m]) == 0
+ for: {{ dig "KubeHpaReplicasMismatch" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeHpaReplicasMismatch" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubeHpaMaxedOut | default false) }}
@@ -358,18 +539,29 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesApps | indent 8 }}
+{{- end }}
description: HPA {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.horizontalpodautoscaler {{`}}`}} has been running at max replicas for longer than 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubehpamaxedout
summary: HPA is running at max replicas
expr: |-
- kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
+ kube_horizontalpodautoscaler_status_current_replicas{namespace=~"{{ $targetNamespace }}"}
==
- kube_horizontalpodautoscaler_spec_max_replicas{job="kube-state-metrics", namespace=~"{{ $targetNamespace }}"}
- for: 15m
+ kube_horizontalpodautoscaler_spec_max_replicas{namespace=~"{{ $targetNamespace }}"}
+ for: {{ dig "KubeHpaMaxedOut" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubeHpaMaxedOut" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesApps }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- end }}
\ No newline at end of file

View File

@ -0,0 +1,153 @@
--- charts-original/templates/prometheus/rules-1.14/kubernetes-storage.yaml
+++ charts/templates/prometheus/rules-1.14/kubernetes-storage.yaml
@@ -31,6 +31,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesStorage }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesStorage | indent 8 }}
+{{- end }}
description: The PersistentVolume claimed by {{`{{`}} $labels.persistentvolumeclaim {{`}}`}} in Namespace {{`{{`}} $labels.namespace {{`}}`}} is only {{`{{`}} $value | humanizePercentage {{`}}`}} free.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubepersistentvolumefillingup
summary: PersistentVolume is filling up.
@@ -43,13 +46,21 @@
kubelet_volume_stats_used_bytes{namespace=~"{{ $targetNamespace }}", metrics_path="/metrics"} > 0
unless on(namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster, namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
- for: 1m
+ for: {{ dig "KubePersistentVolumeFillingUp" "for" "1m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePersistentVolumeFillingUp" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubePersistentVolumeFillingUp | default false) }}
@@ -73,13 +84,21 @@
predict_linear(kubelet_volume_stats_available_bytes{namespace=~"{{ $targetNamespace }}", metrics_path="/metrics"}[6h], 4 * 24 * 3600) < 0
unless on(namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster, namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
- for: 1h
+ for: {{ dig "KubePersistentVolumeFillingUp" "for" "1h" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePersistentVolumeFillingUp" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubePersistentVolumeInodesFillingUp | default false) }}
@@ -101,13 +120,21 @@
kubelet_volume_stats_inodes_used{namespace=~"{{ $targetNamespace }}", metrics_path="/metrics"} > 0
unless on(namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster, namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
- for: 1m
+ for: {{ dig "KubePersistentVolumeInodesFillingUp" "for" "1m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePersistentVolumeInodesFillingUp" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubePersistentVolumeInodesFillingUp | default false) }}
@@ -131,13 +158,21 @@
predict_linear(kubelet_volume_stats_inodes_free{namespace=~"{{ $targetNamespace }}", metrics_path="/metrics"}[6h], 4 * 24 * 3600) < 0
unless on(namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster, namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
- for: 1h
+ for: {{ dig "KubePersistentVolumeInodesFillingUp" "for" "1h" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePersistentVolumeInodesFillingUp" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.KubePersistentVolumeErrors | default false) }}
@@ -146,15 +181,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesStorage }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.kubernetesStorage | indent 8 }}
+{{- end }}
description: The persistent volume {{`{{`}} $labels.persistentvolume {{`}}`}} has status {{`{{`}} $labels.phase {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/kubernetes/kubepersistentvolumeerrors
summary: PersistentVolume is having issues with provisioning.
- expr: kube_persistentvolume_status_phase{phase=~"Failed|Pending",job="kube-state-metrics"} > 0
- for: 5m
+ expr: kube_persistentvolume_status_phase{phase=~"Failed|Pending"} > 0
+ for: {{ dig "KubePersistentVolumeErrors" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "KubePersistentVolumeErrors" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.kubernetesStorage }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- end }}
\ No newline at end of file

View File

@ -0,0 +1,716 @@
--- charts-original/templates/prometheus/rules-1.14/prometheus.yaml
+++ charts/templates/prometheus/rules-1.14/prometheus.yaml
@@ -32,6 +32,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has failed to reload its configuration.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusbadconfig
summary: Failed Prometheus configuration reload.
@@ -39,11 +42,47 @@
# Without max_over_time, failed scrapes could create false negatives, see
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
max_over_time(prometheus_config_last_reload_successful{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) == 0
- for: 10m
+ for: {{ dig "PrometheusBadConfig" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusBadConfig" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- end }}
+{{- end }}
+{{- if not (.Values.defaultRules.disabled.PrometheusSDRefreshFailure | default false) }}
+ - alert: PrometheusSDRefreshFailure
+ annotations:
+{{- if .Values.defaultRules.additionalRuleAnnotations }}
+{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
+{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
+ description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has failed to refresh SD with mechanism {{`{{`}}$labels.mechanism{{`}}`}}.
+ runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheussdrefreshfailure
+ summary: Failed Prometheus SD refresh.
+ expr: increase(prometheus_sd_refresh_failures_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[10m]) > 0
+ for: {{ dig "PrometheusSDRefreshFailure" "for" "20m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
+ labels:
+ severity: {{ dig "PrometheusSDRefreshFailure" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusNotificationQueueRunningFull | default false) }}
@@ -52,6 +91,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Alert notification queue of Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} is running full.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusnotificationqueuerunningfull
summary: Prometheus alert notification queue predicted to run full in less than 30m.
@@ -63,11 +105,19 @@
>
min_over_time(prometheus_notifications_queue_capacity{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m])
)
- for: 15m
+ for: {{ dig "PrometheusNotificationQueueRunningFull" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusNotificationQueueRunningFull" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusErrorSendingAlertsToSomeAlertmanagers | default false) }}
@@ -76,6 +126,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: '{{`{{`}} printf "%.1f" $value {{`}}`}}% errors while sending alerts from Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} to Alertmanager {{`{{`}}$labels.alertmanager{{`}}`}}.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheuserrorsendingalertstosomealertmanagers
summary: Prometheus has encountered more than 1% errors sending alerts to a specific Alertmanager.
@@ -87,11 +140,19 @@
)
* 100
> 1
- for: 15m
+ for: {{ dig "PrometheusErrorSendingAlertsToSomeAlertmanagers" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusErrorSendingAlertsToSomeAlertmanagers" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusNotConnectedToAlertmanagers | default false) }}
@@ -100,6 +161,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} is not connected to any Alertmanagers.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusnotconnectedtoalertmanagers
summary: Prometheus is not connected to any Alertmanagers.
@@ -107,11 +171,19 @@
# Without max_over_time, failed scrapes could create false negatives, see
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
max_over_time(prometheus_notifications_alertmanagers_discovered{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) < 1
- for: 10m
+ for: {{ dig "PrometheusNotConnectedToAlertmanagers" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusNotConnectedToAlertmanagers" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusTSDBReloadsFailing | default false) }}
@@ -120,15 +192,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has detected {{`{{`}}$value | humanize{{`}}`}} reload failures over the last 3h.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheustsdbreloadsfailing
summary: Prometheus has issues reloading blocks from disk.
expr: increase(prometheus_tsdb_reloads_failures_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[3h]) > 0
- for: 4h
+ for: {{ dig "PrometheusTSDBReloadsFailing" "for" "4h" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusTSDBReloadsFailing" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusTSDBCompactionsFailing | default false) }}
@@ -137,15 +220,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has detected {{`{{`}}$value | humanize{{`}}`}} compaction failures over the last 3h.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheustsdbcompactionsfailing
summary: Prometheus has issues compacting blocks.
expr: increase(prometheus_tsdb_compactions_failed_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[3h]) > 0
- for: 4h
+ for: {{ dig "PrometheusTSDBCompactionsFailing" "for" "4h" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusTSDBCompactionsFailing" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusNotIngestingSamples | default false) }}
@@ -154,12 +248,15 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} is not ingesting samples.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusnotingestingsamples
summary: Prometheus is not ingesting samples.
expr: |-
(
- rate(prometheus_tsdb_head_samples_appended_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) <= 0
+ sum without(type) (rate(prometheus_tsdb_head_samples_appended_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m])) <= 0
and
(
sum without(scrape_job) (prometheus_target_metadata_cache_entries{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}) > 0
@@ -167,11 +264,19 @@
sum without(rule_group) (prometheus_rule_group_rules{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}) > 0
)
)
- for: 10m
+ for: {{ dig "PrometheusNotIngestingSamples" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusNotIngestingSamples" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusDuplicateTimestamps | default false) }}
@@ -180,15 +285,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} is dropping {{`{{`}} printf "%.4g" $value {{`}}`}} samples/s with different values but duplicated timestamp.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusduplicatetimestamps
summary: Prometheus is dropping samples with duplicate timestamps.
expr: rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 10m
+ for: {{ dig "PrometheusDuplicateTimestamps" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusDuplicateTimestamps" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusOutOfOrderTimestamps | default false) }}
@@ -197,15 +313,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} is dropping {{`{{`}} printf "%.4g" $value {{`}}`}} samples/s with timestamps arriving out of order.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusoutofordertimestamps
summary: Prometheus drops samples with out-of-order timestamps.
expr: rate(prometheus_target_scrapes_sample_out_of_order_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 10m
+ for: {{ dig "PrometheusOutOfOrderTimestamps" "for" "10m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusOutOfOrderTimestamps" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusRemoteStorageFailures | default false) }}
@@ -214,6 +341,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} failed to send {{`{{`}} printf "%.1f" $value {{`}}`}}% of the samples to {{`{{`}} $labels.remote_name{{`}}`}}:{{`{{`}} $labels.url {{`}}`}}
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusremotestoragefailures
summary: Prometheus fails to send samples to remote storage.
@@ -229,11 +359,19 @@
)
* 100
> 1
- for: 15m
+ for: {{ dig "PrometheusRemoteStorageFailures" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusRemoteStorageFailures" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusRemoteWriteBehind | default false) }}
@@ -242,6 +380,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} remote write is {{`{{`}} printf "%.1f" $value {{`}}`}}s behind for {{`{{`}} $labels.remote_name{{`}}`}}:{{`{{`}} $labels.url {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusremotewritebehind
summary: Prometheus remote write is behind.
@@ -254,11 +395,19 @@
max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m])
)
> 120
- for: 15m
+ for: {{ dig "PrometheusRemoteWriteBehind" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusRemoteWriteBehind" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusRemoteWriteDesiredShards | default false) }}
@@ -267,6 +416,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} remote write desired shards calculation wants to run {{`{{`}} $value {{`}}`}} shards for queue {{`{{`}} $labels.remote_name{{`}}`}}:{{`{{`}} $labels.url {{`}}`}}, which is more than the max of {{`{{`}} printf `prometheus_remote_storage_shards_max{instance="%s",job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}` $labels.instance | query | first | value {{`}}`}}.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusremotewritedesiredshards
summary: Prometheus remote write desired shards calculation wants to run more than configured max shards.
@@ -278,11 +430,19 @@
>
max_over_time(prometheus_remote_storage_shards_max{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m])
)
- for: 15m
+ for: {{ dig "PrometheusRemoteWriteDesiredShards" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusRemoteWriteDesiredShards" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusRuleFailures | default false) }}
@@ -291,15 +451,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has failed to evaluate {{`{{`}} printf "%.0f" $value {{`}}`}} rules in the last 5m.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusrulefailures
summary: Prometheus is failing rule evaluations.
expr: increase(prometheus_rule_evaluation_failures_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusRuleFailures" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusRuleFailures" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusMissingRuleEvaluations | default false) }}
@@ -308,15 +479,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has missed {{`{{`}} printf "%.0f" $value {{`}}`}} rule group evaluations in the last 5m.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusmissingruleevaluations
summary: Prometheus is missing rule evaluations due to slow rule group evaluation.
expr: increase(prometheus_rule_group_iterations_missed_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusMissingRuleEvaluations" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusMissingRuleEvaluations" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusTargetLimitHit | default false) }}
@@ -325,15 +507,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has dropped {{`{{`}} printf "%.0f" $value {{`}}`}} targets because the number of targets exceeded the configured target_limit.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheustargetlimithit
summary: Prometheus has dropped targets because some scrape configs have exceeded the targets limit.
expr: increase(prometheus_target_scrape_pool_exceeded_target_limit_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusTargetLimitHit" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusTargetLimitHit" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusLabelLimitHit | default false) }}
@@ -342,15 +535,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has dropped {{`{{`}} printf "%.0f" $value {{`}}`}} targets because some samples exceeded the configured label_limit, label_name_length_limit or label_value_length_limit.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheuslabellimithit
summary: Prometheus has dropped targets because some scrape configs have exceeded the labels limit.
expr: increase(prometheus_target_scrape_pool_exceeded_label_limits_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusLabelLimitHit" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusLabelLimitHit" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusScrapeBodySizeLimitHit | default false) }}
@@ -359,15 +563,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has failed {{`{{`}} printf "%.0f" $value {{`}}`}} scrapes in the last 5m because some targets exceeded the configured body_size_limit.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusscrapebodysizelimithit
summary: Prometheus has dropped some targets that exceeded body size limit.
expr: increase(prometheus_target_scrapes_exceeded_body_size_limit_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusScrapeBodySizeLimitHit" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusScrapeBodySizeLimitHit" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusScrapeSampleLimitHit | default false) }}
@@ -376,15 +591,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} has failed {{`{{`}} printf "%.0f" $value {{`}}`}} scrapes in the last 5m because some targets exceeded the configured sample_limit.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheusscrapesamplelimithit
summary: Prometheus has failed scrapes that have exceeded the configured sample limit.
expr: increase(prometheus_target_scrapes_exceeded_sample_limit_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0
- for: 15m
+ for: {{ dig "PrometheusScrapeSampleLimitHit" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusScrapeSampleLimitHit" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusTargetSyncFailure | default false) }}
@@ -393,15 +619,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: '{{`{{`}} printf "%.0f" $value {{`}}`}} targets in Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} have failed to sync because invalid configuration was supplied.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheustargetsyncfailure
summary: Prometheus has failed to sync targets.
expr: increase(prometheus_target_sync_failed_total{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[30m]) > 0
- for: 5m
+ for: {{ dig "PrometheusTargetSyncFailure" "for" "5m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusTargetSyncFailure" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusHighQueryLoad | default false) }}
@@ -410,15 +647,26 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} query API has less than 20% available capacity in its query engine for the last 15 minutes.
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheushighqueryload
summary: Prometheus is reaching its maximum capacity serving concurrent requests.
expr: avg_over_time(prometheus_engine_queries{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) / max_over_time(prometheus_engine_queries_concurrent_max{job="{{ $prometheusJob }}",namespace="{{ $namespace }}"}[5m]) > 0.8
- for: 15m
+ for: {{ dig "PrometheusHighQueryLoad" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: warning
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusHighQueryLoad" "severity" "warning" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- if not (.Values.defaultRules.disabled.PrometheusErrorSendingAlertsToAnyAlertmanager | default false) }}
@@ -427,6 +675,9 @@
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
+{{- if .Values.defaultRules.additionalRuleGroupAnnotations.prometheus }}
+{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.prometheus | indent 8 }}
+{{- end }}
description: '{{`{{`}} printf "%.1f" $value {{`}}`}}% minimum errors while sending alerts from Prometheus {{`{{`}}$labels.namespace{{`}}`}}/{{`{{`}}$labels.pod{{`}}`}} to any Alertmanager.'
runbook_url: {{ .Values.defaultRules.runbookUrl }}/prometheus/prometheuserrorsendingalertstoanyalertmanager
summary: Prometheus encounters more than 3% errors sending alerts to any Alertmanager.
@@ -438,11 +689,19 @@
)
* 100
> 3
- for: 15m
+ for: {{ dig "PrometheusErrorSendingAlertsToAnyAlertmanager" "for" "15m" .Values.customRules }}
+ {{- with .Values.defaultRules.keepFiringFor }}
+ keep_firing_for: "{{ . }}"
+ {{- end }}
labels:
- severity: critical
- {{- if .Values.defaultRules.additionalRuleLabels }}
- {{ toYaml .Values.defaultRules.additionalRuleLabels | nindent 8 }}
+ severity: {{ dig "PrometheusErrorSendingAlertsToAnyAlertmanager" "severity" "critical" .Values.customRules }}
+ {{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- with .Values.defaultRules.additionalRuleLabels }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
+ {{- with .Values.defaultRules.additionalRuleGroupLabels.prometheus }}
+ {{- toYaml . | nindent 8 }}
+ {{- end }}
{{- end }}
{{- end }}
{{- end }}
\ No newline at end of file

View File

@ -0,0 +1,39 @@
--- charts-original/templates/prometheus/service.yaml
+++ charts/templates/prometheus/service.yaml
@@ -1,3 +1,4 @@
+{{- $kubeTargetVersion := default .Capabilities.KubeVersion.GitVersion .Values.kubeTargetVersionOverride }}
{{- if .Values.prometheus.enabled }}
apiVersion: v1
kind: Service
@@ -42,15 +43,30 @@
{{- end }}
port: {{ .Values.prometheus.service.port }}
targetPort: {{ .Values.prometheus.service.targetPort }}
+ - name: reloader-web
+ {{- if semverCompare "> 1.20.0-0" $kubeTargetVersion }}
+ appProtocol: http
+ {{- end }}
+ port: {{ .Values.prometheus.service.reloaderWebPort }}
+ targetPort: reloader-web
{{- if .Values.prometheus.service.additionalPorts }}
{{ toYaml .Values.prometheus.service.additionalPorts | indent 2 }}
{{- end }}
publishNotReadyAddresses: {{ .Values.prometheus.service.publishNotReadyAddresses }}
selector:
+ {{- if .Values.prometheus.agentMode }}
+ app.kubernetes.io/name: prometheus-agent
+ {{- else }}
app.kubernetes.io/name: prometheus
- prometheus: {{ template "project-prometheus-stack.prometheus.crname" . }}
+ {{- end }}
+ operator.prometheus.io/name: {{ template "project-prometheus-stack.prometheus.crname" . }}
{{- if .Values.prometheus.service.sessionAffinity }}
sessionAffinity: {{ .Values.prometheus.service.sessionAffinity }}
{{- end }}
+{{- if eq .Values.prometheus.service.sessionAffinity "ClientIP" }}
+ sessionAffinityConfig:
+ clientIP:
+ timeoutSeconds: {{ .Values.prometheus.service.sessionAffinityConfig.clientIP.timeoutSeconds }}
+{{- end }}
type: "{{ .Values.prometheus.service.type }}"
{{- end }}

View File

@ -0,0 +1,75 @@
--- charts-original/templates/prometheus/servicemonitor.yaml
+++ charts/templates/prometheus/servicemonitor.yaml
@@ -29,15 +29,15 @@
scheme: {{ .Values.prometheus.serviceMonitor.scheme }}
{{- end }}
{{- if .Values.prometheus.serviceMonitor.tlsConfig }}
- tlsConfig: {{ toYaml .Values.prometheus.serviceMonitor.tlsConfig | nindent 6 }}
+ tlsConfig: {{- toYaml .Values.prometheus.serviceMonitor.tlsConfig | nindent 6 }}
{{- end }}
{{- if .Values.prometheus.serviceMonitor.bearerTokenFile }}
bearerTokenFile: {{ .Values.prometheus.serviceMonitor.bearerTokenFile }}
{{- end }}
path: "{{ trimSuffix "/" .Values.prometheus.prometheusSpec.routePrefix }}/metrics"
- metricRelabelings:
+ metricRelabelings:
{{- if .Values.prometheus.serviceMonitor.metricRelabelings }}
- {{ tpl (toYaml .Values.prometheus.serviceMonitor.metricRelabelings | indent 6) . }}
+ {{- tpl (toYaml .Values.prometheus.serviceMonitor.metricRelabelings | nindent 6) . }}
{{- end }}
{{ if .Values.global.cattle.clusterId }}
- sourceLabels: [__address__]
@@ -49,8 +49,49 @@
targetLabel: cluster_name
replacement: {{ .Values.global.cattle.clusterName }}
{{- end }}
-{{- if .Values.prometheus.serviceMonitor.relabelings }}
- relabelings:
-{{ toYaml .Values.prometheus.serviceMonitor.relabelings | indent 6 }}
-{{- end }}
+ {{- if .Values.prometheus.serviceMonitor.relabelings }}
+ relabelings: {{- toYaml .Values.prometheus.serviceMonitor.relabelings | nindent 6 }}
+ {{- end }}
+ - port: reloader-web
+ {{- if .Values.prometheus.serviceMonitor.interval }}
+ interval: {{ .Values.prometheus.serviceMonitor.interval }}
+ {{- end }}
+ {{- if .Values.prometheus.serviceMonitor.scheme }}
+ scheme: {{ .Values.prometheus.serviceMonitor.scheme }}
+ {{- end }}
+ {{- if .Values.prometheus.serviceMonitor.tlsConfig }}
+ tlsConfig: {{- toYaml .Values.prometheus.serviceMonitor.tlsConfig | nindent 6 }}
+ {{- end }}
+ path: "/metrics"
+ {{- if .Values.prometheus.serviceMonitor.metricRelabelings }}
+ metricRelabelings: {{- tpl (toYaml .Values.prometheus.serviceMonitor.metricRelabelings | nindent 6) . }}
+ {{- end }}
+ {{- if .Values.prometheus.serviceMonitor.relabelings }}
+ relabelings: {{- toYaml .Values.prometheus.serviceMonitor.relabelings | nindent 6 }}
+ {{- end }}
+ {{- range .Values.prometheus.serviceMonitor.additionalEndpoints }}
+ - port: {{ .port }}
+ {{- if or $.Values.prometheus.serviceMonitor.interval .interval }}
+ interval: {{ default $.Values.prometheus.serviceMonitor.interval .interval }}
+ {{- end }}
+ {{- if or $.Values.prometheus.serviceMonitor.proxyUrl .proxyUrl }}
+ proxyUrl: {{ default $.Values.prometheus.serviceMonitor.proxyUrl .proxyUrl }}
+ {{- end }}
+ {{- if or $.Values.prometheus.serviceMonitor.scheme .scheme }}
+ scheme: {{ default $.Values.prometheus.serviceMonitor.scheme .scheme }}
+ {{- end }}
+ {{- if or $.Values.prometheus.serviceMonitor.bearerTokenFile .bearerTokenFile }}
+ bearerTokenFile: {{ default $.Values.prometheus.serviceMonitor.bearerTokenFile .bearerTokenFile }}
+ {{- end }}
+ {{- if or $.Values.prometheus.serviceMonitor.tlsConfig .tlsConfig }}
+ tlsConfig: {{- default $.Values.prometheus.serviceMonitor.tlsConfig .tlsConfig | toYaml | nindent 6 }}
+ {{- end }}
+ path: {{ .path }}
+ {{- if or $.Values.prometheus.serviceMonitor.metricRelabelings .metricRelabelings }}
+ metricRelabelings: {{- tpl (default $.Values.prometheus.serviceMonitor.metricRelabelings .metricRelabelings | toYaml | nindent 6) . }}
+ {{- end }}
+ {{- if or $.Values.prometheus.serviceMonitor.relabelings .relabelings }}
+ relabelings: {{- default $.Values.prometheus.serviceMonitor.relabelings .relabelings | toYaml | nindent 6 }}
+ {{- end }}
+ {{- end }}
{{- end }}

View File

@ -0,0 +1,13 @@
--- charts-original/templates/validate-install-crd.yaml
+++ charts/templates/validate-install-crd.yaml
@@ -4,8 +4,10 @@
# {{- set $found "monitoring.coreos.com/v1/Alertmanager" false -}}
# {{- set $found "monitoring.coreos.com/v1/PodMonitor" false -}}
# {{- set $found "monitoring.coreos.com/v1/Probe" false -}}
+# {{- set $found "monitoring.coreos.com/v1alpha1/PrometheusAgent" false -}}
# {{- set $found "monitoring.coreos.com/v1/Prometheus" false -}}
# {{- set $found "monitoring.coreos.com/v1/PrometheusRule" false -}}
+# {{- set $found "monitoring.coreos.com/v1alpha1/ScrapeConfig" false -}}
# {{- set $found "monitoring.coreos.com/v1/ServiceMonitor" false -}}
# {{- set $found "monitoring.coreos.com/v1/ThanosRuler" false -}}
# {{- range .Capabilities.APIVersions -}}

View File

@ -0,0 +1,556 @@
--- charts-original/values.yaml
+++ charts/values.yaml
@@ -49,6 +49,10 @@
# scmhash: abc123
# myLabel: aakkmd
+## custom Rules to override "for" and "severity" in defaultRules
+##
+customRules: {}
+
## Create default rules for monitoring the cluster
##
defaultRules:
@@ -74,6 +78,76 @@
## Additional annotations for PrometheusRule alerts
additionalRuleAnnotations: {}
+ ## Additional labels for specific PrometheusRule alert groups
+ additionalRuleGroupLabels:
+ alertmanager: {}
+ etcd: {}
+ configReloaders: {}
+ general: {}
+ k8sContainerCpuUsageSecondsTotal: {}
+ k8sContainerMemoryCache: {}
+ k8sContainerMemoryRss: {}
+ k8sContainerMemorySwap: {}
+ k8sContainerResource: {}
+ k8sPodOwner: {}
+ kubeApiserverAvailability: {}
+ kubeApiserverBurnrate: {}
+ kubeApiserverHistogram: {}
+ kubeApiserverSlos: {}
+ kubeControllerManager: {}
+ kubelet: {}
+ kubeProxy: {}
+ kubePrometheusGeneral: {}
+ kubePrometheusNodeRecording: {}
+ kubernetesApps: {}
+ kubernetesResources: {}
+ kubernetesStorage: {}
+ kubernetesSystem: {}
+ kubeSchedulerAlerting: {}
+ kubeSchedulerRecording: {}
+ kubeStateMetrics: {}
+ network: {}
+ node: {}
+ nodeExporterAlerting: {}
+ nodeExporterRecording: {}
+ prometheus: {}
+ prometheusOperator: {}
+
+ ## Additional annotations for specific PrometheusRule alerts groups
+ additionalRuleGroupAnnotations:
+ alertmanager: {}
+ etcd: {}
+ configReloaders: {}
+ general: {}
+ k8sContainerCpuUsageSecondsTotal: {}
+ k8sContainerMemoryCache: {}
+ k8sContainerMemoryRss: {}
+ k8sContainerMemorySwap: {}
+ k8sContainerResource: {}
+ k8sPodOwner: {}
+ kubeApiserverAvailability: {}
+ kubeApiserverBurnrate: {}
+ kubeApiserverHistogram: {}
+ kubeApiserverSlos: {}
+ kubeControllerManager: {}
+ kubelet: {}
+ kubeProxy: {}
+ kubePrometheusGeneral: {}
+ kubePrometheusNodeRecording: {}
+ kubernetesApps: {}
+ kubernetesResources: {}
+ kubernetesStorage: {}
+ kubernetesSystem: {}
+ kubeSchedulerAlerting: {}
+ kubeSchedulerRecording: {}
+ kubeStateMetrics: {}
+ network: {}
+ node: {}
+ nodeExporterAlerting: {}
+ nodeExporterRecording: {}
+ prometheus: {}
+ prometheusOperator: {}
+
## Prefix for runbook URLs. Use this to override the first part of the runbookURLs that is common to all rules.
runbookUrl: "https://runbooks.prometheus-operator.dev/runbooks"
@@ -112,7 +186,7 @@
create: true
userRoles:
- ## Create default user Roles that the Helm Project Operator will automatically create RoleBindings for
+ ## Create default user ClusterRoles to allow users to interact with Prometheus CRs, ConfigMaps, and Secrets
##
## How does this work?
##
@@ -126,7 +200,7 @@
## can be configured to use a different set of ClusterRoles as the source of truth for admin, edit, and view permissions.
##
create: true
- ## Add labels to Roles
+ ## Aggregate default user ClusterRoles into default k8s ClusterRoles
aggregateToDefaultRoles: true
pspAnnotations: {}
@@ -147,6 +221,10 @@
# or
# - "image-pull-secret"
+windowsMonitoring:
+ ## Deploys the windows-exporter and Windows-specific dashboards and rules (job name must be 'windows-exporter')
+ enabled: false
+
federate:
## enabled indicates whether to add federation to any Project Prometheus Stacks by default
## If not enabled, no federation will be turned on
@@ -184,6 +262,7 @@
create: true
name: ""
annotations: {}
+ automountServiceAccountToken: true
## Configure pod disruption budgets for Alertmanager
## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget
@@ -223,6 +302,8 @@
- 'severity = info'
equal:
- 'namespace'
+ - target_matchers:
+ - 'alertname = InfoInhibitor'
route:
group_by: ['namespace']
group_wait: 30s
@@ -339,8 +420,11 @@
labels: {}
- ## Redirect ingress to an additional defined port on the service
+ ## Override ingress to a different defined port on the service
# servicePort: 8081
+ ## Override ingress to a different service then the default, this is useful if you need to
+ ## point to a specific instance of the alertmanager (eg kube-prometheus-stack-alertmanager-0)
+ # serviceName: kube-prometheus-stack-alertmanager-0
## Hosts must be provided if Ingress is enabled.
##
@@ -391,11 +475,14 @@
##
## Additional ports to open for Alertmanager service
+ ##
additionalPorts: []
- # additionalPorts:
- # - name: authenticated
+ # - name: oauth-proxy
# port: 8081
# targetPort: 8081
+ # - name: oauth-metrics
+ # port: 8082
+ # targetPort: 8082
externalIPs: []
loadBalancerIP: ""
@@ -462,7 +549,7 @@
##
image:
repository: rancher/mirrored-prometheus-alertmanager
- tag: v0.25.0
+ tag: v0.27.0
sha: ""
## If true then the user will be responsible to provide a secret with alertmanager configuration
@@ -475,6 +562,10 @@
##
secrets: []
+ ## If false then the user will opt out of automounting API credentials.
+ ##
+ automountServiceAccountToken: true
+
## ConfigMaps is a list of ConfigMaps in the same namespace as the Alertmanager object, which shall be mounted into the Alertmanager Pods.
## The ConfigMaps are mounted into /etc/alertmanager/configmaps/.
##
@@ -506,6 +597,13 @@
# alertmanagerConfiguration:
# name: global-alertmanager-Configuration
+ ## Defines the strategy used by AlertmanagerConfig objects to match alerts. eg:
+ ##
+ alertmanagerConfigMatcherStrategy: {}
+ ## Example with use OnNamespace strategy
+ # alertmanagerConfigMatcherStrategy:
+ # type: OnNamespace
+
## Define Log Format
# Use logfmt (default) or json logging
logFormat: logfmt
@@ -546,6 +644,13 @@
##
routePrefix: /
+ ## scheme: HTTP scheme to use. Can be used with `tlsConfig` for example if using istio mTLS.
+ scheme: ""
+
+ ## tlsConfig: TLS configuration to use when connect to the endpoint. For example if using istio mTLS.
+ ## Of type: https://github.com/coreos/prometheus-operator/blob/main/Documentation/api.md#tlsconfig
+ tlsConfig: {}
+
## If set to true all actions on the underlying managed objects are not going to be performed, except for delete actions.
##
paused: false
@@ -621,6 +726,8 @@
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
+ seccompProfile:
+ type: RuntimeDefault
## ListenLocal makes the Alertmanager server listen on loopback, so that it does not bind against the Pod IP.
## Note this is only for the Alertmanager UI, not the gossip communication.
@@ -632,15 +739,19 @@
containers: []
# containers:
# - name: oauth-proxy
- # image: quay.io/oauth2-proxy/oauth2-proxy:v7.3.0
+ # image: quay.io/oauth2-proxy/oauth2-proxy:v7.5.1
# args:
# - --upstream=http://127.0.0.1:9093
# - --http-address=0.0.0.0:8081
+ # - --metrics-address=0.0.0.0:8082
# - ...
# ports:
# - containerPort: 8081
# name: oauth-proxy
# protocol: TCP
+ # - containerPort: 8082
+ # name: oauth-metrics
+ # protocol: TCP
# resources: {}
# Additional volumes on the output StatefulSet definition.
@@ -669,6 +780,18 @@
##
clusterAdvertiseAddress: false
+ ## clusterGossipInterval determines interval between gossip attempts.
+ ## Needs to be specified as GoDuration, a time duration that can be parsed by Gos time.ParseDuration() (e.g. 45ms, 30s, 1m, 1h20m15s)
+ clusterGossipInterval: ""
+
+ ## clusterPeerTimeout determines timeout for cluster peering.
+ ## Needs to be specified as GoDuration, a time duration that can be parsed by Gos time.ParseDuration() (e.g. 45ms, 30s, 1m, 1h20m15s)
+ clusterPeerTimeout: ""
+
+ ## clusterPushpullInterval determines interval between pushpull attempts.
+ ## Needs to be specified as GoDuration, a time duration that can be parsed by Gos time.ParseDuration() (e.g. 45ms, 30s, 1m, 1h20m15s)
+ clusterPushpullInterval: ""
+
## ForceEnableClusterMode ensures Alertmanager does not deactivate the cluster mode when running with a single replica.
## Use case is e.g. spanning an Alertmanager cluster across Kubernetes clusters with a single replica in each.
forceEnableClusterMode: false
@@ -677,6 +800,14 @@
## be considered available. Defaults to 0 (pod will be considered available as soon as it is ready).
minReadySeconds: 0
+ ## Additional configuration which is not covered by the properties above. (passed through tpl)
+ additionalConfig: {}
+
+ ## Additional configuration which is not covered by the properties above.
+ ## Useful, if you need advanced templating inside alertmanagerSpec.
+ ## Otherwise, use alertmanager.alertmanagerSpec.additionalConfig (passed through tpl)
+ additionalConfigString: ""
+
## ExtraSecret can be used to store various data in an extra secret
## (use it for example to store hashed basic auth credentials)
extraSecret:
@@ -708,6 +839,9 @@
org_role: Viewer
auth.basic:
enabled: false
+ dashboards:
+ # Modify this value to change the default dashboard shown on the main Grafana page
+ default_home_dashboard_path: /tmp/dashboards/rancher-default-home.json
security:
# Required to embed dashboards in Rancher Cluster Overview Dashboard on Cluster Explorer
allow_embedding: true
@@ -744,6 +878,10 @@
##
defaultDashboardsTimezone: utc
+ ## Editable flag for the default dashboards
+ ##
+ defaultDashboardsEditable: true
+
adminPassword: prom-operator
ingress:
@@ -784,23 +922,46 @@
# hosts:
# - grafana.example.com
+ # # To make Grafana persistent (Using Statefulset)
+ # #
+ # persistence:
+ # enabled: true
+ # type: sts
+ # storageClassName: "storageClassName"
+ # accessModes:
+ # - ReadWriteOnce
+ # size: 20Gi
+ # finalizers:
+ # - kubernetes.io/pvc-protection
+
+ serviceAccount:
+ create: true
+ autoMount: true
+
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
+ searchNamespace: cattle-dashboards
labelValue: "1"
+ # Support for new table panels, when enabled grafana auto migrates the old table panels to newer table panels
+ enableNewTablePanelSyntax: false
+
## Annotations for Grafana dashboard configmaps
##
annotations: {}
multicluster:
global:
enabled: false
+ etcd:
+ enabled: false
provider:
allowUiUpdates: false
datasources:
enabled: true
defaultDatasourceEnabled: true
+ isDefaultDatasource: true
uid: prometheus
@@ -808,6 +969,9 @@
##
# url: http://prometheus-stack-prometheus:9090/
+ ## Prometheus request timeout in seconds
+ # timeout: 30
+
# If not defined, will use prometheus.prometheusSpec.scrapeInterval or its default
# defaultDatasourceScrapeInterval: 15s
@@ -815,6 +979,14 @@
##
annotations: {}
+ ## Set method for HTTP to send query to datasource
+ httpMethod: POST
+
+ ## Create datasource for each Pod of Prometheus StatefulSet;
+ ## this uses headless service `prometheus-operated` which is
+ ## created by Prometheus Operator
+ ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/0fee93e12dc7c2ea1218f19ae25ec6b893460590/pkg/prometheus/statefulset.go#L255-L286
+ createPrometheusReplicasDatasources: false
label: grafana_datasource
labelValue: "1"
@@ -823,6 +995,11 @@
exemplarTraceIdDestinations: {}
# datasourceUid: Jaeger
# traceIdLabelName: trace_id
+ alertmanager:
+ enabled: true
+ uid: alertmanager
+ handleGrafanaManagedAlerts: false
+ implementation: prometheus
extraConfigmapMounts: []
# - name: certs-configmap
@@ -932,14 +1109,6 @@
tlsConfig: {}
scrapeTimeout: 30s
- ## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
- ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#relabelconfig
- ##
- metricRelabelings: []
- # - action: keep
- # regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
- # sourceLabels: [__name__]
-
## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
@@ -994,6 +1163,10 @@
## To be used with a proxy extraContainer port
targetPort: 8081
+ ## Port for Prometheus Reloader to listen on
+ ##
+ reloaderWebPort: 8080
+
## List of IP addresses at which the Prometheus server service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
@@ -1017,12 +1190,16 @@
##
type: ClusterIP
- ## Additional port to define in the Service
+ ## Additional ports to open for Prometheus service
+ ##
additionalPorts: []
# additionalPorts:
- # - name: authenticated
+ # - name: oauth-proxy
# port: 8081
# targetPort: 8081
+ # - name: oauth-metrics
+ # port: 8082
+ # targetPort: 8082
## Consider that all endpoints are considered "ready" even if the Pods themselves are not
## Ref: https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec
@@ -1104,7 +1281,7 @@
scheme: ""
## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
- ## Of type: https://github.com/coreos/prometheus-operator/blob/main/Documentation/api.md#tlsconfig
+ ## Of type: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#tlsconfig
tlsConfig: {}
bearerTokenFile:
@@ -1183,7 +1360,7 @@
##
image:
repository: rancher/mirrored-prometheus-prometheus
- tag: v2.42.0
+ tag: v2.50.1
sha: ""
## Tolerations for use with node taints
@@ -1583,6 +1760,8 @@
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
+ seccompProfile:
+ type: RuntimeDefault
## Priority class assigned to the Pods
##
@@ -1600,7 +1779,23 @@
# secrets: |
# - resourceName: "projects/$PROJECT_ID/secrets/testsecret/versions/latest"
# fileName: "objstore.yaml"
- # objectStorageConfigFile: /var/secrets/object-store.yaml
+ ## ObjectStorageConfig configures object storage in Thanos.
+ # objectStorageConfig:
+ # # use existing secret, if configured, objectStorageConfig.secret will not be used
+ # existingSecret: {}
+ # # name: ""
+ # # key: ""
+ # # will render objectStorageConfig secret data and configure it to be used by Thanos custom resource,
+ # # ignored when prometheusspec.thanos.objectStorageConfig.existingSecret is set
+ # # https://thanos.io/tip/thanos/storage.md/#s3
+ # secret: {}
+ # # type: S3
+ # # config:
+ # # bucket: ""
+ # # endpoint: ""
+ # # region: ""
+ # # access_key: ""
+ # # secret_key: ""
proxy:
image:
@@ -1650,8 +1845,10 @@
## OverrideHonorTimestamps allows to globally enforce honoring timestamps in all scrape configs.
overrideHonorTimestamps: false
- ## IgnoreNamespaceSelectors if set to true will ignore NamespaceSelector settings from the podmonitor and servicemonitor
- ## configs, and they will only discover endpoints within their current namespace. Defaults to false.
+ ## When ignoreNamespaceSelectors is set to true, namespaceSelector from all PodMonitor, ServiceMonitor and Probe objects will be ignored,
+ ## they will only discover targets within the namespace of the PodMonitor, ServiceMonitor and Probe object,
+ ## and servicemonitors will be installed in the default service namespace.
+ ## Defaults to false.
ignoreNamespaceSelectors: false
## EnforcedNamespaceLabel enforces adding a namespace label of origin for each alert and metric that is user created.
@@ -1687,7 +1884,6 @@
## if either value is zero, in which case the non-zero value will be used. If both values are zero, no limit is enforced.
enforcedTargetLimit: false
-
## Per-scrape limit on number of labels that will be accepted for a sample. If more than this number of labels are present
## post metric-relabeling, the entire scrape will be treated as failed. 0 means no limit. Only valid in Prometheus versions
## 2.27.0 and newer.
@@ -1820,6 +2016,25 @@
##
# serverName: ""
+ ## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
+ ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
+ ##
+ # metricRelabelings: []
+ # - action: keep
+ # regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
+ # sourceLabels: [__name__]
+
+ ## RelabelConfigs to apply to samples before scraping
+ ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
+ ##
+ # relabelings: []
+ # - sourceLabels: [__meta_kubernetes_pod_node_name]
+ # separator: ;
+ # regex: ^(.*)$
+ # targetLabel: nodename
+ # replacement: $1
+ # action: replace
+
additionalPodMonitors: []
## Name of the PodMonitor to create
##
@@ -1963,13 +2178,16 @@
##
type: ClusterIP
- ## If true, create a serviceMonitor for thanosRuler
+ ## Configuration for creating a ServiceMonitor for the ThanosRuler service
##
serviceMonitor:
+ ## If true, create a serviceMonitor for thanosRuler
+ ##
+ selfMonitor: true
+
## Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
- selfMonitor: true
## Additional labels
##
@@ -2040,8 +2258,7 @@
##
image:
repository: rancher/mirrored-thanos-thanos
- tag: v0.30.2
- sha: ""
+ tag: v0.34.1
## Namespaces to be selected for PrometheusRules discovery.
## If nil, select own namespace. Namespaces to be selected for ServiceMonitor discovery.

View File

@ -1,5 +1,5 @@
url: https://github.com/rancher/prometheus-federator.git
subdirectory: charts/rancher-project-monitoring/0.4.1
commit: cff814f807f6e86229ad940e77f3d14768cc1b86
subdirectory: charts/rancher-project-monitoring/0.4.2
commit: 734fcff1d3e46f2eb2deb9fa4a388fa11905af12
version: 104.0.0
doNotRelease: true

View File

@ -97,6 +97,7 @@ neuvector-monitor:
prometheus-federator:
- 103.0.1+up0.4.1
- 104.0.0-rc1+up0.4.0
- 104.0.0-rc1+up0.4.2
- 3.0.1+up0.3.3
- 103.0.1+up0.4.0
- 103.0.2+up0.4.0