Skip to content

Commit 8e22f7e

Browse files
feat: Add Prometheus labels and annotations to role-group services (#26)
* feat: Add Prometheus labels and annotations and add an integration test with Prometheus * doc: Document monitoring with Prometheus * chore: Update changelog * chore: Fix typo in code comment * chore: Fix rustdoc warning * doc: Mention CA rotation in the monitoring docs
1 parent 0f4e3b7 commit 8e22f7e

File tree

18 files changed

+676
-25
lines changed

18 files changed

+676
-25
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ All notable changes to this project will be documented in this file.
1919
- Add Listener support ([#17]).
2020
- Make the environment variables `OPENSEARCH_HOME` and `OPENSEARCH_PATH_CONF` overridable, so that
2121
images can be used which have a different directory structure than the Stackable image ([#18]).
22+
- Add Prometheus labels and annotations to role-group services ([#26]).
2223

2324
[#10]: https://github.com/stackabletech/opensearch-operator/pull/10
2425
[#17]: https://github.com/stackabletech/opensearch-operator/pull/17
2526
[#18]: https://github.com/stackabletech/opensearch-operator/pull/18
27+
[#26]: https://github.com/stackabletech/opensearch-operator/pull/26
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
= Monitoring
2+
:description: Use Prometheus to monitor OpenSearch
3+
4+
OpenSearch clusters can be monitored with Prometheus, see also the general xref:operators:monitoring.adoc[] page.
5+
The Prometheus metrics are exposed on the HTTP port 9200 at the path `/_prometheus/metrics`.
6+
7+
The role group services contain the corresponding labels and annotations:
8+
9+
[source,yaml]
10+
----
11+
---
12+
apiVersion: v1
13+
kind: Service
14+
metadata:
15+
name: opensearch-nodes-default-headless
16+
labels:
17+
prometheus.io/scrape: "true"
18+
annotations:
19+
prometheus.io/path: /_prometheus/metrics
20+
prometheus.io/port: "9200"
21+
prometheus.io/scheme: https
22+
prometheus.io/scrape: "true"
23+
----
24+
25+
If authentication is enabled in the OpenSearch security plugin, then the metrics endpoint is also secured.
26+
To make the metrics accessible for all users, especially Prometheus, anonymous authentication can be enabled and access to the monitoring statistics can be allowed for the role of the anonymous user:
27+
28+
[source,yaml]
29+
----
30+
---
31+
apiVersion: v1
32+
kind: Secret
33+
metadata:
34+
name: opensearch-security-config
35+
stringData:
36+
config.yml: |
37+
---
38+
_meta:
39+
type: config
40+
config_version: 2
41+
config:
42+
dynamic:
43+
authc:
44+
basic_internal_auth_domain:
45+
description: Authenticate via HTTP Basic against internal users database
46+
http_enabled: true
47+
transport_enabled: true
48+
order: 1
49+
http_authenticator:
50+
type: basic
51+
challenge: false # <1>
52+
authentication_backend:
53+
type: intern
54+
authz: {}
55+
http:
56+
anonymous_auth_enabled: true # <2>
57+
roles.yml: |
58+
---
59+
_meta:
60+
type: roles
61+
config_version: 2
62+
monitoring: # <3>
63+
reserved: true
64+
cluster_permissions:
65+
- cluster:monitor/health
66+
- cluster:monitor/nodes/info
67+
- cluster:monitor/nodes/stats
68+
- cluster:monitor/prometheus/metrics
69+
- cluster:monitor/state
70+
index_permissions:
71+
- index_patterns:
72+
- "*"
73+
allowed_actions:
74+
- indices:monitor/health
75+
- indices:monitor/stats
76+
roles_mapping.yml: |
77+
---
78+
_meta:
79+
type: rolesmapping
80+
config_version: 2
81+
monitoring: # <4>
82+
backend_roles:
83+
- opendistro_security_anonymous_backendrole
84+
----
85+
<1> If anonymous authentication is enabled, then all defined HTTP authenticators are non-challenging.
86+
<2> Enable https://docs.opensearch.org/latest/security/access-control/anonymous-authentication/[anonymous authentication]
87+
<3> Create a role "monitoring" with the required permissions for the Prometheus endpoint
88+
<4> Map the role "monitoring" to the backend role "opendistro_security_anonymous_backendrole" that is assigned to the anonymous user
89+
90+
If you use the https://prometheus-operator.dev/[Prometheus Operator] to install Prometheus, then you can define a https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.ServiceMonitor[ServiceMonitor] to collect the metrics:
91+
92+
[source,yaml]
93+
----
94+
---
95+
apiVersion: monitoring.coreos.com/v1
96+
kind: ServiceMonitor
97+
metadata:
98+
name: stackable-opensearch
99+
labels:
100+
release: prometheus-stack # <1>
101+
spec:
102+
selector:
103+
matchLabels: # <2>
104+
prometheus.io/scrape: "true"
105+
endpoints:
106+
- relabelings:
107+
- sourceLabels: # <3>
108+
- __meta_kubernetes_service_annotation_prometheus_io_scheme
109+
action: replace
110+
targetLabel: __scheme__
111+
regex: (https?)
112+
- sourceLabels: # <4>
113+
- __meta_kubernetes_service_annotation_prometheus_io_path
114+
action: replace
115+
targetLabel: __metrics_path__
116+
regex: (.+)
117+
- sourceLabels: # <5>
118+
- __meta_kubernetes_pod_name
119+
- __meta_kubernetes_service_name
120+
- __meta_kubernetes_namespace
121+
- __meta_kubernetes_service_annotation_prometheus_io_port
122+
action: replace
123+
targetLabel: __address__
124+
regex: (.+);(.+);(.+);(\d+)
125+
replacement: $1.$2.$3.svc.cluster.local:$4
126+
tlsConfig: # <6>
127+
ca:
128+
configMap:
129+
name: truststore
130+
key: ca.crt
131+
---
132+
apiVersion: secrets.stackable.tech/v1alpha1
133+
kind: TrustStore
134+
metadata:
135+
name: truststore
136+
spec:
137+
secretClassName: tls
138+
format: tls-pem
139+
----
140+
<1> The `release` label must match the Helm release name.
141+
This Helm release was installed with `helm install prometheus-stack oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack ...`.
142+
<2> Label selector to select the Kubernetes `Endpoints` objects to scrape metrics from.
143+
The Endpoints inherit the labels from their Service.
144+
<3> Use the schema (`http` or `https`) from the Service annotation `prometheus.io/scheme`
145+
<4> Use the path (`/_prometheus/metrics`) from the Service annotation `prometheus.io/path`.
146+
These values could also be hard-coded in the ServiceMonitor but it is better to use the ones provided by the operator if they change in the future.
147+
<5> Use the FQDN instead of the IP address because the IP address is not contained in the certificate.
148+
The FQDN is constructed from the pod name, service name, namespace and the HTTP port provided in the Service annotation `prometheus.io/port`, e.g. `opensearch-nodes-default-0.opensearch-nodes-default-headless.my-namespace.svc.cluster.local:9200`.
149+
<6> If TLS is used and the CA is not already provided to Prometheus in another way, then it can be taken from a xref:secret-operator:truststore.adoc[] ConfigMap.
150+
The TrustStore ConfigMap is updated whenever the CA is rotated.
151+
In this case, Prometheus takes over the new certificate.

docs/modules/opensearch/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
** xref:opensearch:usage-guide/listenerclass.adoc[]
77
** xref:opensearch:usage-guide/storage-resource-configuration.adoc[]
88
** xref:opensearch:usage-guide/configuration-environment-overrides.adoc[]
9+
** xref:opensearch:usage-guide/monitoring.adoc[]
910
** xref:opensearch:usage-guide/operations/index.adoc[]
1011
*** xref:opensearch:usage-guide/operations/cluster-operations.adoc[]
1112
*** xref:opensearch:usage-guide/operations/pod-placement.adoc[]

rust/operator-binary/src/controller/build/node_config.rs

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
use std::str::FromStr;
2+
13
use serde_json::{Value, json};
24
use stackable_operator::builder::pod::container::FieldPathEnvVar;
35

@@ -88,7 +90,12 @@ impl NodeConfig {
8890
}
8991

9092
/// static for the cluster
91-
pub fn static_opensearch_config(&self) -> String {
93+
pub fn static_opensearch_config_file(&self) -> String {
94+
Self::to_yaml(self.static_opensearch_config())
95+
}
96+
97+
/// static for the cluster
98+
pub fn static_opensearch_config(&self) -> serde_json::Map<String, Value> {
9299
let mut config = serde_json::Map::new();
93100

94101
config.insert(
@@ -124,7 +131,24 @@ impl NodeConfig {
124131
// Ensure a deterministic result
125132
config.sort_keys();
126133

127-
Self::to_yaml(config)
134+
config
135+
}
136+
137+
pub fn tls_on_http_port_enabled(&self) -> bool {
138+
self.static_opensearch_config()
139+
.get("plugins.security.ssl.http.enabled")
140+
.and_then(Self::value_as_bool)
141+
== Some(true)
142+
}
143+
144+
pub fn value_as_bool(value: &Value) -> Option<bool> {
145+
value.as_bool().or(
146+
// OpenSearch parses the strings "true" and "false" as boolean, see
147+
// https://github.com/opensearch-project/OpenSearch/blob/3.1.0/libs/common/src/main/java/org/opensearch/common/Booleans.java#L45-L84
148+
value
149+
.as_str()
150+
.and_then(|value| FromStr::from_str(value).ok()),
151+
)
128152
}
129153

130154
/// different for every node
@@ -262,6 +286,43 @@ mod tests {
262286
framework::{ClusterName, ProductVersion, role_utils::GenericProductSpecificCommonConfig},
263287
};
264288

289+
#[test]
290+
pub fn test_value_as_bool() {
291+
// boolean
292+
assert_eq!(Some(true), NodeConfig::value_as_bool(&Value::Bool(true)));
293+
assert_eq!(Some(false), NodeConfig::value_as_bool(&Value::Bool(false)));
294+
295+
// valid strings
296+
assert_eq!(
297+
Some(true),
298+
NodeConfig::value_as_bool(&Value::String("true".to_owned()))
299+
);
300+
assert_eq!(
301+
Some(false),
302+
NodeConfig::value_as_bool(&Value::String("false".to_owned()))
303+
);
304+
305+
// invalid strings
306+
assert_eq!(
307+
None,
308+
NodeConfig::value_as_bool(&Value::String("True".to_owned()))
309+
);
310+
311+
// invalid types
312+
assert_eq!(None, NodeConfig::value_as_bool(&Value::Null));
313+
assert_eq!(
314+
None,
315+
NodeConfig::value_as_bool(&Value::Number(
316+
serde_json::Number::from_i128(1).expect("should be a valid number")
317+
))
318+
);
319+
assert_eq!(None, NodeConfig::value_as_bool(&Value::Array(vec![])));
320+
assert_eq!(
321+
None,
322+
NodeConfig::value_as_bool(&Value::Object(serde_json::Map::new()))
323+
);
324+
}
325+
265326
#[test]
266327
pub fn test_environment_variables() {
267328
let image: ProductImage = serde_json::from_str(r#"{"productVersion": "3.0.0"}"#)

rust/operator-binary/src/controller/build/role_builder.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ impl<'a> RoleBuilder<'a> {
5151

5252
// TODO Only one builder function which calls the other ones?
5353

54-
pub fn role_group_builders(&self) -> Vec<RoleGroupBuilder> {
54+
pub fn role_group_builders(&self) -> Vec<RoleGroupBuilder<'_>> {
5555
self.cluster
5656
.role_group_configs
5757
.iter()

0 commit comments

Comments
 (0)