You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/clustering/high-availability/best-practices.mdx
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -175,7 +175,6 @@ have precedence over command line arguments**, and any set environment variable
175
175
## Coordinator settings
176
176
177
177
### enabled reads on main
178
-
179
178
There is a configuration option for specifying whether reads from the main are enabled. The configuration value is by
180
179
default false but can be changed in run-time using the following query:
181
180
@@ -184,22 +183,28 @@ SET COORDINATOR SETTING 'enabled_reads_on_main' TO 'true'/'false' ;
184
183
```
185
184
186
185
### sync failover only
187
-
188
-
Users can also choose whether failover to the async replica is allowed by using the following query:
186
+
Users can also choose whether failover to the ASYNC REPLICA is allowed by using the following query:
189
187
190
188
```
191
189
SET COORDINATOR SETTING 'sync_failover_only' TO 'true'/'false' ;
192
190
```
193
191
194
-
### max failover replica lag
192
+
By default, the value is `true`, which means that only SYNC REPLICAs are candidates in the election. When the value is set to
193
+
`false`, the ASYNC REPLICA is also considered, but there is an additional risk of experiencing data loss.
194
+
195
+
In extreme cases, failover to an ASYNC REPLICA may be necessary when other SYNC REPLICAs are down and you want to
196
+
manually perform a failover.
195
197
196
-
Users can control the maximum transaction lag allowed during failover through configuration. If a replica is behind the main instance by more than the configured threshold,
197
-
that replica becomes ineligible for failover. This prevents data loss beyond the user's acceptable limits.
198
+
### max failover replica lag
199
+
Users can control the maximum transaction lag allowed during failover through configuration. If a REPLICA is behind the MAIN
200
+
instance by more than the configured threshold, that REPLICA becomes ineligible for failover. This prevents data loss
201
+
beyond the user's acceptable limits.
198
202
199
-
To implement this functionality, we employ a caching mechanism on the cluster leader that tracks replicas' lag. The cache gets updated with each StateCheckRpc response from
200
-
replicas. During the brief failover window on the cooordinators' side, the new cluster leader may not have the current lag information for all data instances and in that case,
201
-
any replica can become main. This trade-off is intentional and it avoids flooding Raft logs with frequently-changing lag data while maintaining failover safety guarantees
202
-
in the large majority of situations.
203
+
To implement this functionality, we employ a caching mechanism on the cluster leader coordinator that tracks replicas' lag. The cache gets
204
+
updated with each `StateCheckRpc` response from REPLICAs. During the brief failover window on the cooordinators' side, the new
205
+
cluster leader may not have the current lag information for all data instances and in that case, any REPLICA can become MAIN.
206
+
This trade-off is intentional and it avoids flooding Raft logs with frequently-changing lag data while maintaining failover safety
207
+
guarantees in the large majority of situations.
203
208
204
209
205
210
The configuration value can be controlled using the query:
@@ -208,20 +213,15 @@ The configuration value can be controlled using the query:
208
213
SET COORDINATOR SETTING 'max_failover_replica_lag' TO '10' ;
209
214
```
210
215
211
-
By default, the value is `true`, which means that only sync replicas are candidates in the election. When the value is set to `false`, the async replica is also considered, but
212
-
there is an additional risk of experiencing data loss. However, failover to an async replica may be necessary when other sync replicas are down and you want to
213
-
manually perform a failover.
214
-
215
-
### max_replica_read_lag_ ???
216
-
217
-
218
-
Users can control the maximum allowed replica lag to maintain read consistency. When a replica falls behind the current main by more than `max_replica_read_lag_` transactions, the
219
-
bolt+routing protocol will exclude that replica from read query routing to ensure data freshness.
216
+
### max_replica_read_lag
217
+
Users can control the maximum allowed REPLICA lag to maintain read consistency. When a REPLICA falls behind the current MAIN by
218
+
more than `max_replica_read_lag` transactions, the bolt+routing protocol will exclude that REPLICA from read query routing to
219
+
ensure data freshness.
220
220
221
221
The configuration value can be controlled using the query:
222
222
223
223
```
224
-
SET COORDINATOR SETTING 'max_replica_read_lag_' TO '10' ;
224
+
SET COORDINATOR SETTING 'max_replica_read_lag' TO '10' ;
Copy file name to clipboardExpand all lines: pages/clustering/high-availability/ha-commands-reference.mdx
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ ADD COORDINATOR coordinatorId WITH CONFIG {
33
33
**Parameters:**
34
34
-`coordinatorId` (int) - unique integer for each coordinator. You can set a different incrementing integer for each coordinator as you
35
35
register them.
36
-
-`boltServer` (string) - Network address in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Port is usually set to 7687 as
36
+
-`boltServer` (string) - Network address in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`, used for querying the coordinator. Port is usually set to 7687 as
37
37
that is representative for Bolt protocol. If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs
38
38
to be exposed to the external network, if there are any external applications connected to it.
39
39
-`coordinatorServer` (string) - Network address in format `"COORDINATOR_HOSTNAME|COORDINATOR_PORT"`. Coordinator hostname and port
@@ -94,11 +94,15 @@ REGISTER INSTANCE instanceName ( AS ASYNC | AS STRICT_SYNC ) ? WITH CONFIG {
94
94
-`instanceName` (symbolic name) - unique name of the data instance
95
95
-`AS ASYNC` (optional parameter) - register the instance in `ASYNC` replication mode
96
96
-`AS STRICT_SYNC` (optional parameter) - register the instance in `STRICT_SYNC` replication mode
97
-
-`boltServer` (string) - Network address in format "IP_ADDRESS|DNS_NAME:PORT_NUMBER". Port is usually set to 7687 as
98
-
that is representative for Bolt protocol. If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs
99
-
to be exposed to the external network, if there are any external applications connected to it.
100
-
-`managementServer` (string) - ???
101
-
-`replicationServer` (string) - ???
97
+
-`boltServer` (string) - Server endpoint used for executing queries against the instance. The endpoint needs to be
98
+
in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Port is usually set to 7687 as that is representative for Bolt protocol.
99
+
If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs to be exposed to the external network,
100
+
if there are any external applications connected to it.
101
+
-`managementServer` (string) - Server endpoint used for communication between coordinator and data instance. The endpoint
102
+
needs to be in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. The management port needs to be the same, as provided in the command line
103
+
arguments for that data instance.
104
+
-`replicationServer` (string) - Server endpoint used for replicating data between data instances. The endpoint needs to be in
105
+
format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Usual port that is assigned for replication server is 20000.
102
106
103
107
**Behaviour:**
104
108
- The coordinator instance will connect to the data instance on the `management_server` network address.
@@ -115,12 +119,15 @@ Constructs `( AS ASYNC | AS STRICT_SYNC )` serve to specify a different replicat
115
119
You can only have `STRICT_SYNC` and `ASYNC` or `SYNC` and `ASYNC` replicas together in the cluster. Combining `STRICT_SYNC`
116
120
and `SYNC` replicas together doesn't have proper semantic meaning so it is forbidden.
117
121
122
+
For local development, hostname of the data instance for management server and replication server is `localhost`. For Helm charts,
123
+
check the name of the service (e.g. `memgraph-coordinator-1.default.svc.cluster.local`).
0 commit comments