Skip to content

Commit 9a7e7de

Browse files
committed
Update concerns
1 parent 6ca63f3 commit 9a7e7de

File tree

3 files changed

+39
-29
lines changed

3 files changed

+39
-29
lines changed

pages/clustering/high-availability/best-practices.mdx

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,6 @@ have precedence over command line arguments**, and any set environment variable
175175
## Coordinator settings
176176

177177
### enabled reads on main
178-
179178
There is a configuration option for specifying whether reads from the main are enabled. The configuration value is by
180179
default false but can be changed in run-time using the following query:
181180

@@ -184,22 +183,28 @@ SET COORDINATOR SETTING 'enabled_reads_on_main' TO 'true'/'false' ;
184183
```
185184

186185
### sync failover only
187-
188-
Users can also choose whether failover to the async replica is allowed by using the following query:
186+
Users can also choose whether failover to the ASYNC REPLICA is allowed by using the following query:
189187

190188
```
191189
SET COORDINATOR SETTING 'sync_failover_only' TO 'true'/'false' ;
192190
```
193191

194-
### max failover replica lag
192+
By default, the value is `true`, which means that only SYNC REPLICAs are candidates in the election. When the value is set to
193+
`false`, the ASYNC REPLICA is also considered, but there is an additional risk of experiencing data loss.
194+
195+
In extreme cases, failover to an ASYNC REPLICA may be necessary when other SYNC REPLICAs are down and you want to
196+
manually perform a failover.
195197

196-
Users can control the maximum transaction lag allowed during failover through configuration. If a replica is behind the main instance by more than the configured threshold,
197-
that replica becomes ineligible for failover. This prevents data loss beyond the user's acceptable limits.
198+
### max failover replica lag
199+
Users can control the maximum transaction lag allowed during failover through configuration. If a REPLICA is behind the MAIN
200+
instance by more than the configured threshold, that REPLICA becomes ineligible for failover. This prevents data loss
201+
beyond the user's acceptable limits.
198202

199-
To implement this functionality, we employ a caching mechanism on the cluster leader that tracks replicas' lag. The cache gets updated with each StateCheckRpc response from
200-
replicas. During the brief failover window on the cooordinators' side, the new cluster leader may not have the current lag information for all data instances and in that case,
201-
any replica can become main. This trade-off is intentional and it avoids flooding Raft logs with frequently-changing lag data while maintaining failover safety guarantees
202-
in the large majority of situations.
203+
To implement this functionality, we employ a caching mechanism on the cluster leader coordinator that tracks replicas' lag. The cache gets
204+
updated with each `StateCheckRpc` response from REPLICAs. During the brief failover window on the cooordinators' side, the new
205+
cluster leader may not have the current lag information for all data instances and in that case, any REPLICA can become MAIN.
206+
This trade-off is intentional and it avoids flooding Raft logs with frequently-changing lag data while maintaining failover safety
207+
guarantees in the large majority of situations.
203208

204209

205210
The configuration value can be controlled using the query:
@@ -208,20 +213,15 @@ The configuration value can be controlled using the query:
208213
SET COORDINATOR SETTING 'max_failover_replica_lag' TO '10' ;
209214
```
210215

211-
By default, the value is `true`, which means that only sync replicas are candidates in the election. When the value is set to `false`, the async replica is also considered, but
212-
there is an additional risk of experiencing data loss. However, failover to an async replica may be necessary when other sync replicas are down and you want to
213-
manually perform a failover.
214-
215-
### max_replica_read_lag_ ???
216-
217-
218-
Users can control the maximum allowed replica lag to maintain read consistency. When a replica falls behind the current main by more than `max_replica_read_lag_` transactions, the
219-
bolt+routing protocol will exclude that replica from read query routing to ensure data freshness.
216+
### max_replica_read_lag
217+
Users can control the maximum allowed REPLICA lag to maintain read consistency. When a REPLICA falls behind the current MAIN by
218+
more than `max_replica_read_lag` transactions, the bolt+routing protocol will exclude that REPLICA from read query routing to
219+
ensure data freshness.
220220

221221
The configuration value can be controlled using the query:
222222

223223
```
224-
SET COORDINATOR SETTING 'max_replica_read_lag_' TO '10' ;
224+
SET COORDINATOR SETTING 'max_replica_read_lag' TO '10' ;
225225
```
226226

227227
## Observability

pages/clustering/high-availability/ha-commands-reference.mdx

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ ADD COORDINATOR coordinatorId WITH CONFIG {
3333
**Parameters:**
3434
- `coordinatorId` (int) - unique integer for each coordinator. You can set a different incrementing integer for each coordinator as you
3535
register them.
36-
- `boltServer` (string) - Network address in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Port is usually set to 7687 as
36+
- `boltServer` (string) - Network address in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`, used for querying the coordinator. Port is usually set to 7687 as
3737
that is representative for Bolt protocol. If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs
3838
to be exposed to the external network, if there are any external applications connected to it.
3939
- `coordinatorServer` (string) - Network address in format `"COORDINATOR_HOSTNAME|COORDINATOR_PORT"`. Coordinator hostname and port
@@ -94,11 +94,15 @@ REGISTER INSTANCE instanceName ( AS ASYNC | AS STRICT_SYNC ) ? WITH CONFIG {
9494
- `instanceName` (symbolic name) - unique name of the data instance
9595
- `AS ASYNC` (optional parameter) - register the instance in `ASYNC` replication mode
9696
- `AS STRICT_SYNC` (optional parameter) - register the instance in `STRICT_SYNC` replication mode
97-
- `boltServer` (string) - Network address in format "IP_ADDRESS|DNS_NAME:PORT_NUMBER". Port is usually set to 7687 as
98-
that is representative for Bolt protocol. If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs
99-
to be exposed to the external network, if there are any external applications connected to it.
100-
- `managementServer` (string) - ???
101-
- `replicationServer` (string) - ???
97+
- `boltServer` (string) - Server endpoint used for executing queries against the instance. The endpoint needs to be
98+
in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Port is usually set to 7687 as that is representative for Bolt protocol.
99+
If IPs are ephemeral, it's best to use the DNS name/FQDN. The server IP needs to be exposed to the external network,
100+
if there are any external applications connected to it.
101+
- `managementServer` (string) - Server endpoint used for communication between coordinator and data instance. The endpoint
102+
needs to be in format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. The management port needs to be the same, as provided in the command line
103+
arguments for that data instance.
104+
- `replicationServer` (string) - Server endpoint used for replicating data between data instances. The endpoint needs to be in
105+
format `"IP_ADDRESS|DNS_NAME:PORT_NUMBER"`. Usual port that is assigned for replication server is 20000.
102106

103107
**Behaviour:**
104108
- The coordinator instance will connect to the data instance on the `management_server` network address.
@@ -115,12 +119,15 @@ Constructs `( AS ASYNC | AS STRICT_SYNC )` serve to specify a different replicat
115119
You can only have `STRICT_SYNC` and `ASYNC` or `SYNC` and `ASYNC` replicas together in the cluster. Combining `STRICT_SYNC`
116120
and `SYNC` replicas together doesn't have proper semantic meaning so it is forbidden.
117121

122+
For local development, hostname of the data instance for management server and replication server is `localhost`. For Helm charts,
123+
check the name of the service (e.g. `memgraph-coordinator-1.default.svc.cluster.local`).
124+
118125
**Example:**
119126
```cypher
120127
REGISTER INSTANCE instance1 WITH CONFIG {
121128
"bolt_server": "my_outside_instance1_IP:7687",
122-
"management_server": "???:10000",
123-
"replication_server": "???:20000"
129+
"management_server": "memgraph-data-1.default.svc.cluster.local:10000",
130+
"replication_server": "memgraph-data-1.default.svc.cluster.local:20000"
124131
};
125132
```
126133

pages/clustering/high-availability/setup-ha-cluster-docker-compose.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,9 @@ coordinators in the cluster:
288288
<Callout>
289289
For localhost development:
290290
Since the host can't resolve the IP for coordinators and data instances, Bolt
291-
servers in Docker Compose setup require `bolt_server` set to `localhost:<port>`. ???
291+
servers in Docker Compose setup require `bolt_server` set to `localhost:<port>`, instead of `127.0.0.1`.
292+
293+
This behaviour is such, because in some Docker Setups, for different machines, the `localhost` is intercepted,
294+
and mapped to the host network automatically, while `127.0.0.1` stays within the container.
292295
</Callout>
293296
</Steps>

0 commit comments

Comments
 (0)