Skip to content

Conversation

@nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Dec 24, 2025

No description provided.

@nrwahl2 nrwahl2 force-pushed the nrwahl2-based branch 3 times, most recently from deaa1a7 to bc1dd5d Compare December 31, 2025 07:31
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based branch 3 times, most recently from 852e402 to 5241abc Compare January 3, 2026 06:14
@nrwahl2 nrwahl2 mentioned this pull request Jan 3, 2026
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based branch 10 times, most recently from 3416148 to 2dd98b1 Compare January 8, 2026 20:01
* reference to the client's \c GSource has been removed. There is likely
* only one reference when we call this function, and thus the client is
* likely to be freed before we return. The current GLib code looks as if
* this is always the case. However, that is a GLib implementation detail.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_main_context_dispatch() dispatches all pending sources in a batch before moving to the next main loop iteration. If we're running this function, then we're running it as a result of a particular source being dispatched. So I think there may be another reference to the source we're removing if it's also being dispatched as part of this batch.

So I'll weaken this assumption. But I still don't think we need to wait for remote clients to be destroyed by the callback before dropping clients. (See a couple of commits ahead of here.)

nrwahl2 added 12 commits January 8, 2026 16:13
Avoid using the public API prefix.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
...to based_common_callback_worker(), to avoid using public API prefix.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Avoid using the public API prefix.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This will probably get renamed and/or split up further when we implement
IPC/message-related best practices in pacemaker-based. However, for now,
simply make the header name match the .c file name.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Make based_process_request() a bit more legible.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And change/add some variable names.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There are still plenty of IPC-specific pieces in based_callbacks.c.
However, this pulls out all of the libqb IPC callbacks, which can be
easily isolated. Pulling out the rest of the IPC-specific code will
require more refactoring.

This is based on commit 24d4455 and similar ones for other daemons.

Ref T566

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also drop the unused size argument from dispatch_common().

This is based on commit 4f8e790 and similar ones for other daemons.

Ref T566

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
nrwahl2 added 29 commits January 8, 2026 16:14
It doesn't make sense to have this handful of unrelated initializations
in a separate function, when most of main() is for initializing things.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Place the call right before the first thing that needs it
(based_activate_cib()). The rest of the initialization helpers are
called in this region, so let's collect them. We can't make them all
contiguous though -- based_remote_init() requires that the CIB be read
and activated first.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To mirror attrd and fenced.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
pcmk__corosync_connect() doesn't use the caches directly.
pcmk__get_node() initializes the caches when it gets called.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It shouldn't matter whether this is a Corosync cluster or not.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To mirror attrd and fenced.

Notes:
* Previously, we weren't freeing the cluster object on exit in
  based_terminate(). Now we are.
* pcmk_cluster_free() calls pcmk__cluster_destroy_node_caches(), which
  is why we drop the call to that function.
* I'm fairly certain that the reason the pcmk_cluster_disconnect() call
  previously occurred before the done section, is that prior to a recent
  commit, we weren't NULL-checking the cluster argument before
  disconnect. We should be able to call based_cluster_disconnect()
  regardless of how based_terminate() wherever we want to free the
  cluster object.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And NULL-check the argument argument.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Save some lines.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
pcmk_cluster_disconnect() now returns EINVAL if passed a NULL argument.
Previously, if given a NULL argument in a Corosync cluster,
pcmk_cluster_disconnect() would call down to pcmk__cpg_disconnect(),
which would dereference the NULL pointer.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To mirror attrd and the fencer.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And make it static. It seems logical to have it in the same file as the
main() function.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This gets rid of the last use of based_cluster outside of
based_corosync.c.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And rename to cluster. We don't typically care much about making
variable names distinct, although we try to make function names
distinct.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To replace cib_shutdown_flag.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To replace global stand_alone variable.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
To drop a global.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There were four allowed values: shared-mem, socket, posix, and sysv.

Libqb removed support for posix and sysv (message queues) in 2012 via
commit 70a9623a. It simply returns an error if you try to set up those
types of IPC.

I spoke with the libqb maintainers last night, and they said that socket
IPC has never worked correctly and that they are considering removing
it. Sockets were mainly intended for systems where shared memory was not
available. That is believed to be no longer applicable -- shared memory
should be available on all the platforms that Pacemaker and libqb
target for support.

That leaves only shared memory as a working option.

We are dropping this option rather than deprecating it. As discussed
above, two of the values (posix and sysv) will fail immediately during
IPC setup, and socket is very buggy. So user-facing behavior should not
be affected. If the option is configured, it will simply be ignored.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
QB_IPC_NATIVE tells libqb to use shared memory (QB_IPC_SHM) unless it's
shared-memory IPC is specifically disabled at build time because it's
unavailable on the system. In that case, libqb will use sockets
(QB_IPC_SOCKET).

I spoke with the libqb maintainers last night, and they said that socket
IPC has never worked correctly and that they are considering removing
it. They told me never to use libqb socket IPC.

Libqb socket IPC was mainly intended for systems where shared memory was
not available. That is believed to be no longer applicable -- shared
memory should be available on all the platforms that Pacemaker and libqb
target for support.

The maintainers weakly suggested using QB_IPC_NATIVE (rather than
QB_IPC_SHM). I suppose this was for future-proofing ("let libqb decide
what to use"). Nonetheless, since we have a couple of use cases where we
MUST avoid blocking, and QB_IPC_SHM is non-blocking, we'll just use
QB_IPC_SHM everywhere until we have a reason not to. Besides, this makes
things explicit.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
...argument.

QB_IPC_NATIVE tells libqb to use shared memory (QB_IPC_SHM) unless
shared-memory IPC is specifically disabled at build time because it's
unavailable on the system. In that case, libqb will use sockets
(QB_IPC_SOCKET).

I spoke with the libqb maintainers last night, and they said that socket
IPC has never worked correctly and that they are considering removing
it. They told me never to use libqb socket IPC.

Libqb socket IPC was mainly intended for systems where shared memory was
not available. That is believed to be no longer applicable -- shared
memory should be available on all the platforms that Pacemaker and libqb
target for support.

The maintainers weakly suggested using QB_IPC_NATIVE (rather than
QB_IPC_SHM). I suppose this was for future-proofing ("let libqb decide
what to use"). Nonetheless, since we have a couple of use cases where we
MUST avoid blocking, and QB_IPC_SHM is non-blocking, we'll just use
QB_IPC_SHM everywhere until we have a reason not to.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And make the mainloop variable static. No other code changes.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There is still more work to do to consolidate/reorganize the based
exit/shutdown-related code. I'm keeping the pieces small to try to make
it easier to reason about. based_terminate() seems too complicated.

Note that based_ipc_cleanup() calls pcmk__client_cleanup(), which is why
we drop the call from main.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There are five call sites.
* based_cpg_destroy: This is called only via the main loop. See
  pcmk_cluster_set_destroy_fn() and pcmk__cpg_connect().

* based_peer_change_cb: We create and run the main loop immediately
  after connecting to the cluster (and thus registering this callback).
  We disconnect from the cluster immediately after returning from the
  main loop, and then we exit after freeing data structures.

* based_process_shutdown: We can only reach this function through the
  main loop, when we receive a message from a client or a cluster peer.

* based_shutdown: We can only reach this through the main loop. It's set
  up as a main loop signal handler via mainloop_add_signal(). The true
  signal handler is mainloop_signal_handler(), which sets a main loop
  trigger to call based_shutdown().

* based_force_exit: This is called only by a main loop timer created by
  based_shutdown().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Now that we assume the main loop is running, -1 behaves exactly the same
as CRM_EX_OK. We call based_cluster_disconnect() almost immediately
after returning from the main loop.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant