Make most remote reads asynchronous
Closed (Released)Public
Actions

Assigned To

Authored By

	kgaillot
	Jul 30 2024, 1:57 PM

Description

Currently, pcmk__read_remote_message() reads a single message from a remote connection synchronously with a timeout (typically 60 seconds). That means if the other side stops sending data while a message is being read, the remote daemon can block, making it unable to accept new connections until the timeout expires.

The reproducer would be bringing the network interface down on the connection host. The connection host should be fenced and the connection should restart on the other cluster node.

The read should be made asynchronous. If data isn't immediately available, return to the mainloop and try again in some small amount of time. New connections should cancel any partial reads. The timeout in pcmk__remote_ready() must be considered as well.

This task covers remote CIB and most Pacemaker Remote reads (everything not covered by T901).

Related Objects
Search...

		Status	Assigned	Task
		Released	clumens	T855 Make most remote reads asynchronous
		Open	None	T901 Use asynchronous communication for liblrmd commands with replies

Event Timeline

kgaillot triaged this task as High priority.Jul 30 2024, 1:57 PM

kgaillot created this task.

kgaillot created this object with edit policy "Restricted Project (Project)".

kgaillot renamed this task from Make remote reads asynchronous to Make remote handshakes and reads asynchronous.Jul 30 2024, 2:14 PM

kgaillot renamed this task from Make remote handshakes and reads asynchronous to Make remote reads asynchronous.

kgaillot updated the task description. (Show Details)

kgaillot mentioned this in T824: Make GnuTLS handshakes asynchronous.Jul 30 2024, 2:16 PM

kgaillot added a project: Restricted Project.

kgaillot assigned this task to clumens.Jul 30 2024, 5:48 PM

kgaillot added projects: Restricted Project, Pacemaker (3.0.0).

My first thought here is to start by converting pcmk__remote_ready to be an async function that checks the remote to see if it's ready once and sends a result up to the mainloop. However, one caller of this function is lrmd_poll, which we do not use anywhere but I assume is public API. Do we need to continue keeping it (and lrmd_dispatch) around? If so, we might need sync and async versions of this function.

In T855#13286, @clumens wrote:

My first thought here is to start by converting pcmk__remote_ready to be an async function that checks the remote to see if it's ready once and sends a result up to the mainloop. However, one caller of this function is lrmd_poll, which we do not use anywhere but I assume is public API. Do we need to continue keeping it (and lrmd_dispatch) around? If so, we might need sync and async versions of this function.

Yeah that's public API, for someone who wants a non-mainloop connection. We could've deprecated it for 2.1.8, but oh well. Note that it calls pcmk__remote_ready() with a timeout of 0, which means return immediately without blocking, so it would be fine with the check-once approach.

Currently lrmd_remote_client_msg() calls pcmk__read_remote_message(), which loops over read_available_remote_data() up to the timeout.

I think the basic idea will be for lrmd_remote_client_msg() to do something like read_available_remote_data() directly, process the message if it has been received in its entirety, and otherwise return to the main loop. The main loop will call it again when more data is available. I don't think any timeout is necessary since it's simply dispatching data from the socket and has nothing else to do.

cib_remote_msg() (server side) and cib_remote_callback_dispatch()/cib_remote_command_dispatch() (client side) are equivalent for remote CIB administration and should be able to follow the same pattern.

cib_remote_perform_op() calls pcmk__read_remote_message() only for sync replies, so it could keep the current approach.

cib_tls_signon() calls pcmk__read_remote_message() to get an authentication reply. That's probably fine to do sync since it's for the client, so it can keep the current approach too.

lrmd_tls_dispatch() calls pcmk__read_remote_message() and is used in three different contexts: by lrmd_dispatch() for non-mainloop socket dispatch (another public API not used internally, and since there's no mainloop it should keep the current approach); by add_tls_to_mainloop() for mainloop socket dispatch (which should follow the new approach); and as a trigger callback (which is for saved messages and doesn't really matter for this purpose, but can stay with the mainloop socket dispatch).

read_remote_reply() calls pcmk__read_remote_message() and is called by lrmd_send_xml() -> lrmd_tls_send_recv() which is sync, so the current approach is fine.

kgaillot moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Aug 21 2024, 4:37 PM

clumens changed the task status from Open to WIP.Aug 28 2024, 2:43 PM

clumens mentioned this in rPebd0a552e873: Feature: libcib: Perform server-side remote CIB reads asynchronously..Oct 7 2024, 3:03 PM

clumens mentioned this in rP0f8018be30a7: Feature: libcib: Perform client-side remote CIB reads asynchronously..

clumens mentioned this in rPff92ca2c1051: Feature: libcib: Perform server-side remote CIB reads asynchronously..Oct 8 2024, 3:51 PM

clumens mentioned this in rPeef8a8daa00d: Feature: libcib: Perform client-side remote CIB reads asynchronously..

clumens mentioned this in rPf44c95494260: Feature: daemons: execd should read remote client messages async..Oct 29 2024, 3:45 PM

Following up on the state of this issue so I don't forget...

Everything except lrmd_tls_dispatch_async has been figured out and committed. I have a patch to introduce lrmd_tls_dispatch and it is very simple, but it causes problems that most often occur when running the RemoteBasic CTSlab test. You can also reproduce this on a test cluster by following along with what RemoteBasic does - add two real nodes and a VM to run pacemaker_remote, create the remote resource, and then create the Dummy resource. Stop and then start the cluster, and you'll see all sorts of XML IPC error messages.

These message seem to be caused by the following:

lrmd_tls_dispatch_async gets EAGAIN and reads part of a message, puts the bytes into the buffer, and returns to the mainloop.
lrmd_send_command gets called somewhere to send a message. This uses lrmd_send_command, which synchronously and immediately wants to read a reply.
There's bytes in the buffer (the first part of the message that was read), so read_remote_reply immediately calls pcmk__remote_message_xml in the beginning of that for loop.
Errors occur because the bytes in the buffer aren't a complete, valid XML document. These errors cascade causing probes to fail and nodes to be fenced.
lrmd_tls_dispatch_async never gets a chance to read the rest of the message.

Basically, mixing async and sync is causing problems and we need to convert lrmd_send_command to async before the last part of this can be done.

clumens added a parent task: T901: Use asynchronous communication for liblrmd commands with replies.Oct 31 2024, 4:53 PM

clumens removed a parent task: T901: Use asynchronous communication for liblrmd commands with replies.

clumens added a parent task: T901: Use asynchronous communication for liblrmd commands with replies.

clumens removed a parent task: T901: Use asynchronous communication for liblrmd commands with replies.

clumens added a subtask: T901: Use asynchronous communication for liblrmd commands with replies.Oct 31 2024, 4:55 PM

kgaillot updated the task description. (Show Details)Nov 4 2024, 11:00 AM

kgaillot renamed this task from Make remote reads asynchronous to Make most remote reads asynchronous.Nov 4 2024, 11:20 AM

kgaillot closed this task as Merged.

kgaillot changed the status of subtask T901: Use asynchronous communication for liblrmd commands with replies from Open to WIP.

kgaillot updated the task description. (Show Details)

kgaillot changed the status of subtask T901: Use asynchronous communication for liblrmd commands with replies from WIP to Open.Dec 4 2024, 12:23 PM