Currently, pcmk__read_remote_message() reads a single message from a remote connection synchronously with a timeout (typically 60 seconds). That means if the other side stops sending data while a message is being read, the remote daemon can block, making it unable to accept new connections until the timeout expires.
The reproducer would be bringing the network interface down on the connection host. The connection host should be fenced and the connection should restart on the other cluster node.
The read should be made asynchronous. If data isn't immediately available, return to the mainloop and try again in some small amount of time. New connections should cancel any partial reads. The timeout in pcmk__remote_ready() must be considered as well.
See also: