Performing cluster maintenance with JBoss-WildFly

Today I have found an interesting question of StackOverFlow asking how to temporarily remove a server node from the cluster so that maintenance is done on the node. Later on, the server node is allowed to return to the cluster and the operations have to be performed without a server restart.

Actually for a cluster of standalone server this is a common scenario as your cluster members can have their own configuration and you want to perform some customizations with a kind of "Pit Stop", which means you don't want to shut down the node and manually edit the configuration. This activity can be easily achieved by setting a different multicast address for the protocol you are using (default udp) so that you server node will temporarily leave the cluster.

At first, check with your System Administrator for an available alternative multicast address. The following reference will be of help:

Supposing you have decided to use the address (instead of the default, you can issue the following CLI on your server node:


Now reload your server configuration and check from your logs the new cluster view:

 22:32:16,032 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]

(ServerService Thread Pool -- 55) ISPN000094: Received new cluster view:  [nodeB/web, nodeA/web] 

In our example, the nodeC of the cluster just left the cluster. At a later time, you can get it back into the cluster with:


Side node: you might find on your server logs for about a minute the following warning saying:

22:27:16,094 WARN  [org.jgroups.protocols.TP$ProtocolAdapter] (INT-1,shared=udp)  JGRP000031: nodeA/web: dropping unicast message to wrong destination nodeC/web

This is a failure in the UNICAST3 protocol which still tries to send messages to cluster node that left. It will disappear in 1 munute, yet you can tune the timeout parameter by setting the following property on your UNICAST3 configuration:

<subsystem xmlns="urn:jboss:domain:jgroups:2.0" default-stack="udp">

   <stack name="udp">

          . . . .

          <protocol type="UNICAST3">

                    <property name="conn_close_timeout">5</property>


          . . . .


Domain configuration

When dealing with Domain configuration you can achieve the same effect by managing the socket-binding-group of a Server Group or of Individual Servers. 

At first, create a new Socket Binding group which has a custom jgroups-udp IP address (or jgroups-tcp if you are using a TCP cluster):

<socket-binding-group name="ha-sockets.maintenance" default-interface="public">
    <socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>
    <socket-binding name="http" port="${jboss.http.port:8080}"/>
    <socket-binding name="https" port="${jboss.https.port:8443}"/>
    <socket-binding name="jgroups-mping" port="0" multicast-address="${jboss.default.multicast.address:}" multicast-port="45700"/>
    <socket-binding name="jgroups-tcp" port="7600"/>
    <socket-binding name="jgroups-tcp-fd" port="57600"/>
    <socket-binding name="jgroups-udp" port="55200" multicast-address="${jboss.default.multicast.address:}" multicast-port="45688"/>
    <socket-binding name="jgroups-udp-fd" port="54200"/>
    <socket-binding name="modcluster" port="0" multicast-address="" multicast-port="23364"/>
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
        <remote-destination host="localhost" port="25"/>

Now, when you need to temporarily remove a Server Group from the cluster, just issue from your CLI (you can use as well the Admin Console):


You will also need an host reload after the above operation:

reload --host=master

Now, the other server group will leave the cluster so you can perform maintenance. Later on, you can let your Server Group join the cluster, by setting the standard ha-sockets bindings:


As a side node, consider that you can even set the socket bindings at server level with:


I don't advice it though as you will have an unsynchronized configuration between your servers which are part of a Server group.

Here is the original Thread from StackOverflow. Feel free to add a "thumbs up" if you have found this solution useful :-) 

