Websphere : Web Server Plug-In Retry Interval
When stopping and restarting application servers, it is important to consider the correlation with the Web server plug-in retry interval. This is especially important in an environment with only a few application servers. You need to understand the rate between starting and stopping application servers in a cluster and the retry interval so that you do not get HTTP request failures when you stop application servers in a cluster.
The retry interval tells the Web server plug-in how long to wait before retrying an HTTP request against a specific application server. If you stop and start application servers in a cluster faster than the retry rate, then a situation can occur where the plug-in assumes that two or more application server are down, when in fact they are not.
Here is an example of this situation. Let us assume following:
- The retry interval is set to 60 seconds (which is the default)
- There are two application servers in the cluster
- We are running a servlet application
- We alternate taking the two servers down. We take them down and restart them in a 45 second interval.
The Web server plug-in sends a request to a specific application server and if that server is unavailable, it marks it as down.
The request is then sent to the next application server in the cluster. The plug-in does not retry the marked down server until the retry interval has passed.
In our example, because the retry interval is longer than the recycle time, there are time slots where one server’s retry count has not yet expired. So, the plug-in does not retry that server (even though it might be back up), and the other server is down. Thus, there might be a time slot where the Web server plug-in thinks that both servers are down.
Figure illustrates this example
In an environment with two application servers per cluster, the plug-in cannot forward the request to any application server and, thus, returns an HTTP request failure. If there are more than two application servers in the cluster, the plug-in routes the request to the remaining active servers in the cluster. However, depending on the difference between the retry interval and the application server recycle time, there might be more than two application servers that are perceived to be unavailable at any given time.
To avoid this problem, increase the recycle rate to be at least as long in duration as the retry interval. If your retry interval is 60 seconds, then delay at least 60 seconds between starting one application server and stopping another application server.
You can verify the retry interval setting either by using the Administrative Console
(Servers → Web servers → WebServer_Name → Plug-in properties → Request Routing) or
by looking at the plugin_cfg.xml file
If it is not possible to add an appropriate delay between recycling servers, then ensure that your remaining application server environment has the capacity to handle requests for the application assuming that two (or more) of the application servers are unavailable at the same time. So, there are two maxims to remember:
- Do not stop an application server in a shorter duration than the retry interval.
- If you stop an application server in a shorter duration than the retry interval, ensure that the set of remaining active servers can handle the capacity of having two application servers unavailable for a period of time equal to the retry interval.