Grid false alert "Agent to OMS communication is broken"
If you are receiving tonnes of alerts from grid as below, here is a simple solution.
Increase max_inactive_time in table sysman.mgmt_emd_ping, default timeout is 120 secs (2 mins)
How does this parameter work with Grid & Agent?
Each Management Agent sends a periodic signal to an Oracle Management Service (OMS) indicating that the Management Agent is available.
If the Management Service does not receive a signal from an Management Agent within a specified time interval (default of 120 seconds) then the Management Service performs a reverse ping. A reverse ping is when the Management Service attempts to contact the Management Agent using the Management Agent URL. If the reverse ping succeeds, then the Oracle Management Service knows that the Management Agent and host are both available.
If the Management Service reverse ping fails, then all targets monitored by the Management Agent are considered to be in the “Agent Unreachable” state, and the Oracle Management Service attempts a TCP ping of the host on which the Management Agent resides. Based on the results of the TCP ping, one of two messages will be returned:
- If the Management Service’s TCP ping to the host succeeds, then the Management Service determines that the Management Agent is down, but the host is still up. The notification alert message will indicate this state.The message is as follows:
Agent is Unreachable (REASON = Connection refused) but the host is UP.The REASON will be filled with the error that was received while performing the reverse ping.
- If the Management Service’s TCP ping to the host fails, then, you may conclude one of the following:
- There are network problems between the Management Service and Management Agent hosts
- The host itself is down
- The host cannot be reached using a ping because a firewall exists between the Management Service and the Management Agent hosts that prevents ICMP traffic from passing, or the Management Agent host does not support ICMP packets
The notification alert message will indicate the problem. If Management Service’s host ping fails, the message is:
Agent is Unreachable (REASON = Connection refused). Host is unreachable (REASON = unknown host)
The REASON for Agent Unreachable will be filled with the error we received while perfoming the reverse ping and the REASON for host unreachability will be filled with the error we received while performing the host ping.