Websphere First Failure Data Capture (FFDC) Logs
Often the websphere default systmout,systemerror logs doesn’t provide detailed information on error. In such cases have a look at directory logs under /opt/IBM/WebSphere/AppServer/profiles/Profile01/dmgr/logs/ffdc & /opt/IBM/WebSphere/AppServer/profiles/Profile01/Node/logs/ffdc
There are three property files which control the behavior of the ffdc filter. The files which are used are based upon the state of the server:
1. ffdcStart.properties: used during start of the server
2. ffdcRun.properties: used after the server is ready
3. ffdcStop.properties: used while the server is in the process of stopping
[was61@IBM]$ du -a | grep ffdcRun.properties
# Enable FFDC processing
# FFDC=true [default]
# Level of processing to perform
# 0 – none
# 1 – monitor exception path
# 2 – dump the call stack, with no advanced processing
# 3 – 2, plus object interspecting the current object
# 4 – 2, plus use DM to process the current object
# 5 – 4, plus process the top part of the call stack with DMs
# 6 – perform advanced processing the entire call stack
# ExceptionFileMaximumAge, number of days to purge the file
# ExceptionFileMaximumAge=<any positive number of days>
# Default is 7 days.
The only file that you should modify is the ffdcRun.properties file. You can change the value of ExceptionFileMaximumAge property. This property specifies the number of days that an FFDC log remains in the <profileroot>/logs/ffdc directory before being deleted.
There are two artifacts which are produced by FFDC, the information can be located in the <Install Root>/logs/FFDC directory:
- * Exception Logs:<ServerName>_Exception.log here mgr_exception.log
- * Incident Stream:<ServerName>_<threadid>_<timeStamp>_<SequenceNumber>.txt
exa Exception Logs [was61@IBM]$ du -a | grep exception.log
exa Incident Stream /opt/IBM/WebSphere/AppServer/profiles/Profile01/dmgr/logs
[logs]$ cd ffdc/
[ffdc]$ ls -lrt
-rw-r–r– 1 was61 was61 6607 Aug 14 07:07 dmgr_0000000a_11.08.14_07.07.09_0.txt
-rw-r–r– 1 was61 was61 1082 Aug 14 07:07 dmgr_exception.log
-rw-r–r– 1 was61 was61 5916 Aug 14 07:07 dmgr_00000011_11.08.14_07.07.12_0.txt
We can relate the incident file with the exception.log file by taking the probeid from the incident file and searching for it in the exception.log file. You will notice that timestamps also match.
——Start of DE processing—— = [14/08/11 07:07:12:086 BST] , key = java.io.IOException com.ibm.ws.management.discovery.DiscoveryService.sendQuery 165
Exception = java.io.IOException
Source = com.ibm.ws.management.discovery.DiscoveryService.sendQuery
probeid = 165
Index Count Time of last Occurrence Exception SourceId ProbeId
1 1 14/08/11 07:07:05:200 BST org.omg.CORBA.BAD_OPERATION com.ibm.ws.naming.jndicos.CNContextImpl.isLocal 3510
+ 2 1 14/08/11 07:07:08:047 BST com.ibm.websphere.security.EntryNotFoundException com.ibm.ws.security.auth.ContextManagerImpl.runAs 4162
+ 3 1 14/08/11 07:07:08:050 BST com.ibm.websphere.wim.exception.EntityNotFoundException com.ibm.websphere.security.EntryNotFoundException 170
+ 4 1 14/08/11 07:07:08:064 BST com.ibm.websphere.security.EntryNotFoundException com.ibm.ws.security.role.RoleBasedConfiguratorImpl.fillMissingAccessIds 542
+ 5 1 14/08/11 07:07:09:325 BST com.ibm.wkplc.extensionregistry.util.XmlUtilException class com.ibm.wkplc.extensionregistry.RegistryLoader.restore 1
+ 6 1 14/08/11 07:07:12:086 BST java.io.IOException com.ibm.ws.management.discovery.DiscoveryService.sendQuery 165
1. Exception Log: Row elements
The exception logs contains all of the exception paths which have been encountered since the server has started. Due to optimizations in the data collection, the table was created to give an over view of the exceptions which have been encountered in the server. A entry in the table look like this :
Index Occur Time of last Occurence Exception SourceId ProbeId
18 1 11/24/10 15:29:59:893 GMT com.ibm.websphere.security.auth.WSLoginFailedException com.ibm.ws.security.auth.JaasLoginHelper.jaas_login 487
19 8 11/24/10 15:29:23:819 GMT javax.net.ssl.SSLHandshakeException com.ibm.ws.security.orbssl.WSSSLClientSocketFactoryImpl.createSSLSocket 540
20 1 11/24/10 15:29:59:838 GMT com.ibm.websphere.security.PasswordCheckFailedException com.ibm.ws.security.auth.ContextManagerImpl.runAs 4101
21 2 11/24/10 15:29:23:979 GMT com.ibm.websphere.management.exception.ConnectorException com.ibm.ws.management.RoutingTable.Accessor.getConnector 583
The first element in the row is a simply index, this is simply used to determine the number of rows in the table. In some entries, a ‘+’ may appear in the first column, this indicates that the row has been added to the table since the last time the entire table was dunmped.
The second element is the number of occurences. This is useful to see if there is an unusual number of exceptions which are occurring.
The third element in the row, is a time stamp for the last occurence of the exeception. This is useful in looking at exceptions which have occurred at about the same time.
The last element in the row is a combination of values. This consists of the exception name, a source Id and the probe Id. This information is useful to locate information in the incident steam about the specific failure.
File content : The make up of the file can be a little confusing when first viewed. The file is a accumulation of all of the dumps which have occurred over the life of the server. This means that much of the informaion in the file is out of data, and does not apply to the current server. The most relevent information is the last (tail) of the file.
It is quite easy to locate the last dump of the exception table. The dump will be deliminated by ‘——————-…’. Entries which begin with a ‘+’ appear outside the delimination of the table, and indicate that they are additions to the table since the last time the table was dumped. (Again due to performance concerns, the table is dump only periodically, and when the server is stopping).
The information in the above file is displayed in the unordered form as the hash table. A more viewable form of the file would be to actually sort the output based upon the time stamp. (This is done by using mks commands, hopefully there are available on your system).
Sorted output of only the last dump of the exception table for Server1_Exception.log. This is done by the following command :
tail -n<n> <servername>_exception.log | sort -k4n
where n is the number exceptions in the exception table plus 1 (use the index value to determine this value). <servername> is the name of the server.
Note: The sort key needs a little work for servers which have rolled the data.
2. Incident Stream
The incident stream contains more details about exceptions which have been encountered during the running of the server. Depending on the configuration of the property files, the content of the incident streams will vary.
The default settings of the property files, the incident stream will not contain exception information for exceptions which were encountered during the start of the server (due to the Level=1 in the ffdcStart.properties). But where the server does to ready, and new exeception which is encountered will be processed.
The incident stream files should be used in conjunction of the exception log. The values which are contained in the exception log, in most instances will have a corresponding entry in the incident stream. The relationship between the exception log and the incident stream is the hash code which is made up of the exception type, the source Id, and the probe Id. The simpliest way to look at this information is to use the grep command. The information is not all contained on the same line, if you need to know the exact file containing the value, you can use a compound grep command.
File content : The file contains information on exception which have been encountered. Each exception will contain information which corresponds to the information (exception name, source Id and the probe Id) contained in the exception table (documented above). If the catch of the exception is a non-static method, the content of the this pointer. In some instances, if there is a diagnostic module which corresponds to the current execution, the DM will write the information about the state of the object to the incident stream.
The call stack will also be written to the incident stream.
In some instances, there may be an exception which was encountered while the server is running which will not produce a call stack. This is because the exception was encountered during the start of the server, and since the server started, the exception is considered to be a normal path exception. All of the exception can be seen by either looking at all of the runtime exceptions, or looking at all of the exceptions.