Oracle 11g Grid| How to add custom shell script to raise user defined alert/notification

Posted By Sagar Patil

Although, I can use grid to carry my RMAN backups  I am not entirely convinced about it’s transparency. As a DBA I like to  have more control to myself and I trust my custom scripts used for years more than anything else. Here is a small process I added to raise alert for failed rman backups.

I wrote 2 scripts though I can possibly combine them in a single script. Please feel free to make changes.

rman_full.sh : Level 0 RMAN bckup script & check_rman_log.sh :  Shell script to check for keywords & raise errors

#!/bin/ksh
# Declare your ORACLE environment variables  (rman_full.sh)

export ORACLE_SID=GRID
export ORACLE_BASE=/opt/app/oracle
export ORACLE_HOME=/opt/app/oracle/product/11.2/db_1
export PATH=$PATH:${ORACLE_HOME}/bin
$ORACLE_HOME/bin/rman target / msglog=/mnt/data/backups/rman/rman_${ORACLE_SID}.log <<eof
run {
allocate channel d1 type disk;
backup incremental level 0 cumulative
skip inaccessible
tag Full_Online_Backup
format ‘/mnt/data/backups/rman/OMS_data_t%t_s%s_p%p’
database;
copy current controlfile to ‘/mnt/data/backups/rman/snap_ctl.ctl’;
sql ‘alter system archive log current’;
backup
format ‘/mnt/data/backups/rman/OMS_archive_t%t_s%s_p%p’
archivelog all
delete input;
DELETE NOPROMPT OBSOLETE;
DELETE NOPROMPT EXPIRED BACKUP;
release channel d1;
}

Following shell script (check_rman_log.sh) will look for error codes & messages at rman log file and will return string value of  “Backup Completed  Successfully” else “Backup Failed” to grid control

#!/bin/bash
#author: Sagar PATIL  (check_rman_log.sh)

#!/bin/bash
#author: Sagar PATIL

# Exit codes
STATE_OK=”em_result=Backup Completed  Successfully”
STATE_CRITICAL=”em_result=Backup Failed”

#I’m just declaring the logfile variable
# logfile=/mnt/data/backups/rman/rman_GRID01DB.log

#this is the minimum size that the file should have (bytes)
minimumLogSize=1000

#I need to get current date
curDate=$(date “+%d-%b-%y” | tr ‘a-z’ ‘A-Z’)

#debug (1 = ON)
DEBUG=0

#this array will contain the words that should be found into the log
#keywordsOK[0]=”Finished backup at $curDate”
keywordsOK[1]=”Finished Control File and SPFILE Autobackup at $curDate”

#this array will contain the words that shouldn’t be found in the log.
#if they are found the script will exit with STATE_CRITICAL code (2)
keywordsBad[0]=”ORA-”
keywordsBad[1]=”ERR-”
keywordsBad[2]=”err-”
keywordsBad[3]=”Ora-”
keywordsBad[4]=”user interrupt received”

#this function checks the log file creation date. if the
#creation date is different that the current date, the
#script will exit with $STATE_CRITICAL state (error code 2)
checkCreationDate() {
#this is the date of creation of the log file (I’m using ctime UNIX stuff)
fileDate=$(stat $logfile | grep Access | tail -n 1 | awk ‘{print $2}’)
currentDate=$(date “+%Y-%m-%d”)
#compare dates
if [[ "$fileDate" != "$currentDate" ]]; then
#in this case, the dates don’t match so the script
#will print an error msg and then exit
#        echo “Error checking date: today is $currentDate and the file creation is $fileDate”
echo $STATE_CRITICAL “| Error checking date: today is $currentDate and the file creation is $fileDate”
else
#show a message if the log file creation date is OK
if [ $DEBUG -eq 1 ]; then
echo “Date checked. All OK”
fi
fi
}

#this function will first check for the words that shouldnt be
#in the log file (the ones in the keywordsBad array); if they are
#found the script will exit with STATE_CRITICAL code (2). On the
#other hand, if the ‘bad’ keywords are not found, then it will
#loop through the array that contains the words that shoud be
#found; if those keywords are not found the script will exit with
#STATE_CRITICAL code (2).
checkKeywords() {
#loop through the undesirable keywords
for i in “${keywordsBad[@]}”; do
#look for the keyword in the file
if tac $logfile | grep -w -i -m1 “$i” > /dev/null
then
#show error msg and exit
#            echo “Errors in the log ($i)”
echo $STATE_CRITICAL “|Errors in the log ($i)”
else
echo > /dev/null
fi
done

#status: 1 = OK, 0 = fail
status=1
#since the keywords that shouldnt be found in the script
#were NOT found… check for the ones that should
for i in “${keywordsOK[@]}”; do
#look for the keyword backwards in the file
if tac $logfile | grep -i -m1 “$i” > /dev/null
then
echo > /dev/null
else
#if there were found a keyword the status
#will be set to 0 indicating something wrong is happening
status=0
fi
done

#if all is OK
if [[ $status -eq 1 ]]; then
if [ $DEBUG -eq 1 ]; then
echo “The ‘good’ keywords were found :)
fi
else #if the script couldnt find one of the keywords
#show error msg and exit
#        echo “Couldnt find the Good  keywords in the file”
echo $STATE_CRITICAL “|Couldnt find the Good  keywords in the file”
fi
}
#this function checks the log size. if it’s greater than
#1KB we consider the log file is OK; otherwise the script
#will exit with error code
checkFileSize() {
#get the file size
fileSize=$(ls -l $logfile | awk ‘{print $5}’)
#compare the log size
if [[ $fileSize -gt $minimumLogSize ]]; then
if [ $DEBUG -eq 1 ]; then
echo “Log file size is OK ($fileSize)”
fi
else
#        echo “Log file size is not OK ($fileSize)”
echo $STATE_CRITICAL “| Log file size is not OK ($fileSize)”
fi
}
#loop through the script parameters (each parameter is a path with
#logfile name example /u07/backup/RMAN/).
#Then, for each parameter run the functions.

while [ $# -ne 0 ]; do
logfile=”$1″
if [ $DEBUG -eq 1 ]; then
echo “————————————-”
echo “Checking the log file: $logfile”
fi
#check if file exists or not
if [ -e "$logfile" ]; then
#check the log file creation date
checkCreationDate
#check the file size (it uses the $minimumLogSize var)
checkFileSize
#search keywords in the file
checkKeywords
else
#        echo “The file ‘$logfile’ doesn’t exist”
echo $STATE_CRITICAL  “| The file ‘$logfile’ doesn’t exist”
fi
shift
echo
done

#At end of the program move logfile to preseve history of 30 days
#mv $logfile $logfile_curDate
#find /u07/backup/RMAN/ -name rman_*.log -mtime +30 -exec rm {} \;

#if the script was not killed in the checking part,
#then it’s probably that all is OK

  • Click at Targets from Top menu and select required  “Host” machine
  • Scroll down and you will see a link for “User-Defined Metrics” , at next screen select “create”

  • Enter details like Metric Name, Metric Type, Command Line, Operating System Credentials, Thresholds as below

  • select required Schedule and click OK.

  • If you have selected “Start Immediately after creation” radio button, in minutes you will see an alert if there is a failed backup

  • Click on message for details

Leave a Reply

You must be logged in to post a comment.

Top of Page

Top menu