Showing posts with label Troubleshooting Clusterware. Show all posts
Showing posts with label Troubleshooting Clusterware. Show all posts

Thursday, March 6, 2014

How to Delete From or Add Resource to OCR in Oracle Clusterware

This document explains necessary steps to remove/add resource from/to OCR when normal srvctl command fails. 

Common scenario is resource in "UNKNOWN" status and can not be stopped or deleted.

A. Terms/Variables used in this note

RESOURCE_TYPE can be database, instance, service, listener etc

RESOURCE_NAME is the name from crs_stat command for pre-11.2, or the name from "crsctl stat res" for 11gR2, i.e. ora.racdb.db

RESOURCE_HOME refers to the ORACLE_HOME that the resource runs off, i.e. RDBMS_HOME for database, instance and service resource; 

RESOURCE_OWNER refers to the OS user that owns the RESOURCE_HOME, i.e. grid user for GRID_HOME.

B. Resource must be managed by srvctl from RESOURCE_HOME/bin

Resource must be managed by srvctl from $RESOURCE_HOME where resource is running off, for example, vip from $GRID_HOME or $CRS_HOME, 11.2 .db in 11.2 RDBMS_HOME, and 11.1 .db in 11.1 RDBMS_HOME etc.

If wrong srvctl is used, the following error will be reported:
PRCD-1245 : Addition of database version 11.2.0.3.0 is not allowed using srvctl version 10.2.0.0.0

OR

PRCD-1027 : Failed to retrieve database racdb
PRKP-1088 : Failed to retrieve configuration of cluster database racdb
PRKR-1078 : Database racdb of version 10.2.0.0.0 cannot be administered using current version of srvctl. Instead run srvctl from /dbhome/10.2

OR

PRCD-1027 : Failed to retrieve database racdb
PRCD-1229 : An attempt to access configuration of database racdb was rejected because its version 11.2.0.2.0 differs from the program version 11.2.0.3.0. Instead run the program from /dbhome/11.2.0.2

C. To remove a resource from OCR

A resource should be in OFFLINE state before it can be removed, and srvctl from RESOURCE_HOME should be executed as RESOURCE_OWNER:

1. To stop:

Try the following sequentially until the resource is stopped successfully:
$ $RESOURCE_HOME/bin/srvctl stop <RESOURCE_TYPE> <options>
$ $RESOURCE_HOME/bin/srvctl stop <RESOURCE_TYPE> <options> -f




For srvctl syntax, refer to:

Oracle Real Application Clusters
Administration and Deployment Guide

Server Control Utility Reference


2. To remove 

Once the resource is stopped, try the following sequentially until resource is removed successfully:
$ $RESOURCE_HOME/bin/srvctl remove<RESOURCE_TYPE><options>
$ $RESOURCE_HOME/bin/srvctl remove<RESOURCE_TYPE><options> -f


D. To add a resource to OCR


3. To add:
$ $RESOURCE_HOME/bin/srvctl add<RESOURCE_TYPE><options>

E. To troubleshoot

If srvctl reports error, the following SRVM tracing can be turned on before executing the srvctl command:
$ script /tmp/out.1
$ SRVM_TRACE=true
$ export SRVM_TRACE
$ $RESOURCE_HOME/bin/srvctl <command> <RESOURCE_TYPE> <option>
$ exit

Screen output will be saved in /tmp/out.1

To get help on srvctl syntax:
$ $RESOURCE_HOME/bin/srvctl <command> <RESOURCE_TYPE> -h

E.1. Engage Oracle Support

If the issue can not be solved, engage Oracle Support

F. Misc

F1. VIP or network resource(in 11.2.0.2 or above)

Addition, removal or modification of VIP or network resource (in 11.2.0.2 or above, ora.net1.network) must be done by root user

F2. Pre-11.2 database in Oracle Restart

Pre-11.2 single instance databases can not be managed by 11gR2 Grid Infrastructure Standalone (aka Oracle Restart), the following error will be reported when trying to register:
PRCD-1245 : Addition of database version n.n.0.n.0 is not allowed using srvctl version n.n.n.n.n

OR

srvctl[nnnn]: /bin/java:  not found"

F3. listener can only be managed by netca in 10gR2

F4. DB_UNIQUE_NAME

When managing database with "srvctl <command> database -d <dbname> <options>", if dbname is different than db_unique_name, db_unique_name must be used to avoid error PRCD-1120 PRCR-1001




REFERENCES

NOTE:1050908.1 - Troubleshoot Grid Infrastructure Startup Issues
NOTE:1068835.1 - What to Do if 11gR2 Grid Infrastructure is Unhealthy

NOTE:948456.1 - Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment

Wednesday, March 5, 2014

Deinstall Grid Infrstructure Oracle 11g

First Collect all the below information

$GRID_HOME/bin/crsctl stat res -t
$GRID_HOME/bin/crsctl stat res -p
$GRID_HOME/bin/crsctl query css votedisk
$GRID_HOME/bin/ocrcheck
$GRID_HOME/bin/oifcfg getif
$GRID_HOME/bin/srvctl config nodeapps -a
$GRID_HOME/bin/srvctl config scan
$GRID_HOME/bin/srvctl config asm -a
$GRID_HOME/bin/srvctl config listener -l <listener-name> -a
$DB_HOME/bin/srvctl config database -d <dbname> -a
$DB_HOME/bin/srvctl config service -d <dbname> -s <service-name> -v

On all remote nodes, as root execute:
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose

Once the above command finishes on all remote nodes, on local node, as root execute:
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose -keepdg -lastnode


As grid user, execute:
$ <$GRID_HOME>/deinstall/deinstall
$ <$GRID_HOME>/deinstall/deinstall

Troubleshooting Oracle Clusterware

Troubleshooting Oracle Clusterware

Objectives

  • Locate Oracle Clusterware log files
  • Gather all log files using diagcollection.pl
  • Enable resource debugging
  • Enable component-level debugging
  • Enable tracing for Java-based tools
  • Troubleshoot the Oracle Cluster Registry (OCR) file

Topics


Refer the links below:

Troubleshooting Oracle Clusterware-1




OCR-Related Tools for Debugging



      OCR tools:
     ocrdump
     ocrconfig
     ocrcheck
     srvctl
      Logs are generated in the following directory: <Grid_Home>/log/<hostname>/client/
      Debugging is controlled through the following file: <Grid_Home>/srvm/admin/ocrlog.ini

# ocrcheck

ocrdump



The ocrdump utility can be used to view the OCR content for troubleshooting. The ocrdump utility enables you to view logical information by writing the contents to a file or displaying the contents to stdout in a readable format. 

If the ocrdump command is issued without any options, the default file name of OCRDUMPFILE will be written to the current directory, provided that the directory is writable. The information contained within the OCR is organized by keys that are associated with privileges. Therefore, the root user will not see the same results as the clusterware owner.


  •      To dump the OCR contents into a text file for reading:

      $ ocrdump file.txt
      # ocrdump file.txt

  •      To dump the OCR contents for a specific key:

      # ocrdump –keyname SYSTEM.language

  •      To dump the OCR contents to stdout in XML format:

      # ocrdump –stdout -xml

  •      To dump the contents of an OCR backup file:

      # ocrdump –backupfile file.ocr

Process Roles for Node Reboots



The following processes can evict nodes from the cluster or cause a node reboot:

  • oclskd: Is used by CSS to reboot a node based on requests from other nodes in the cluster
  • cssdagent and cssdmonitor: Monitor node hangs and vendor clusterware
  • ocssd: Monitors the internode’s health status


Determining Which Process Caused Reboot

Log File Locations for Processes Causing Reboots.

  •    oclskd
    •     <Grid_Home>/log/<hostname>/client/oclskd.log
  •    ocssd
    •     /var/log/messages
    •     <Grid_Home>/log/<hostname>/cssd/ocssd.log
  •    cssdagent
    •     <Grid_Home>/log/<hostname>/agent/ohasd/oracssdagent_root
  •    cssdmonitor
    •     <Grid_Home>/log/<hostname>/agent/ohasd/oracssdmonitor_root