Oracle 11g RAC vip OFFLINE for operation and maintenance

background

One node of the Oracle 11g RAC of a project has an offline vip service, and the cluster has changed from a dual node to a single node

Troubleshooting

  • crsctl command to view cluster status
$ su - grid
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
ora.CRS.dg
               ONLINE  ONLINE       dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
ora.DATA.dg
               ONLINE  ONLINE       dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
ora.LISTENER.lsnr
               ONLINE  OFFLINE      dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
ora.asm
               ONLINE  ONLINE       dbprd1                   Started             
               ONLINE  ONLINE       dbprd2                   Started             
ora.gsd
               OFFLINE OFFLINE      dbprd1                                       
               OFFLINE OFFLINE      dbprd2                                       
ora.net1.network
               ONLINE  ONLINE       dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
ora.ons
               ONLINE  ONLINE       dbprd1                                       
               ONLINE  ONLINE       dbprd2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbprd2                                       
ora.asdfprdb.db
      1        ONLINE  ONLINE       dbprd1                   Open                
      2        ONLINE  ONLINE       dbprd2                   Open                
ora.cvu
      1        ONLINE  ONLINE       dbprd2                                       
ora.dbprd1.vip
      1        ONLINE  OFFLINE                                                   
ora.dbprd2.vip
      1        ONLINE  ONLINE       dbprd2                                       
ora.oc4j
      1        ONLINE  ONLINE       dbprd1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       dbprd2         

As you can see, ora Dbprd1.vip1 is in OFFLINE status, and ora Listener Lsnr is also OFFLINE. The monitoring should be affected by VIP and can be ignored

  • Check cluster health status
[grid@dbprd1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Other services are in normal status.

  • View alert log

alert logs can be queried through the following sql

[grid@dbprd1 ~]$ sqlplus / as sysdba

SQL> select * from v$diag_info where name ='Diag Alert';

   INST_ID NAME
---------- ----------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
         1 Diag Alert
/u01/app/grid_base/diag/asm/+asm/+ASM1/alert

There is no error message in the alert log, indicating that there is no error in the database instance

  • View system log

The system log is located in the /var/log/messages file and needs root permission. The messages log will be archived regularly. Therefore, you need to find the log of the corresponding date according to the time of your system error

-rw-------  1 root   root   188K Jul  7 00:01 messages
-rw-------  1 root   root   686K Jun 15 03:07 messages-20200615
-rw-------  1 root   root   525K Jun 21 03:42 messages-20200621
-rw-------  1 root   root   694K Jun 29 03:30 messages-20200629
-rw-------  1 root   root   552K Jul  5 03:30 messages-20200705

There are no obvious errors in the messages log

  • View crsd process log

vip is managed by crsd process. You can view the log file of crsd process. The file is located in

/u01/app/11.2.0/grid/log/{SID}/agent/crsd/orarootagent_root/orarootagent_root.log

The following errors were found in the log

CRS-5005: IP Address: 172.16.200.191 is already in use in the network
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/dbprd1/agent/crsd/orarootagent_root//orarootagent_root.log".

2020-06-29 13:02:08.366: [ora.dbprd1.vip][2503161600]{1:57860:43811} [start] (:CLSN00107:) clsn_agent::start }
2020-06-29 13:02:08.366: [    AGFW][2503161600]{1:57860:43811} Command: start for resource: ora.dbprd1.vip 1 1 completed with status: FAIL
2020-06-29 13:02:08.367: [    AGFW][2501060352]{1:57860:43811} Agent sending reply for: RESOURCE_START[ora.dbprd1.vip 1 1] ID 4098:3899996
2020-06-29 13:02:08.367: [    AGFW][2501060352]{1:57860:43811} Agent sending reply for: RESOURCE_START[ora.dbprd1.vip 1 1] ID 4098:3899996
2020-06-29 13:02:08.867: [ora.dbprd1.vip][2503161600]{1:57860:43811} [check] Failed to check 172.16.200.191 on eth0

172.16.200.191 this ip is used as a vip in rac. From the log, it can be seen that this ip is used by other hosts in the same network. At this time, the vip service has been stopped but the host is not able to ping. This indicates that there is indeed a host using this ip. It is reported to the person in charge. Through the query, it is found that a windows device uses this ip. After replacing windowsip, restart the vip and the service returns to normal. The restart command is as follows:

[grid]

$ srvctl start vip -n dbprd1

dbprd1 is the node name

reference

Operation and maintenance Series is a series of operation and maintenance practice records carefully sorted out by me. Each case comes from a real online environment. If you are interested, click the following link to view other articles

Tags: Operation & Maintenance Oracle

Posted by martinchristov on Wed, 01 Jun 2022 22:35:41 +0530