background
One node of the Oracle 11g RAC of a project has an offline vip service, and the cluster has changed from a dual node to a single node
Troubleshooting
- crsctl command to view cluster status
$ su - grid $ crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE dbprd1 ONLINE ONLINE dbprd2 ora.CRS.dg ONLINE ONLINE dbprd1 ONLINE ONLINE dbprd2 ora.DATA.dg ONLINE ONLINE dbprd1 ONLINE ONLINE dbprd2 ora.LISTENER.lsnr ONLINE OFFLINE dbprd1 ONLINE ONLINE dbprd2 ora.asm ONLINE ONLINE dbprd1 Started ONLINE ONLINE dbprd2 Started ora.gsd OFFLINE OFFLINE dbprd1 OFFLINE OFFLINE dbprd2 ora.net1.network ONLINE ONLINE dbprd1 ONLINE ONLINE dbprd2 ora.ons ONLINE ONLINE dbprd1 ONLINE ONLINE dbprd2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbprd2 ora.asdfprdb.db 1 ONLINE ONLINE dbprd1 Open 2 ONLINE ONLINE dbprd2 Open ora.cvu 1 ONLINE ONLINE dbprd2 ora.dbprd1.vip 1 ONLINE OFFLINE ora.dbprd2.vip 1 ONLINE ONLINE dbprd2 ora.oc4j 1 ONLINE ONLINE dbprd1 ora.scan1.vip 1 ONLINE ONLINE dbprd2
As you can see, ora Dbprd1.vip1 is in OFFLINE status, and ora Listener Lsnr is also OFFLINE. The monitoring should be affected by VIP and can be ignored
- Check cluster health status
[grid@dbprd1 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Other services are in normal status.
- View alert log
alert logs can be queried through the following sql
[grid@dbprd1 ~]$ sqlplus / as sysdba SQL> select * from v$diag_info where name ='Diag Alert'; INST_ID NAME ---------- ---------------------------------------------------------------- VALUE -------------------------------------------------------------------------------- 1 Diag Alert /u01/app/grid_base/diag/asm/+asm/+ASM1/alert
There is no error message in the alert log, indicating that there is no error in the database instance
- View system log
The system log is located in the /var/log/messages file and needs root permission. The messages log will be archived regularly. Therefore, you need to find the log of the corresponding date according to the time of your system error
-rw------- 1 root root 188K Jul 7 00:01 messages -rw------- 1 root root 686K Jun 15 03:07 messages-20200615 -rw------- 1 root root 525K Jun 21 03:42 messages-20200621 -rw------- 1 root root 694K Jun 29 03:30 messages-20200629 -rw------- 1 root root 552K Jul 5 03:30 messages-20200705
There are no obvious errors in the messages log
- View crsd process log
vip is managed by crsd process. You can view the log file of crsd process. The file is located in
/u01/app/11.2.0/grid/log/{SID}/agent/crsd/orarootagent_root/orarootagent_root.log
The following errors were found in the log
CRS-5005: IP Address: 172.16.200.191 is already in use in the network . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/dbprd1/agent/crsd/orarootagent_root//orarootagent_root.log". 2020-06-29 13:02:08.366: [ora.dbprd1.vip][2503161600]{1:57860:43811} [start] (:CLSN00107:) clsn_agent::start } 2020-06-29 13:02:08.366: [ AGFW][2503161600]{1:57860:43811} Command: start for resource: ora.dbprd1.vip 1 1 completed with status: FAIL 2020-06-29 13:02:08.367: [ AGFW][2501060352]{1:57860:43811} Agent sending reply for: RESOURCE_START[ora.dbprd1.vip 1 1] ID 4098:3899996 2020-06-29 13:02:08.367: [ AGFW][2501060352]{1:57860:43811} Agent sending reply for: RESOURCE_START[ora.dbprd1.vip 1 1] ID 4098:3899996 2020-06-29 13:02:08.867: [ora.dbprd1.vip][2503161600]{1:57860:43811} [check] Failed to check 172.16.200.191 on eth0
172.16.200.191 this ip is used as a vip in rac. From the log, it can be seen that this ip is used by other hosts in the same network. At this time, the vip service has been stopped but the host is not able to ping. This indicates that there is indeed a host using this ip. It is reported to the person in charge. Through the query, it is found that a windows device uses this ip. After replacing windowsip, restart the vip and the service returns to normal. The restart command is as follows:
[grid] $ srvctl start vip -n dbprd1
dbprd1 is the node name
reference
- https://docs.oracle.com/database/121/RACAD/GUID-B3AF3FC7-2EC1-4A8B-A4D9-28CF0C239AF6.htm#RACAD7848
- https://support.oracle.com/knowledge/Oracle%20Database%20Products/1470361_1.html
Operation and maintenance Series is a series of operation and maintenance practice records carefully sorted out by me. Each case comes from a real online environment. If you are interested, click the following link to view other articles
- Oracle 11g RAC vip OFFLINE for operation and maintenance
- Slow password modification of OIM for operation and maintenance
- Linux virus cleaning for operation and maintenance
- jenkins' permission for operation and maintenance
- weblogic memory leakage in operation and maintenance
- Troubleshooting of server high load problem in operation and maintenance
- Solving the problem of expired kubernetes Certificate in operation and maintenance
- RAC 12c database fails to start for operation and maintenance
- Port occupation in operation and maintenance
- OAM and ADFS single sign on for operation and maintenance
- Ghost of operation and maintenance JCO