The default behavior of the broker is to start the real-time apply once the standby database is running. But sometimes, you will see an increasing apply lag for a long time, and the status of apply service (MRP0) is always WAIT_FOR_LOG, which means the real-time apply is waiting for something and not working.
In my experiences, the broker is sometimes and partially dysfunctional due to poor network condition. At that moment, I think the broker does not control over the apply service because of unknown waiting.
In our case, real-time apply has already enabled by broker with zero DelayMins by default, but now the apply lag is almost two hours long. It must be something wrong. At last, I did one simple move that can make the applying service work again.
Check LNS on Primary Database: Writing
Let's see the status of redo log transportation processes on the primary database which is a RAC in our case.
SQL> select inst_id, process, status, thread#, sequence#, block#, blocks from gv$managed_standby where process in ('RFS','LNS','MRP0');
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
---------- --------- ------------ ---------- ---------- ---------- ----------
2 LNS WRITING 2 74 17773 1
1 LNS WRITING 1 79 25355 1
LNS is transporting the redo data to the standby side without problems.
Check MRP0 on Standby Database: WAIT_FOR_LOG
Now we check the standby database in turn and see the status of receiving (RFS) and applying (MRP) processes.
SQL> select inst_id, process, status, thread#, sequence#, block#, blocks from gv$managed_standby where process in ('RFS','LNS','MRP0');
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
---------- --------- ------------ ---------- ---------- ---------- ----------
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 RFS IDLE 1 79 25361 1
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 MRP0 WAIT_FOR_LOG 2 74 0 0
2 RFS IDLE 2 74 17779 1
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
10 rows selected.
RFS is continuously receiving redo data from the primary. But MRP0 is WAIT_FOR_LOG, it seems to be trapped or wait for something infinitely. No real-time apply in this moment.
Check Broker Status
Check the apply lag by the broker.
[oracle@primary01 ~]$ dgmgrl sys/password@primdb1
...
DGMGRL> show database standb;
Database - standb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds
Apply Lag: 1 hour(s) 55 minutes 47 seconds
Real Time Query: OFF
Instance(s):
standb1
standb2 (apply instance)
Database Status:
SUCCESS
Everything looks fine besides "Apply Lag". Normally, "Apply Lag" should be 0 if real-time apply is working, but now, the lag is almost 2 hours. Did the broker fall asleep?
Re-enable Standby Database
In such situation, you can re-enable the standby database to make it work again.
DGMGRL> enable database standb;
Enabled.
The broker is waked up and I think it's interrupted in something infinite loop.
Check the apply lag again.
DGMGRL> show database standb;
Database - standb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Real Time Query: OFF
Instance(s):
standb1
standb2 (apply instance)
Database Status:
SUCCESS
There's no apply lag now.
Check the status of the apply service (MRP0).
SQL> select inst_id, process, status, thread#, sequence#, block#, blocks from gv$managed_standby where process in ('RFS','LNS','MRP0');
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
---------- --------- ------------ ---------- ---------- ---------- ----------
1 RFS RECEIVING 0 0 0 0
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 RFS IDLE 1 80 58 1
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
2 MRP0 APPLYING_LOG 1 79 25581 25582
2 RFS IDLE 2 75 43 1
2 RFS IDLE 0 0 0 0
2 RFS IDLE 0 0 0 0
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
---------- --------- ------------ ---------- ---------- ---------- ----------
2 RFS RECEIVING 0 0 0 0
12 rows selected.
Now it's back to normal. APPLYING_LOG is the normal status for real-time applying service.
Problem not solved getting following error:
DGM-17016: failed to retrieve status for database
Thanks for your input. I think there might be other errors accompanied in your stack trace need to be investigated.