OGG-15163
When we tried to stop a GoldenGate service process, we may see OGG-15163 to alert us that the process is unstoppable. Let's see what we had.
Lagged Extract
The EXTRACT seemed hung, but its status was still "RUNNING". Restarting the process is my first thought, but we can't stop it normally. At last, we force stop EXTRACT by killing the process. Here is the case:
GGSCI (product1) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING E_FIAPP1 00:51:52 19:13:54
EXTRACT RUNNING P_FIAPP1 00:00:00 00:00:00
As you can see, the column "Time Since Chkpt" is far behind current time. Apparently, EXTRACT stops working. Something's wrong.
Check EXTRACT Log (Report)
GGSCI (product1) 6> view report E_FIAPP1
...
2018-01-02 13:36:10 ERROR OGG-01028 Timeout in waiting for threads to respond to message:9.
Is it waiting for something temporarily? No, it has been 19 hours long. If we don't intervene immediately, it could be hung forever. Therefore, we should try to restart EXTRACT group and see the response.
Stop EXTRACT
Let's see the response.
GGSCI (product1) 7> stop E_FIAPP1
Sending STOP request to EXTRACT E_FIAPP1 ...
2018-01-03 08:38:26 ERROR OGG-15163 There was a problem sending a message to EXTRACT E_FIAPP1 (Timeout waiting for message).
It seems unstoppable.
For newer releases, it additionally provides some hints:
GGSCI (product1) 7> stop E_FIAPP1
Sending STOP request to EXTRACT E_FIAPP1 ...
STOP request pending. There are open, long-running transactions.
Before you stop Extract, make the archives containing data for those transactions available for when Extract restarts.
To force Extract to stop, use the SEND EXTRACT E_FIAPP1, FORCESTOP command.
Recovery information:
Oldest redo log files necessary to restart Extract are:
Redo Thread 1, Redo Log Sequence Number 6200, SCN 4.3500183118 (20980052302), RBA 842407184
Redo Thread 2, Redo Log Sequence Number 6433, SCN 4.3440914449 (20920783633), RBA 239712528
In this case, OGG-15163 means that GGSCI has sent a STOP message to the process, but the process didn't respond. We need to stop it force, a stronger way to do it.
KILL EXTRACT
OGG kill Command
You can use KILL EXTRACT to kill an Extract process running in regular or PASSIVE mode. Use this command only if a process cannot be stopped gracefully with the STOP EXTRACT command.
GGSCI (product1) 8> kill extract E_FIAPP1
Sending KILL request to MANAGER ...
Killed process (27604) for EXTRACT E_FIAPP1
KILL REPLICAT
In a similar way, we can kill a replicate process by issuing KILL REPLICAT in the interactive mode.
OS kill Command
Actually, it's the same as killing the process on OS-level.
[oracle@product1 ogghome]$ ps -ef | grep E_FIAPP1
...
oracle 79138 37711 86 Jan02 ? 20:19:20 /ogghome/extract PARAMFILE /ogghome/dirprm/e_fiapp1.prm REPORTFILE /ogghome/dirrpt/E_FIAPP1.rpt PROCESSID E_FIAPP1 USESUBDIRS
[oracle@product1 ogghome]$ kill -9 79138
By the way, the Manager process will not attempt to restart a killed Extract process.
Start EXTRACT
[oracle@product1 ogghome]$ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.1.2.0.0 17185003 OGGCORE_12.1.2.0.0_PLATFORMS_130924.1316_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Sep 25 2013 02:33:54
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.
GGSCI (product1) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED E_FIAPP1 00:51:52 19:23:10
EXTRACT RUNNING P_FIAPP1 00:00:00 00:00:05
GGSCI (product1) 2> start E_FIAPP1
Sending START request to MANAGER ...
EXTRACT E_FIAPP1 starting
GGSCI (product1) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING E_FIAPP1 20:15:12 00:00:04
EXTRACT RUNNING P_FIAPP1 00:00:00 00:00:00
Now the extract is moving on.
Send FORCESTOP
The following commands may achieve the same result. For example:
SEND EXTRACT <extract_group_name>, FORCESTOP
You can test them by yourself