Therefore, it could be a bad idea to enable fast-start failover under an unreliable network environment. In case we hit a false failover, we should disable the fast-start failover, instead, we can design a manual failover procedure to reduce database down time if MTTR is more tolerant than normal.
In this type of failover, we can analysis and distinguish all symptoms of database environment to avoid a false failover, moreover, we have more options to choose during a failover process.
Now, let's see a demonstration of a manual failover, and then rebuild Data Guard envrionment.
- Check current standby database status.
- Shutdown abort the current primary database.
- Check applying status of current standby database.
- Check Data Guard Broker status before failover.
- Do a complete failover.
- Check current primary database (former standby database).
- Check Data Guard Broker status after failover.
- Reinstate the current standby database. Reinstating process will flashback database to the latest synchronized status, thus, it will discard any data changes that have not been synchronized with current primary database before failover. After reinstating, the former primary database will play a standby role.
- Recreate a standby database.
- Restart current standby database to nomount state.
- Duplicate a physical standby database.
- Restart current standby database to mount state
- Check current standby database status
- Enable current standby database by Data Guard Broker.
- Check current standby database status again.
[oracle@standby01 ~]$ sqlplus / as sysdba
SQL> SELECT inst_id, open_mode, database_role, switchover_status FROM gv$database;
INST_ID OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- -------------------- ---------------- --------------------
1 MOUNTED PHYSICAL STANDBY NOT ALLOWED
2 MOUNTED PHYSICAL STANDBY NOT ALLOWED
It's a normal standby database.
[oracle@primary01 ~]$ srvctl stop database -d compdb -o abort
This step is to simulate a blackout of the primary database.
SQL> SELECT inst_id, process, status, thread#, sequence#, block# FROM gv$managed_standby WHERE process IN ('RFS','LNS','MRP0');
INST_ID PROCESS STATUS THREAD# SEQUENCE# BLOCK#
---------- --------- ------------ ---------- ---------- ----------
1 MRP0 WAIT_FOR_LOG 1 364 00
Only "WAIT_FOR_LOG" is left on MRP0.
[oracle@standby01 ~]$ dgmgrl
...
DGMGRL> connect sys@standb1
Password:
Connected.
DGMGRL> show configuration
Configuration - COMPDR
Protection Mode: MaxPerformance
Databases:
primdb - Primary database
standb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-16625: cannot reach database "primdb"
DGM-17017: unable to determine configuration status
DGMGRL> show database primdb
Database - primdb
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
primdb1
primdb2
Database Status:
DGM-17016: failed to retrieve status for database "primdb"
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-16625: cannot reach database "primdb"
DGMGRL> show database standb
Database - standb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Real Time Query: OFF
Instance(s):
standb1 (apply instance)
standb2
Database Status:
SUCCESS
You can see the database primdb cannot be connected.
DGMGRL> failover to standb;
Performing failover NOW, please wait...
Failover succeeded, new primary is "standb"
If the time is very demanding or a complete failover cannot be done, you need a immediate failover.
DGMGRL> failover to standb immediate;
[oracle@standby01 ~]$ sqlplus / as sysdba
SQL> SELECT inst_id, open_mode, database_role, switchover_status FROM gv$database;
INST_ID OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- -------------------- ---------------- --------------------
1 READ WRITE PRIMARY NOT ALLOWED
2 READ WRITE PRIMARY NOT ALLOWED
Former standby database is now primary, it shows the failover is successful.
DGMGRL> show configuration
Configuration - COMPDR
Protection Mode: MaxPerformance
Databases:
standb - Primary database
primdb - Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
DGMGRL> show database primdb
Database - primdb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: (unknown)
Apply Lag: (unknown)
Real Time Query: OFF
Instance(s):
primdb1
primdb2
Database Status:
ORA-16661: the standby database needs to be reinstated
DGMGRL> show database standb
Database - standb
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
standb1
standb2
Database Status:
SUCCESS
The current standby database (former primary database) is currently disabled, and the Data Guard environment must be back to an operable status, so we needs to reinstate or recreate the current standby database.
[oracle@primary01 ~]$ srvctl start database -d compdb -o mount
...
DGMGRL> reinstate database primdb;
Reinstating database "primdb", please wait...
Operation requires shutdown of instance "primdb2" on database "primdb"
Shutting down instance "primdb2"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Continuing to reinstate database "primdb" ...
Error: ORA-16653: failed to reinstate database
Failed.
Reinstatement of database "primdb" failed
Since we had never enabled flashback database, the reinstating failed with ORA-16653. It's time to recreate a standby database.
[oracle@primary01 ~]$ srvctl stop database -d compdb
[oracle@primary01 ~]$ srvctl start database -d compdb -o nomount
[oracle@primary01 ~]$ rman target sys@standb1 auxiliary sys@primdb1
...
target database Password:
connected to target database: COMPDB (DBID=841830157)
auxiliary database Password:
connected to auxiliary database: COMPDB (not mounted)
RMAN> RUN
{
allocate channel c1 device type disk;
allocate auxiliary channel a1 device type DISK;
DUPLICATE TARGET DATABASE
FOR STANDBY FROM ACTIVE DATABASE
NOFILENAMECHECK;
}
2> 3> 4> 5> 6> 7> 8>
using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=118 instance=standb1 device type=DISK
allocated channel: a1
channel a1: SID=34 instance=primdb1 device type=DISK
Starting Duplicate Db at 06-AUG-13
contents of Memory Script:
{
backup as copy reuse
targetfile '/u01/app/oracle/product/11.2.0/db_1/dbs/orapwstandb1' auxiliary format
'/u01/app/oracle/product/11.2.0/db_1/dbs/orapwprimdb1' ;
}
executing Memory Script
Starting backup at 06-AUG-13
Finished backup at 06-AUG-13
contents of Memory Script:
{
sql clone "alter system set control_files =
''+DATA/compdb/controlfile/current.260.761506189'' comment=
''Set by RMAN'' scope=spfile";
backup as copy current controlfile for standby auxiliary format '+DATA/compdb/controlfile/current.260.761506189';
sql clone "alter system set control_files =
''+DATA/compdb/controlfile/current.260.761506189'' comment=
''Set by RMAN'' scope=spfile";
shutdown clone immediate;
startup clone nomount;
}
executing Memory Script
sql statement: alter system set control_files = ''+DATA/compdb/controlfile/current.260.761506189'' comment= ''Set by RMAN'' scope=spfile
Starting backup at 06-AUG-13
channel c1: starting datafile copy
copying standby control file
output file name=/u01/app/oracle/product/11.2.0/db_1/dbs/snapcf_standb1.f tag=TAG20130806T193214 RECID=9 STAMP=797943439
channel c1: datafile copy complete, elapsed time: 00:00:15
Finished backup at 06-AUG-13
sql statement: alter system set control_files = ''+DATA/compdb/controlfile/current.260.761506189'' comment= ''Set by RMAN'' scope=spfile
Oracle instance shut down
connected to auxiliary database (not started)
Oracle instance started
Total System Global Area 839282688 bytes
Fixed Size 2217992 bytes
Variable Size 792725496 bytes
Database Buffers 41943040 bytes
Redo Buffers 2396160 bytes
allocated channel: a1
channel a1: SID=33 instance=primdb1 device type=DISK
contents of Memory Script:
{
sql clone 'alter database mount standby database';
}
executing Memory Script
sql statement: alter database mount standby database
RMAN-05529: WARNING: DB_FILE_NAME_CONVERT resulted in invalid ASM names; names changed to disk group only.
contents of Memory Script:
{
set newname for tempfile 1 to
"+data";
switch clone tempfile all;
set newname for datafile 1 to
"+data";
set newname for datafile 2 to
"+data";
set newname for datafile 3 to
"+data";
set newname for datafile 4 to
"+data";
set newname for datafile 5 to
"+data";
set newname for datafile 6 to
"+data";
backup as copy reuse
datafile 1 auxiliary format
"+data" datafile
2 auxiliary format
"+data" datafile
3 auxiliary format
"+data" datafile
4 auxiliary format
"+data" datafile
5 auxiliary format
"+data" datafile
6 auxiliary format
"+data" ;
sql 'alter system archive log current';
}
executing Memory Script
executing command: SET NEWNAME
renamed tempfile 1 to +data in control file
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
Starting backup at 06-AUG-13
channel c1: starting datafile copy
input datafile file number=00001 name=+DATA/compdb/datafile/system.268.795098881
output file name=+DATA/primdb/datafile/system.897.797943475 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:01:35
channel c1: starting datafile copy
input datafile file number=00002 name=+DATA/compdb/datafile/sysaux.269.795098973
output file name=+DATA/primdb/datafile/sysaux.898.797943569 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:01:55
channel c1: starting datafile copy
input datafile file number=00003 name=+DATA/compdb/datafile/undotbs1.270.795099057
output file name=+DATA/primdb/datafile/undotbs1.899.797943687 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:00:25
channel c1: starting datafile copy
input datafile file number=00005 name=+DATA/compdb/datafile/example.271.795099083
output file name=+DATA/primdb/datafile/example.900.797943711 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:00:26
channel c1: starting datafile copy
input datafile file number=00006 name=+DATA/compdb/datafile/undotbs2.272.795099097
output file name=+DATA/primdb/datafile/undotbs2.901.797943737 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:00:15
channel c1: starting datafile copy
input datafile file number=00004 name=+DATA/compdb/datafile/users.273.795099105
output file name=+DATA/primdb/datafile/users.902.797943753 tag=TAG20130806T193254
channel c1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 06-AUG-13
sql statement: alter system archive log current
contents of Memory Script:
{
switch clone datafile all;
}
executing Memory Script
datafile 1 switched to datafile copy
input datafile copy RECID=9 STAMP=797943765 file name=+DATA/primdb/datafile/system.897.797943475
datafile 2 switched to datafile copy
input datafile copy RECID=10 STAMP=797943766 file name=+DATA/primdb/datafile/sysaux.898.797943569
datafile 3 switched to datafile copy
input datafile copy RECID=11 STAMP=797943766 file name=+DATA/primdb/datafile/undotbs1.899.797943687
datafile 4 switched to datafile copy
input datafile copy RECID=12 STAMP=797943766 file name=+DATA/primdb/datafile/users.902.797943753
datafile 5 switched to datafile copy
input datafile copy RECID=13 STAMP=797943766 file name=+DATA/primdb/datafile/example.900.797943711
datafile 6 switched to datafile copy
input datafile copy RECID=14 STAMP=797943766 file name=+DATA/primdb/datafile/undotbs2.901.797943737
Finished Duplicate Db at 06-AUG-13
released channel: c1
released channel: a1
RMAN> exit;
[oracle@primary01 ~]$ srvctl stop database -d compdb
[oracle@primary01 ~]$ srvctl start database -d compdb -o mount
[oracle@primary01 ~]$ sqlplus / as sysdba
...
SQL> select inst_id, open_mode, database_role, switchover_status from gv$database;
INST_ID OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- -------------------- ---------------- --------------------
1 MOUNTED PHYSICAL STANDBY SESSIONS ACTIVE
2 MOUNTED PHYSICAL STANDBY TO PRIMARY
The current standby database is rebuilt and startup to mount state successfully, which also means the controlfile is built for standby currently, no bother to get and apply another copy of standby controlfile.
DGMGRL> show database primdb
Database - primdb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: (unknown)
Apply Lag: (unknown)
Real Time Query: OFF
Instance(s):
primdb1
primdb2
Database Status:
ORA-16795: the standby database needs to be re-created
DGMGRL> enable database primdb
Enabled.
DGMGRL> show database primdb
Database - primdb
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 12 minutes 14 seconds
Apply Lag: 12 minutes 14 seconds
Real Time Query: OFF
Instance(s):
primdb1
primdb2 (apply instance)
Database Status:
SUCCESS
The result shows Data Guard Broker enables database primdb successfully.
[oracle@primary01 ~]$ sqlplus / as sysdba
...
SQL> select inst_id, open_mode, database_role, switchover_status from gv$database;
INST_ID OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- -------------------- ---------------- --------------------
1 MOUNTED PHYSICAL STANDBY NOT ALLOWED
2 MOUNTED PHYSICAL STANDBY NOT ALLOWED
DGMGRL> connect sys@primdb1
Password:
Connected.
DGMGRL> switchover to primdb
Performing switchover NOW, please wait...
Operation requires a connection to instance "primdb2" on database "primdb"
Connecting to instance "primdb2"...
Connected.
Operation requires shutdown of instance "standb2" on database "standb"
Shutting down instance "standb2"...
Database closed.
Database dismounted.
ORACLE instance shut down.
New primary database "primdb" is opening...
Operation requires shutdown of instance "standb1" on database "standb"
Shutting down instance "standb1"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "standb1" on database "standb"
Starting instance "standb1"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "primdb"
DGMGRL>
Now, we are back to the original environment. Everyone is pleased and can sleep well at night.
Since the primary and the standby sites could be in different buildings with a considerable distance, using RMAN to duplicate a standby database online could be unrealistic when the network resource is limited. DBA should design and compare various recreation plans in advance to reduce down time as much as possible.