Wednesday, July 31, 2019

How to check if your physical standby database is applying logs or not



For Oracle versions 12.1 and lowe, use V$MANAGED_STANDBY:


SELECT CLIENT_PROCESS, PROCESS, THREAD#, SEQUENCE#, STATUS
FROM V$MANAGED_STANDBY
WHERE CLIENT_PROCESS='LGWR' OR PROCESS='MRP0'
;

CLIENT_PROCESS PROCESS THREAD# SEQUENCE# STATUS
LGWR RFS 1 116713 RECEIVING
N/A MRP0 1 114039 APPLYING_LOG

Important columns are

* PROCESS, which is the type of process whose information is being reported
MRP0 means the detached recovery server process. In this case, it is applying logs.

* CLIENT_PROCESS, which identifies the corresponding primary database process.
LGWR indicates the background log writer process

If there is a gap to be covered, you'll see the processes as being idle:

CLIENT_PROCESS PROCESS THREAD# SEQUENCE# STATUS
LGWR RFS 1 116727 IDLE
N/A MRP0 1 114115 WAIT_FOR_LOG

This is a good time to check where the gap is:
SELECT * FROM v$archive_gap;

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# CON_ID
1 114120 114121 1

This matches the alert log of the standby database:
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 114120-114121

For Oracle 12.2 and beyond, Oracle encourage DBAs to use V$DATAGUARD_PROCESS (although V$MANAGED_STANDBY is still there for backward compability).
This view "displays one row for each Oracle Data Guard process that is currently running."

According to Oracle's documentation, there should be information about a process mapped toa role called "recovery logmerger" which in the ACTION column should state "APPLYING_LOG". I have repeatedly tried to find this information without luck.

What I do find in V$DATAGUARD_PROCESS is a role called "managed recovery".
If I run these queries against my database, they seem to yield two different results, the first telling me that manged recovery is idle, the second telling me that it is applying logs:
set lines 200
select name "process name",pid "standby server PID",role,action,client_role,client_pid "primary server PID" ,task_done
from v$dataguard_process
where role like 'managed recovery'
order by role desc, action;

SELECT CLIENT_PROCESS, PROCESS, THREAD#, SEQUENCE#, STATUS
FROM V$MANAGED_STANDBY
WHERE CLIENT_PROCESS='LGWR' OR PROCESS='MRP0';
exit



Output:
proce standby server PID       ROLE                    ACTION       CLIENT_ROLE      primary server PID T
----- ------------------------ ----------------------- ------------ ---------------- ------------------ -
MRP0  10714                    managed recovery        IDLE         none                              0 N


CLIENT_P PROCESS      THREAD#  SEQUENCE# STATUS
-------- --------- ---------- ---------- ------------
LGWR     RFS                1     116940 RECEIVING
N/A      MRP0               1     116940 APPLYING_LOG

Comments about this seemingly contradictory output would be much welcome.

Tuesday, July 30, 2019

How do identify a transport error in a Data Guard setup using the broker


Applicable to Oracle 12.2

If the broker has been setup, check the configuration basics:
dgmgrl /
show configuration;
Configuration - DGConfig1

  Protection Mode: MaxPerformance
  Members:
  proddb01       - Primary database
    Error: ORA-16778: redo transport error for one or more members

    proddb01_stby1 - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

    proddb01_stby2 - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

In the configuration above, I have a primary + two physical standby databases.

Check the database:
DGMGRL> show database proddb01;

Database - proddb01

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    proddb01
      Error: ORA-16737: the redo transport service for member "proddb01_stby1" has an error

Use the 'LogXptStatus' monitorable property to check for transporation errors:
DGMGRL> show database proddb01 'LogXptStatus'
LOG TRANSPORT STATUS
PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME     STATUS                ERROR
            proddb01       proddb01_stby1      ERROR ORA-16191: Primary log shipping client not logged on standby
            proddb01       proddb01_stby2      VALID

The error ORA-16191 was in my case related to password files not being identical on the standby and the primary database.

For the standby databases, you can use the monitorable property 'RecvQEntries', which returns a table indicating all log files that were received by the standby database but have not yet been applied:
DGMGRL>  show database proddb01_stby1 'RecvQEntries'
STANDBY_RECEIVE_QUEUE
              STATUS     RESETLOGS_ID           THREAD              LOG_SEQ       TIME_GENERATED       TIME_COMPLETED        FIRST_CHANGE#         NEXT_CHANGE#       SIZE (KBs)
         NOT_APPLIED        894183587                1               110406  07/19/2019 10:28:10  07/19/2019 10:29:19          73008920461          73009040943           424661
         NOT_APPLIED        894183587                1               110408  07/19/2019 10:30:35  07/19/2019 10:31:50          73009169966          73009301192           442727
         NOT_APPLIED        894183587                1               110409  07/19/2019 10:31:50  07/19/2019 10:33:14          73009301192          73009443216           426723
         NOT_APPLIED        894183587                1               110410  07/19/2019 10:33:14  07/19/2019 10:34:32          73009443216  

The same information can be obtained by checking the view v$archived_log, like this:
select sequence#
from v$archived_log
where registrar = 'RFS'
and   applied = 'NO';

The actual lag that these non-applied logs causes, can be found with the query:
set lines 200
col source_db_unique_name format a20
col value format a20
SELECT source_db_unique_name,name,value,unit,time_computed,datum_time
FROM v$dataguard_stats
WHERE name = 'apply lag';

The RecvQEntries property is documented here

The LogXptStatus property is documentet here

How to find the last archivelog received and applied in a standby database




The last archive log received:
select  max(sequence#) "Last Standby Seq Received" 
from v$archived_log val, v$database vdb 
where val.resetlogs_change# = vdb.resetlogs_change# 
/
Last Standby Seq Received
116227

The last archive log applied:
select thread#, max(sequence#) "Last Standby Seq Applied" 
from v$archived_log val, v$database vdb 
where val.resetlogs_change# = vdb.resetlogs_change# 
and val.applied in ('YES','IN-MEMORY') 
group by thread# order by 1; 
THREAD# Last Standby Seq Applied
1 112772

As can be seen from the above queries, the standby database is applying older logs that constitute a gap, since the last log applied has a lower sequence number than the last one to be received from the primary database.

The queries were supplied by Oracle Support.

How to find out if your standby database lags behind the primary database


Applicable to Oracle 12.2.

Use the view V$DATAGUARD_STATS, which will only return results when executed on a standby database:
set lines 200
col source_db_unique_name format a20
col value format a20
SELECT source_db_unique_name,name,value,unit,time_computed,datum_time
 FROM v$dataguard_stats
 WHERE name like '%lag'
 ;

The columns explained, as stated in the documentation:
* APPLY LAG - how much does the data in a standby database lag behind the data in the primary database, due to delays in propagating and applying redo to the standby database.
* TRANSPORT LAG - how much the transport of redo to the standby database lag behind the generation of redo on the primary database.
Note what will happen to the value in this column if there are more than one redo log gap: the transport lag is calculated "as if no redo has been received after the beginning of the earliest redo gap".
Both of these columns are of type INTERVAL DAY TO SECOND, so in my case I am facing a 7 days, 11 hours and 7 minutes lag.

* TIME_COMPUTED - Local time at the standby database when the metric was computed
* DATUM_TIME - Local time at the standby database when the datum used to compute the metric was received. Note that the APPLY LAG and TRANSPORT LAG metrics are computed based on data that is periodically received from the primary database. An unchanging value in this column across multiple queries indicates that the standby database is not receiving data from the primary database.

SOURCE_DB_UNIQUE_NAME NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
proddb01 transport lag +07 11:07:37 day(2) to second(0) interval 07/30/2019 08:39:01 07/30/2019 08:32:55
proddb01 apply lag +07 11:07:37 day(2) to second(0) interval 07/30/2019 08:39:01 07/30/2019 08:32:55