Minimalistic Oracle

Wednesday, July 31, 2019

How to check if your physical standby database is applying logs or not

For Oracle versions 12.1 and lowe, use V$MANAGED_STANDBY:

SELECT CLIENT_PROCESS, PROCESS, THREAD#, SEQUENCE#, STATUS
FROM V$MANAGED_STANDBY
WHERE CLIENT_PROCESS='LGWR' OR PROCESS='MRP0'
;

CLIENT_PROCESS	PROCESS	THREAD#	SEQUENCE#	STATUS
LGWR	RFS	1	116713	RECEIVING
N/A	MRP0	1	114039	APPLYING_LOG

Important columns are

* PROCESS, which is the type of process whose information is being reported
MRP0 means the detached recovery server process. In this case, it is applying logs.

* CLIENT_PROCESS, which identifies the corresponding primary database process.
LGWR indicates the background log writer process

If there is a gap to be covered, you'll see the processes as being idle:

CLIENT_PROCESS	PROCESS	THREAD#	SEQUENCE#	STATUS
LGWR	RFS	1	116727	IDLE
N/A	MRP0	1	114115	WAIT_FOR_LOG

This is a good time to check where the gap is:

SELECT * FROM v$archive_gap;

THREAD#	LOW_SEQUENCE#	HIGH_SEQUENCE#	CON_ID
1	114120	114121	1

This matches the alert log of the standby database:

FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 114120-114121

For Oracle 12.2 and beyond, Oracle encourage DBAs to use V$DATAGUARD_PROCESS (although V$MANAGED_STANDBY is still there for backward compability).
This view "displays one row for each Oracle Data Guard process that is currently running."

According to Oracle's documentation, there should be information about a process mapped toa role called "recovery logmerger" which in the ACTION column should state "APPLYING_LOG". I have repeatedly tried to find this information without luck.

What I do find in V$DATAGUARD_PROCESS is a role called "managed recovery".
If I run these queries against my database, they seem to yield two different results, the first telling me that manged recovery is idle, the second telling me that it is applying logs:

set lines 200
select name "process name",pid "standby server PID",role,action,client_role,client_pid "primary server PID" ,task_done
from v$dataguard_process
where role like 'managed recovery'
order by role desc, action;

SELECT CLIENT_PROCESS, PROCESS, THREAD#, SEQUENCE#, STATUS
FROM V$MANAGED_STANDBY
WHERE CLIENT_PROCESS='LGWR' OR PROCESS='MRP0';
exit

Output:

proce standby server PID       ROLE                    ACTION       CLIENT_ROLE      primary server PID T
----- ------------------------ ----------------------- ------------ ---------------- ------------------ -
MRP0  10714                    managed recovery        IDLE         none                              0 N


CLIENT_P PROCESS      THREAD#  SEQUENCE# STATUS
-------- --------- ---------- ---------- ------------
LGWR     RFS                1     116940 RECEIVING
N/A      MRP0               1     116940 APPLYING_LOG

Comments about this seemingly contradictory output would be much welcome.

Tuesday, July 30, 2019

How do identify a transport error in a Data Guard setup using the broker

Applicable to Oracle 12.2

If the broker has been setup, check the configuration basics:

dgmgrl /
show configuration;
Configuration - DGConfig1

  Protection Mode: MaxPerformance
  Members:
  proddb01       - Primary database
    Error: ORA-16778: redo transport error for one or more members

    proddb01_stby1 - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

    proddb01_stby2 - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

In the configuration above, I have a primary + two physical standby databases.

Check the database:

DGMGRL> show database proddb01;

Database - proddb01

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    proddb01
      Error: ORA-16737: the redo transport service for member "proddb01_stby1" has an error

Use the 'LogXptStatus' monitorable property to check for transporation errors:

DGMGRL> show database proddb01 'LogXptStatus'
LOG TRANSPORT STATUS
PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME     STATUS                ERROR
            proddb01       proddb01_stby1      ERROR ORA-16191: Primary log shipping client not logged on standby
            proddb01       proddb01_stby2      VALID

The error ORA-16191 was in my case related to password files not being identical on the standby and the primary database.

For the standby databases, you can use the monitorable property 'RecvQEntries', which returns a table indicating all log files that were received by the standby database but have not yet been applied:

DGMGRL>  show database proddb01_stby1 'RecvQEntries'
STANDBY_RECEIVE_QUEUE
              STATUS     RESETLOGS_ID           THREAD              LOG_SEQ       TIME_GENERATED       TIME_COMPLETED        FIRST_CHANGE#         NEXT_CHANGE#       SIZE (KBs)
         NOT_APPLIED        894183587                1               110406  07/19/2019 10:28:10  07/19/2019 10:29:19          73008920461          73009040943           424661
         NOT_APPLIED        894183587                1               110408  07/19/2019 10:30:35  07/19/2019 10:31:50          73009169966          73009301192           442727
         NOT_APPLIED        894183587                1               110409  07/19/2019 10:31:50  07/19/2019 10:33:14          73009301192          73009443216           426723
         NOT_APPLIED        894183587                1               110410  07/19/2019 10:33:14  07/19/2019 10:34:32          73009443216

The same information can be obtained by checking the view v$archived_log, like this:

select sequence#
from v$archived_log
where registrar = 'RFS'
and   applied = 'NO';

The actual lag that these non-applied logs causes, can be found with the query:

set lines 200
col source_db_unique_name format a20
col value format a20
SELECT source_db_unique_name,name,value,unit,time_computed,datum_time
FROM v$dataguard_stats
WHERE name = 'apply lag';

The RecvQEntries property is documented here

The LogXptStatus property is documentet here

How to find the last archivelog received and applied in a standby database

The last archive log received:

select  max(sequence#) "Last Standby Seq Received" 
from v$archived_log val, v$database vdb 
where val.resetlogs_change# = vdb.resetlogs_change# 
/

Last Standby Seq Received
116227

The last archive log applied:

select thread#, max(sequence#) "Last Standby Seq Applied" 
from v$archived_log val, v$database vdb 
where val.resetlogs_change# = vdb.resetlogs_change# 
and val.applied in ('YES','IN-MEMORY') 
group by thread# order by 1;

THREAD#	Last Standby Seq Applied
1	112772

As can be seen from the above queries, the standby database is applying older logs that constitute a gap, since the last log applied has a lower sequence number than the last one to be received from the primary database.

The queries were supplied by Oracle Support.

How to find out if your standby database lags behind the primary database

Applicable to Oracle 12.2.

Use the view V$DATAGUARD_STATS, which will only return results when executed on a standby database:

set lines 200
col source_db_unique_name format a20
col value format a20
SELECT source_db_unique_name,name,value,unit,time_computed,datum_time
 FROM v$dataguard_stats
 WHERE name like '%lag'
 ;

The columns explained, as stated in the documentation:
* APPLY LAG - how much does the data in a standby database lag behind the data in the primary database, due to delays in propagating and applying redo to the standby database.
* TRANSPORT LAG - how much the transport of redo to the standby database lag behind the generation of redo on the primary database.
Note what will happen to the value in this column if there are more than one redo log gap: the transport lag is calculated "as if no redo has been received after the beginning of the earliest redo gap".
Both of these columns are of type INTERVAL DAY TO SECOND, so in my case I am facing a 7 days, 11 hours and 7 minutes lag.

* TIME_COMPUTED - Local time at the standby database when the metric was computed
* DATUM_TIME - Local time at the standby database when the datum used to compute the metric was received. Note that the APPLY LAG and TRANSPORT LAG metrics are computed based on data that is periodically received from the primary database. An unchanging value in this column across multiple queries indicates that the standby database is not receiving data from the primary database.

SOURCE_DB_UNIQUE_NAME	NAME	VALUE	UNIT	TIME_COMPUTED	DATUM_TIME
proddb01	transport lag	+07 11:07:37	day(2) to second(0) interval	07/30/2019 08:39:01	07/30/2019 08:32:55
proddb01	apply lag	+07 11:07:37	day(2) to second(0) interval	07/30/2019 08:39:01	07/30/2019 08:32:55

Wednesday, June 19, 2019

What to remember when adding temporary tablespaces on primary databases

Are temporary tablespaces automatically replicated to the standby databases?

No, they are not.
This is documentet in Oracle support note 834174.1 "Temporary Datafile created in Primary is missing in the Standby Database"

The setting of STANDBY_FILE_MANAGEMENT has no impact on temporary tablespaces and temporary files, because no redo is generated.

Unfortunately, this fact is not mentioned in the matrix listed in the documentation under the headline "Primary Database Changes That Require Manual Intervention at a Physical Standby"

From my own experience, here is what I did at a time when the primary database ran short on temporary space, and a new tablespace had to be created immediately:

Add temporary tablespace to the primary database:

CREATE  BIGFILE  TEMPORARY  TABLESPACE TMP 
TEMPFILE '/u02/oradata/proddb01/tmp.dbf' SIZE 1024M AUTOEXTEND ON NEXT 128M MAXSIZE 1T 
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1M;

Check the situation on the standby database:

SQL> SELECT TABLESPACE_NAME , bigfile,status FROM DBA_TABLESPACES WHERE CONTENTS = 'TEMPORARY';

TABLESPACE_NAME                BIG STATUS
------------------------------ --- ---------
TEMP                           NO  ONLINE
TMP                            YES ONLINE

The temporary tablespace name is registred in the standby data dictionary, but there is no tempfile attached.

In order to add a tempfile to the standby database, it must be opened in READ ONLY mode.
Since my standby database is an active standby, it is already opened in READ ONLY mode:

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ ONLY WITH APPLY

Add the tempfile:

alter tablespace tmp add tempfile '/u02/oradata/proddb01/tmp.dbf' size 1024M autoextend on next 128M maxsize 1T;

Tablespace altered.

Recheck the situation in the standby database:

SQL>  select ts#,name,status,enabled from v$tempfile;

       TS# NAME                                               STATUS  ENABLED
---------- -------------------------------------------------- ------- ----------
         3 /u02/oradata/proddb01/temp01.dbf                     ONLINE  READ WRITE
        31 /u02/oradata/proddb01/tmp.dbf                        ONLINE  READ WRITE

Wednesday, June 12, 2019

How to use dbms_stats.report_col_usage

set long 10000
select dbms_stats.report_col_usage('SALES', 'ORDERS') from dual;
exit

Output could look something like this:

DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate

DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
--------------------------------------------------------------------------------
GROUP_BY   : Used in GROUP BY expression
...............................................................................

###############################################################################

COLUMN USAGE REPORT FOR SALES.ORDERS
.......................................................

1. ORDER_DATE                          : EQ RANGE
2. ORDER_ID                            : EQ
3. ITEM_NAME                           : EQ
###############################################################################

This makes it easy to see which columns could benefit from an index.

Wednesday, June 5, 2019

How to solve "FAILED: file csrrsreg.sql on worker " when applying RPC5 to an EBS 12.1.3 installation

During a patching of EBS 12.1.3 to RPC5, I encountered an unexpected problem.

A while into the patching, the patching procedure failed with an error message like this:

Deferred: file csrrsreg.sql on worker  1 for product csr username CSR.               (Deferment number 1 for this job)
 Assigned: file csrrsreg.sql on worker  1 for product csr username CSR.
   FAILED: file csrrsreg.sql on worker  1 for product csr username CSR.
 Deferred: file csrrsreg.sql on worker  1 for product csr username CSR.               (Deferment number 2 for this job)
 Assigned: file csrrsreg.sql on worker  1 for product csr username CSR.
   FAILED: file csrrsreg.sql on worker  1 for product csr username CSR.

ATTENTION: All workers either have failed or are waiting:

           FAILED: file csrrsreg.sql on worker  1.

ATTENTION: Please fix the above failed worker(s) so the manager can continue.

Open another ssh session to the server.

Check the logfile for the worker:

cd $APPL_TOP/admin/$ORACLE_SID/log
ls -la adwork*.log

-rw-r--r-- 1 ebsadm dba 123777231 Jun  5 12:58 adwork001.log
-rw-r--r-- 1 ebsadm dba 115207185 Jun  5 12:58 adwork002.log
-rw-r--r-- 1 ebsadm dba 119484153 Jun  5 12:58 adwork003.log
-rw-r--r-- 1 ebsadm dba 119051681 Jun  5 12:57 adwork004.log

Check the logfile for worker 1:

vi adwork001.log

The content could look something like this:

Creating the XML Schema Directory
CSR Product Top is 

PL/SQL procedure successfully completed.

Registing Rules XML Schema Document
ERROR:
ORA-06502: PL/SQL: numeric or value error: host bind array too small
ORA-06512: at line 1

Your workers are now waiting.
You can see their status by using the utility "adctrl".
Open another ssh session.
Execute adctrl, and answer the questions along the way.
At the end, you'll see a menu like this:

AD Controller Menu
     ---------------------------------------------------

     1.    Show worker status

     2.    Tell worker to restart a failed job

     3.    Tell worker to quit

     4.    Tell manager that a worker failed its job

     5.    Tell manager that a worker acknowledges quit

     6.    Restart a worker on the current machine

     7.    Exit

Enter your choice [1] : 1

        Control
Worker  Code      Context            Filename                    Status
------  --------  -----------------  --------------------------  --------------
     1  Run       AutoPatch R120     csrrsreg.sql                FAILED
     2  Run       AutoPatch R120                                 Wait
     3  Run       AutoPatch R120                                 Wait
     4  Run       AutoPatch R120                                 Wait

The solution to this particular error can be found in the note 1263895.1 "Advanced Scheduler Error:' ORA-06502: PL/SQL: numeric or value error: host bind array too small' In CSRRSREG.sql When Upgrading To 12.1.3".
You need to pull up the file csrrsreg.sql in a text editor and change

dbms_output.put_line('Registering Scheduler Rules XML Schema failed with an error ' || SQLERRM);

dbms_output.put_line(SUBSTR(('Registering Scheduler Rules XML Schema failed with an error ' || SQLERRM), 1, 250));

The file is placed in the patch stage area 22644544/csr/patch/115/sql, but it is copied to $APPL_TOP/csr/12.0.0/patch/115/sql during installation.
So I changed the file in both places, in case of a rerun.

Then, according to note 1263895.1, run the $APPL_TOP/csr/12.0.0/patch/115/sql/csrrsreg.sql manually.
Open the file first, and read the instructions. You need the passowrds for APPS and CSR, as well as the value for $CSR_TOP.
Run it and supply the parameters as they are requested by the script:
$1 = 'APPS'
$2 = password for APPS
$3 = 'CSR'
$4 = password for CSR
$5 = value for $CSR_TOP

sqlplus /nolog @/u01/app/apps/apps_st/appl/csr/12.0.0/patch/115/sql/csrrsreg.sql

Go back to your session where you are running adctrl.
Restart the worker process:

AD Controller Menu
     ---------------------------------------------------

     1.    Show worker status

     2.    Tell worker to restart a failed job

     3.    Tell worker to quit

     4.    Tell manager that a worker failed its job

     5.    Tell manager that a worker acknowledges quit

     6.    Restart a worker on the current machine

     7.    Exit

Enter your choice [1] : 2

Enter the worker number(s)/range(s) or 'all' for all workers,
or press [Return] to go back to the menu : 1

Status changed to 'Fixed, restart' for worker 1.

You should now check the worker status again:

Review the messages above, then press [Return] to continue.

                    AD Controller Menu
     ---------------------------------------------------

     1.    Show worker status

     2.    Tell worker to restart a failed job

     3.    Tell worker to quit

     4.    Tell manager that a worker failed its job

     5.    Tell manager that a worker acknowledges quit

     6.    Restart a worker on the current machine

     7.    Exit

Enter your choice [1] : 1

        Control
Worker  Code      Context            Filename                    Status
------  --------  -----------------  --------------------------  --------------
     1  Run       AutoPatch R120     pa_inv_exception_rpt_tbl_t  Running
     2  Run       AutoPatch R120     pa_perid_profile_tbl_typ.x  Assigned
     3  Run       AutoPatch R120                                 Wait
     4  Run       AutoPatch R120                                 Wait

Review the messages above, then press [Return] to continue.

Enter your choice [1] : 1


        Control
Worker  Code      Context            Filename                    Status
------  --------  -----------------  --------------------------  --------------
     1  Run       AutoPatch R120     BEN_REP_ACTIONS_TAB.xdf     Running
     2  Run       AutoPatch R120     BEN_REP_ARCHIVE_TAB.xdf     Running
     3  Run       AutoPatch R120     BEN_DPNT_SICVRD_GT_TAB.xdf  Running
     4  Run       AutoPatch R120     BEN_ACA_CBR_TAB.xdf         Running

The workers are proceeding as before.
If you check your initial ssh session where you run the patching, things should move along.