Minimalistic Oracle: August 2014

Friday, August 29, 2014

Login storm against database caused exhausted library cache

One of our databases experienced massive contention in the shared pool, in form of wait events alerted as "library cache locks".

The database was very small indeed, so my natural instinct was to throw some more memory at the virtual host, and rearrange the memory parameters.

This turned out to be a misconception; the resources were sufficient for the instance to work properly.

The problem was caused by an incorrect password configuration on the application server.

What we could observe was:

A totally exhausted shared pool, caused by "library cache lock"

The SQL that seemed to be repeatedly executed was

SELECT /*+ connect_by_filtering */
          privilege#, LEVEL
      FROM sysauth$
CONNECT BY grantee# = PRIOR privilege# AND privilege# > 0
START WITH grantee# = :1 AND privilege# > 0;


SELECT privilege#
  FROM sysauth$
WHERE (grantee# = :1 OR grantee# = 1) AND privilege# > 0;

The V$EVENT_NAME view showed that the wait event was accompanied by the additional information found in the columns parameter1 through parameter3, which turned out to be helpful further on:

select  name, wait_class,parameter1,parameter2,parameter3
from v$event_name
where wait_class = 'Concurrency'
and name = 'library cache lock';

NAME	WAIT_CLASS	PARAMETER1	PARAMETER2	PARAMETER3
library cache lock	Concurrency	handle address	lock address	100*mode+namespace

Further research showed that the problem was due to a built-in delay between failed login attempts in Oracle 11g:

"The 'library cache lock' wait is seen due to the fact that the account status gets updated due to incorrect login.
To prevent password guessing attack, there's a sleep() in the code when incorrect login attempts exceed count of 3.
And because of this sleep() you see a wait on library cache, as the process is yet to release the lock."

In release 11.1.0.7, patch 7715339 was released to remove this delay.

In release 11.2.X, the DBA must set an event to remove the delay, as follows:

alter system set events '28401 trace name context forever, level 1';

According to Oracle, the purpose of the built-sleep is to make it harder to succeed in a "password guessing attack", particularly in cases where FAILED_LOGIN_ATTEMPTS is set to UNLIMITED. Oracle Development is pointing out that disabling the sleep-function is not recommended. A better solution is to set the FAILED_LOGIN_ATTEMPTS to a reasonable value.
When the number of failed login attempts for a session hits the limit, the account will be locked. Subsequent logon attempts with incorrect password will then be rejected immediately without any contention in the library cache.

See Bug 15882590 : 'LIBRARY CACHE LOCK' DURING WRONG PASSWORD LOGON ATTEMPTS on My Oracle Support (MOS) for further information.

Thursday, August 28, 2014

How to use the case statement in a shell script, with case-insensitive input

Use the pipe (|) character for each possible case:

echo "TABLESPACE REPORT FOR $ORACLE_SID"

stty echo
   echo "All tablespaces or specific tablespace? [A|S]"
   read answer
stty echo

case $answer in
A|a) echo "All tablespaces selected...please wait...";
    getFiles A;;
S|s) echo "Which tablespace?";
    read wt;
    getFiles $wt;;
*) echo "Only A or S are supported parameters.";
    exit 1;;
esac

Tuesday, August 26, 2014

How to find the SID, serial# and their operating system process ID from the database

A simple statement to find the SID, serial# and their operating system process ID from the database:

SET LINES 300 PAGES 300
SET TRIMSPOOL ON
SPOOL SQL
COL "Os pid" FORMAT A10
COL MACHINE  FORMAT A30
COL "SQL Text" FORMAT A100 WRA
COL PROGRAM  FORMAT A20
COL USERNAME FORMAT A12

SELECT
  S.SID,
  S.SERIAL#,
  S.USERNAME,
  S.STATUS,
  P.SPID "Os pid",
  S.MACHINE,
  CONCAT(SUBSTR(A.SQL_TEXT,1,60), '...(output truncated)')  "SQL text"
FROM V$SESSION S,
          V$SQLAREA A,
          V$PROCESS P
WHERE A.ADDRESS = S.SQL_ADDRESS
AND S.PADDR=P.ADDR
ORDER BY 1
/

Example output:

SID          SERIAL# USERNAME     Os pid     MACHINE          SQL text
---------- ---------- ------------ ---------- -------------  ------------------------------------------------------------------------------------
         4          3              6357052    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
        18       7035 SYS          21496004   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
       107      62143 SYS          23527460   myserver       SELECT   S.SID,   S.SERIAL#,   S.USERNAME,   P.SPID "Os pid"...(output truncated)
       303      27191 SYS          16777372   myserver       analyze table scott.man_to_stage validate structure cascade o...(output truncated)
       393          1              8716332    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
       402       3125 BATCHUSR     18612294   myserver       call mypackage.start_dorg (  )...(output truncated)
       490          1              8257548    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
       499      25461 SYS          13762730   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
       503       2635 SYS          11272234   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
       593      12091 BATCHUSR     19726590   myserver       call myprocedure.read_queue (  )...(output truncated)
       600       9193 SYS          21102724   COMP\PC1       select * from dba_locks...(output truncated)
       694      44601 BATCHUSR     20840656   myserver       call mypackage.check_job_consistency(  )...(output truncated)
       696        819 SYS          15269968   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
       700       1135 BATCHUSR     21364890   myserver       DECLARE job BINARY_INTEGER := :job;  next_date TIMESTAMP WIT...(output truncated)

What is the meaning of the "10G 11G" value in DBA_USERS.PASSWORD_VERSIONS?

When the value of DBA_USERS.PASSWORD_VERSIONS is shown as "10G 11G", it means that both old and new-style hash values are available for the user.

Note that instead of storing the hashed password values directly in the DBA_USERS table, they are from 11gR1 and onwards both stored in the table USER$, column PASSWORD for the 10G style hash value, and column SPARE4 for the 11G SHA-1 style hash value.

A NULL value indicates that the password has not been changed since the migration and the user still has the old case insensitive password.

In my query below, USER1 through USER4 have been migrated and will now take advantage of 11g password case-sensitivity, if enabled.

USER5 is still using 10G style case insensitive passwords.

USER6 is created after migration, and will also take advantage of the 11g password case-sensitivity, if available.

SELECT U.NAME "Name",
            NVL(REGEXP_REPLACE(U.SPARE4,'^.+', 'Password changed since migration',1,0), 'Password unchanged since migration') "Action undertaken",
            DU.PASSWORD_VERSIONS "Password Version"
FROM USER$ U LEFT OUTER JOIN DBA_USERS DU
ON U.NAME = DU.USERNAME
WHERE DU.USERNAME NOT IN (SELECT ROLE FROM DBA_ROLES)
ORDER BY NAME ASC;

Name	Action undertaken	Password Version
USER1	Password changed since migration	10G 11G
USER2	Password changed since migration	10G 11G
USER3	Password changed since migration	10G 11G
USER4	Password changed since migration	10G 11G
USER5	Password unchanged since migration	10g
USER6	Password changed since migration	11G

Users that are imported from an earlier release into an 11g database, will remain case-insensitive until the password is changed.

To generate "alter user identified by values" statements, use the following:

SELECT 'alter user ' || NAME || ' identified by values ' || '''' || SPARE4 ||';' || PASSWORD ||''';' 
FROM USER$ 
WHERE  NAME IN ('USER1','USER2');

Sources: Oracle Documentation

Monday, August 25, 2014

How to use trace event 10046

ALTER SESSION SET max_dump_file_size = unlimited;
ALTER SESSION SET tracefile_identifier = 'normal_run_with_trace_10046';
ALTER SESSION SET statistics_level = ALL;
ALTER SESSION SET events '10046 trace name context forever, level 12';

Your SQL statement(s) here

ALTER SYSTEM SET EVENTS '10046 trace name context off';
EXIT

Alternatively, use the oradebug utility:

select s.sid,s.serial#,p.spid
from v$session s, v$process p 
where sid = 29
and s.paddr = p.addr

Output:

SID	SERIAL#	SPID
29	9	24510662

Check that the process is indeed there:

prodserver1>ps -ef | grep 24510662
  ora11g 24510662        1   0 17:24:42      -  0:00 oraclemagr (LOCAL=NO)
prodserver1>proctree 24510662
  24510662    oraclemagr (LOCAL=NO)

Enable tracing on the operating system process, inside sqlplus:

SQL> oradebug setospid 24051998
Oracle pid: 420, Unix process pid: 24051998, image: oracle@ystu032ma
SQL> oradebug unlimit
SQL> oradebug event 10046 trace name context forever, level 12

When done, disable tracing:

SYS@magr SQL> oradebug setospid 24051998
Oracle pid: 420, Unix process pid: 24051998, image: oracle@ystu032ma

SQL> oradebug unlimit
SQL> oradebug event 10046 trace name context forever, level 12

You can now parse the trace file produced through tkprof:

tkprof mytracefile.trc mytracefile.out sys=yes waits=yes sort=exemis

For an extensive list of commands to be used with tkprof, simply type tkprof at the prompt.

An excellent note on 10046 trace event is called "How To Collect 10046 Trace (SQL_TRACE) Diagnostics for Performance Issues (Doc ID 376442.1)".

Another great note on tracing in general is "Tracing Enhancements Using DBMS_MONITOR (In 10g, 11g and Above) (Doc ID 293661.1)"

- both available on Oracle Supports site.

Friday, August 1, 2014

Why are there multiple rows in V$SESSION_LONGOPS when running an RMAN backup?

Answer:
When performing an RMAN backup, V$SESSION_LONGOPS will be populated with two types of rows:

Detailed rows: Captures the progress of individual job steps.

Aggregated rows: Captures and summarizes the progress for the individual step of the backup job, as they complete. They are updated only after each detailed step has finished.

Example:
Below I am querying the V$SESSION_LONGOPS after an incremental backup level 0 has started, using 6 channels. Notice that there are both individual rows and an aggregated row present:

SELECT SID,SERIAL#,OPNAME,TARGET_DESC,TOTALWORK,SOFAR,UNITS,START_TIME,TIME_REMAINING,ELAPSED_SECONDS,MESSAGE
FROM V$SESSION_LONGOPS 
WHERE  TOTALWORK <> 0
AND SOFAR <> TOTALWORK
ORDER BY SERIAL# ASC;

SID	SERIAL#	OPNAME	TARGET_DESC	TOTALWORK	SOFAR	UNITS	START_TIME	TIME_REMAINING	ELAPSED_SECONDS	MESSAGE
203	7409	RMAN: incremental datafile backup	Set Count	4096000	1004926	Blocks	01.08.2014 12:28:28	557	181	RMAN: incremental datafile backup: Set Count 137085: 1004926 out of 4096000 Blocks done
398	15079	RMAN: incremental datafile backup	Set Count	3018368	1122302	Blocks	01.08.2014 12:28:28	306	181	RMAN: incremental datafile backup: Set Count 137088: 1122302 out of 3018368 Blocks done
439	17451	RMAN: incremental datafile backup	Set Count	2560000	1127038	Blocks	01.08.2014 12:28:29	229	180	RMAN: incremental datafile backup: Set Count 137089: 1127038 out of 2560000 Blocks done
357	31627	RMAN: incremental datafile backup	Set Count	3044608	1115518	Blocks	01.08.2014 12:28:28	313	181	RMAN: incremental datafile backup: Set Count 137087: 1115518 out of 3044608 Blocks done
476	55467	RMAN: incremental datafile backup	Set Count	2557440	940542	Blocks	01.08.2014 12:28:29	309	180	RMAN: incremental datafile backup: Set Count 137090: 940542 out of 2557440 Blocks done
512	55527	RMAN: aggregate input	backup	76514816	4360436	Blocks	01.08.2014 12:28:28	2085	126	RMAN: aggregate input: backup 33: 4360436 out of 76514816 Blocks done
281	56085	RMAN: incremental datafile backup	Set Count	3251200	969214	Blocks	01.08.2014 12:28:28	426	181	RMAN: incremental datafile backup: Set Count 137086: 969214 out of 3251200 Blocks done

Let's join V$SESSION_LONGOPS with V$SESSION and (if desirable) V$PROCESS, to view only the detailed rows, and estimate the progress in percent. This query is very useful to quickly get an overview of your backup, whether or not it is progressing according to expected speed etc:

SELECT S.CLIENT_INFO "Client Info", SL.OPNAME "Operation" ,SL.MESSAGE, SL.SID, SL.SERIAL#, P.SPID "OS Process ID", SL.SOFAR "So Far", SL.TOTALWORK "Totalwork", ROUND(SL.SOFAR/SL.TOTALWORK*100,2) "% complete"
FROM V$SESSION_LONGOPS SL INNER JOIN V$SESSION S ON SL.SID = S.SID 
                          INNER JOIN V$PROCESS P ON P.ADDR = S.PADDR
AND OPNAME LIKE 'RMAN%'
AND OPNAME NOT LIKE '%aggregate%'
AND TOTALWORK != 0
AND SOFAR <> TOTALWORK
/

Client Info	Operation	MESSAGE	SID	SERIAL#	OS Process ID	So Far	Totalwork	% complete
rman channel=tsm_channel_0	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137096: 1173118 out of 2252800 Blocks done	203	7409	42926864	1173118	2252800	52,07
rman channel=tsm_channel_1	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137095: 2029054 out of 2393600 Blocks done	281	56085	52101750	2029054	2393600	84,77
rman channel=tsm_channel_2	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137097: 293246 out of 2096640 Blocks done	357	31627	56361104	293246	2096640	13,99
rman channel=tsm_channel_3	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137099: 174206 out of 2048000 Blocks done	398	15079	52756692	174206	2048000	8,51
rman channel=tsm_channel_4	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137098: 460286 out of 2048000 Blocks done	439	17451	29032454	460286	2048000	22,47
rman channel=tsm_channel_5	RMAN: incremental datafile backup	RMAN: incremental datafile backup: Set Count 137100: 196094 out of 2048000 Blocks done	476	55467	9700724	196094	2048000	9,57

Notice how the V$RMAN_STATUS only hold information on aggregated level (2 rows, one for each operation although the first spawns the other):

SELECT SID,OPERATION,STATUS,MBYTES_PROCESSED, START_TIME, END_TIME, OBJECT_TYPE, OUTPUT_DEVICE_TYPE 
FROM V$RMAN_STATUS WHERE STATUS = 'RUNNING';

SID	OPERATION	STATUS	MBYTES_PROCESSED	START_TIME	OBJECT_TYPE	OUTPUT_DEVICE_TYPE
512	BACKUP	RUNNING	226641,953125	01.08.2014 12:28:28	DB INCR	SBT_TAPE
512	RMAN	RUNNING	0	01.08.2014 12:28:28