Friday, August 29, 2014

Login storm against database caused exhausted library cache

One of our databases experienced massive contention in the shared pool, in form of wait events alerted as "library cache locks".

The database was very small indeed, so my natural instinct was to throw some more memory at the virtual host, and rearrange the memory parameters.

This turned out to be a misconception; the resources were sufficient for the instance to work properly.

The problem was caused by an incorrect password configuration on the application server.

What we could observe was:

  • A totally exhausted shared pool, caused by "library cache lock"
  • The SQL that seemed to be repeatedly executed was
    SELECT /*+ connect_by_filtering */
              privilege#, LEVEL
          FROM sysauth$
    CONNECT BY grantee# = PRIOR privilege# AND privilege# > 0
    START WITH grantee# = :1 AND privilege# > 0;
    
    
    SELECT privilege#
      FROM sysauth$
    WHERE (grantee# = :1 OR grantee# = 1) AND privilege# > 0;
    
  • The V$EVENT_NAME view showed that the wait event was accompanied by the additional information found in the columns parameter1 through parameter3, which turned out to be helpful further on:
    select  name, wait_class,parameter1,parameter2,parameter3
    from v$event_name
    where wait_class = 'Concurrency'
    and name = 'library cache lock';
    

    NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3
    library cache lock Concurrency handle address lock address 100*mode+namespace

    Further research showed that the problem was due to a built-in delay between failed login attempts in Oracle 11g:

    "The 'library cache lock' wait is seen due to the fact that the account status gets updated due to incorrect login.
    To prevent password guessing attack, there's a sleep() in the code when incorrect login attempts exceed count of 3.
    And because of this sleep() you see a wait on library cache, as the process is yet to release the lock."


  • In release 11.1.0.7, patch 7715339 was released to remove this delay.
  • In release 11.2.X, the DBA must set an event to remove the delay, as follows:

    alter system set events '28401 trace name context forever, level 1'; 
    

    According to Oracle, the purpose of the built-sleep is to make it harder to succeed in a "password guessing attack", particularly in cases where FAILED_LOGIN_ATTEMPTS is set to UNLIMITED. Oracle Development is pointing out that disabling the sleep-function is not recommended. A better solution is to set the FAILED_LOGIN_ATTEMPTS to a reasonable value.
    When the number of failed login attempts for a session hits the limit, the account will be locked. Subsequent logon attempts with incorrect password will then be rejected immediately without any contention in the library cache.

    See Bug 15882590 : 'LIBRARY CACHE LOCK' DURING WRONG PASSWORD LOGON ATTEMPTS on My Oracle Support (MOS) for further information.


  • Thursday, August 28, 2014

    How to use the case statement in a shell script, with case-insensitive input

    Use the pipe (|) character for each possible case:

    echo "TABLESPACE REPORT FOR $ORACLE_SID"
    
    stty echo
       echo "All tablespaces or specific tablespace? [A|S]"
       read answer
    stty echo
    
    case $answer in
    A|a) echo "All tablespaces selected...please wait...";
        getFiles A;;
    S|s) echo "Which tablespace?";
        read wt;
        getFiles $wt;;
    *) echo "Only A or S are supported parameters.";
        exit 1;;
    esac
    

    Tuesday, August 26, 2014

    How to find the SID, serial# and their operating system process ID from the database

    A simple statement to find the SID, serial# and their operating system process ID from the database:

    SET LINES 300 PAGES 300
    SET TRIMSPOOL ON
    SPOOL SQL
    COL "Os pid" FORMAT A10
    COL MACHINE  FORMAT A30
    COL "SQL Text" FORMAT A100 WRA
    COL PROGRAM  FORMAT A20
    COL USERNAME FORMAT A12
    
    SELECT
      S.SID,
      S.SERIAL#,
      S.USERNAME,
      S.STATUS,
      P.SPID "Os pid",
      S.MACHINE,
      CONCAT(SUBSTR(A.SQL_TEXT,1,60), '...(output truncated)')  "SQL text"
    FROM V$SESSION S,
              V$SQLAREA A,
              V$PROCESS P
    WHERE A.ADDRESS = S.SQL_ADDRESS
    AND S.PADDR=P.ADDR
    ORDER BY 1
    /
    
    Example output:
    SID          SERIAL# USERNAME     Os pid     MACHINE          SQL text
    ---------- ---------- ------------ ---------- -------------  ------------------------------------------------------------------------------------
             4          3              6357052    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
            18       7035 SYS          21496004   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
           107      62143 SYS          23527460   myserver       SELECT   S.SID,   S.SERIAL#,   S.USERNAME,   P.SPID "Os pid"...(output truncated)
           303      27191 SYS          16777372   myserver       analyze table scott.man_to_stage validate structure cascade o...(output truncated)
           393          1              8716332    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
           402       3125 BATCHUSR     18612294   myserver       call mypackage.start_dorg (  )...(output truncated)
           490          1              8257548    myserver       insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtim...(output truncated)
           499      25461 SYS          13762730   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
           503       2635 SYS          11272234   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
           593      12091 BATCHUSR     19726590   myserver       call myprocedure.read_queue (  )...(output truncated)
           600       9193 SYS          21102724   COMP\PC1       select * from dba_locks...(output truncated)
           694      44601 BATCHUSR     20840656   myserver       call mypackage.check_job_consistency(  )...(output truncated)
           696        819 SYS          15269968   myserver       call DBMS_AQADM_SYS.REGISTER_DRIVER (  )...(output truncated)
           700       1135 BATCHUSR     21364890   myserver       DECLARE job BINARY_INTEGER := :job;  next_date TIMESTAMP WIT...(output truncated)
    


    What is the meaning of the "10G 11G" value in DBA_USERS.PASSWORD_VERSIONS?

    When the value of DBA_USERS.PASSWORD_VERSIONS is shown as "10G 11G", it means that both old and new-style hash values are available for the user.

    Note that instead of storing the hashed password values directly in the DBA_USERS table, they are from 11gR1 and onwards both stored in the table USER$, column PASSWORD for the 10G style hash value, and column SPARE4 for the 11G SHA-1 style hash value.

    A NULL value indicates that the password has not been changed since the migration and the user still has the old case insensitive password.

    In my query below, USER1 through USER4 have been migrated and will now take advantage of 11g password case-sensitivity, if enabled.

    USER5 is still using 10G style case insensitive passwords.

    USER6 is created after migration, and will also take advantage of the 11g password case-sensitivity, if available.

    SELECT U.NAME "Name",
                NVL(REGEXP_REPLACE(U.SPARE4,'^.+', 'Password changed since migration',1,0), 'Password unchanged since migration') "Action undertaken",
                DU.PASSWORD_VERSIONS "Password Version"
    FROM USER$ U LEFT OUTER JOIN DBA_USERS DU
    ON U.NAME = DU.USERNAME
    WHERE DU.USERNAME NOT IN (SELECT ROLE FROM DBA_ROLES)
    ORDER BY NAME ASC;
    


    Name Action undertaken Password Version
    USER1 Password changed since migration 10G 11G
    USER2 Password changed since migration 10G 11G
    USER3 Password changed since migration 10G 11G
    USER4 Password changed since migration 10G 11G
    USER5 Password unchanged since migration 10g
    USER6 Password changed since migration 11G

    Users that are imported from an earlier release into an 11g database, will remain case-insensitive until the password is changed.


    To generate "alter user identified by values" statements, use the following:

    SELECT 'alter user ' || NAME || ' identified by values ' || '''' || SPARE4 ||';' || PASSWORD ||''';' 
    FROM USER$ 
    WHERE  NAME IN ('USER1','USER2');
    

    Sources: Oracle Documentation

    Monday, August 25, 2014

    How to use trace event 10046


    ALTER SESSION SET max_dump_file_size = unlimited;
    ALTER SESSION SET tracefile_identifier = 'normal_run_with_trace_10046';
    ALTER SESSION SET statistics_level = ALL;
    ALTER SESSION SET events '10046 trace name context forever, level 12';
    
    Your SQL statement(s) here
    
    ALTER SYSTEM SET EVENTS '10046 trace name context off';
    EXIT
    

    Alternatively, use the oradebug utility:

    select s.sid,s.serial#,p.spid
    from v$session s, v$process p 
    where sid = 29
    and s.paddr = p.addr
    

    Output:
    SID SERIAL# SPID
    29
    9
    24510662
    Check that the process is indeed there:
    prodserver1>ps -ef | grep 24510662
      ora11g 24510662        1   0 17:24:42      -  0:00 oraclemagr (LOCAL=NO)
    prodserver1>proctree 24510662
      24510662    oraclemagr (LOCAL=NO)
    

    Enable tracing on the operating system process, inside sqlplus:
    SQL> oradebug setospid 24051998
    Oracle pid: 420, Unix process pid: 24051998, image: oracle@ystu032ma
    SQL> oradebug unlimit
    SQL> oradebug event 10046 trace name context forever, level 12
    

    When done, disable tracing:
    SYS@magr SQL> oradebug setospid 24051998
    Oracle pid: 420, Unix process pid: 24051998, image: oracle@ystu032ma
    
    SQL> oradebug unlimit
    SQL> oradebug event 10046 trace name context forever, level 12
    

    You can now parse the trace file produced through tkprof:
    tkprof mytracefile.trc mytracefile.out sys=yes waits=yes sort=exemis
    

    For an extensive list of commands to be used with tkprof, simply type tkprof at the prompt.

    An excellent note on 10046 trace event is called "How To Collect 10046 Trace (SQL_TRACE) Diagnostics for Performance Issues (Doc ID 376442.1)".

    Another great note on tracing in general is "Tracing Enhancements Using DBMS_MONITOR (In 10g, 11g and Above) (Doc ID 293661.1)"

    - both available on Oracle Supports site.

    Friday, August 1, 2014

    Why are there multiple rows in V$SESSION_LONGOPS when running an RMAN backup?

    Answer:
    When performing an RMAN backup, V$SESSION_LONGOPS will be populated with two types of rows:

    Detailed rows: Captures the progress of individual job steps.

    Aggregated rows: Captures and summarizes the progress for the individual step of the backup job, as they complete. They are updated only after each detailed step has finished.

    Example:
    Below I am querying the V$SESSION_LONGOPS after an incremental backup level 0 has started, using 6 channels. Notice that there are both individual rows and an aggregated row present:
    SELECT SID,SERIAL#,OPNAME,TARGET_DESC,TOTALWORK,SOFAR,UNITS,START_TIME,TIME_REMAINING,ELAPSED_SECONDS,MESSAGE
    FROM V$SESSION_LONGOPS 
    WHERE  TOTALWORK <> 0
    AND SOFAR <> TOTALWORK
    ORDER BY SERIAL# ASC;
    

    SID SERIAL# OPNAME TARGET_DESC TOTALWORK SOFAR UNITS START_TIME TIME_REMAINING ELAPSED_SECONDS MESSAGE
    203
    7409
    RMAN: incremental datafile backup Set Count
    4096000
    1004926
    Blocks 01.08.2014 12:28:28
    557
    181
    RMAN: incremental datafile backup: Set Count 137085: 1004926 out of 4096000 Blocks done
    398
    15079
    RMAN: incremental datafile backup Set Count
    3018368
    1122302
    Blocks 01.08.2014 12:28:28
    306
    181
    RMAN: incremental datafile backup: Set Count 137088: 1122302 out of 3018368 Blocks done
    439
    17451
    RMAN: incremental datafile backup Set Count
    2560000
    1127038
    Blocks 01.08.2014 12:28:29
    229
    180
    RMAN: incremental datafile backup: Set Count 137089: 1127038 out of 2560000 Blocks done
    357
    31627
    RMAN: incremental datafile backup Set Count
    3044608
    1115518
    Blocks 01.08.2014 12:28:28
    313
    181
    RMAN: incremental datafile backup: Set Count 137087: 1115518 out of 3044608 Blocks done
    476
    55467
    RMAN: incremental datafile backup Set Count
    2557440
    940542
    Blocks 01.08.2014 12:28:29
    309
    180
    RMAN: incremental datafile backup: Set Count 137090: 940542 out of 2557440 Blocks done
    512
    55527
    RMAN: aggregate input backup
    76514816
    4360436
    Blocks 01.08.2014 12:28:28
    2085
    126
    RMAN: aggregate input: backup 33: 4360436 out of 76514816 Blocks done
    281
    56085
    RMAN: incremental datafile backup Set Count
    3251200
    969214
    Blocks 01.08.2014 12:28:28
    426
    181
    RMAN: incremental datafile backup: Set Count 137086: 969214 out of 3251200 Blocks done

    Let's join V$SESSION_LONGOPS with V$SESSION and (if desirable) V$PROCESS, to view only the detailed rows, and estimate the progress in percent. This query is very useful to quickly get an overview of your backup, whether or not it is progressing according to expected speed etc:

    SELECT S.CLIENT_INFO "Client Info", SL.OPNAME "Operation" ,SL.MESSAGE, SL.SID, SL.SERIAL#, P.SPID "OS Process ID", SL.SOFAR "So Far", SL.TOTALWORK "Totalwork", ROUND(SL.SOFAR/SL.TOTALWORK*100,2) "% complete"
    FROM V$SESSION_LONGOPS SL INNER JOIN V$SESSION S ON SL.SID = S.SID 
                              INNER JOIN V$PROCESS P ON P.ADDR = S.PADDR
    AND OPNAME LIKE 'RMAN%'
    AND OPNAME NOT LIKE '%aggregate%'
    AND TOTALWORK != 0
    AND SOFAR <> TOTALWORK
    /
    
    Client Info Operation MESSAGE SID SERIAL# OS Process ID So Far Totalwork % complete
    rman channel=tsm_channel_0 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137096: 1173118 out of 2252800 Blocks done
    203
    7409
    42926864
    1173118
    2252800
    52,07
    rman channel=tsm_channel_1 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137095: 2029054 out of 2393600 Blocks done
    281
    56085
    52101750
    2029054
    2393600
    84,77
    rman channel=tsm_channel_2 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137097: 293246 out of 2096640 Blocks done
    357
    31627
    56361104
    293246
    2096640
    13,99
    rman channel=tsm_channel_3 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137099: 174206 out of 2048000 Blocks done
    398
    15079
    52756692
    174206
    2048000
    8,51
    rman channel=tsm_channel_4 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137098: 460286 out of 2048000 Blocks done
    439
    17451
    29032454
    460286
    2048000
    22,47
    rman channel=tsm_channel_5 RMAN: incremental datafile backup RMAN: incremental datafile backup: Set Count 137100: 196094 out of 2048000 Blocks done
    476
    55467
    9700724
    196094
    2048000
    9,57

    Notice how the V$RMAN_STATUS only hold information on aggregated level (2 rows, one for each operation although the first spawns the other):
    SELECT SID,OPERATION,STATUS,MBYTES_PROCESSED, START_TIME, END_TIME, OBJECT_TYPE, OUTPUT_DEVICE_TYPE 
    FROM V$RMAN_STATUS WHERE STATUS = 'RUNNING';
    

    SID OPERATION STATUS MBYTES_PROCESSED START_TIME OBJECT_TYPE OUTPUT_DEVICE_TYPE
    512
    BACKUP RUNNING
    226641,953125
    01.08.2014 12:28:28 DB INCR SBT_TAPE
    512
    RMAN RUNNING
    0
    01.08.2014 12:28:28