Wednesday, June 11, 2014

How to deal with ORA-00020: maximum number of processes (%s) exceeded


I recently had a situation where access to the database was completely blocked because of the infamous error message
ORA-00020: maximum number of processes (%s) exceeded
Facts:
  • The database had processes set to 1000.
  • The Cloud Control agent was spawning hundreds of processes (obviously an error to troubleshoot as a separate action)
  • Connecting through sqlplus with os authentication (sqlplus / as sysdba) didn't work due to the same reason

    At that time, the database had to become available to the users again ASAP.

    When I have encountered these situations in the past, I have had to kill all the operating system processes and restart the instance. A brut-force method that is not particularly pretty, but sometimes necessary:

    for a in $(ps -ef |grep $ORACLE_SID | grep -v grep | awk '{ print $2}'); do 
    >kill -9 $a; 
    >done
    

    It normally does the job when you really have no other option.
    This time however, after having killed all the processes, Oracle still rejected connections to the database using sqlplus:

    sqlplus /nolog
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:07:23 2014
    
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
     @ SQL> connect /
     ERROR:
     ORA-00020: maximum number of processes (1000) exceeded
    
    I then found the page by tech.e2sn.com that showed how to use sqlplus with the "preliminary connection".

    Simply by using
    
    sqlplus -prelim "/as sysdba"
    
    I was able to connect and shutdown the database with the abort option.
    sqlplus -prelim "/ as sysdba"
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:09:15 2014
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
    
     SQL> shutdown abort
     ORACLE instance shut down.
     SQL> exit
     Disconnected from ORACLE
    
    After this point the database could once again be restarted:
    sqlplus / as sysdba
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:10:38 2014
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
     Connected to an idle instance.
    SQL> startup
     ORACLE instance started.
    
     Total System Global Area 2137886720 bytes
     Fixed Size                  2248080 bytes
     Variable Size            1258291824 bytes
     Database Buffers          855638016 bytes
     Redo Buffers               21708800 bytes
     Database mounted.
     Databasen opened.
    

    The article referred to above is worth reading, but in short, the -prelim option will not try to create private session structures in the SGA. This allows you to connect to perform debugging or shutdown operations.
  • Great feature in adrci for searching in the alert log


    A potentially very timesaving feature in Oracle's adrci is the ability to search in the alert log for specific text, as I had to do to find when a specific parameter was set:

    adrci
    
     ADRCI: Release 11.2.0.4.0 - Production on On Jun 11 12:03:34 2014
    
     Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
    
     ADR base = "/u01/oracle/admin/proddb01/diagnostic"
     adrci> show homes
     ADR Homes:
     diag/rdbms/proddb01/proddb01
     adrci> show alert -p "message_text like '%event%'"
    
     ADR Home = /u01/oracle/admin/proddb01/diagnostic/diag/rdbms/proddb01/proddb01:
     *************************************************************************
     Output the results to file: /tmp/alert_3343326_1_proddb01_1.ado
     "/tmp/alert_3343326_1_proddb01_1.ado" 44 lines, 2408 characters
     2013-08-25 14:35:46.344000 +02:00
     One of the following events caused this:
     2014-01-02 10:56:34.311000 +01:00
     OS Pid: 8455160 executed alter system set events '10852 trace name context forever, level 16384'
     2014-01-02 10:56:47.555000 +01:00
     ALTER SYSTEM SET event='10852 trace name context forever, level 16384' SCOPE=SPFILE;
     2014-01-10 00:43:02.945000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-01-31 19:32:59.471000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-02-01 09:12:59.653000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     CLOSE: Active sessions prevent database close operation
     2014-02-01 18:10:54.100000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-06-10 19:38:42.536000 +02:00
     ALTER SYSTEM SET event='10852 trace name context forever, level 16384 off' SCOPE=SPFILE;
     2014-06-10 19:43:12.770000 +02:00
       event                    = "10852 trace name context off"
    

    Without much effort I was able to find that the parameter was set 02.01.2014, and switched off 10.06.2014.

    How to disable an event parameter in the database

    During an Upgrade from 11.2.0.3 to 11.2.0.4 I had to remove an event-parameter in the database.

    The syntax for this is:
    alter system set event="10852 trace name context off" scope=spfile;
    

    Friday, June 6, 2014

    Why does the TIMESTAMP# column in the AUD$ table contain NULL values?

    According to Oracle Support Note 427296.1:

    "In database version 10gR1 and above, the TIMESTAMP# column is obsoleted in favor of the new NTIMESTAMP# column."

    So when exchanging the TIMESTAMP# with the NTIMESTAMP# column, my script works as intended, while it had previously showed NULL values:

    SELECT DBID "CURRENT DBID" FROM V$DATABASE;
    
     SET TIMING ON
     SET LINES 200
     COL "Earliest" format a30
     col "Latest" format a30
    
     PROMPT Counting the DBIDs and the number of audit entries each
     PROMPT Could take a while...
     COL TIMESTAMP# FORMAT A3
     SELECT DBID,COUNT(*),MIN(NTIMESTAMP#) "Earliest", MAX(NTIMESTAMP#) "Latest"
     FROM AUD$
     GROUP BY DBID;
    

    Output:
    Counting the DBIDs and the number of audit entries each
     Could take a while...
    
           DBID   COUNT(*) Earliest                       Latest
     ---------- ---------- ------------------------------ ------------------------------
    2367413790       1867 05.06.2014 14.01.30,193254     06.06.2014 06.17.08,485629
    

    The views built upon AUD$, for example DBA_AUDIT_TRAIL and DBA_FGA_AUDIT_TRAIL, will of course reflect the correct columns from AUD$ (NTIMESTAMP#) in their own TIMESTAMP column.

    Thursday, June 5, 2014

    Multiple DBIDs in AUD$

    I wanted to test the DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL package, with use_last_arch_timestamp to TRUE, to only purge records one month older than the minimum value found.

    I found this value by using the function ADD_MONTHS to add 1 month on top of the minimum value found in AUD$
    SET SERVEROUTPUT ON
     DECLARE
     currdate DATE;
     last_archtime DATE;
    
     BEGIN
    
     currdate := SYSTIMESTAMP;
    
     ---------------------------------------------------------
     -- Get the oldest timestamp from AUD$, then add one month.
     -- Use this timestamp as the last archive timestamp in
     -- procedure SET_LAST_ARCHIVE_TIMESTAMP
     ---------------------------------------------------------
     SELECT ADD_MONTHS(
       (
        SELECT MIN(TIMESTAMP) 
        FROM DBA_AUDIT_TRAIL
        ), 1)
     INTO last_archtime
     FROM DUAL;
     DBMS_OUTPUT.PUT_LINE('last_archtime: ' || last_archtime);
    
      DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP (
        AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
        LAST_ARCHIVE_TIME => last_archtime);
    
     END;
     /
    
    Put all that in a file called set_last_timestamp_std.sql.
    First, check the DBA_AUDIT_MGMT_LAST_ARCH_TS for the last archive timestamp:
    SQL> SELECT * FROM DBA_AUDIT_MGMT_LAST_ARCH_TS;
    
    No rows selected.
    
    
    Execute the script created above:
    sqlplus / as sysdba @set_last_timestamp_std.sql
    
     last_archtime: 07.02.2009
    
     PL/SQL-procedure executed.
    
    Check the DBA_AUDIT_MGMT_LAST_ARCH_TS again:
    AUDIT_TRAIL          RAC_INSTANCE LAST_ARCHIVE_TS
     -------------------- ------------ ----------------------------------------
     STANDARD AUDIT TRAIL            0 07.02.2009 15.01.50,000000 +00:00
    

    I was now ready to execute the manual cleanup. Before I did so, I wanted to get an idea about how many rows that should be purged:
    SELECT COUNT(*)
    FROM DBA_AUDIT_TRAIL
    WHERE TIMESTAMP < (
    SELECT ADD_MONTHS(
       (SELECT TIMESTAMP
        FROM (SELECT TIMESTAMP FROM DBA_AUDIT_TRAIL ORDER BY TIMESTAMP ASC)
        WHERE ROWNUM <=1), 1)
     FROM DUAL)
    ;
    
      COUNT(*)
    ----------
       126405
    
    
    Compare with the total number of rows:
    SQL> SELECT COUNT(*)FROM DBA_AUDIT_TRAIL;
    
      COUNT(*)
    ----------
      33 664,540
    
    Sweet. 126405 records would be purged. I then executed:
    BEGIN
    DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
    audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    use_last_arch_timestamp => TRUE);
    END;
    /
    
    The purge succeeded. But when I checked the number of rows again, it still returned 126405 rows. What I found was that what Oracle was executing the following statement internally when using the DBMS_AUDIT_MGMT package:
    DELETE FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 10000;
    
    So I tested to select the rows using the same predicate that was used during the purge:
    SQL> SELECT COUNT(*) FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');
    
       COUNT(*)
     ----------
              0
    
    checked again against the dba_audit_trail
    SQL> SELECT COUNT(*) FROM DBA_AUDIT_TRAIL WHERE TIMESTAMP  < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');
    
       COUNT(*)
     ----------
         126405
    
    So there are indeed records that are older than '2009-02-07 15:01:50'. Why is it not caught when querying the AUD$ table, only the DBA_AUDIT_TRAIL? Of course! The AUD$ table also has a reference to the DBID. And since the database was recently cloned, it has cycled through another incarnation:
    SQL> select DBID,MIN(NTIMESTAMP#)
    2  FROM AUD$
    3  GROUP BY DBID;
    
    DBID MIN(NTIMESTAMP#)
    ---------- ----------------------------
    2367413790 19.05.2014 07.07.13,675010
    848951741 07.01.2009 13.01.50,802413
    
    So the fact that minimum timestamp for DBID 2367413790 is 19.05.2014 is correct after all:
    SQL> SELECT MIN(TIMESTAMP) FROM DBA_AUDIT_TRAIL WHERE DBID=2367413790;
    
    MIN(TIMEST
    ----------
    19.05.2014
    
    In fact, the majority of the audit trail records are from a previous incarnation:
    SQL> select count(*) from aud$ where dbid = 848951741;
    
    COUNT(*)
    ----------
    33 612,411
    
    SQL> select DBID,MAX(NTIMESTAMP#)
    2  FROM AUD$
    3  GROUP BY DBID;
    
    DBID MAX(NTIMESTAMP#)
    ---------- --------------------------------
    2367413790 05.06.2014 08.42.59,749967
    848951741 15.05.2014 21.41.52,247344
    
    So the size of the AUD$ is 7481 MB:
    SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';
    
    SUM(BYTES)/1024/1024
    --------------------
    7481
    
    Now the question is: since the procedure DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL with the parameter use_last_arch_timestamp set to TRUE only attempts to purge the rows from AUD$ that has the same DBID as the current database incarnation, will a "purge all" directive, indicated by use_last_arch_timestamp set to FALSE be equally selective? Since this is a test system, I tried it out by putting the following statements into a script:
    SET LINES 200
    SET SERVEROUTPUT ON
    
    SELECT DBID,COUNT(*)
    FROM AUD$
    GROUP BY DBID;
    
    BEGIN
    DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
    audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    use_last_arch_timestamp => FALSE);
    END;
    /
    
    SELECT DBID,COUNT(*)
    FROM AUD$
    GROUP BY DBID;
    
    Execute it:
    sqlplus / as sysdba @clean_all_audit_trail.sql
    
    Result:
    DBID   COUNT(*)
    ---------- ----------
    2367413790      52560
    848951741   33612411
    
    PL/SQL-procedure executed.
    
    No rows selected.
    
    
    SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';
    
    SUM(BYTES)/1024/1024
    --------------------
    ,0625
    

    So a "purge all" directive will certainly wipe out all of your audit trail, regardless of the presence of multiple DBIDs.

    Purging "up until a the last archive timestamp" will only select the audit entries for your current database incarnation's DBID.

    Audit parameters. What do they mean - short description



    When managing your audit trail, you will need to be familiar with these settings. Here is a short description of what they mean.
    SQL> SELECT NAME, VALUE FROM V$PARAMETER WHERE NAME LIKE '%audit%';
    

    NAME VALUE
    audit_sys_operations TRUE
    audit_file_dest /u01/oracle/admin/slyt/adump
    audit_syslog_level  
    audit_trail DB_EXTENDED

  • AUDIT_TRAIL
    When set to "DB", Oracle directs audit records to the database audit trail (the SYS.AUD$ table), except for mandatory and SYS audit records, which are always written to the operating system audit trail.

    When set to "DB,EXTENDED", oracle behaves as it would when AUDIT_TRAIL=DB, but also enables you to capture the SQL statement used in the action that was audited.

    When set to "NONE", standard auditing is disabled

    When set to "OS", Oracle directs all audit records to an operating system file.

    When set to "XML", Oracle writes to the operating system audit record file in XML format

    When set to "XML, EXTENDED", oracle behaves as it would with "AS AUDIT_TRAIL=XML", but also includes SQL text and SQL bind information

  • AUDIT_SYS_OPERATIONS
    Enables or disables the auditing of top-level operations, which are SQL statements directly issued by users when connecting with SYSDBA or SYSOPER privileges.

  • AUDIT_FILE_DEST
    specifies the operating system directory into which the audit trail is written when the AUDIT_TRAIL initialization parameter is set to os, xml, or xml,extended. It is also the location to which mandatory auditing information is written and, if so specified by the AUDIT_SYS_OPERATIONS initialization parameter, audit records for user SYS.

  • AUDIT_SYSLOG_LEVEL
    Allows SYS and standard OS audit records to be written to the system audit log using the SYSLOG utility


    Note: when you set the audit_trail parameter in the spfile, DO NOT use qotation marks around the values: Incorrect:
    alter system set audit_trail='DB,EXTENDED' scope=spfile;
    
    ORA-00096: invalid value DB,EXTENDED for parameter audit_trail, must be from among extended, xml, none, os, db
    

    Correct:
    alter system set audit_trail=db,extended scope=spfile;
    
    System altered.
    
  • Tuesday, June 3, 2014

    How to use the slibclean utility on AIX to clean up unused modules in memory

    On AIX, oracle will leave links to library usage in memory.

    This can prevent you from for example applying a patch using opatch.
    To clean up, use the slibclean utility, as demonstrated below:

    First, check for unused links by using the genld and genkld utilities:
    root@myserver> genld -l | grep /u01/oracle/product/11204
    root@myserver> genkld | grep /u01/oracle/product/11204
    
    Output will be similar to the following:
    900000002fb6000    141a3 /u01/oracle/product/11204/lib/libdbcfg11.so
    900000002f9e000    17eb6 /u01/oracle/product/11204/lib/libclsra11.so
    9000000036f6000   20ef63 /u01/oracle/product/11204/lib/libocrb11.so
    900000002efd000    a0f25 /u01/oracle/product/11204/lib/libocr11.so
    900000003055000   6a0dc1 /u01/oracle/product/11204/lib/libhasgen11.so
    900000002dcf000     cb95 /u01/oracle/product/11204/lib/libocrutl11.so
    900000002dcd000     1d7d /u01/oracle/product/11204/lib/libskgxn2.so
    900000010ba7000  2ddd1de /u01/oracle/product/11204/lib/libttsh11.so
    900000002a64000    1701f /u01/oracle/product/11204/lib/libons.so
    900000002a7c000   3508c9 /u01/oracle/product/11204/lib/libnnz11.so
    90000000cb99000  400d9b5 /u01/oracle/product/11204/lib/libolapapi11.so
    900000002173000     c05e /u01/oracle/product/11204/lib/libcorejava.so
    90000000211d000    5597d /u01/oracle/product/11204/lib/libxdb.so
    900000002007000     125f /u01/oracle/product/11204/lib/libodm11.so
    900000001f37000    cf4de /u01/oracle/product/11204/lib/libskgxp11.so
    
    Then clean up:
    root@myserver> slibclean
    
    Links are now gone:
    root@myserver> genkld | grep /u01/oracle/product/11204
    root@myserver>