Thursday, June 12, 2014

How to pin SGA in memory on an AIX server

On AIX, if you want to pin the SGA in memory, set the parameter LOCK_SGA to TRUE.

A prerequisite for locking the SGA is that you set the proper capabilities for the user owning and spawning the Oracle rdbms processes.

Check the set capabilities as follows:
[root@myserver] lsuser -a capabilities oracle
 oracle
Set capabilities:
[root@myserver] chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
 [root@myserver] lsuser -a capabilities oracle
 oracle capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
 [root@myserver]


Wednesday, June 11, 2014

How to work around SEVERE: OUI-10197:Unable to create a new Oracle Home when using clone.pl

When cloning a database home onto a new server, the clone.pl script threw the following error:

perl clone.pl ORACLE_HOME=/u01/oracle/product/11204 ORACLE_HOME_NAME=11204 

SEVERE: OUI-10197:Unable to create a new Oracle Home at /u01/oracle/product/11204. Oracle Home already exists at this location. Select another location.
The server was freshly installed, new volume Groups, logical Volume Groups and file systems newly created.

I then realized that since the software was tared and untared into a new directory, there would potentially be an existing Oracle Inventory present, and I would needed to "detach" the ORACLE_HOME before cloning it. The tarball did indeed contain an Inventory:

 myserver>cd $ORACLE_HOME
 myserver>cat oraInst.loc.org
 inventory_loc=/u01/oracle/oraInventory
 inst_group=dba
Since there was no such directory present, I created it:
myserver>mkdir /u01/oracle/oraInventory/ContentsXML/
myserver>cd /u01/oracle/oraInventory/ContentsXML/
myserver>vi inventory.xml
In the file inventory.xml, I added the following minimal information:
<!-- Copyright (c) 1999, 2013, Oracle and/or its affiliates.
All rights reserved. -->
 <!-- Do not modify the contents of this file by hand. -->
 <inventory>
 <version_info>
    <saved_with>11.2.0.4.0</saved_with>
    <minimum_ver>2.1.0.6.0</minimum_ver>
 </version_info>
 <home_list>
 <home idx="1" loc="/u01/oracle/product/11204" name="11204" type="O">
 </home></home_list>
 <compositehome_list>
 </compositehome_list>
 </inventory>

I then added
 -invPtrLoc /u01/oracle/product/11204/oraInst.loc
to file $ORACLE_HOME/clone/config/cs.properties

Next, I detached the ORACLE_HOME:
myserver>cd $ORACLE_HOME/oui/bin
 myserver>./runInstaller -detachHome ORACLE_HOME=/u01/oracle/product/11204 -invPtrLoc /u01/oracle/product/11204/oraInst.loc
 Starting Oracle Universal Installer...

 Checking swap space: must be greater than 500 MB.   Actual 8192 MB    Passed
 The inventory pointer is located at /u01/oracle/product/11204/oraInst.loc
 The inventory is located at /u01/oracle/oraInventory
 'DetachHome' was successful.

Finally I was able to clone the ORACLE_HOME:

perl clone.pl ORACLE_HOME=/u01/oracle/product/11204 ORACLE_HOME_NAME=11204 ORACLE_BASE=/u01/oracle OSDBA_GROUP=dba

 ********************************************************************************

 Your platform requires the root user to perform certain pre-clone
 OS preparation.  The root user should run the shell script 'rootpre.sh' before
 you proceed with cloning.  rootpre.sh can be found at
 /u01/oracle/product/11204/clone directory.
 Answer 'y' if the root user has run 'rootpre.sh' script.

 ********************************************************************************

 Has 'rootpre.sh' been run by the root user? [y/n] (n)
 y
 ./runInstaller -clone -waitForCompletion  "ORACLE_HOME=/u01/oracle/product/11204" "ORACLE_HOME_NAME=11204" "ORACLE_BASE=/u01/oracle" "oracle_install_OSDBA=dba" -silent -noConfig -nowait -invPtrLoc /u01/oracle/product/11204/oraInst.loc
 Starting Oracle Universal Installer...

 Checking swap space: must be greater than 500 MB.   Actual 8192 MB    Passed
 Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_03-12-31PM. Please wait ...Oracle Universal Installer, Version 11.2.0.4.0 Production
 Copyright (C) 1999, 2013, Oracle. All rights reserved.

 You can find the log of this install session at:
  /u01/oracle/oraInventory/logs/cloneActions2014-06-11_03-12-31PM.log
 .................................................................................................... 100% Done.



 Installation in progress (Wednesday, June 11, 2014 3:12:45 PM CEST)
 ................................................................................                                                80% Done.
 Install successful

 Linking in progress (Wednesday, June 11, 2014 3:12:54 PM CEST)
 Link successful

 Setup in progress (Wednesday, June 11, 2014 3:15:32 PM CEST)
 Setup successful

 End of install phases.(Wednesday, June 11, 2014 3:15:59 PM CEST)
 WARNING:
 The following configuration scripts need to be executed as the "root" user.
 /u01/oracle/product/11204/root.sh
 To execute the configuration scripts:
     1. Open a terminal window
     2. Log in as "root"
     3. Run the scripts

 The cloning of 11204 was successful.
 Please check '/u01/oracle/oraInventory/logs/cloneActions2014-06-11_03-12-31PM.log' for more details.

How to deal with ORA-00020: maximum number of processes (%s) exceeded


I recently had a situation where access to the database was completely blocked because of the infamous error message
ORA-00020: maximum number of processes (%s) exceeded
Facts:
  • The database had processes set to 1000.
  • The Cloud Control agent was spawning hundreds of processes (obviously an error to troubleshoot as a separate action)
  • Connecting through sqlplus with os authentication (sqlplus / as sysdba) didn't work due to the same reason

    At that time, the database had to become available to the users again ASAP.

    When I have encountered these situations in the past, I have had to kill all the operating system processes and restart the instance. A brut-force method that is not particularly pretty, but sometimes necessary:

    for a in $(ps -ef |grep $ORACLE_SID | grep -v grep | awk '{ print $2}'); do 
    >kill -9 $a; 
    >done
    

    It normally does the job when you really have no other option.
    This time however, after having killed all the processes, Oracle still rejected connections to the database using sqlplus:

    sqlplus /nolog
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:07:23 2014
    
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
     @ SQL> connect /
     ERROR:
     ORA-00020: maximum number of processes (1000) exceeded
    
    I then found the page by tech.e2sn.com that showed how to use sqlplus with the "preliminary connection".

    Simply by using
    
    sqlplus -prelim "/as sysdba"
    
    I was able to connect and shutdown the database with the abort option.
    sqlplus -prelim "/ as sysdba"
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:09:15 2014
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
    
     SQL> shutdown abort
     ORACLE instance shut down.
     SQL> exit
     Disconnected from ORACLE
    
    After this point the database could once again be restarted:
    sqlplus / as sysdba
    
     SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:10:38 2014
     Copyright (c) 1982, 2013, Oracle.  All rights reserved.
     Connected to an idle instance.
    SQL> startup
     ORACLE instance started.
    
     Total System Global Area 2137886720 bytes
     Fixed Size                  2248080 bytes
     Variable Size            1258291824 bytes
     Database Buffers          855638016 bytes
     Redo Buffers               21708800 bytes
     Database mounted.
     Databasen opened.
    

    The article referred to above is worth reading, but in short, the -prelim option will not try to create private session structures in the SGA. This allows you to connect to perform debugging or shutdown operations.
  • Great feature in adrci for searching in the alert log


    A potentially very timesaving feature in Oracle's adrci is the ability to search in the alert log for specific text, as I had to do to find when a specific parameter was set:

    adrci
    
     ADRCI: Release 11.2.0.4.0 - Production on On Jun 11 12:03:34 2014
    
     Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
    
     ADR base = "/u01/oracle/admin/proddb01/diagnostic"
     adrci> show homes
     ADR Homes:
     diag/rdbms/proddb01/proddb01
     adrci> show alert -p "message_text like '%event%'"
    
     ADR Home = /u01/oracle/admin/proddb01/diagnostic/diag/rdbms/proddb01/proddb01:
     *************************************************************************
     Output the results to file: /tmp/alert_3343326_1_proddb01_1.ado
     "/tmp/alert_3343326_1_proddb01_1.ado" 44 lines, 2408 characters
     2013-08-25 14:35:46.344000 +02:00
     One of the following events caused this:
     2014-01-02 10:56:34.311000 +01:00
     OS Pid: 8455160 executed alter system set events '10852 trace name context forever, level 16384'
     2014-01-02 10:56:47.555000 +01:00
     ALTER SYSTEM SET event='10852 trace name context forever, level 16384' SCOPE=SPFILE;
     2014-01-10 00:43:02.945000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-01-31 19:32:59.471000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-02-01 09:12:59.653000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     CLOSE: Active sessions prevent database close operation
     2014-02-01 18:10:54.100000 +01:00
       event                    = "10852 trace name context forever, level 16384"
     2014-06-10 19:38:42.536000 +02:00
     ALTER SYSTEM SET event='10852 trace name context forever, level 16384 off' SCOPE=SPFILE;
     2014-06-10 19:43:12.770000 +02:00
       event                    = "10852 trace name context off"
    

    Without much effort I was able to find that the parameter was set 02.01.2014, and switched off 10.06.2014.

    How to disable an event parameter in the database

    During an Upgrade from 11.2.0.3 to 11.2.0.4 I had to remove an event-parameter in the database.

    The syntax for this is:
    alter system set event="10852 trace name context off" scope=spfile;
    

    Friday, June 6, 2014

    Why does the TIMESTAMP# column in the AUD$ table contain NULL values?

    According to Oracle Support Note 427296.1:

    "In database version 10gR1 and above, the TIMESTAMP# column is obsoleted in favor of the new NTIMESTAMP# column."

    So when exchanging the TIMESTAMP# with the NTIMESTAMP# column, my script works as intended, while it had previously showed NULL values:

    SELECT DBID "CURRENT DBID" FROM V$DATABASE;
    
     SET TIMING ON
     SET LINES 200
     COL "Earliest" format a30
     col "Latest" format a30
    
     PROMPT Counting the DBIDs and the number of audit entries each
     PROMPT Could take a while...
     COL TIMESTAMP# FORMAT A3
     SELECT DBID,COUNT(*),MIN(NTIMESTAMP#) "Earliest", MAX(NTIMESTAMP#) "Latest"
     FROM AUD$
     GROUP BY DBID;
    

    Output:
    Counting the DBIDs and the number of audit entries each
     Could take a while...
    
           DBID   COUNT(*) Earliest                       Latest
     ---------- ---------- ------------------------------ ------------------------------
    2367413790       1867 05.06.2014 14.01.30,193254     06.06.2014 06.17.08,485629
    

    The views built upon AUD$, for example DBA_AUDIT_TRAIL and DBA_FGA_AUDIT_TRAIL, will of course reflect the correct columns from AUD$ (NTIMESTAMP#) in their own TIMESTAMP column.

    Thursday, June 5, 2014

    Multiple DBIDs in AUD$

    I wanted to test the DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL package, with use_last_arch_timestamp to TRUE, to only purge records one month older than the minimum value found.

    I found this value by using the function ADD_MONTHS to add 1 month on top of the minimum value found in AUD$
    SET SERVEROUTPUT ON
     DECLARE
     currdate DATE;
     last_archtime DATE;
    
     BEGIN
    
     currdate := SYSTIMESTAMP;
    
     ---------------------------------------------------------
     -- Get the oldest timestamp from AUD$, then add one month.
     -- Use this timestamp as the last archive timestamp in
     -- procedure SET_LAST_ARCHIVE_TIMESTAMP
     ---------------------------------------------------------
     SELECT ADD_MONTHS(
       (
        SELECT MIN(TIMESTAMP) 
        FROM DBA_AUDIT_TRAIL
        ), 1)
     INTO last_archtime
     FROM DUAL;
     DBMS_OUTPUT.PUT_LINE('last_archtime: ' || last_archtime);
    
      DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP (
        AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
        LAST_ARCHIVE_TIME => last_archtime);
    
     END;
     /
    
    Put all that in a file called set_last_timestamp_std.sql.
    First, check the DBA_AUDIT_MGMT_LAST_ARCH_TS for the last archive timestamp:
    SQL> SELECT * FROM DBA_AUDIT_MGMT_LAST_ARCH_TS;
    
    No rows selected.
    
    
    Execute the script created above:
    sqlplus / as sysdba @set_last_timestamp_std.sql
    
     last_archtime: 07.02.2009
    
     PL/SQL-procedure executed.
    
    Check the DBA_AUDIT_MGMT_LAST_ARCH_TS again:
    AUDIT_TRAIL          RAC_INSTANCE LAST_ARCHIVE_TS
     -------------------- ------------ ----------------------------------------
     STANDARD AUDIT TRAIL            0 07.02.2009 15.01.50,000000 +00:00
    

    I was now ready to execute the manual cleanup. Before I did so, I wanted to get an idea about how many rows that should be purged:
    SELECT COUNT(*)
    FROM DBA_AUDIT_TRAIL
    WHERE TIMESTAMP < (
    SELECT ADD_MONTHS(
       (SELECT TIMESTAMP
        FROM (SELECT TIMESTAMP FROM DBA_AUDIT_TRAIL ORDER BY TIMESTAMP ASC)
        WHERE ROWNUM <=1), 1)
     FROM DUAL)
    ;
    
      COUNT(*)
    ----------
       126405
    
    
    Compare with the total number of rows:
    SQL> SELECT COUNT(*)FROM DBA_AUDIT_TRAIL;
    
      COUNT(*)
    ----------
      33 664,540
    
    Sweet. 126405 records would be purged. I then executed:
    BEGIN
    DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
    audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    use_last_arch_timestamp => TRUE);
    END;
    /
    
    The purge succeeded. But when I checked the number of rows again, it still returned 126405 rows. What I found was that what Oracle was executing the following statement internally when using the DBMS_AUDIT_MGMT package:
    DELETE FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 10000;
    
    So I tested to select the rows using the same predicate that was used during the purge:
    SQL> SELECT COUNT(*) FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');
    
       COUNT(*)
     ----------
              0
    
    checked again against the dba_audit_trail
    SQL> SELECT COUNT(*) FROM DBA_AUDIT_TRAIL WHERE TIMESTAMP  < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');
    
       COUNT(*)
     ----------
         126405
    
    So there are indeed records that are older than '2009-02-07 15:01:50'. Why is it not caught when querying the AUD$ table, only the DBA_AUDIT_TRAIL? Of course! The AUD$ table also has a reference to the DBID. And since the database was recently cloned, it has cycled through another incarnation:
    SQL> select DBID,MIN(NTIMESTAMP#)
    2  FROM AUD$
    3  GROUP BY DBID;
    
    DBID MIN(NTIMESTAMP#)
    ---------- ----------------------------
    2367413790 19.05.2014 07.07.13,675010
    848951741 07.01.2009 13.01.50,802413
    
    So the fact that minimum timestamp for DBID 2367413790 is 19.05.2014 is correct after all:
    SQL> SELECT MIN(TIMESTAMP) FROM DBA_AUDIT_TRAIL WHERE DBID=2367413790;
    
    MIN(TIMEST
    ----------
    19.05.2014
    
    In fact, the majority of the audit trail records are from a previous incarnation:
    SQL> select count(*) from aud$ where dbid = 848951741;
    
    COUNT(*)
    ----------
    33 612,411
    
    SQL> select DBID,MAX(NTIMESTAMP#)
    2  FROM AUD$
    3  GROUP BY DBID;
    
    DBID MAX(NTIMESTAMP#)
    ---------- --------------------------------
    2367413790 05.06.2014 08.42.59,749967
    848951741 15.05.2014 21.41.52,247344
    
    So the size of the AUD$ is 7481 MB:
    SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';
    
    SUM(BYTES)/1024/1024
    --------------------
    7481
    
    Now the question is: since the procedure DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL with the parameter use_last_arch_timestamp set to TRUE only attempts to purge the rows from AUD$ that has the same DBID as the current database incarnation, will a "purge all" directive, indicated by use_last_arch_timestamp set to FALSE be equally selective? Since this is a test system, I tried it out by putting the following statements into a script:
    SET LINES 200
    SET SERVEROUTPUT ON
    
    SELECT DBID,COUNT(*)
    FROM AUD$
    GROUP BY DBID;
    
    BEGIN
    DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
    audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    use_last_arch_timestamp => FALSE);
    END;
    /
    
    SELECT DBID,COUNT(*)
    FROM AUD$
    GROUP BY DBID;
    
    Execute it:
    sqlplus / as sysdba @clean_all_audit_trail.sql
    
    Result:
    DBID   COUNT(*)
    ---------- ----------
    2367413790      52560
    848951741   33612411
    
    PL/SQL-procedure executed.
    
    No rows selected.
    
    
    SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';
    
    SUM(BYTES)/1024/1024
    --------------------
    ,0625
    

    So a "purge all" directive will certainly wipe out all of your audit trail, regardless of the presence of multiple DBIDs.

    Purging "up until a the last archive timestamp" will only select the audit entries for your current database incarnation's DBID.