Minimalistic Oracle

Thursday, June 12, 2014

How to pin SGA in memory on an AIX server

On AIX, if you want to pin the SGA in memory, set the parameter LOCK_SGA to TRUE.

A prerequisite for locking the SGA is that you set the proper capabilities for the user owning and spawning the Oracle rdbms processes.

Check the set capabilities as follows:

[root@myserver] lsuser -a capabilities oracle
 oracle

Set capabilities:

[root@myserver] chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
 [root@myserver] lsuser -a capabilities oracle
 oracle capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
 [root@myserver]

Wednesday, June 11, 2014

How to work around SEVERE: OUI-10197:Unable to create a new Oracle Home when using clone.pl

When cloning a database home onto a new server, the clone.pl script threw the following error:

perl clone.pl ORACLE_HOME=/u01/oracle/product/11204 ORACLE_HOME_NAME=11204 

SEVERE: OUI-10197:Unable to create a new Oracle Home at /u01/oracle/product/11204. Oracle Home already exists at this location. Select another location.

The server was freshly installed, new volume Groups, logical Volume Groups and file systems newly created.

I then realized that since the software was tared and untared into a new directory, there would potentially be an existing Oracle Inventory present, and I would needed to "detach" the ORACLE_HOME before cloning it. The tarball did indeed contain an Inventory:


 myserver>cd $ORACLE_HOME
 myserver>cat oraInst.loc.org
 inventory_loc=/u01/oracle/oraInventory
 inst_group=dba

Since there was no such directory present, I created it:

myserver>mkdir /u01/oracle/oraInventory/ContentsXML/
myserver>cd /u01/oracle/oraInventory/ContentsXML/
myserver>vi inventory.xml

In the file inventory.xml, I added the following minimal information:

<!-- Copyright (c) 1999, 2013, Oracle and/or its affiliates.
All rights reserved. -->
 <!-- Do not modify the contents of this file by hand. -->
 <inventory>
 <version_info>
    <saved_with>11.2.0.4.0</saved_with>
    <minimum_ver>2.1.0.6.0</minimum_ver>
 </version_info>
 <home_list>
 <home idx="1" loc="/u01/oracle/product/11204" name="11204" type="O">
 </home></home_list>
 <compositehome_list>
 </compositehome_list>
 </inventory>

I then added

 -invPtrLoc /u01/oracle/product/11204/oraInst.loc

to file $ORACLE_HOME/clone/config/cs.properties

Next, I detached the ORACLE_HOME:

myserver>cd $ORACLE_HOME/oui/bin
 myserver>./runInstaller -detachHome ORACLE_HOME=/u01/oracle/product/11204 -invPtrLoc /u01/oracle/product/11204/oraInst.loc
 Starting Oracle Universal Installer...

 Checking swap space: must be greater than 500 MB.   Actual 8192 MB    Passed
 The inventory pointer is located at /u01/oracle/product/11204/oraInst.loc
 The inventory is located at /u01/oracle/oraInventory
 'DetachHome' was successful.

Finally I was able to clone the ORACLE_HOME:

perl clone.pl ORACLE_HOME=/u01/oracle/product/11204 ORACLE_HOME_NAME=11204 ORACLE_BASE=/u01/oracle OSDBA_GROUP=dba

 ********************************************************************************

 Your platform requires the root user to perform certain pre-clone
 OS preparation.  The root user should run the shell script 'rootpre.sh' before
 you proceed with cloning.  rootpre.sh can be found at
 /u01/oracle/product/11204/clone directory.
 Answer 'y' if the root user has run 'rootpre.sh' script.

 ********************************************************************************

 Has 'rootpre.sh' been run by the root user? [y/n] (n)
 y
 ./runInstaller -clone -waitForCompletion  "ORACLE_HOME=/u01/oracle/product/11204" "ORACLE_HOME_NAME=11204" "ORACLE_BASE=/u01/oracle" "oracle_install_OSDBA=dba" -silent -noConfig -nowait -invPtrLoc /u01/oracle/product/11204/oraInst.loc
 Starting Oracle Universal Installer...

 Checking swap space: must be greater than 500 MB.   Actual 8192 MB    Passed
 Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_03-12-31PM. Please wait ...Oracle Universal Installer, Version 11.2.0.4.0 Production
 Copyright (C) 1999, 2013, Oracle. All rights reserved.

 You can find the log of this install session at:
  /u01/oracle/oraInventory/logs/cloneActions2014-06-11_03-12-31PM.log
 .................................................................................................... 100% Done.



 Installation in progress (Wednesday, June 11, 2014 3:12:45 PM CEST)
 ................................................................................                                                80% Done.
 Install successful

 Linking in progress (Wednesday, June 11, 2014 3:12:54 PM CEST)
 Link successful

 Setup in progress (Wednesday, June 11, 2014 3:15:32 PM CEST)
 Setup successful

 End of install phases.(Wednesday, June 11, 2014 3:15:59 PM CEST)
 WARNING:
 The following configuration scripts need to be executed as the "root" user.
 /u01/oracle/product/11204/root.sh
 To execute the configuration scripts:
     1. Open a terminal window
     2. Log in as "root"
     3. Run the scripts

 The cloning of 11204 was successful.
 Please check '/u01/oracle/oraInventory/logs/cloneActions2014-06-11_03-12-31PM.log' for more details.

How to deal with ORA-00020: maximum number of processes (%s) exceeded

I recently had a situation where access to the database was completely blocked because of the infamous error message

ORA-00020: maximum number of processes (%s) exceeded

Facts:

The database had processes set to 1000.

The Cloud Control agent was spawning hundreds of processes (obviously an error to troubleshoot as a separate action)

Connecting through sqlplus with os authentication (sqlplus / as sysdba) didn't work due to the same reason

At that time, the database had to become available to the users again ASAP.

When I have encountered these situations in the past, I have had to kill all the operating system processes and restart the instance. A brut-force method that is not particularly pretty, but sometimes necessary:

for a in $(ps -ef |grep $ORACLE_SID | grep -v grep | awk '{ print $2}'); do 
>kill -9 $a; 
>done

It normally does the job when you really have no other option.
This time however, after having killed all the processes, Oracle still rejected connections to the database using sqlplus:

sqlplus /nolog

 SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:07:23 2014

 Copyright (c) 1982, 2013, Oracle.  All rights reserved.
 @ SQL> connect /
 ERROR:
 ORA-00020: maximum number of processes (1000) exceeded

I then found the page by tech.e2sn.com that showed how to use sqlplus with the "preliminary connection".

Simply by using


sqlplus -prelim "/as sysdba"

I was able to connect and shutdown the database with the abort option.

sqlplus -prelim "/ as sysdba"

 SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:09:15 2014
 Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 SQL> shutdown abort
 ORACLE instance shut down.
 SQL> exit
 Disconnected from ORACLE

After this point the database could once again be restarted:

sqlplus / as sysdba

 SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 09:10:38 2014
 Copyright (c) 1982, 2013, Oracle.  All rights reserved.
 Connected to an idle instance.
SQL> startup
 ORACLE instance started.

 Total System Global Area 2137886720 bytes
 Fixed Size                  2248080 bytes
 Variable Size            1258291824 bytes
 Database Buffers          855638016 bytes
 Redo Buffers               21708800 bytes
 Database mounted.
 Databasen opened.

The article referred to above is worth reading, but in short, the -prelim option will not try to create private session structures in the SGA. This allows you to connect to perform debugging or shutdown operations.

Great feature in adrci for searching in the alert log

A potentially very timesaving feature in Oracle's adrci is the ability to search in the alert log for specific text, as I had to do to find when a specific parameter was set:

adrci

 ADRCI: Release 11.2.0.4.0 - Production on On Jun 11 12:03:34 2014

 Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

 ADR base = "/u01/oracle/admin/proddb01/diagnostic"
 adrci> show homes
 ADR Homes:
 diag/rdbms/proddb01/proddb01
 adrci> show alert -p "message_text like '%event%'"

 ADR Home = /u01/oracle/admin/proddb01/diagnostic/diag/rdbms/proddb01/proddb01:
 *************************************************************************
 Output the results to file: /tmp/alert_3343326_1_proddb01_1.ado
 "/tmp/alert_3343326_1_proddb01_1.ado" 44 lines, 2408 characters
 2013-08-25 14:35:46.344000 +02:00
 One of the following events caused this:
 2014-01-02 10:56:34.311000 +01:00
 OS Pid: 8455160 executed alter system set events '10852 trace name context forever, level 16384'
 2014-01-02 10:56:47.555000 +01:00
 ALTER SYSTEM SET event='10852 trace name context forever, level 16384' SCOPE=SPFILE;
 2014-01-10 00:43:02.945000 +01:00
   event                    = "10852 trace name context forever, level 16384"
 2014-01-31 19:32:59.471000 +01:00
   event                    = "10852 trace name context forever, level 16384"
 2014-02-01 09:12:59.653000 +01:00
   event                    = "10852 trace name context forever, level 16384"
 CLOSE: Active sessions prevent database close operation
 2014-02-01 18:10:54.100000 +01:00
   event                    = "10852 trace name context forever, level 16384"
 2014-06-10 19:38:42.536000 +02:00
 ALTER SYSTEM SET event='10852 trace name context forever, level 16384 off' SCOPE=SPFILE;
 2014-06-10 19:43:12.770000 +02:00
   event                    = "10852 trace name context off"

Without much effort I was able to find that the parameter was set 02.01.2014, and switched off 10.06.2014.

How to disable an event parameter in the database

During an Upgrade from 11.2.0.3 to 11.2.0.4 I had to remove an event-parameter in the database.

The syntax for this is:

alter system set event="10852 trace name context off" scope=spfile;

Friday, June 6, 2014

Why does the TIMESTAMP# column in the AUD$ table contain NULL values?

According to Oracle Support Note 427296.1:

"In database version 10gR1 and above, the TIMESTAMP# column is obsoleted in favor of the new NTIMESTAMP# column."

So when exchanging the TIMESTAMP# with the NTIMESTAMP# column, my script works as intended, while it had previously showed NULL values:

SELECT DBID "CURRENT DBID" FROM V$DATABASE;

 SET TIMING ON
 SET LINES 200
 COL "Earliest" format a30
 col "Latest" format a30

 PROMPT Counting the DBIDs and the number of audit entries each
 PROMPT Could take a while...
 COL TIMESTAMP# FORMAT A3
 SELECT DBID,COUNT(*),MIN(NTIMESTAMP#) "Earliest", MAX(NTIMESTAMP#) "Latest"
 FROM AUD$
 GROUP BY DBID;

Output:

Counting the DBIDs and the number of audit entries each
 Could take a while...

       DBID   COUNT(*) Earliest                       Latest
 ---------- ---------- ------------------------------ ------------------------------
2367413790       1867 05.06.2014 14.01.30,193254     06.06.2014 06.17.08,485629

The views built upon AUD$, for example DBA_AUDIT_TRAIL and DBA_FGA_AUDIT_TRAIL, will of course reflect the correct columns from AUD$ (NTIMESTAMP#) in their own TIMESTAMP column.

Thursday, June 5, 2014

Multiple DBIDs in AUD$

I wanted to test the DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL package, with use_last_arch_timestamp to TRUE, to only purge records one month older than the minimum value found.

I found this value by using the function ADD_MONTHS to add 1 month on top of the minimum value found in AUD$

SET SERVEROUTPUT ON
 DECLARE
 currdate DATE;
 last_archtime DATE;

 BEGIN

 currdate := SYSTIMESTAMP;

 ---------------------------------------------------------
 -- Get the oldest timestamp from AUD$, then add one month.
 -- Use this timestamp as the last archive timestamp in
 -- procedure SET_LAST_ARCHIVE_TIMESTAMP
 ---------------------------------------------------------
 SELECT ADD_MONTHS(
   (
    SELECT MIN(TIMESTAMP) 
    FROM DBA_AUDIT_TRAIL
    ), 1)
 INTO last_archtime
 FROM DUAL;
 DBMS_OUTPUT.PUT_LINE('last_archtime: ' || last_archtime);

  DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP (
    AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    LAST_ARCHIVE_TIME => last_archtime);

 END;
 /

Put all that in a file called set_last_timestamp_std.sql.
First, check the DBA_AUDIT_MGMT_LAST_ARCH_TS for the last archive timestamp:

SQL> SELECT * FROM DBA_AUDIT_MGMT_LAST_ARCH_TS;

No rows selected.

Execute the script created above:

sqlplus / as sysdba @set_last_timestamp_std.sql

 last_archtime: 07.02.2009

 PL/SQL-procedure executed.

Check the DBA_AUDIT_MGMT_LAST_ARCH_TS again:

AUDIT_TRAIL          RAC_INSTANCE LAST_ARCHIVE_TS
 -------------------- ------------ ----------------------------------------
 STANDARD AUDIT TRAIL            0 07.02.2009 15.01.50,000000 +00:00

I was now ready to execute the manual cleanup. Before I did so, I wanted to get an idea about how many rows that should be purged:

SELECT COUNT(*)
FROM DBA_AUDIT_TRAIL
WHERE TIMESTAMP < (
SELECT ADD_MONTHS(
   (SELECT TIMESTAMP
    FROM (SELECT TIMESTAMP FROM DBA_AUDIT_TRAIL ORDER BY TIMESTAMP ASC)
    WHERE ROWNUM <=1), 1)
 FROM DUAL)
;

  COUNT(*)
----------
   126405

Compare with the total number of rows:

SQL> SELECT COUNT(*)FROM DBA_AUDIT_TRAIL;

  COUNT(*)
----------
  33 664,540

Sweet. 126405 records would be purged. I then executed:

BEGIN
DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
use_last_arch_timestamp => TRUE);
END;
/

The purge succeeded. But when I checked the number of rows again, it still returned 126405 rows. What I found was that what Oracle was executing the following statement internally when using the DBMS_AUDIT_MGMT package:

DELETE FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 10000;

So I tested to select the rows using the same predicate that was used during the purge:

SQL> SELECT COUNT(*) FROM SYS.AUD$ WHERE DBID = 2367413790 AND NTIMESTAMP# < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');

   COUNT(*)
 ----------
          0

checked again against the dba_audit_trail

SQL> SELECT COUNT(*) FROM DBA_AUDIT_TRAIL WHERE TIMESTAMP  < to_timestamp('2009-02-07 15:01:50', 'YYYY-MM-DD HH24:MI:SS.FF');

   COUNT(*)
 ----------
     126405

So there are indeed records that are older than '2009-02-07 15:01:50'. Why is it not caught when querying the AUD$ table, only the DBA_AUDIT_TRAIL? Of course! The AUD$ table also has a reference to the DBID. And since the database was recently cloned, it has cycled through another incarnation:

SQL> select DBID,MIN(NTIMESTAMP#)
2  FROM AUD$
3  GROUP BY DBID;

DBID MIN(NTIMESTAMP#)
---------- ----------------------------
2367413790 19.05.2014 07.07.13,675010
848951741 07.01.2009 13.01.50,802413

So the fact that minimum timestamp for DBID 2367413790 is 19.05.2014 is correct after all:

SQL> SELECT MIN(TIMESTAMP) FROM DBA_AUDIT_TRAIL WHERE DBID=2367413790;

MIN(TIMEST
----------
19.05.2014

In fact, the majority of the audit trail records are from a previous incarnation:

SQL> select count(*) from aud$ where dbid = 848951741;

COUNT(*)
----------
33 612,411

SQL> select DBID,MAX(NTIMESTAMP#)
2  FROM AUD$
3  GROUP BY DBID;

DBID MAX(NTIMESTAMP#)
---------- --------------------------------
2367413790 05.06.2014 08.42.59,749967
848951741 15.05.2014 21.41.52,247344

So the size of the AUD$ is 7481 MB:

SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';

SUM(BYTES)/1024/1024
--------------------
7481

Now the question is: since the procedure DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL with the parameter use_last_arch_timestamp set to TRUE only attempts to purge the rows from AUD$ that has the same DBID as the current database incarnation, will a "purge all" directive, indicated by use_last_arch_timestamp set to FALSE be equally selective? Since this is a test system, I tried it out by putting the following statements into a script:

SET LINES 200
SET SERVEROUTPUT ON

SELECT DBID,COUNT(*)
FROM AUD$
GROUP BY DBID;

BEGIN
DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
use_last_arch_timestamp => FALSE);
END;
/

SELECT DBID,COUNT(*)
FROM AUD$
GROUP BY DBID;

Execute it:

sqlplus / as sysdba @clean_all_audit_trail.sql

Result:

DBID   COUNT(*)
---------- ----------
2367413790      52560
848951741   33612411

PL/SQL-procedure executed.

No rows selected.


SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_SEGMENTS WHERE SEGMENT_NAME = 'AUD$';

SUM(BYTES)/1024/1024
--------------------
,0625

So a "purge all" directive will certainly wipe out all of your audit trail, regardless of the presence of multiple DBIDs.

Purging "up until a the last archive timestamp" will only select the audit entries for your current database incarnation's DBID.